This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.
Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.
As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).
Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.
Closes#2527
This PR adds a set of tests to the Consul test suite for testing
Nomad OSS's behavior around setting Consul Namespace on groups,
which is to ignore the setting (as Consul Namespaces are currently
an Enterprise feature).
Tests are generally a reduced facsimile of existing tests, modified
to check behavior of when group.consul.namespace is set and not set.
Verification is oriented around what happens in Consul; the in-depth
functional correctness of these features is left to the original tests.
Nomad ENT will get its own version of these tests in `namespaces_ent.go`.
Prefer testutil.WaitForResultRetries that emits more descriptive errors on
failures. `require.Evatually` fails with opaque "Condition never satisfied"
error message.
We directly parse job files in e2eutil, but currently using jobspec
package. Instead, use the Parse method from the jobspec2 package so
we can parse job files with new features.
Adds 2 tests around Connect Native. Both make use of the example connect native
services in https://github.com/hashicorp/nomad-connect-examples
One of them runs without Consul ACLs enabled, the other with.
This changeset:
* adds eval status to the error messages emitted when we have
placement failure in tests. The implementation here isn't quite
perfect but it's a lot better than "condition not met".
* enforces the ordering of teardown of the CSI test
* doesn't pass the purge flag to one of the two CSI tests, so that we
exercise both code paths.
- In script checks, ensure we're running `Exec` against the new running
allocation and not the earlier stopped one.
- In script checks, allow `Exec` calls to error due to lack of pty when
we use the exec to kill the task.
- In `utils.go/RegisterAllocs`, force query for allocations to wait on
wait index returned by registration call.
The e2e test code is absolutely hideous and leaks processes and files
on disk. NomadAgent seems useful, but the clientstate e2e tests are very
messy and slow. The last test "Corrupt" is probably the most useful as
it explicitly corrupts the state file whereas the other tests attempt to
reproduce steps thought to cause corruption in earlier releases of
Nomad.