Tools like `nomad-nodesim` are unable to implement a minimal implementation of
an allocrunner so that we can test the client communication without having to
lug around the entire allocrunner/taskrunner code base. The allocrunner was
implemented with an interface specifically for this purpose, but there were
circular imports that made it challenging to use in practice.
Move the AllocRunner interface into an inner package and provide a factory
function type. Provide a minimal test that exercises the new function so that
consumers have some idea of what the minimum implementation required is.
* api: enable support for setting original source alongside job
This PR adds support for setting job source material along with
the registration of a job.
This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.
The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.
The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.
* api: avoid writing var content to disk for parsing
* api: move submission validation into RPC layer
* api: return an error if updating a job submission without namespace or job id
* api: be exact about the job index we associate a submission with (modify)
* api: reword api docs scheduling
* api: prune all but the last 6 job submissions
* api: protect against nil job submission in job validation
* api: set max job source size in test server
* api: fixups from pr
* use msgtype in upsert node
adds message type to signature for upsert node, update tests, remove placeholder method
* UpsertAllocs msg type test setup
* use upsertallocs with msg type in signature
update test usage of delete node
delete placeholder msgtype method
* add msgtype to upsert evals signature, update test call sites with test setup msg type
handle snapshot upsert eval outside of FSM and ignore eval event
remove placeholder upsertevalsmsgtype
handle job plan rpc and prevent event creation for plan
msgtype cleanup upsertnodeevents
updatenodedrain msgtype
msg type 0 is a node registration event, so set the default to the ignore type
* fix named import
* fix signature ordering on upsertnode to match
The test inserts an alloc in the server state, but expect the client to
start the alloc runner for it almost immediately.
Here, we add a retry loop to check that the client start all expected
alloc runners eventually.
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.
Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.
This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
The previous integration test was broken during the client refactor, and
it seems to be some sort of race with state updating.
I'm going to try and construct a replacement test as part of work on
performance, but for now, the underlying behaviour is still being
tested.
The default job here contains some exec task config (for setting
command and args) that aren't used for mock driver. Now, the alloc
runner seems stricter about validating fields and errors on unexpected
fields.
Updating configs in tests so we can have an explicit task config
whenever driver is set explicitly.
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.
Also stops logging "marked for GC" twice.
* alloc_runner
* Random tests
* parallel task_runner and no exec compatible check
* Parallel client
* Fail fast and use random ports
* Fix docker port mapping
* Make concurrent pull less timing dependant
* up parallel
* Fixes
* don't build chroots in parallel on travis
* Reduce parallelism on travis with lxc/rkt
* make java test app not run forever
* drop parallelism a little
* use docker ports that are out of the os's ephemeral port range
* Limit even more on travis
* rkt deadline