* api: enable support for setting original source alongside job
This PR adds support for setting job source material along with
the registration of a job.
This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.
The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.
The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.
* api: avoid writing var content to disk for parsing
* api: move submission validation into RPC layer
* api: return an error if updating a job submission without namespace or job id
* api: be exact about the job index we associate a submission with (modify)
* api: reword api docs scheduling
* api: prune all but the last 6 job submissions
* api: protect against nil job submission in job validation
* api: set max job source size in test server
* api: fixups from pr
This changeset refactors the tests of the draining node watcher so that we don't
mock the node watcher's `Remove` and `Update` methods for its own tests. Instead
we'll mock the node watcher's dependencies (the job watcher and deadline
notifier) and now unit tests can cover the real code. This allows us to remove a
bunch of TODOs in `watch_nodes.go` around testing and clarify some important
behaviors:
* Nodes that are down or disconnected will still be watched until the scheduler
decides what to do with their allocations. This will drive the job watcher but
not the node watcher, and that lets the node watcher gracefully handle cases
where a heartbeat fails but the node heartbeats again before its allocs can be
evicted.
* Stop watching nodes that have been deleted. The blocking query for nodes set
the maximum index to the highest index of a node it found, rather than the
index of the nodes table. This misses updates to the index from deleting
nodes. This was done as an performance optimization to avoid excessive
unblocking, but because the query is over all nodes anyways there's no
optimization to be had here. Remove the optimization so we can detect deleted
nodes without having to wait for an update to an unrelated node.
* use msgtype in upsert node
adds message type to signature for upsert node, update tests, remove placeholder method
* UpsertAllocs msg type test setup
* use upsertallocs with msg type in signature
update test usage of delete node
delete placeholder msgtype method
* add msgtype to upsert evals signature, update test call sites with test setup msg type
handle snapshot upsert eval outside of FSM and ignore eval event
remove placeholder upsertevalsmsgtype
handle job plan rpc and prevent event creation for plan
msgtype cleanup upsertnodeevents
updatenodedrain msgtype
msg type 0 is a node registration event, so set the default to the ignore type
* fix named import
* fix signature ordering on upsertnode to match