open-nomad

Author	SHA1	Message	Date
Drew Bailey	f3dcefe5a9	remove event durability (#9147 ) * remove event durability temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs * drop events table schema and state store methods * fix neweventbuffer invocations	2020-10-22 12:21:03 -04:00
Drew Bailey	6c788fdccd	Events/msgtype cleanup (#9117 ) * use msgtype in upsert node adds message type to signature for upsert node, update tests, remove placeholder method * UpsertAllocs msg type test setup * use upsertallocs with msg type in signature update test usage of delete node delete placeholder msgtype method * add msgtype to upsert evals signature, update test call sites with test setup msg type handle snapshot upsert eval outside of FSM and ignore eval event remove placeholder upsertevalsmsgtype handle job plan rpc and prevent event creation for plan msgtype cleanup upsertnodeevents updatenodedrain msgtype msg type 0 is a node registration event, so set the default to the ignore type * fix named import * fix signature ordering on upsertnode to match	2020-10-19 09:30:15 -04:00
Drew Bailey	c463479848	filter on additional filter keys, remove switch statement duplication properly wire up durable event count move newline responsibility moves newline creation from NDJson to the http handler, json stream only encodes and sends now ignore snapshot restore if broker is disabled enable dev mode to access event steam without acl use mapping instead of switch use pointers for config sizes, remove unused ttl, simplify closed conn logic	2020-10-14 14:14:33 -04:00
Drew Bailey	df96b89958	Add EvictCallbackFn to handle removing entries from go-memdb when they are removed from the event buffer. Wire up event buffer size config, use pointers for structs.Events instead of copying.	2020-10-14 12:44:42 -04:00
Drew Bailey	315f77a301	rehydrate event publisher on snapshot restore address pr feedback	2020-10-14 12:44:41 -04:00
Drew Bailey	d793529d61	event durability count and cfg	2020-10-14 12:44:40 -04:00
Drew Bailey	b4c135358d	use Events to wrap index and events, store in events table	2020-10-14 12:44:39 -04:00
Michael Schurter	9dd59ceaa7	core: improve job deregister error logging Noticed this error in some production logs, and they were far from helpful. Changes: 1. Include job ID in logs 2. Wrap errors and log once instead of double log lines 3. Test fsm error handling behavior	2020-09-21 08:59:03 -07:00
Drew Bailey	0af749c92e	Transaction change tracking This commit wraps memdb.DB with a changeTrackerDB, which is a thin wrapper around memdb.DB which enables go-memdb's TrackChanges on all write transactions. When the transaction is comitted the changes are sent to an eventPublisher which will be used to create and emit change events. debugging TestFSM_ReconcileSummaries wip revert back rebase revert back rebase fix snapshot to actually use a snapshot	2020-09-01 10:27:20 -04:00
Mahmood Ali	78568b8e63	Remove unused state.TestInitState	2020-07-20 09:55:55 -04:00
Drew Bailey	01e2cc5054	allow ClusterMetadata to accept a watchset (#8299 ) * allow ClusterMetadata to accept a watchset * use nil instead of empty watchset	2020-06-26 13:23:32 -04:00
Mahmood Ali	2c963885b0	handle upgrade path and defaults Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on upgrades or on API handling when user misses the value. The scheduler already treats `""` value as binpack. This PR merely ensures that the operator API returns the effective value.	2020-05-09 12:34:08 -04:00
Mahmood Ali	f492ab6d9e	implement MinQuorum	2020-02-16 16:04:59 -06:00
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	2b66ce93bb	nomad: ensure a unique ClusterID exists when leader (gh-6702) Enable any Server to lookup the unique ClusterID. If one has not been generated, and this node is the leader, generate a UUID and attempt to apply it through raft. The value is not yet used anywhere in this changeset, but is a prerequisite for gh-6701.	2020-01-31 19:03:26 -06:00
Mahmood Ali	d740d347ce	Migrate old alloc structs on read This commit ensures that Alloc.AllocatedResources is properly populated when read from persistence stores (namely Raft and client state store). The alloc struct may have been written previously by an arbitrary old version that may only populate Alloc.TaskResources.	2020-01-09 08:46:50 -05:00
Lang Martin	a95225d754	NodeDeregisterBatch -> NodeBatchDeregister match JobBatch pattern	2019-07-10 13:56:20 -04:00
Lang Martin	ce0f03651a	fsm support new NodeDeregisterBatchRequest	2019-07-10 13:56:20 -04:00
Lang Martin	a97407e030	fsm NodeDeregisterRequest is now a batch	2019-07-10 13:56:19 -04:00
Preetha Appan	aa2b4b4e00	Undo removal of node drain compat changes Decided to remove that in 0.10	2019-07-01 15:12:01 -05:00
Preetha Appan	10e7d6df6d	Remove compat code associated with many previous versions of nomad This removes compat code for namespaces (0.7), Drain(0.8) and other older features from releases older than Nomad 0.7	2019-06-25 19:05:25 -05:00
Preetha Appan	ad3c263d3f	Rename to match system scheduler config. Also added docs	2019-05-03 14:06:12 -05:00
Preetha Appan	6615d5c868	Add config to disable preemption for batch/service jobs	2019-04-29 18:48:07 -05:00
Preetha Appan	0f8a113ead	Refactor to find jobs with child instances more effeciently also added unit tests	2019-01-17 14:29:48 -06:00
Preetha Appan	81a8f18cac	Fix bug in reconcile summaries that affects periodic/parameterized jobs This fixes incorrect parent job summaries by recomputing them in the ReconcileJobSummaries method in the state store	2019-01-17 12:01:01 -06:00
Alex Dadgar	08dc2ea702	Merge pull request #4867 from hashicorp/b-deployment-progress-deadline Blocked evaluation fixes	2018-11-13 10:29:03 -08:00
Alex Dadgar	6d8bb3a7bd	Duplicate blocked evals cancelling improved The old logic for cancelling duplicate blocked evaluations by job id had the issue where the newer evaluation could have additional node classes that it is (in)eligible for that we would not capture. This could make it such that cluster state could change such that the job would make progress but no evaluation was unblocked.	2018-11-07 10:08:23 -08:00
Preetha Appan	32cc764072	Add fsm layer tests	2018-10-30 11:06:32 -05:00
Preetha Appan	7b8156fc47	Restore/Snapshot plus unit tests for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	c1c1c230e4	Make preemption config a struct to allow for enabling based on scheduler type	2018-10-30 11:06:32 -05:00
Preetha Appan	bd34cbb1f7	Support for new scheduler config API, first use case is to disable preemption	2018-10-30 11:06:32 -05:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Alex Dadgar	300b1a7a15	Tests only use testlog package logger	2018-06-13 15:40:56 -07:00
Preetha Appan	4134fcd2c7	Fix test setup for FSMSnapshotRestore_Deployments to use a valid job that exists	2018-05-31 14:39:39 -05:00
Alex Dadgar	352f2e03b5	Clean up leaked deployments on restoration This PR cancels deployments that are active but do not have a job associated with them. This is a broken invariant that causes issues in the deployment watcher since it will not track them. Thus they are objects that can't be operated on or cleaned up. Fixes https://github.com/hashicorp/nomad/issues/4286	2018-05-23 16:44:21 -07:00
Alex Dadgar	17aac1c9de	node heartbeat missed event	2018-05-22 14:05:46 -07:00
Alex Dadgar	5f2080bc26	Emit events based on eligibility	2018-05-22 14:04:59 -07:00
Alex Dadgar	a35248d1d8	Plumb event via FSM	2018-05-10 16:30:54 -07:00
Alex Dadgar	9617a13a2b	Correctly handle the upgrade path of a node being drained when applying Raft logs	2018-04-03 15:32:44 -07:00
Alex Dadgar	301704091b	Handle upgrade where Node doesn't have eligiblity This PR handles upgrading a node that has no scheduling eligiblity set.	2018-03-29 16:52:23 -07:00
Chelsea Holland Komlo	31557cc44f	move tests to use time.Time	2018-03-27 15:43:57 -04:00
Michael Schurter	cb61a4bdc7	Fix linting errors	2018-03-21 16:51:45 -07:00
Alex Dadgar	9d23c965da	fix comment	2018-03-21 16:51:45 -07:00
Alex Dadgar	2d91b9dfba	Batch drain update	2018-03-21 16:51:44 -07:00
Alex Dadgar	010a6b8ca5	Unblock evals once eligible	2018-03-21 16:51:44 -07:00
Alex Dadgar	0fba0101b6	RPC/FSM/State Store for Eligibility	2018-03-21 16:51:44 -07:00
Alex Dadgar	2f5309d82a	Remove update time	2018-03-21 16:51:43 -07:00
Alex Dadgar	0965c9ed28	Fix tests	2018-03-21 16:51:43 -07:00
Alex Dadgar	e459a666ed	Node.Drain takes strategy	2018-03-21 16:49:48 -07:00

1 2 3 4

159 commits