open-nomad

Commit Graph

Author	SHA1	Message	Date
Juana De La Cuesta	21b675244e	style: rename ForceRun to ForceEval, for clarity (#16617 )	2023-03-27 15:38:48 +02:00
Tim Gross	3c0eaba9db	remove backcompat support for non-atomic job registration (#16305 ) In Nomad 0.12.1 we introduced atomic job registration/deregistration, where the new eval was written in the same raft entry. Backwards-compatibility checks were supposed to have been removed in Nomad 1.1.0, but we missed that. This is long safe to remove.	2023-03-03 15:52:22 -05:00
Tim Gross	0e1b554299	handle `FSM.Apply` errors in `raftApply` (#16287 ) The signature of the `raftApply` function requires that the caller unwrap the first returned value (the response from `FSM.Apply`) to see if it's an error. This puts the burden on the caller to remember to check two different places for errors, and we've done so inconsistently. Update `raftApply` to do the unwrapping for us and return any `FSM.Apply` error as the error value. Similar work was done in Consul in https://github.com/hashicorp/consul/pull/9991. This eliminates some boilerplate and surfaces a few minor bugs in the process: * job deregistrations of already-GC'd jobs were still emitting evals * reconcile job summaries does not return scheduler errors * node updates did not report errors associated with inconsistent service discovery or CSI plugin states Note that although _most_ of the `FSM.Apply` functions return only errors (which makes it tempting to remove the first return value entirely), there are few that return `bool` for some reason and Variables relies on the response value for proper CAS checking.	2023-03-02 13:51:09 -05:00
Tim Gross	3c78980b78	make version checks specific to region (1.4.x) (#14912 ) * One-time tokens are not replicated between regions, so we don't want to enforce that the version check across all of serf, just members in the same region. * Scheduler: Disconnected clients handling is specific to a single region, so we don't want to enforce that the version check across all of serf, just members in the same region. * Variables: enforce version check in Apply RPC * Cleans up a bunch of legacy checks. This changeset is specific to 1.4.x and the changes for previous versions of Nomad will be manually backported in a separate PR.	2022-10-17 16:23:51 -04:00
Mahmood Ali	b0e048bfa4	periodic: always reset periodic children status Fixes a bug where Nomad reports negative or incorrect running children counts for periodic jobs. The periodic dispatcher derives a child job without reseting the status. If the periodic job has a `running` status, the derived job will start as `running` status and transition to `pending`. Since this is unexpected transition, the counting in StateStore.setJobSummary gets out of sync and result in negative/incorrect values. Note that this only affects periodic jobs after a leader transition. During the first job registration, the job is added with `pending` or `""` status. However, after a leader transition, the new leader repopulates the dispatcher heap with `"running"` status and triggers the bug.	2021-03-25 11:27:09 -04:00
Mahmood Ali	fbfe4ab1bd	Atomic eval insertion with job (de-)registration This fixes a bug where jobs may get "stuck" unprocessed that dispropotionately affect periodic jobs around leadership transitions. When registering a job, the job registration and the eval to process it get applied to raft as two separate transactions; if the job registration succeeds but eval application fails, the job may remain unprocessed. Operators may detect such failure, when submitting a job update and get a 500 error code, and they could retry; periodic jobs failures are more likely to go unnoticed, and no further periodic invocations will be processed until an operator force evaluation. This fixes the issue by ensuring that the job registration and eval application get persisted and processed atomically in the same raft log entry. Also, applies the same change to ensure atomicity in job deregistration. Backward Compatibility We must maintain compatibility in two scenarios: mixed clusters where a leader can handle atomic updates but followers cannot, and a recent cluster processes old log entries from legacy or mixed cluster mode. To handle this constraints: ensure that the leader continue to emit the Evaluation log entry until all servers have upgraded; also, when processing raft logs, the servers honor evaluations found in both spots, the Eval in job (de-)registration and the eval update entries. When an updated server sees mix-mode behavior where an eval is inserted into the raft log twice, it ignores the second instance. I made one compromise in consistency in the mixed-mode scenario: servers may disagree on the eval.CreateIndex value: the leader and updated servers will report the job registration index while old servers will report the index of the eval update log entry. This discripency doesn't seem to be material - it's the eval.JobModifyIndex that matters.	2020-07-14 11:59:29 -04:00
Jasmine Dahilig	8d980edd2e	add create and modify timestamps to evaluations (#5881 )	2019-08-07 09:50:35 -07:00
Mahmood Ali	1f2473263e	fix more cases of logging arity errors	2019-01-09 09:22:47 -05:00
Michael Schurter	80263861aa	test: fix race around updateCh handling PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and PeriodicDispatch.run accesses updateCh in another. The race can be prevented by having SetEnabled pass updateCh to run. Race detector output from `go test -race -run TestServer_RPC` in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c0001d3f48 by goroutine 75: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468 +0x256 github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724 +0x267 github.com/hashicorp/nomad/nomad.(Server).leaderLoop.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131 +0x3c github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163 +0x4dd github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0001d3f48 by goroutine 515: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).run() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338 +0x177 Goroutine 75 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 515 (running) created at: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176 +0x1bc github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231 +0x582 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Alex Dadgar	d0f237086b	UX touchups	2018-04-26 15:24:27 -07:00
Chelsea Holland Komlo	fca0169dbc	handle potential panic in cron parsing	2018-04-26 16:57:45 -04:00
Josh Soref	173ce63fe9	spelling: transition	2018-03-11 19:06:05 +00:00
Alex Dadgar	86608124ca	Fix followers not creating periodic launch Fix an issue in which periodic launches wouldn't be made on followers.	2017-12-11 13:55:17 -08:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Alex Dadgar	e5ec915ac3	sync	2017-09-19 10:08:23 -05:00
Alex Dadgar	e3dbcdcb44	Fix restoration of stopped periodic jobs This PR fixes an issue in which we would add a stopped periodic job to the periodic launcher.	2017-09-12 14:25:40 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	2284e59b57	Fix double close and cleanup code	2017-08-03 13:40:34 -07:00
Alex Dadgar	146f3f5cb2	Don't restore parameterized periodic jobs	2017-08-03 12:37:58 -07:00
Alex Dadgar	2471b86dec	Show submit time	2017-07-07 12:07:07 -07:00
Alex Dadgar	c58494fdb6	Handle periodic paramaterized jobs Fixes https://github.com/hashicorp/nomad/issues/2382	2017-03-01 11:45:20 -08:00
Alex Dadgar	7e918003ba	Allow specification of timezones	2017-02-15 14:37:06 -08:00
Alex Dadgar	b69b357c7f	Nomad builds	2017-02-07 20:31:23 -08:00
Alex Dadgar	5f3e27ecd8	Fix case in periodic dispatch and blocked evals where lock was not released	2016-06-03 13:46:57 -07:00
Alex Dadgar	273dfaf2c7	Periodic jobs always are evaluated in UTC TZ	2016-04-12 09:47:25 -07:00
Alex Dadgar	47390c5186	remove the GC field on the job and use the job type	2016-03-23 18:02:01 -07:00
Alex Dadgar	260b50c2b3	Mark evals from periodic as triggered by a periodic job	2016-01-21 14:21:58 -08:00
Alex Dadgar	80dd30b03d	Add force spawn endpoint	2016-01-13 10:19:53 -08:00
Alex Dadgar	877630e7d1	Debug log for skipping overlapping periodic jobs	2016-01-07 20:12:07 -08:00
Alex Dadgar	f843e95cbf	Check parent id of prefix jobs and special case the output if no child job has been launched	2016-01-07 14:43:55 -08:00
Alex Dadgar	ce5a7b73ed	periodic status	2016-01-07 14:25:17 -08:00
Alex Dadgar	19faa4bb00	Remove parent index	2016-01-07 12:54:41 -08:00
Alex Dadgar	f289bdc76b	Remove debug message	2016-01-07 11:23:44 -08:00
Alex Dadgar	24fd4a8c27	Add ProhibitOverlap option to PeriodicConfig	2016-01-07 11:19:46 -08:00
Alex Dadgar	e87f3e6ca7	Simplify periodic nextLaunch, dispatch and run	2015-12-23 18:54:51 -08:00
Alex Dadgar	bf2aa9f733	Always remove periodic jobs in fsm	2015-12-23 18:26:39 -08:00
Alex Dadgar	e6f9a5bbb3	fix vet	2015-12-23 18:26:39 -08:00
Alex Dadgar	e3231171b8	Fix deadlock and test	2015-12-23 18:26:39 -08:00
Alex Dadgar	6bc0737970	Unix timestamps not UnixNano	2015-12-23 18:26:39 -08:00
Alex Dadgar	b3e87b6719	Remove the periodicRunner interface and pass the server as an interface to the periodicDispatcher	2015-12-23 18:26:39 -08:00
Alex Dadgar	a60783a4ca	Simplify run function and add nextLaunch test	2015-12-23 18:26:39 -08:00
Alex Dadgar	49dd0dc461	fixes from review	2015-12-23 18:26:39 -08:00
Alex Dadgar	642219ba5d	Race condition fixed	2015-12-23 18:26:39 -08:00
Alex Dadgar	ca65daf4c0	move created evals to the test package	2015-12-23 18:26:39 -08:00
Alex Dadgar	ea799b88cb	merge	2015-12-23 18:26:39 -08:00
Alex Dadgar	610cfe4b34	Small fixes and test fixes	2015-12-23 18:26:39 -08:00
Alex Dadgar	f6769c3d96	Leader election restore, add structs to api jobs	2015-12-23 18:26:39 -08:00
Alex Dadgar	670cc50a02	merge	2015-12-23 18:26:39 -08:00

1 2

51 Commits