open-nomad

Author	SHA1	Message	Date
Mahmood Ali	69bb42acf8	tests: prefix agent logs to identify agent sources	2020-06-07 16:38:11 -04:00
Mahmood Ali	47a163b63f	reassert leadership	2020-06-07 15:47:06 -04:00
Mahmood Ali	9eb13ae144	basic snapshot restore	2020-06-07 15:46:23 -04:00
Mahmood Ali	bf7a3583e5	Merge pull request #8089 from hashicorp/b-leader-worker-count leadership: pause and unpause workers consistently	2020-06-04 12:01:01 -04:00
Mahmood Ali	cd8e1b4d62	stop periodic dispatch at end of tests (#8111 )	2020-06-04 09:15:00 -04:00
Lang Martin	ac7c39d3d3	Delayed evaluations for `stop_after_client_disconnect` can cause unwanted extra followup evaluations around job garbage collection (#8099 ) * client/heartbeatstop: reversed time condition for startup grace * scheduler/generic_sched: use `delayInstead` to avoid a loop Without protecting the loop that creates followUpEvals, a delayed eval is allowed to create an immediate subsequent delayed eval. For both `stop_after_client_disconnect` and the `reschedule` block, a delayed eval should always produce some immediate result (running or blocked) and then only after the outcome of that eval produce a second delayed eval. * scheduler/reconcile: lostLater are different than delayedReschedules Just slightly. `lostLater` allocs should be used to create batched evaluations, but `handleDelayedReschedules` assumes that the allocations are in the untainted set. When it creates the in-place updates to those allocations at the end, it causes the allocation to be treated as running over in the planner, which causes the initial `stop_after_client_disconnect` evaluation to be retried by the worker.	2020-06-03 09:48:38 -04:00
Mahmood Ali	70fbcb99c2	leadership: pause and unpause workers consistently This fixes a bug where leadership establishment pauses 3/4 of workers but stepping down unpause only 1/2!	2020-06-01 10:57:53 -04:00
Mahmood Ali	891fb3f8a9	test for paused workers upon leadership revocation	2020-06-01 10:48:42 -04:00
Mahmood Ali	de44d9641b	Merge pull request #8047 from hashicorp/f-snapshot-save API for atomic snapshot backups	2020-06-01 07:55:16 -04:00
Mahmood Ali	e37a3312d5	If leadership fails, consider it handled The callers for `forward` and old implementation expect failures to be accompanied with a true value! This fixes the issue and have tests passing!	2020-05-31 22:06:17 -04:00
Mahmood Ali	30ab9c84e5	more review feedback	2020-05-31 21:39:09 -04:00
Mahmood Ali	a73cd01a00	Merge pull request #8001 from hashicorp/f-jobs-list-across-nses endpoint to expose all jobs across all namespaces	2020-05-31 21:28:03 -04:00
Mahmood Ali	082c085068	Merge pull request #8036 from hashicorp/f-background-vault-revoke-on-restore Speed up leadership establishment	2020-05-31 21:27:16 -04:00
Mahmood Ali	1af32e65bc	clarify rpc consistency readiness comment	2020-05-31 21:26:41 -04:00
Mahmood Ali	0819ea60ea	Apply suggestions from code review Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2020-05-31 21:04:39 -04:00
Mahmood Ali	37c6160b96	Handle nil/empty cluster metadata Handle case where a snapshot is made before cluster metadata is created. This fixes a bug where a server may have empty cluster metadata if it created and installed a Raft snapshot before a new cluster metadata ID is generated. This case is very unlikely to arise. Most likely reason is when upgrading from an old version slowly where servers may use snapshots before all servers upgrade. This happened for a user with a log line like: ``` 2020-05-21T15:21:56.996Z [ERROR] nomad.fsm: ClusterSetMetadata failed: error=""set cluster metadata failed: refusing to set new cluster id, previous: , new: <<redacted> ```	2020-05-29 13:34:21 -04:00
Drew Bailey	23d24c7a7f	removes pro tags (#8014 )	2020-05-28 15:40:17 -04:00
Mahmood Ali	475b3b77ad	Merge pull request #8060 from hashicorp/tests-deflake-20200526 Deflake some tests - 2020-05-27 edition	2020-05-27 15:24:31 -04:00
Drew Bailey	34871f89be	Oss license support for ent builds (#8054 ) * changes necessary to support oss licesning shims revert nomad fmt changes update test to work with enterprise changes update tests to work with new ent enforcements make check update cas test to use scheduler algorithm back out preemption changes add comments * remove unused method	2020-05-27 13:46:52 -04:00
Mahmood Ali	61e4f5aaf9	tests: use GreaterOrEqual and apply change to other tests	2020-05-27 11:22:48 -04:00
Mahmood Ali	6dfe0f5d3b	tests: use t.Fatalf when it's clearer	2020-05-27 10:09:56 -04:00
Mahmood Ali	ec1fcedb93	tests: node drain events may be duplicated	2020-05-27 08:59:06 -04:00
Mahmood Ali	c3c2a85314	tests: wait until clients are in the state store	2020-05-26 18:53:24 -04:00
Mahmood Ali	5d80d2a511	tests: eval may be processed quickly	2020-05-26 18:53:24 -04:00
Mahmood Ali	19141f8103	{volume\|deployment}watcher: check for nil batcher	2020-05-26 14:54:27 -04:00
Mahmood Ali	81ac098a22	deploymentwatcher: no batcher when disabling When disabling deploymentwatcher (at the end of a test), avoid starting a new update batcher with its new goroutine.	2020-05-26 14:44:47 -04:00
Mahmood Ali	ccc89f940a	terminate leader goroutines on shutdown Ensure that nomad steps down (and terminate leader goroutines) on shutdown, when the server is the leader. Without this change, `monitorLeadership` may handle `shutdownCh` event and exit early before handling the raft `leaderCh` event and end up leaking leadership goroutines.	2020-05-26 10:18:10 -04:00
Mahmood Ali	e671913e56	fix a trace logline	2020-05-26 10:18:09 -04:00
Mahmood Ali	1c79c3b93d	refactor: context is first parameter By convention, go functions take `context.Context` as the first argument.	2020-05-26 10:18:09 -04:00
Mahmood Ali	1eff8b0ed8	volumewatcher: no batcher when disabling When disabling volumewatcher (at the end of a test), avoid starting a new update batcher with its new goroutine.	2020-05-26 10:18:09 -04:00
Mahmood Ali	b895cef622	always set purgeFunc purgeFunc cannot be nil, so ensure it's set to a no-op function in tests.	2020-05-21 21:05:53 -04:00
Mahmood Ali	2108681c1d	Endpoint for snapshotting server state	2020-05-21 20:04:38 -04:00
Mahmood Ali	fbe140b26c	vault: ensure ttl expired tokens are purge If a token is scheduled for revocation expires before we revoke it, ensure that it is marked as purged in raft and is only removed from local vault state if the purge operation succeeds. Prior to this change, we may remove the accessor from local state but not purge it from Raft. This causes unnecessary and churn in the next leadership elections (and until 0.11.2 result in indefinite retries).	2020-05-21 19:54:50 -04:00
Mahmood Ali	aa8e79e55b	Reorder leadership handling Start serving RPC immediately after leader components are enabled, and move clean up to the bottom as they don't block leadership responsibilities.	2020-05-21 08:30:31 -04:00
Mahmood Ali	1cf1114627	apply the same change to consul revocation	2020-05-21 08:30:31 -04:00
Mahmood Ali	1399d02f45	rate limit revokeDaemon	2020-05-21 08:30:31 -04:00
Mahmood Ali	6e749d12a0	on leadership establishment, revoke Vault tokens in background Establishing leadership should be very fast and never make external API calls. This fixes a situation where there is a long backlog of Vault tokens to be revoked on when leadership is gained. In such case, revoking the tokens will significantly slow down leadership establishment and slow down processing. Worse, the revocation call does not honor leadership `stopCh` signals, so it will not stop when the leader loses leadership.	2020-05-21 07:38:27 -04:00
Tim Gross	72430a4e62	csi: don't pass volume claim releases thru GC eval (#8021 ) Following the new volumewatcher in #7794 and performance improvements to it that landed afterwards, there's no particular reason we should be threading claim releases through the GC eval rather than writing an empty `CSIVolumeClaimRequest` with the mode set to `CSIVolumeClaimRelease`, just as the GC evaluation would do. Also, by batching up these raft messages, we can reduce the amount of raft writes by 1 and cross-server RPCs by 1 per volume we release claims on.	2020-05-20 15:22:51 -04:00
Tim Gross	3902709c0a	csi: check for empty arguments on CSI endpoint (#8027 ) Some of the CSI RPC endpoints were missing validation that the ID or the Volume definition was present. This could result in nonsense `CSIVolume` structs being written to raft during registration. This changeset corrects that bug and adds validation checks to present nicer error messages to operators in some other cases.	2020-05-20 10:22:24 -04:00
Charlie Voiselle	70303c906c	Simplify comments Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>	2020-05-19 15:05:24 -04:00
Charlie Voiselle	6976a7699e	Set Updated to true for all non-CAS requests	2020-05-19 12:59:39 -04:00
Mahmood Ali	406fce90c3	list all jobs on namespaces the token can access	2020-05-19 09:51:41 -04:00
Seth Hoenig	f6c8db8a8a	consul/connect: use task kind to get service name Fixes #8000 When requesting a Service Identity token from Consul, use the TaskKind of the Task to get at the service name associated with the task. In the past using the TaskName worked because it was generated as a sidecar task with a name that included the service. In the Native context, we need to get at the service name in a more correct way, i.e. using the TaskKind which is defined to include the service name.	2020-05-18 13:46:00 -06:00
Mahmood Ali	5ab2d52e27	endpoint to expose all jobs across all namespaces Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all namespaces. The returned list is to contain a `Namespace` field indicating the job namespace. If ACL is enabled, the request token needs to be a management token or have `namespace:list-jobs` capability on all existing namespaces.	2020-05-18 13:50:46 -04:00
Tim Gross	2082cf738a	csi: support for VolumeContext and VolumeParameters (#7957 ) The MVP for CSI in the 0.11.0 release of Nomad did not include support for opaque volume parameters or volume context. This changeset adds support for both. This also moves args for ControllerValidateCapabilities into a struct. The CSI plugin `ControllerValidateCapabilities` struct that we turn into a CSI RPC is accumulating arguments, so moving it into a request struct will reduce the churn of this internal API, make the plugin code more readable, and make this method consistent with the other plugin methods in that package.	2020-05-15 08:16:01 -04:00
Mahmood Ali	b385a5d063	Merge pull request #7959 from hashicorp/b-deleted-vault-accessors vault: ensure that token revocation is idempotent	2020-05-14 12:39:06 -04:00
Mahmood Ali	077342c528	vault: ensure that token revocation is idempotent This ensures that token revocation is idempotent and can handle when tokens are revoked out of band. Idempotency is important to handle some transient failures and retries. Consider when a single token of a batch fails to be revoked, nomad would retry revoking the entire batch; tokens already revoked should be gracefully handled, otherwise, nomad may retry revoking the same tokens forever.	2020-05-14 11:30:32 -04:00
Mahmood Ali	6ac166e1aa	vault: failing test for repeated revocation	2020-05-14 11:30:29 -04:00
Lang Martin	d3c4700cd3	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Mahmood Ali	3b4116e0db	Merge pull request #7894 from hashicorp/b-cronexpr-dst-fix Fix Daylight saving transition handling	2020-05-12 16:36:11 -04:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Mahmood Ali	938e916d9c	When serializing msgpack, only consider codec tag When serializing structs with msgpack, only consider type tags of `codec`. Hashicorp/go-msgpack (based on ugorji/go) defaults to interpretting `codec` tag if it's available, but falls to using `json` if `codec` isn't present. This behavior is surprising in cases where we want to serialize json differently from msgpack, e.g. serializing `ConsulExposeConfig`.	2020-05-11 14:14:10 -04:00
Mahmood Ali	b4fa8e9588	codec: we use hashicorp/go-msgpack exclusively No need to maintain two msgpack handles!	2020-05-11 14:05:29 -04:00
Tim Gross	6554e9ee37	csi: log fallthrough on invalid node IDs for client RPC (#7918 ) When a CSI client RPC is given a specific node for a controller but the lookup fails (because the node is gone or is an older version), we fallthrough to select a node from all those available. This adds logging to this case to aid in diagnostics.	2020-05-11 12:26:10 -04:00
Tim Gross	1ec41b6770	volumewatcher: stop watcher goroutines when there's no work (#7909 ) The watcher goroutines will be automatically started if a volume has updates, but when idle we shouldn't keep a goroutine running and taking up memory.	2020-05-11 09:32:05 -04:00
Mahmood Ali	061a439f2c	Merge pull request #7912 from hashicorp/f-scheduler-algorithm-followup Scheduler Algorithm Defaults handling and docs	2020-05-11 09:30:58 -04:00
Mahmood Ali	0384543d05	Merge pull request #7913 from hashicorp/deflake-TestTaskTemplateManager_BlockedEvents Deflake TestTaskTemplateManager_BlockedEvents test	2020-05-11 09:30:44 -04:00
Mahmood Ali	dff0fcf2f3	Merge pull request #7914 from hashicorp/b-csi-fix-slice-initialization Fix slice initialization	2020-05-11 09:27:01 -04:00
Tim Gross	3aa761b151	Periodic GC for volume claims (#7881 ) This changeset implements a periodic garbage collection of CSI volumes with missing allocations. This can happen in a scenario where a node update fails partially and the allocation updates are written to raft but the evaluations to GC the volumes are dropped. This feature will cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1 get any stray claims cleaned up.	2020-05-11 08:20:50 -04:00
James Rasell	aaf2fe033e	Merge pull request #7903 from hashicorp/b-gh-7902 api: validate scale count value is not negative.	2020-05-11 09:17:01 +02:00
Mahmood Ali	9fac6ea5d9	Fix slice initialization	2020-05-09 21:35:42 -04:00
Mahmood Ali	64de395df0	tests: ease debugging TestClientEndpoint_CreateNodeEvals TestClientEndpoint_CreateNodeEvals flakes a bit but its output is very confusing, as `structs.Evaluations` overrides GoString. Here, we emit the entire struct of the evaluation, and hopefully we'll figure out the problem the next time it happens	2020-05-09 16:04:32 -04:00
Mahmood Ali	ff5c3e81b0	avoid logging after a test completes	2020-05-09 14:40:00 -04:00
Mahmood Ali	2c963885b0	handle upgrade path and defaults Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on upgrades or on API handling when user misses the value. The scheduler already treats `""` value as binpack. This PR merely ensures that the operator API returns the effective value.	2020-05-09 12:34:08 -04:00
Tim Gross	8373e917fc	volumewatcher: set maximum batch size for raft update (#7907 ) The `volumewatcher` has a 250ms batch window so claim updates will not typically be large enough to risk exceeding the maximum raft message size. But large jobs might have enough volume claims that this could be a danger. Set a maximum batch size of 100 messages per batch (roughly 33K), as a very conservative safety/robustness guard. Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>	2020-05-08 16:53:57 -04:00
James Rasell	55a2ad3854	api: validate scale count value is not negative. An operator could submit a scale request including a negative count value. This negative value caused the Nomad server to panic. The fix adds validation to the submitted count, returning an error to the caller if it is negative.	2020-05-08 16:51:40 +02:00
Mahmood Ali	57435950d7	Update current DST and some code style issues	2020-05-07 19:27:05 -04:00
Mahmood Ali	c8fb132956	Update cronexpr to point to hashicorp/cronexpr	2020-05-07 17:50:45 -04:00
Mahmood Ali	507c0b8f64	tests for periodic job scheduling and DST	2020-05-07 17:36:59 -04:00
Tim Gross	42f9d517d8	CSI volumewatcher testability improvments (#7889 ) * volumewatcher: remove redundant log fields The constructor for `volumeWatcher` already sets a `logger.With` that includes the volume ID and namespace fields. Remove them from the various trace logs. * volumewatcher: advance state for controller already released One way of bypassing client RPCs in testing is to set a claim status to controller-detached, but this results in an incorrect claim state when we checkpoint.	2020-05-07 15:57:24 -04:00
Tim Gross	801ebcfe8d	periodic GC for CSI plugins (#7878 ) This changeset implements a periodic garbage collection of unused CSI plugins. Plugins are self-cleaning when the last allocation for a plugin is stopped, but this feature will cover any missing edge cases and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins cleaned up.	2020-05-06 16:49:12 -04:00
Tim Gross	00c9bd7ff0	reorder volume claim batch request raft message (#7871 ) For backwards compatibility during upgrades, new raft message types need to come at the end of the enum.	2020-05-06 08:57:51 -04:00
Tim Gross	ce86a594a6	csi: fix plugin counts on node update (#7844 ) In this changeset: * If a Nomad client node is running both a controller and a node plugin (which is a common case), then if only the controller or the node is removed, the plugin was not being updated with the correct counts. * The existing test for plugin cleanup didn't go back to the state store, which normally is ok but is complicated in this case by denormalization which changes the behavior. This commit makes the test more comprehensive. * Set "controller required" when plugin has `PUBLISH_READONLY`. All known controllers that support `PUBLISH_READONLY` also support `PUBLISH_UNPUBLISH_VOLUME` but we shouldn't assume this. * Only create plugins when the allocs for those plugins are healthy. If we allow a plugin to be created for the first time when the alloc is not healthy, then we'll recreate deleted plugins when the job's allocs all get marked terminal. * Terminal plugin alloc updates should cleanup the plugin. The client fingerprint can't tell if the plugin is unhealthy intentionally (for the case of updates or job stop). Allocations that are server-terminal should delete themselves from the plugin and trigger a plugin self-GC, the same as an unused node.	2020-05-05 15:39:57 -04:00
Tim Gross	22e3815e8c	docstring improvements and typo fixes (#7862 )	2020-05-05 10:30:50 -04:00
Tim Gross	1c6dcab56b	volumewatcher: remove spurious nil-check (#7858 ) The nil-check here is left-over from an earlier approach that didn't get merged. It doesn't do anything for us now as we can't ever pass it `nil` and if we leave it in the `getVolume` call it guards will panic anyways.	2020-05-04 12:28:32 -04:00
Mahmood Ali	78ae7b885a	Merge pull request #7810 from hashicorp/spread-configuration spread scheduling algorithm	2020-05-01 13:15:19 -04:00
Mahmood Ali	3da74068dd	changelog and fix typo	2020-05-01 13:14:20 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	d8e5e02398	Wiring algorithm to scheduler calls	2020-05-01 13:13:29 -04:00
Charlie Voiselle	663fb677cf	Add SchedulerAlgorithm to SchedulerConfig	2020-05-01 13:13:29 -04:00
Lang Martin	28bac139cb	client/heartbeatstop: destroy allocs when disconnected from servers - track lastHeartbeat, the client local time of the last successful heartbeat round trip - track allocations with `stop_after_client_disconnect` configured - trigger allocation destroy (which handles cleanup) - restore heartbeat/killable allocs tracking when allocs are recovered from disk - on client restart, stop those allocs after a grace period if the servers are still partioned	2020-05-01 12:35:49 -04:00
Michael Schurter	c901d0e7dd	Merge branch 'master' into b-reserved-scoring	2020-04-30 14:48:14 -07:00
Tim Gross	52e805a6a6	csi: ensure Read/WriteAllocs aren't released early (#7841 ) We should only remove the `ReadAllocs`/`WriteAllocs` values for a volume after the claim has entered the "ready to free" state. The volume will eventually be released as expected. But querying the volume API will show the volume is released before the controller unpublish has finished and this can cause a race with starting new jobs. Test updates are to cover cases where we're dropping claims but not running through the whole reaping process.	2020-04-30 17:11:31 -04:00
Tim Gross	a7a64443e1	csi: move volume claim release into volumewatcher (#7794 ) This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The `Watcher` performs a blocking query on updates to the `CSIVolumes` table and triggers reaping of volume claims. This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers The volume watcher is enabled on leader step-up and disabled on leader step-down. The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.	2020-04-30 09:13:00 -04:00
Tim Gross	e34f099d20	csi: read-repair CSI volume claims (#7824 ) The `CSIVolumeClaim` fields were added after 0.11.1, so claims made before that may be missing the value. Repair this when we read the volume out of the state store. The `NodeID` field was added after 0.11.0, so we need to ensure it's been populated during upgrades from 0.11.0.	2020-04-29 11:57:19 -04:00
Mahmood Ali	18f16cfb12	Merge pull request #7818 from greut/codegen structs: give codecgen import	2020-04-28 12:16:41 -04:00
Chris Baker	315bcf1060	Merge pull request #7816 from hashicorp/b-7789-job-scaling-status-issues fix issues in Job.ScaleStatus	2020-04-28 06:33:42 -05:00
Yoan Blanc	5ca31f23e5	structs: give codecgen import Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-04-28 08:23:20 +02:00
Nick Ethier	4b810b697a	nomad: build dynamic port for exposed checks if not specified (#7800 )	2020-04-28 00:07:41 -04:00
Chris Baker	73f1390316	modified Job.ScaleStatus to ignore deployments and look directly at the allocations, ignoring canaries	2020-04-27 21:45:39 +00:00
Tim Gross	083b35d651	csi: checkpoint volume claim garbage collection (#7782 ) Adds a `CSIVolumeClaim` type to be tracked as current and past claims on a volume. Allows for a client RPC failure during node or controller detachment without having to keep the allocation around after the first garbage collection eval. This changeset lays groundwork for moving the actual detachment RPCs into a volume watching loop outside the GC eval.	2020-04-23 11:06:23 -04:00
Chris Baker	09d980be2b	modify state store so that autoscaling policies are deleted from their table as job is stopped (and recreated when job is started)	2020-04-21 23:01:26 +00:00
Tim Gross	bd74b593d0	csi: nil-check allocs for VolumeDenormalize and claim methods (#7760 )	2020-04-21 08:32:24 -04:00
Michael Dwan	ba70c54340	fix panic while deleting CSI plugins for missing job (#7758 )	2020-04-20 17:13:33 -04:00
Seth Hoenig	40e0f8a346	Merge pull request #7690 from hashicorp/b-inspect-proxy-output two fixes for inspect on connect proxy	2020-04-20 10:17:54 -06:00
Anthony Scalisi	9664c6b270	fix spelling errors (#6985 )	2020-04-20 09:28:19 -04:00
Michael Schurter	4c5a0cae35	core: fix node reservation scoring The BinPackIter accounted for node reservations twice when scoring nodes which could bias scores toward nodes with reservations. Pseudo-code for previous algorithm: ``` proposed = reservedResources + sum(allocsResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are added to the total resources used by allocations, and then the node's reserved resources are later substracted from the node's overall resources. The new algorithm is: ``` proposed = sum(allocResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are no longer added to the total resources used by allocations. My guess as to how this bug happened is that the resource utilization variable (`util`) is calculated and returned by the `AllocsFit` function which needs to take reserved resources into account as a basic feasibility check. To avoid re-calculating alloc resource usage (because there may be a large number of allocs), we reused `util` in the `ScoreFit` function. `ScoreFit` properly accounts for reserved resources by subtracting them from the node's overall resources. However since `util` _also_ took reserved resources into account the score would be incorrect. Prior to the fix the added test output: ``` Node: reserved Score: 1.0000 Node: reserved2 Score: 1.0000 Node: no-reserved Score: 0.9741 ``` The scores being 1.0 for both nodes with reserved resources is a good hint something is wrong as they should receive different scores. Upon further inspection the double accounting of reserved resources caused their scores to be >1.0 and clamped. After the fix the added test outputs: ``` Node: no-reserved Score: 0.9741 Node: reserved Score: 0.9480 Node: reserved2 Score: 0.8717 ```	2020-04-15 15:13:30 -07:00
Seth Hoenig	d5ad580d5c	structs: fix compatibility between api and nomad/structs proxy definitions The field names within the structs representing the Connect proxy definition were not the same (nomad/structs/ vs api/), causing the values to be lost in translation for the 'nomad job inspect' command. Since the field names already shipped in v0.11.0 we cannot simply fix the names. Instead, use the json struct tag on the structs/ structs to remap the name to match the publicly expose api/ package on json encoding. This means existing jobs from v0.11.0 will continue to work, and the JSON API for job submission will remain backwards compatible.	2020-04-13 15:59:45 -06:00
Tim Gross	4e9bd1e1d1	refactor: consolidate private methods for CSI RPC (#7702 ) Follow-up for a method missed in the refactor for #7688. The `volAndPluginLookup` method is only ever called from the server's `CSI` RPC and never the `ClientCSI` RPC, so move it into that scope.	2020-04-13 10:46:43 -04:00
Tim Gross	f37e986b1b	refactor: make nodeForControllerPlugin private to ClientCSI (#7688 ) The current design of `ClientCSI` RPC requires that callers in the server know about the free-standing `nodeForControllerPlugin` function. This makes it difficult to send `ClientCSI` RPC messages from subpackages of `nomad` and adds a bunch of boilerplate to every server-side caller of a controller RPC. This changeset makes it so that the `ClientCSI` RPCs will populate and validate the controller's client node ID if it hasn't been passed by the caller, centralizing the logic of picking and validating controller targets into the `nomad.ClientCSI` struct.	2020-04-10 16:47:21 -04:00

1 2 3 4 5 ...

3376 commits