open-nomad

Author	SHA1	Message	Date
Lars Lehtonen	2638cbb31d	nomad: TestEvalBroker_EnqueueAll_Dequeue_Fair() proper goroutine error handling (#6636 ) nomad: TestEvalBroker_EnqueueAll_Dequeue_Fair() improve test readability	2019-11-07 10:39:29 -05:00
Drew Bailey	a5e2e1805f	return after request has been forwarded	2019-11-07 08:33:53 -05:00
Lars Lehtonen	e64f98837c	nomad: fix dropped error in TestJobEndpoint_Deregister_ACL (#6602 )	2019-11-06 16:40:45 -05:00
Drew Bailey	f4a7e3dc75	coordinate closing of doneCh, use interface to simplify callers comments	2019-11-05 11:44:26 -05:00
Drew Bailey	fe542680dc	log-json -> json fix typo command/agent/monitor/monitor.go Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com> Update command/agent/monitor/monitor.go Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com> address feedback, lock to prevent send on closed channel fix lock/unlock for dropped messages	2019-11-05 09:51:59 -05:00
Drew Bailey	298b8358a9	move forwarded monitor request into helper	2019-11-05 09:51:56 -05:00
Drew Bailey	8726b685de	address feedback	2019-11-05 09:51:56 -05:00
Drew Bailey	0e759c401c	moving endpoints over to frames	2019-11-05 09:51:54 -05:00
Drew Bailey	17d876d5ef	rename function, initialize log level better underscores instead of dashes for query params	2019-11-05 09:51:53 -05:00
Drew Bailey	8178beecf0	address feedback, use agent_endpoint instead of monitor	2019-11-05 09:51:53 -05:00
Drew Bailey	db65b1f4a5	agent:read acl policy for monitor	2019-11-05 09:51:52 -05:00
Drew Bailey	2533617888	rpc acl tests for both monitor endpoints	2019-11-05 09:51:51 -05:00
Drew Bailey	3c33747e1f	client monitor endpoint tests	2019-11-05 09:51:50 -05:00
Drew Bailey	4bc68855d0	use intercepting loggers for rpchandlers	2019-11-05 09:51:50 -05:00
Drew Bailey	3b9c33a5f0	new hclog with standardlogger intercept	2019-11-05 09:51:49 -05:00
Drew Bailey	a45ae1cd58	enable json formatting, use queryoptions	2019-11-05 09:51:49 -05:00
Drew Bailey	786989dbe3	New monitor pkg for shared monitor functionality Adds new package that can be used by client and server RPC endpoints to facilitate monitoring based off of a logger clean up old code small comment about write rm old comment about minsize rename to Monitor Removes connection logic from monitor command Keep connection logic in endpoints, use a channel to send results from monitoring use new multisink logger and interfaces small test for dropped messages update go-hclogger and update sink/intercept logger interfaces	2019-11-05 09:51:49 -05:00
Lars Lehtonen	0a4542fadc	nomad: fix test goroutine (#6593 )	2019-10-31 08:23:32 -04:00
Seth Hoenig	98592113a3	Merge pull request #6582 from hashicorp/b-vault-createToken-log-msg nomad: fix vault.CreateToken log message printing wrong error	2019-10-29 17:35:05 -05:00
Mahmood Ali	7f2e4dc5d8	Merge pull request #6574 from hashicorp/b-gh-6570-vault-role-validation vault: honor new `token_period` in vault token role	2019-10-29 10:18:59 -04:00
Seth Hoenig	838c6e3329	nomad: fix vault.CreateToken log message printing wrong error Fixes typo in word "failed". Fixes bug where incorrect error is printed. The old code would only ever print a nil error, instead of the validationErr which is being created.	2019-10-28 23:05:32 -05:00
Mahmood Ali	c5d8d66787	Fix admissionValidators `admissionValidators` doesn't aggregate errors correctly, as it aggregates errors in `errs` reference yet it always returns the nil `err`. Here, we avoid shadowing `err`, and move variable declarations to where they are used.	2019-10-28 10:52:53 -04:00
Mahmood Ali	abb930249a	consul connect: do basic validation before mutating job `groupConnectHook` assumes that Networks is a non-empty slice, but TG hasn't been validated yet and validation may depend on mutation results. As such, we do basic check here before dereferencing network slice elements.	2019-10-28 10:49:02 -04:00
Mahmood Ali	bb45a7a776	add tests for consul connect validation	2019-10-28 10:41:51 -04:00
Mahmood Ali	4c64658397	vault: Support new role field `token_role` Vault 1.2.0 deprecated `period` field in favor of `token_period` in auth role: > * Token store roles use new, common token fields for the values > that overlap with other auth backends. `period`, `explicit_max_ttl`, and > `bound_cidrs` will continue to work, with priority being given to the > `token_` prefixed versions of those parameters. They will also be returned > when doing a read on the role if they were used to provide values initially; > however, in Vault 1.4 if `period` or `explicit_max_ttl` is zero they will no > longer be returned. (`explicit_max_ttl` was already not returned if empty.) https://github.com/hashicorp/vault/blob/master/CHANGELOG.md#120-july-30th-2019	2019-10-28 09:33:26 -04:00
Seth Hoenig	8b03477f46	Merge pull request #6448 from hashicorp/f-set-connect-sidecar-tags connect: enable setting tags on consul connect sidecar service in job…	2019-10-17 15:14:09 -05:00
Seth Hoenig	039fbd3f3b	connect: enable setting tags on consul connect sidecar service in jobspec (#6415 )	2019-10-17 19:25:20 +00:00
Mahmood Ali	4e4a9b252c	Merge pull request #6290 from hashicorp/r-generated-code-refactor dev: avoid codecgen code in downstream projects	2019-10-15 08:22:31 -04:00
Danielle	fee482ae6c	Merge pull request #6331 from hashicorp/dani/f-volume-mount-propagation volumes: Add support for mount propagation	2019-10-14 14:29:40 +02:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Mahmood Ali	4b2ba62e35	acl: check ACL against object namespace Fix a bug where a millicious user can access or manipulate an alloc in a namespace they don't have access to. The allocation endpoints perform ACL checks against the request namespace, not the allocation namespace, and performs the allocation lookup independently from namespaces. Here, we check that the requested can access the alloc namespace regardless of the declared request namespace. Ideally, we'd enforce that the declared request namespace matches the actual allocation namespace. Unfortunately, we haven't documented alloc endpoints as namespaced functions; we suspect starting to enforce this will be very disruptive and inappropriate for a nomad point release. As such, we maintain current behavior that doesn't require passing the proper namespace in request. A future major release may start enforcing checking declared namespace.	2019-10-08 12:59:22 -04:00
Mahmood Ali	674a457865	use RequestNamespace(), the canonical way to get namespace	2019-09-27 07:40:58 -04:00
Mahmood Ali	e29ee4c400	nomad: defensive check for namespaces in job registration call In a job registration request, ensure that the request namespace "header" and job namespace field match. This should be the case already in prod, as http handlers ensures that the values match [1]. This mitigates bugs that exploit bugs where we may check a value but act on another, resulting into bypassing ACL system. [1] https://github.com/hashicorp/nomad/blob/v0.9.5/command/agent/job_endpoint.go#L415-L418	2019-09-26 17:02:47 -04:00
Lang Martin	fb41dd86ba	default raft protocol v2	2019-09-24 14:37:55 -04:00
Lang Martin	31d7f116dd	nomad/server comments	2019-09-24 14:36:18 -04:00
Tim Gross	cd9c23617f	client/connect: ConsulProxy LocalServicePort/Address (#6358 ) Without a `LocalServicePort`, Connect services will try to use the mapped port even when delivering traffic locally. A user can override this behavior by pinning the port value in the `service` stanza but this prevents us from using the Consul service name to reach the service. This commits configures the Consul proxy with its `LocalServicePort` and `LocalServiceAddress` fields.	2019-09-23 14:30:48 -04:00
Danielle Lancashire	78b61de45f	config: Hoist volume.config.source into volume Currently, using a Volume in a job uses the following configuration: ``` volume "alias-name" { type = "volume-type" read_only = true config { source = "host_volume_name" } } ``` This commit migrates to the following: ``` volume "alias-name" { type = "volume-type" source = "host_volume_name" read_only = true } ``` The original design was based due to being uncertain about the future of storage plugins, and to allow maxium flexibility. However, this causes a few issues, namely: - We frequently need to parse this configuration during submission, scheduling, and mounting - It complicates the configuration from and end users perspective - It complicates the ability to do validation As we understand the problem space of CSI a little more, it has become clear that we won't need the `source` to be in config, as it will be used in the majority of cases: - Host Volumes: Always need a source - Preallocated CSI Volumes: Always needs a source from a volume or claim name - Dynamic Persistent CSI Volumes: Always needs a source to attach the volumes to for managing upgrades and to avoid dangling. - Dynamic Ephemeral CSI Volumes: Less thought out, but `source` will probably point to the plugin name, and a `config` block will allow you to pass meta to the plugin. Or will point to a pre-configured ephemeral config. *If implemented The new design simplifies this by merging the source into the volume stanza to solve the above issues with usability, performance, and error handling.	2019-09-13 04:37:59 +02:00
Mahmood Ali	4b8280e51d	remove generated code	2019-09-06 19:24:15 +00:00
Nomad Release bot	dc7d728a82	Generate files for 0.10.0-beta1 release	2019-09-06 18:47:09 +00:00
Mahmood Ali	01f42053e4	dev: avoid codecgen code in downstream projects This is an attempt to ease dependency management for external driver plugins, by avoiding requiring them to compile ugorji/go generated files. Plugin developers reported some pain with the brittleness of ugorji/go dependency in particular, specially when using go mod, the default go mod manager in golang 1.13. Context -------- Nomad uses msgpack to persist and serialize internal structs, using ugorji/go library. As an optimization, we use ugorji/go code generation to speedup process and aovid the relection-based slow path. We commit these generated files in repository when we cut and tag the release to ease reproducability and debugging old releases. Thus, downstream projects that depend on release tag, indirectly depends on ugorji/go generated code. Sadly, the generated code is brittle and specific to the version of ugorji/go being used. When go mod picks another version of ugorji/go then nomad (go mod by default uses release according to semver), downstream projects face compilation errors. Interestingly, downstream projects don't commonly serialize nomad internal structs. Drivers and device plugins use grpc instead of msgpack for the most part. In the few cases where they use msgpag (e.g. decoding task config), they do without codegen path as they run on driver specific structs not the nomad internal structs. Also, the ugorji/go serialization through reflection is generally backward compatible (mod some ugorji/go regression bugs that get introduced every now and then :( ). Proposal --------- The proposal here is to keep committing ugorji/go codec generated files for releases but to use a go tag for them. All nomad development through the makefile, including releasing, CI and dev flow, has the tag enabled. Downstream plugin projects, by default, will skip these files and life proceed as normal for them. The downside is that nomad developers who use generated code but avoid using make must start passing additional go tag argument. Though this is not a blessed configuration.	2019-09-06 09:22:00 -04:00
Mahmood Ali	6d73ca0cfb	Merge pull request #6250 from hashicorp/f-raft-protocol-v3 Update default raft protocol to version 3	2019-09-04 09:34:41 -04:00
Mahmood Ali	c94a5ef1f8	tests: give up on TestAutopilot_CleanupStaleRaftServer for now	2019-09-04 09:10:53 -04:00
Nick Ethier	6a90a9f505	structs: canonicalize tg Services and Networks (#6257 )	2019-09-04 08:55:47 -04:00
Mahmood Ali	6cefd8f97e	tests: attempt to fix TestAutopilot_CleanupStaleRaftServer Also add a utility function for waiting for stable leadership	2019-09-04 08:49:33 -04:00
Mahmood Ali	035a7a94d9	tests: update time sensitive tests Fix tests whose messages seem timing dependent.	2019-09-04 08:45:25 -04:00
Mahmood Ali	0beb757b6f	tests: disable server auto join by default Tests typically call join cluster directly rather than rely on consul discovery. Worse, consul discovery seems to cause additional leadership transitions when a server is shutdown in tests than tests expect.	2019-09-04 07:54:54 -04:00
Mahmood Ali	3e2ab6e2a3	address review feedback	2019-09-03 21:44:39 -04:00
Mahmood Ali	0a6d73020c	use current nomad version in testing	2019-09-03 21:42:41 -04:00
Mahmood Ali	9bd56587cd	Fix raft tests Wait until leadership stabalizes and all non-voters get promoted before killing leader	2019-09-03 14:53:29 -04:00
Michael Schurter	5957030d18	connect: add unix socket to proxy grpc for envoy (#6232 ) * connect: add unix socket to proxy grpc for envoy Fixes #6124 Implement a L4 proxy from a unix socket inside a network namespace to Consul's gRPC endpoint on the host. This allows Envoy to connect to Consul's xDS configuration API. * connect: pointer receiver on structs with mutexes * connect: warn on all proxy errors	2019-09-03 08:43:38 -07:00
Buck Doyle	21ec6a237c	Merge branch 'master' into f-policy-json # Conflicts: # CHANGELOG.md	2019-09-03 09:56:25 -05:00
Jasmine Dahilig	4edebe389a	add default update stanza and max_parallel=0 disables deployments (#6191 )	2019-09-02 10:30:09 -07:00
Buck Doyle	ab96785fc9	Change test to use valid HCL for rules	2019-08-29 16:09:02 -05:00
Buck Doyle	4a159f5dcf	Change parsing error to set rules to nil	2019-08-29 15:50:34 -05:00
Buck Doyle	5495a7e689	Add standard error-handling for parse failure	2019-08-29 11:12:02 -05:00
Buck Doyle	8b06712d21	Merge branch 'master' into f-policy-json	2019-08-29 11:11:21 -05:00
Mahmood Ali	3da10b5cb3	scheduler: tests for multiple drivers in TG	2019-08-29 09:03:31 -04:00
Mahmood Ali	a67f5f0565	update tests to run with v2	2019-08-28 16:42:08 -04:00
Mahmood Ali	6eabf53b91	Default raft protocol to version 3	2019-08-28 15:56:59 -04:00
Michael Schurter	f5792635ca	Merge pull request #6218 from hashicorp/f-consul-defaults consul: use Consul's defaults and env vars	2019-08-28 11:54:44 -07:00
Nick Ethier	9e96971a75	cli: display group ports and address in alloc status command output (#6189 ) * cli: display group ports and address in alloc status command output * add assertions for port.To = -1 case and convert assertions to testify	2019-08-27 23:59:36 -04:00
Nick Ethier	cbb27e74bc	Add environment variables for connect upstreams (#6171 ) * taskenv: add connect upstream env vars + test * set taskenv upstreams instead of appending * Update client/taskenv/env.go Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-27 23:41:38 -04:00
Michael Schurter	3b0e1d8ef7	consul: use Consul's defaults and env vars Use Consul's API package defaults and env vars as Nomad's defaults.	2019-08-27 14:56:52 -07:00
Mahmood Ali	3791a70aa9	Merge pull request #5676 from hashicorp/f-b-upgrade-ugorji-dep-20190508 Update ugorji/go to latest	2019-08-23 18:29:49 -04:00
Jerome Gravel-Niquet	cbdc1978bf	Consul service meta (#6193 ) * adds meta object to service in job spec, sends it to consul * adds tests for service meta * fix tests * adds docs * better hashing for service meta, use helper for copying meta when registering service * tried to be DRY, but looks like it would be more work to use the helper function	2019-08-23 12:49:02 -04:00
Michael Schurter	95b8048553	Merge pull request #6121 from hashicorp/f-connect-bootstrap connect: task hook for bootstrapping envoy sidecar	2019-08-22 10:58:31 -07:00
Michael Schurter	59e0b67c7f	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Danielle Lancashire	2e5f28029f	remove hidden field from host volumes We're not shipping support for "hidden" volumes in 0.10 any more, I'll convert this to an issue+mini RFC for future enhancement.	2019-08-22 08:48:05 +02:00
Danielle	0428284aee	Merge pull request #6180 from hashicorp/dani/readonly-acl Fine grained ACLs for Host Volumes	2019-08-21 22:22:14 +02:00
Danielle Lancashire	91bb67f713	acls: Break mount acl into mount-rw and mount-ro	2019-08-21 21:17:30 +02:00
Nick Ethier	c8556daf37	structs: validate no tcp checks for connect services (#6169 )	2019-08-21 12:42:53 -04:00
Michael Schurter	050cc32fde	Merge pull request #6157 from hashicorp/f-connect-register Register connect enabled group services with Consul	2019-08-20 14:45:38 -07:00
Tim Gross	7dc6ee2d27	structs: add taskgroup networks and services to plan diffs Adds a check for differences in `job.Diff` so that task group networks and services, including new Consul connect stanzas, show up in the job plan outputs.	2019-08-20 16:18:30 -04:00
Michael Schurter	b008fd1724	connect: register group services with Consul Fixes #6042 Add new task group service hook for registering group services like Connect-enabled services. Does not yet support checks.	2019-08-20 12:25:10 -07:00
Tim Gross	a0e923f46c	add optional task field to group service checks	2019-08-20 09:35:31 -04:00
Mahmood Ali	d699a70875	Merge pull request #5911 from hashicorp/b-rpc-consistent-reads Block rpc handling until state store is caught up	2019-08-20 09:29:37 -04:00
Nick Ethier	24f5a4c276	sidecar_task override in connect admission controller (#6140 ) * structs: use seperate SidecarTask struct for sidecar_task stanza and add merge * nomad: merge SidecarTask into proxy task during connect Mutate hook	2019-08-20 01:22:46 -04:00
Nick Ethier	965f00b2fc	Builtin Admission Controller Framework (#6116 ) * nomad: add admission controller framework * nomad: add admission controller framework and Consul Connect hooks * run admission controllers before checking permissions * client: add default node meta for connect configurables * nomad: remove validateJob func since it has been moved to admission controller * nomad: use new TaskKind type * client: use consts for connect sidecar image and log level * Apply suggestions from code review Co-Authored-By: Michael Schurter <mschurter@hashicorp.com> * nomad: add job register test with connect sidecar * Update nomad/job_endpoint_hooks.go Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-15 11:22:37 -04:00
Preetha Appan	72e45dd01e	More code review feedback	2019-08-12 17:41:40 -05:00
Preetha	76c8a11b31	Apply suggestions from code review Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-12 17:03:30 -05:00
Preetha Appan	219dc05541	Fix type for kind	2019-08-12 14:39:50 -05:00
Preetha Appan	35506c516d	Improve validation logic and add table driven tests	2019-08-12 14:39:50 -05:00
Preetha Appan	d324a9864e	Add validation for kind field if it is a consul connect proxy	2019-08-12 14:39:50 -05:00
Danielle Lancashire	b38c1d810e	job_endpoint: Validate volume permissions	2019-08-12 15:39:09 +02:00
Danielle Lancashire	33db40d4e6	structs: Document VolumeMount	2019-08-12 15:39:08 +02:00
Danielle Lancashire	861caa9564	HostVolumeConfig: Source -> Path	2019-08-12 15:39:08 +02:00
Danielle Lancashire	e132a30899	structs: Unify Volume and VolumeRequest	2019-08-12 15:39:08 +02:00
Danielle Lancashire	6d7b417e54	structs: Add declarations of basic structs for volume support	2019-08-12 15:39:08 +02:00
Nick Ethier	1871c1edbc	Add sidecar_task stanza parsing (#6104 ) * jobspec: breakup parse.go into smaller files * add sidecar_task parsing to jobspec and api * jobspec: combine service parsing logic for task and group service stanzas * api: use slice of ConsulUpstream values instead of pointers	2019-08-09 15:18:53 -04:00
Preetha Appan	a393ea79e8	Add field "kind" to task for use in connect tasks	2019-08-07 18:43:36 -05:00
Jasmine Dahilig	8d980edd2e	add create and modify timestamps to evaluations (#5881 )	2019-08-07 09:50:35 -07:00
Michael Schurter	3e4796799a	Merge pull request #6003 from pete-woods/add-job-status-metrics nomad: add job status metrics	2019-08-07 08:02:16 -07:00
Michael Schurter	d2862b33e6	Merge pull request #6045 from hashicorp/f-connect-groupservice consul: add Connect structs	2019-08-06 15:43:38 -07:00
Michael Schurter	ef9d100d2f	Merge pull request #6082 from hashicorp/b-vault-deadlock vault: fix deadlock in SetConfig	2019-08-06 15:30:17 -07:00
Michael Schurter	ecb1a65bb9	Merge pull request #6077 from hashicorp/b-vault-revlock vault: fix race in accessor revocations	2019-08-06 14:28:47 -07:00
Michael Schurter	b8e127b3c0	vault: ensure SetConfig calls are serialized This is a defensive measure as SetConfig should only be called serially.	2019-08-06 11:17:10 -07:00
Michael Schurter	5022341b27	vault: fix deadlock in SetConfig This seems to be the minimum viable patch for fixing a deadlock between establishConnection and SetConfig. SetConfig calls tomb.Kill+tomb.Wait while holding v.lock. establishConnection needs to acquire v.lock to exit but SetConfig is holding v.lock until tomb.Wait exits. tomb.Wait can't exit until establishConnect does! ``` SetConfig -> tomb.Wait ^ \| \| v v.lock <- establishConnection ```	2019-08-06 10:40:14 -07:00
Michael Schurter	17fd82d6ad	consul: add Connect structs Refactor all Consul structs into {api,structs}/services.go because api/tasks.go didn't make sense anymore and structs/structs.go is gigantic.	2019-08-06 08:15:07 -07:00
Michael Schurter	d0a83eb818	vault: fix race in accessor revocations	2019-08-05 15:08:04 -07:00
Preetha Appan	8b298621ef	Add more comments to clarify job.Stable field	2019-08-05 15:00:53 -05:00
Preetha Appan	e6a496bac0	Code review feedback	2019-07-31 01:04:08 -04:00
Preetha Appan	99eca85206	Scheduler changes to support network at task group level Also includes unit tests for binpacker and preemption. The tests verify that network resources specified at the task group level are properly accounted for	2019-07-31 01:04:08 -04:00
Michael Schurter	4501fe3c4d	structs: deepcopy shared alloc resources Also DRY up Networks code by using Networks.Copy	2019-07-31 01:04:06 -04:00
Michael Schurter	fb487358fb	connect: add group.service stanza support	2019-07-31 01:04:05 -04:00
Nick Ethier	a03f6a95a2	structs: refactor network validation to seperate fn	2019-07-31 01:03:16 -04:00
Danielle	1e7571eb85	fix structs comment Co-Authored-By: nickethier <ncethier@gmail.com>	2019-07-31 01:03:16 -04:00
Nick Ethier	aa7c08679e	structs: Add validations for task group networks	2019-07-31 01:03:16 -04:00
Nick Ethier	6c160df689	fix tests from introducing new struct fields	2019-07-31 01:03:16 -04:00
Nick Ethier	8650429e38	Add network stanza to group Adds a network stanza and additional options to the task group level in prep for allowing shared networking between tasks of an alloc.	2019-07-31 01:03:12 -04:00
Preetha Appan	d048029b5a	remove generated code and change version to 0.10.0	2019-07-30 15:56:05 -05:00
Nomad Release bot	e39fb11531	Generate files for 0.9.4 release	2019-07-30 19:05:18 +00:00
Buck Doyle	0a1a0419cb	Combine conditionals	2019-07-29 10:38:07 -05:00
Buck Doyle	0a082c1e5e	Update assertion to use better failure-reporting	2019-07-29 10:35:07 -05:00
Buck Doyle	c3deb7703d	Update policy endpoint to permit anonymous access	2019-07-26 13:07:42 -05:00
Pete Woods	9096aa3d23	Add job status metrics This avoids having to write services to repeatedly hit the jobs API	2019-07-26 10:12:49 +01:00
Buck Doyle	77f5a38c8f	Add parsed rules to policy response	2019-07-25 10:43:57 -05:00
Preetha Appan	6b4c40f5a8	remove generated code	2019-07-23 12:07:49 -05:00
Nomad Release bot	04187c8b86	Generate files for 0.9.4-rc1 release	2019-07-22 21:42:36 +00:00
Jasmine Dahilig	2157f6ddf1	add formatting for hcl parsing error messages (#5972 )	2019-07-19 10:04:39 -07:00
Lang Martin	f282da4ced	blocked_evals_test disable calls Flush	2019-07-18 10:32:13 -04:00
Lang Martin	8f7a20839e	worker comment system -> core	2019-07-18 10:32:13 -04:00
Lang Martin	83d20169f6	blocked_evals reset system evals on Flush	2019-07-18 10:32:13 -04:00
Lang Martin	6e3425babf	blocked_evals_test Test_UnblockNode	2019-07-18 10:32:12 -04:00
Lang Martin	ea275d5ce7	fsm attach UnblockNode on node updates	2019-07-18 10:32:12 -04:00
Lang Martin	3bf618f217	blocked_evals system evals indexed by job and node	2019-07-18 10:32:12 -04:00
Michael Schurter	81b4b6f19b	Merge pull request #5791 from hashicorp/b-plan-snapshotindex nomad: include snapshot index when submitting plans	2019-07-17 09:25:00 -07:00
Mahmood Ali	ad39bcef60	rpc: use tls wrapped connection for streaming rpc This ensures that server-to-server streaming RPC calls use the tls wrapped connections. Prior to this, `streamingRpcImpl` function uses tls for setting header and invoking the rpc method, but returns unwrapped tls connection. Thus, streaming writes fail with tls errors. This tls streaming bug existed since 0.8.0[1], but PR #5654[2] exacerbated it in 0.9.2. Prior to PR #5654, nomad client used to shuffle servers at every heartbeat -- `servers.Manager.setServers`[3] always shuffled servers and was called by heartbeat code[4]. Shuffling servers meant that a nomad client would heartbeat and establish a connection against all nomad servers eventually. When handling streaming RPC calls, nomad servers used these local connection to communicate directly to the client. The server-to-server forwarding logic was left mostly unexercised. PR #5654 means that a nomad client may connect to a single server only and caused the server-to-server forward streaming RPC code to get exercised more and unearthed the problem. [1] https://github.com/hashicorp/nomad/blob/v0.8.0/nomad/rpc.go#L501-L515 [2] https://github.com/hashicorp/nomad/pull/5654 [3] https://github.com/hashicorp/nomad/blob/v0.9.1/client/servers/manager.go#L198-L216 [4] https://github.com/hashicorp/nomad/blob/v0.9.1/client/client.go#L1603	2019-07-12 14:41:44 +08:00
Mahmood Ali	9c9bec62fd	rpc: add positive tests for server streaming RPC	2019-07-12 14:32:52 +08:00
Lang Martin	0b97175a16	node_endpoint preserve both messages as rpcs and in raft	2019-07-10 13:56:20 -04:00
Lang Martin	ee4848167c	core_sched add compat comment for later removal	2019-07-10 13:56:20 -04:00
Lang Martin	c13c97c6c2	structs drop deprecation warning, revert unnecessary comment change	2019-07-10 13:56:20 -04:00
Lang Martin	a95225d754	NodeDeregisterBatch -> NodeBatchDeregister match JobBatch pattern	2019-07-10 13:56:20 -04:00
Lang Martin	a8e72a5b68	state_store error if called without node_ids	2019-07-10 13:56:20 -04:00
Lang Martin	44cbca9b98	fsm new NodeDeregisterBatchRequestType sorted at the end of the case	2019-07-10 13:56:20 -04:00
Lang Martin	91e139dcb5	structs NodeDeregisterBatchRequestType must go at the end	2019-07-10 13:56:20 -04:00
Lang Martin	1cc6b4062c	fsm label batch_deregister_node metrics explicitly Co-Authored-By: Mahmood Ali <mahmood@notnoop.com>	2019-07-10 13:56:20 -04:00
Lang Martin	ad3549f906	core_sched use the new rpc names	2019-07-10 13:56:20 -04:00
Lang Martin	ce0f03651a	fsm support new NodeDeregisterBatchRequest	2019-07-10 13:56:20 -04:00
Lang Martin	fa5649998e	node endpoint support new NodeDeregisterBatchRequest	2019-07-10 13:56:19 -04:00
Lang Martin	683ab8d1d2	structs add NodeDeregisterBatchRequest	2019-07-10 13:56:19 -04:00
Lang Martin	82349aba5d	node_endpoint argument setup	2019-07-10 13:56:19 -04:00
Lang Martin	6dbf5d7d13	fsm return an error on both NodeDeregisterRequest fields set	2019-07-10 13:56:19 -04:00
Lang Martin	fbc78ba96c	fsm variable names for consistency	2019-07-10 13:56:19 -04:00
Lang Martin	09fd05bd8f	node_endpoint raft store then shutdown, test deprecation	2019-07-10 13:56:19 -04:00
Lang Martin	4610c70777	util simplify partitionAll	2019-07-10 13:56:19 -04:00
Lang Martin	d22d9fb5b2	core_sched check ServersMeetMinimumVersion	2019-07-10 13:56:19 -04:00
Lang Martin	3bf41211fb	fsm honor new and old style NodeDeregisterRequests	2019-07-10 13:56:19 -04:00
Lang Martin	3fb82e83a5	structs add back NodeDeregisterRequest.NodeID, compatibility	2019-07-10 13:56:19 -04:00
Lang Martin	a4472e3d34	core_sched check ServersMeetMinimumVersion, send old node deregister	2019-07-10 13:56:19 -04:00
Lang Martin	8e53c105fc	state_store just one index update, test deletion	2019-07-10 13:56:19 -04:00
Lang Martin	3e2d1f0338	node_endpoint improve error messages	2019-07-10 13:56:19 -04:00
Lang Martin	5a6a947e98	state_store improve error messages	2019-07-10 13:56:19 -04:00
Lang Martin	fd14cedf95	drainer watch_nodes_test batch of 1	2019-07-10 13:56:19 -04:00
Lang Martin	b176066d42	node_endpoint deregister the batch of nodes	2019-07-10 13:56:19 -04:00
Lang Martin	a97407e030	fsm NodeDeregisterRequest is now a batch	2019-07-10 13:56:19 -04:00
Lang Martin	d5ff2834ca	core_sched batch node deregistration requests	2019-07-10 13:56:19 -04:00
Lang Martin	10848841be	util partitionAll for paging	2019-07-10 13:56:19 -04:00
Lang Martin	be2d6853cb	state_store DeleteNode operates on a batch of ids	2019-07-10 13:56:19 -04:00
Lang Martin	77cf037bff	struct NodeDeregisterRequest has a batch of NodeIDs	2019-07-10 13:56:19 -04:00
Mahmood Ali	ea3a98357f	Block rpc handling until state store is caught up Here, we ensure that when leader only responds to RPC calls when state store is up to date. At leadership transition or launch with restored state, the server local store might not be caught up with latest raft logs and may return a stale read. The solution here is to have an RPC consistency read gate, enabled when `establishLeadership` completes before we respond to RPC calls. `establishLeadership` is gated by a `raft.Barrier` which ensures that all prior raft logs have been applied. Conversely, the gate is disabled when leadership is lost. This is very much inspired by https://github.com/hashicorp/consul/pull/3154/files	2019-07-02 16:07:37 +08:00
Preetha Appan	3cb798235d	Missed one revert of backwards compatibility for node drain	2019-07-01 16:46:05 -05:00
Preetha Appan	aa2b4b4e00	Undo removal of node drain compat changes Decided to remove that in 0.10	2019-07-01 15:12:01 -05:00
Preetha Appan	3484f18984	Fix more tests	2019-06-26 16:30:53 -05:00
Preetha Appan	ff1b80dba6	Fix node drain test	2019-06-26 16:12:07 -05:00
Preetha Appan	23319e04d6	Restore accidentally deleted block	2019-06-26 13:59:14 -05:00
Michael Schurter	69ba495f0c	nomad: expand comments on subtle plan apply behaviors	2019-06-26 08:49:24 -07:00
Preetha Appan	66fa6a67ec	newline	2019-06-25 19:41:09 -05:00
Preetha Appan	10e7d6df6d	Remove compat code associated with many previous versions of nomad This removes compat code for namespaces (0.7), Drain(0.8) and other older features from releases older than Nomad 0.7	2019-06-25 19:05:25 -05:00
Michael Schurter	e4bc943a68	nomad: SnapshotAfter -> SnapshotMinIndex Rename SnapshotAfter to SnapshotMinIndex. The old name was not technically accurate. SnapshotAtOrAfter is more accurate, but wordy and still lacks context about what precisely it is at or after (the index). SnapshotMinIndex was chosen as it describes the action (snapshot), a constraint (minimum), and the object of the constraint (index).	2019-06-24 12:16:46 -07:00
Michael Schurter	0f8164b2f1	nomad: evaluate plans after previous plan index The previous commit prevented evaluating plans against a state snapshot which is older than the snapshot at which the plan was created. This is correct and prevents failures trying to retrieve referenced objects that may not exist until the plan's snapshot. However, this is insufficient to guarantee consistency if the following events occur: 1. P1, P2, and P3 are enqueued with snapshot @ 100 2. Leader evaluates and applies Plan P1 with snapshot @ 100 3. Leader evaluates Plan P2 with snapshot+P1 @ 100 4. P1 commits @ 101 4. Leader evaluates applies Plan P3 with snapshot+P2 @ 100 Since only the previous plan is optimistically applied to the state store, the snapshot used to evaluate a plan may not contain the N-2 plan! To ensure plans are evaluated and applied serially we must consider all previous plan's committed indexes when evaluating further plans. Therefore combined with the last PR, the minimum index at which to evaluate a plan is: min(previousPlanResultIndex, plan.SnapshotIndex)	2019-06-24 12:16:46 -07:00
Michael Schurter	e10fea1d7a	nomad: include snapshot index when submitting plans Plan application should use a state snapshot at or after the Raft index at which the plan was created otherwise it risks being rejected based on stale data. This commit adds a Plan.SnapshotIndex which is set by workers when submitting plan. SnapshotIndex is set to the Raft index of the snapshot the worker used to generate the plan. Plan.SnapshotIndex plays a similar role to PlanResult.RefreshIndex. While RefreshIndex informs workers their StateStore is behind the leader's, SnapshotIndex is a way to prevent the leader from using a StateStore behind the worker's. Plan.SnapshotIndex should be considered the lower bound index for consistently handling plan application. Plans must also be committed serially, so Plan N+1 should use a state snapshot containing Plan N. This is guaranteed for plans after the first plan after a leader election. The Raft barrier on leader election ensures the leader's statestore has caught up to the log index at which it was elected. This guarantees its StateStore is at an index > lastPlanIndex.	2019-06-24 12:16:46 -07:00
Chris Baker	59fac48d92	alloc lifecycle: 404 when attempting to stop non-existent allocation	2019-06-20 21:27:22 +00:00
Preetha	586e50d1a4	Merge pull request #5841 from hashicorp/f-raft-snapshot-metrics Raft and state store indexes as metrics	2019-06-19 12:01:03 -05:00
Preetha Appan	dc0ac81609	Change interval of raft stats collection to 10s	2019-06-19 11:58:46 -05:00
Preetha Appan	104d66f10c	Changed name of metric	2019-06-17 15:51:31 -05:00
Chris Baker	e0170e1c67	metrics: add namespace label to allocation metrics	2019-06-17 20:50:26 +00:00
Preetha Appan	c54b4a5b17	Emit metrics with raft commit and apply index and statestore latest index	2019-06-14 16:30:27 -05:00
Jasmine Dahilig	ed9740db10	Merge pull request #5664 from hashicorp/f-http-hcl-region backfill region from hcl for jobUpdate and jobPlan	2019-06-13 12:25:01 -07:00
Jasmine Dahilig	51e141be7a	backfill region from job hcl in jobUpdate and jobPlan endpoints - updated region in job metadata that gets persisted to nomad datastore - fixed many unrelated unit tests that used an invalid region value (they previously passed because hcl wasn't getting picked up and the job would default to global region)	2019-06-13 08:03:16 -07:00
Nick Ethier	1b7fa4fe29	Optional Consul service tags for nomad server and agent services (#5706 ) Optional Consul service tags for nomad server and agent services	2019-06-13 09:00:35 -04:00
Mahmood Ali	e31159bf1f	Prepare for 0.9.4 dev cycle	2019-06-12 18:47:50 +00:00
Nomad Release bot	4803215109	Generate files for 0.9.3 release	2019-06-12 16:11:16 +00:00
Mahmood Ali	07f2c77c44	comment DenormalizeAllocationDiffSlice applies to terminal allocs only	2019-06-12 08:28:43 -04:00
Lang Martin	fe8a4781d8	config merge maintains *HCL string fields used for duration conversion	2019-06-11 16:34:04 -04:00
Mahmood Ali	392f5bac44	Stop updating allocs.Job on stopping or preemption	2019-06-10 18:30:20 -04:00
Mahmood Ali	6c8e329819	test that stopped alloc jobs aren't modified When an alloc is stopped, test that we don't update the job found in alloc with new job that is no longer relevent for this alloc.	2019-06-10 17:14:26 -04:00
Mahmood Ali	d30c3d10b0	Merge pull request #5747 from hashicorp/b-test-fixes-20190521-1 More test fixes	2019-06-05 19:09:18 -04:00
Mahmood Ali	87173111de	Merge pull request #5746 from hashicorp/b-no-updating-inmem-node set node.StatusUpdatedAt in raft	2019-06-05 19:05:21 -04:00
Mahmood Ali	97957fbf75	Prepare for 0.9.3 dev cycle	2019-06-05 14:54:00 +00:00
Nomad Release bot	43bfbf3fcc	Generate files for 0.9.2 release	2019-06-05 11:59:27 +00:00
Michael Schurter	073893f529	nomad: disable service+batch preemption by default Enterprise only. Disable preemption for service and batch jobs by default. Maintain backward compatibility in a x.y.Z release. Consider switching the default for new clusters in the future.	2019-06-04 15:54:50 -07:00
Michael Schurter	a8fc50cc1b	nomad: revert use of SnapshotAfter in planApply Revert plan_apply.go changes from #5411 Since non-Command Raft messages do not update the StateStore index, SnapshotAfter may unnecessarily block and needlessly fail in idle clusters where the last Raft message is a non-Command message. This is trivially reproducible with the dev agent and a job that has 2 tasks, 1 of which fails. The correct logic would be to SnapshotAfter the previous plan's index to ensure consistency. New clusters or newly elected leaders will not have a previous plan, so the index the leader was elected should be used instead.	2019-06-03 15:34:21 -07:00
Mahmood Ali	a4ead8ff79	remove 0.9.2-rc1 generated code	2019-05-23 11:14:24 -04:00
Nomad Release bot	6d6bc59732	Generate files for 0.9.2-rc1 release	2019-05-22 19:29:30 +00:00
Lang Martin	d46613ff44	structs check TaskGroup.Update for nil	2019-05-22 12:34:57 -04:00
Lang Martin	10a3fd61b0	comment replace COMPAT 0.7.0 for job.Update with more current info	2019-05-22 12:34:57 -04:00
Lang Martin	67ebcc47dd	structs comment todo DeploymentStatus & DeploymentStatusDescription	2019-05-22 12:34:57 -04:00
Lang Martin	21bf9fdf90	structs job warnings for taskgroup with mixed auto_promote settings	2019-05-22 12:34:57 -04:00
Lang Martin	0f6f543a5f	deployment_watcher auto promote iff every task group is auto promotable	2019-05-22 12:34:57 -04:00
Lang Martin	d27d6f8ede	structs validate requires Canary for AutoPromote	2019-05-22 12:32:08 -04:00
Lang Martin	0c668ecc7a	log error on autoPromoteDeployment failure	2019-05-22 12:32:08 -04:00
Lang Martin	f23f9fd99e	describe a pending deployment without auto_promote more explicitly	2019-05-22 12:32:08 -04:00
Lang Martin	34230577df	describe a pending deployment with auto_promote accurately	2019-05-22 12:32:08 -04:00
Lang Martin	b5fd735960	add update AutoPromote bool	2019-05-22 12:32:08 -04:00
Lang Martin	3c5a9fed22	deployments_watcher_test new TestWatcher_AutoPromoteDeployment	2019-05-22 12:32:08 -04:00
Lang Martin	0bebf5d7f8	deployment_watcher when it's ok to autopromote, do so	2019-05-22 12:32:08 -04:00
Lang Martin	0cf4168ed9	deployments_watcher comments	2019-05-22 12:32:08 -04:00
Lang Martin	0c403eafde	state_store typo in a comment	2019-05-22 12:32:08 -04:00
Lang Martin	e1e28307be	new deploymentwatcher/doc.go for package level documentation	2019-05-22 12:32:08 -04:00
Mahmood Ali	9ff5f163b5	update callers in tests	2019-05-21 21:10:17 -04:00
Mahmood Ali	6bdbeed319	set node.StatusUpdatedAt in raft Fix a case where `node.StatusUpdatedAt` was manipulated directly in memory. This ensures that StatusUpdatedAt is set in raft layer, and ensures that the field is updated when node drain/eligibility is updated too.	2019-05-21 16:13:32 -04:00
Mahmood Ali	2159d0f3ac	tests: fix some nomad/drainer test data races	2019-05-21 14:40:58 -04:00
Mahmood Ali	3b0152d778	tests: fix deploymentwatcher tests data races	2019-05-21 14:29:45 -04:00
Michael Schurter	689794e08d	nomad: fix deadlock in UnblockClassAndQuota Previous commit could introduce a deadlock if the capacityChangeCh was full and the receiving side exited before freeing a slot for the sending side could send. Flush would then block forever waiting to acquire the lock just to throw the pending update away. The race is around getting/setting the chan field, not chan operations, so only lock around getting the chan field.	2019-05-20 15:41:52 -07:00
Michael Schurter	8c99214f69	nomad: fix race in BlockedEvals I assume the mutex was being released before sending on capacityChangeCh to avoid blocking in the critical section, but: 1. This is race. 2. capacityChangeCh has a huge buffer (8096). If it's full things already seem Very Bad, and a little backpressure seems appropriate.	2019-05-20 15:26:20 -07:00
Michael Schurter	05a9c6aedb	Merge pull request #5411 from hashicorp/b-snapshotafter Block plan application until state store has caught up to raft	2019-05-20 14:03:10 -07:00
Mahmood Ali	cd64ada95d	Run TestClientAllocations_Restart_ACL test	2019-05-17 20:30:23 -04:00
Michael Schurter	0e39927782	nomad: emit more detailed error Avoid returning context.DeadlineExceeded as it lacks helpful information and is often ignored or handled specially by callers.	2019-05-17 14:37:42 -07:00
Michael Schurter	b80a7e0feb	nomad: wait for state store to sync in plan apply Wait for state store to catch up with raft when applying plans.	2019-05-17 14:37:12 -07:00
Michael Schurter	1bc731da47	nomad: remove unused NotifyGroup struct I don't think it's been used for a long time.	2019-05-17 13:30:23 -07:00
Michael Schurter	9732bc37ff	nomad: refactor waitForIndex into SnapshotAfter Generalize wait for index logic in the state store for reuse elsewhere. Also begin plumbing in a context to combine handling of timeouts and shutdown.	2019-05-17 13:30:23 -07:00
Preetha	c8fdf20c66	Merge pull request #5717 from hashicorp/b-plan-apply-preemptions Fix bug in plan applier introduced in PR-5602	2019-05-16 11:01:05 -05:00
Preetha	2dcd4291f8	Merge pull request #5702 from hashicorp/f-filter-by-create-index Filter deployments by create index	2019-05-15 21:50:41 -05:00
Preetha	555dd23c2c	remove stray newline Co-Authored-By: Danielle <dani@builds.terrible.systems>	2019-05-15 21:11:52 -05:00
Preetha Appan	2b787aad7e	Fix bug in plan applier introduced in PR-5602 This fixes a bug in the state store during plan apply. When denormalizing preempted allocations it incorrectly set the preemptor's job during the update. This eventually causes a panic downstream in the client. Added a test assertion that failed before and passes after this fix	2019-05-15 20:34:06 -05:00
Danielle	d202582502	Merge pull request #5699 from hashicorp/dani/b-eval-broker-lifetime Eval Broker: Prevent redundant enqueue's when a node is not a leader	2019-05-15 23:30:52 +01:00
Danielle Lancashire	2fb93a6229	evalbroker: test for no enqueue on disabled	2019-05-15 11:02:21 +02:00
Nick Ethier	ade97bc91f	fixup #5172 and rebase against master	2019-05-14 14:37:34 -04:00
Nick Ethier	cab6a95668	Merge branch 'master' into pr/5172 * master: (912 commits) Update redirects.txt Added redirect for Spark guide link client: log when server list changes docs: mention regression in task config validation fix update to changelog update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665 typo: "atleast" -> "at least" implement nomad exec for rkt docs: fixed typo use pty/tty terminology similar to github.com/kr/pty vendor github.com/kr/pty drivers: implement streaming exec for executor based drivers executors: implement streaming exec executor: scaffolding for executor grpc handling client: expose allocated memory per task client improve a comment in updateNetworks stalebot: Add 'thinking' as an exempt label (#5684) Added Sparrow link update links to use new canonical location Add redirects for restructing done in GH-5667 ...	2019-05-14 14:10:33 -04:00
Michael Schurter	d7e5ace1ed	client: do not restart dead tasks until server is contacted Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.	2019-05-14 10:53:27 -07:00
Danielle Lancashire	d9815888ed	evalbroker: Simplify nextDelayedEval locking	2019-05-14 14:06:27 +02:00
Danielle Lancashire	38562afbc1	evalbroker: No new enqueues when disabled Currently when an evalbroker is disabled, it still recieves delayed enqueues via log application in the fsm. This causes an ever growing heap of evaluations that will never be drained, and can cause memory issues in larger clusters, or when left running for an extended period of time without a leader election. This commit prevents the enqueuing of evaluations while we are disabled, and relies on the leader restoreEvals routine to handle reconciling state during a leadership transition. Existing dequeues during an Enabled->Disabled broker state transition are handled by the enqueueLocked function dropping evals.	2019-05-14 13:59:10 +02:00
Danielle Lancashire	c91ae21a6c	evalbroker: Flush within update lock Primarily a cleanup commit, however, currently there is a potential race condition (that I'm not sure we've ever actually hit) during a flapping SetEnabled/Disabled state where we may never correctly restart the eval broker, if it was being called from multiple routines.	2019-05-14 13:26:56 +02:00
Preetha Appan	4d3f74e161	Fix test setup to have correct jobcreateindex for deployments	2019-05-13 18:53:47 -05:00
Preetha Appan	d448750449	Lookup job only once, and fix tests	2019-05-13 18:33:41 -05:00
Preetha Appan	07690d6f9e	Add flag similar to --all for allocs to be able to filter deployments by latest	2019-05-13 18:33:41 -05:00
Jasmine Dahilig	30d346ca15	Merge pull request #5665 from hashicorp/b-empty-datacenters add non-empty string validation for datacenters	2019-05-13 10:23:26 -07:00
Mahmood Ali	cf1f3625b4	Update ugorji/go to latest Our testing so far indicates that ugorji/go/codec maintains backward compatiblity with the version we are using now, for purposes of Nomad serialization. Using latest ugorji/go allows us to get back to using upstream library, get get the optimizations benefits in RPC paths (including code generation optimizations). ugorji/go introduced two significant changes: * time binary format in `debb8e2d2e`. Setting `h.BasicHandle.TimeNotBuiltin = true` restores old behavior * ugorji/go started honoring `json` tag as well: v1.1.4 is the latest but has a bug in handling RawString that's fixed in `d09a80c1e0` .	2019-05-09 19:35:58 -04:00
Mahmood Ali	919827f2df	Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base nomad exec part 1: plumbing and docker driver	2019-05-09 18:09:27 -04:00
Mahmood Ali	3c668732af	server: server forwarding logic for nomad exec endpoint	2019-05-09 16:49:08 -04:00
Jasmine Dahilig	0ba2bd15b9	add unit tests for datacenter non-empty string validation	2019-05-08 11:51:52 -07:00
Mahmood Ali	9d3f13e9b3	remove Index field from EmitNodeEventsResponse `Index` is already included as part of `WriteMeta` embedding. This is a backward compatible change: Clients never read the field; and Server refernces to `EmitNodeEventsResponse.Index` would be using the value in `WriteMeta`, which is consistent with other response structs.	2019-05-08 08:42:26 -04:00
Preetha	1538913a2a	Merge pull request #5628 from hashicorp/f-preemption-config Add config to disable preemption for batch/service jobs	2019-05-06 15:40:35 -05:00
Mahmood Ali	f35ad92a8b	Merge pull request #5646 from hashicorp/some-ugorji-fixes Codegen codec helpers for all nomad structs	2019-05-06 13:23:12 -04:00
Lang Martin	9f3f11df97	Merge pull request #5601 from hashicorp/b-config-parse-direct-hcl config parse direct hcl	2019-05-06 12:05:19 -04:00
Mahmood Ali	92c133b905	Update peers info with new raft config details	2019-05-03 16:55:53 -04:00
Preetha Appan	ad3c263d3f	Rename to match system scheduler config. Also added docs	2019-05-03 14:06:12 -05:00
Jasmine Dahilig	016495c368	add non-empty string validation for datacenters	2019-05-03 06:48:02 -07:00
Hemanth Basappa	3fef02aa93	Add support in nomad for supporting raft 3 protocol peers.json	2019-05-02 09:11:23 -07:00
Mahmood Ali	21d21baf8b	codegen codecs for nomad structs `ls *[!_test].go` was ignoring any file that ends with `s.go` (or any of the letter inside `[]`), including `structs.go`!	2019-05-01 12:42:55 -04:00
Lang Martin	598112a1cc	tag HCL bookkeeping keys with json:"-" to keep them out of the api	2019-04-30 10:29:14 -04:00
Lang Martin	5ebae65d1a	agent/config, config/* mapstructure tags -> hcl tags	2019-04-30 10:29:14 -04:00
Preetha Appan	6615d5c868	Add config to disable preemption for batch/service jobs	2019-04-29 18:48:07 -05:00
Lang Martin	371014b781	Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config client fingerprinter doesn't overwrite manual configuration	2019-04-26 12:55:34 -04:00
Danielle Lancashire	3409e0be89	allocs: Add nomad alloc signal command This command will be used to send a signal to either a single task within an allocation, or all of the tasks if <task-name> is omitted. If the sent signal terminates the allocation, it will be treated as if the allocation has crashed, rather than as if it was operator-terminated. Signal validation is currently handled by the driver itself and nomad does not attempt to restrict or validate them.	2019-04-25 12:43:32 +02:00
Arshneet Singh	b7b050cdd1	Change min version required for plan optimization	2019-04-24 12:36:07 -07:00
Arshneet Singh	9cc39edb67	Return error when preempted/stopped alloc doesn't exist during denormalization	2019-04-24 12:36:07 -07:00
Lang Martin	19ba0f4882	structs_test use testify require.True instead of t.Fatal	2019-04-23 17:00:11 -04:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	0dd4c109e8	Compat tags	2019-04-23 09:18:01 -07:00
Arshneet Singh	65f5fab131	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	198a838b61	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	832f607433	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Lang Martin	8aa97cff13	tests over setwise equality of fingerprinted parts	2019-04-19 15:49:24 -04:00
Lang Martin	7de6e28ddc	structs need to keep assert Equal interface implementation for tests	2019-04-19 15:23:49 -04:00
Lang Martin	977d33970b	structs equals use labeled continue for clarity	2019-04-19 15:23:48 -04:00
Lang Martin	7b99488afa	struct equals use a working pattern for setwise comparison	2019-04-19 15:23:48 -04:00
Lang Martin	eba4e29440	client fingerprinter doesn't overwrite manual configuration Revert "Revert accidental merge of pr #5482" This reverts commit c45652ab8c113487b9d4fbfb107782cbcf8a85b0.	2019-04-19 15:23:48 -04:00
Preetha Appan	22109d1e20	Add preemption related fields to AllocationListStub	2019-04-18 10:36:44 -05:00
Lang Martin	a2a1e7829d	Revert accidental merge of pr #5482 Revert "fingerprint Constraints and Affinities have Equals, as set" This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609. Revert "client tests assert the independent handling of interface and speed" This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40. Revert "structs missed applying a style change from the review" This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2. Revert "client, structs comments" This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2. Revert "client fingerprint updateNetworks preserves the network configuration" This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c. Revert "client_test cleanup comments from review" This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445. Revert "client Networks Equals is set equality" This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573. Revert "struct cleanup indentation in RequestedDevice Equals" This reverts commit f4746411cab328215def6508955b160a53452da3. Revert "struct Equals checks for identity before value checking" This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226. Revert "fix client-test, avoid hardwired platform dependecy on lo0" This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7. Revert "refactor error in client fingerprint to include the offending data" This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5. Revert "add client updateNodeResources to merge but preserve manual config" This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032. Revert "refactor struts.RequestedDevice to have its own Equals" This reverts commit 689782524090e51183474516715aa2f34908b8e6. Revert "refactor structs.Resource.Networks to have its own Equals" This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1. Revert "refactor structs.Resource.Devices to have its own Equals" This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9. Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources" This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5. Revert "add structs.Resources Equals" This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b. Revert "test that fingerprint resources are updated, net not clobbered" This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.	2019-04-11 10:29:40 -04:00
Lang Martin	07ff740408	fingerprint Constraints and Affinities have Equals, as set	2019-04-11 09:56:22 -04:00
Lang Martin	8f07698c03	structs missed applying a style change from the review	2019-04-11 09:56:22 -04:00
Lang Martin	7258a13c72	client, structs comments	2019-04-11 09:56:22 -04:00
Lang Martin	1878bf694e	client Networks Equals is set equality	2019-04-11 09:56:22 -04:00
Lang Martin	e1c91afd19	struct cleanup indentation in RequestedDevice Equals	2019-04-11 09:56:22 -04:00
Lang Martin	0c90efebdc	struct Equals checks for identity before value checking	2019-04-11 09:56:22 -04:00
Lang Martin	1a594b53f6	refactor struts.RequestedDevice to have its own Equals	2019-04-11 09:56:21 -04:00
Lang Martin	ec1ccdeda0	refactor structs.Resource.Networks to have its own Equals NodeResource.Networks uses the same function	2019-04-11 09:56:21 -04:00
Lang Martin	06008465c4	refactor structs.Resource.Devices to have its own Equals	2019-04-11 09:56:21 -04:00
Lang Martin	36f3022246	add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources	2019-04-11 09:56:21 -04:00
Lang Martin	d4567e9909	add structs.Resources Equals	2019-04-11 09:56:21 -04:00
Danielle Lancashire	e135876493	allocs: Add nomad alloc restart This adds a `nomad alloc restart` command and api that allows a job operator with the alloc-lifecycle acl to perform an in-place restart of a Nomad allocation, or a given subtask.	2019-04-11 14:25:49 +02:00
Chris Baker	34e100cc96	server vault client: use two vault clients, one with namespace, one without for /sys calls	2019-04-10 10:34:10 -05:00
Michael Schurter	cc7768c170	Update nomad/structs/config/vault.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-04-10 10:34:10 -05:00
Chris Baker	a26d4fe1e5	docs: -vault-namespace, VAULT_NAMESPACE, and config agent: added VAULT_NAMESPACE env-based configuration	2019-04-10 10:34:10 -05:00
Chris Baker	d3041cdb17	wip: added config parsing support, CLI flag, still need more testing, VAULT_ var, documentation	2019-04-10 10:34:10 -05:00
Chris Baker	0eaeef872f	config/docs: added `namespace` to vault config server/client: process `namespace` config, setting on the instantiated vault client	2019-04-10 10:34:10 -05:00
Michael Schurter	c0cd96ef75	Update nomad/job_endpoint_test.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-04-10 10:34:10 -05:00
Michael Schurter	188c32421a	Update nomad/job_endpoint.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-04-10 10:34:10 -05:00
Chris Baker	0ba1600545	server/job_endpoint: accept vault token and pass as part of Job.RegisterRequest [#4555 ]	2019-04-10 10:34:10 -05:00
James Rasell	9470507cf4	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Michael Schurter	45b4827ad7	Bump to 0.9.1-dev	2019-04-09 09:01:48 -07:00
Nomad Release bot	e307734e4a	Generate files for 0.9.0 release	2019-04-09 01:56:00 +00:00
Michael Schurter	3af602b633	Remove 0.9.0-rc2 generated files	2019-04-03 07:41:09 -07:00
Nomad Release bot	16b4336ccf	Generate files for 0.9.0-rc2 release	2019-04-03 01:54:29 +00:00
Michael Schurter	9afbc45cff	Bump to dev post-0.9.0-rc1 release	2019-03-22 08:26:30 -07:00
Nomad Release bot	3ab3dd4105	Generate files for 0.9.0-rc1 release	2019-03-21 19:06:13 +00:00
HashedDan	caad68e799	server: inconsistent receiver notation corrected Signed-off-by: HashedDan <georgedanielmangum@gmail.com>	2019-03-16 17:53:53 -05:00
Alex Dadgar	e779d9444b	Update nomad/eval_endpoint_test.go Co-Authored-By: schmichael <michael.schurter@gmail.com>	2019-03-05 15:19:15 -08:00
Alex Dadgar	1857f5d7c1	Update nomad/eval_endpoint.go Co-Authored-By: schmichael <michael.schurter@gmail.com>	2019-03-05 15:19:07 -08:00
Michael Schurter	e37bbb21a5	nomad: simplify code and improve parameter name	2019-03-04 13:44:14 -08:00
Michael Schurter	05f51499ba	nomad: compare current eval when setting WaitIndex Consider currently dequeued Evaluation's ModifyIndex when determining its WaitIndex. Normally the Evaluation itself would already be in the state store snapshot used to determine the WaitIndex. However, since the FSM applies Raft messages to the state store concurrently with Dequeueing, it's possible the currently dequeued Evaluation won't yet exist in the state store snapshot used by JobsForEval. This can be solved by always considering the current eval's modify index and using it if it is greater than all of the evals returned by the state store.	2019-03-01 15:23:39 -08:00
Michael Schurter	3f386e3951	Remove generated files for 0.9.0-beta3	2019-02-26 10:34:08 -08:00
Michael Schurter	d74755900e	Generate files for 0.9.0-beta3 release	2019-02-26 09:44:49 -08:00
Charlie Voiselle	604c49beb8	Merge pull request #5344 from hashicorp/b-nexteval-for-failed-follow-up Set NextEval when making `failed-follow-up` evals	2019-02-22 14:14:41 -08:00
Charlie Voiselle	006afdca9b	Added comments * caller should created eval id * prev/next eval used in failed-follow-up	2019-02-22 10:22:52 -08:00
Charlie Voiselle	c28c195f42	Set NextEval when making `failed-follow-up` evals This allows users to locate failed-follow-up evals more easily	2019-02-20 16:07:11 -08:00
Michael Schurter	6580ed668e	client: don't redownload completed artifacts on retries Track the download status of each artifact independently so that if only one of many artifacts fails to download, completed artifacts aren't downloaded again.	2019-02-20 08:45:12 -08:00
Michael Schurter	2db91425e3	Remove 0.9.0-beta2 generated files	2019-02-01 08:28:44 -08:00
Alex Dadgar	84d0afccae	Generate files for 0.9.0-beta2	2019-01-30 13:31:50 -08:00
Alex Dadgar	d2e5ede119	remove generated structs	2019-01-30 12:38:34 -08:00
Alex Dadgar	41265d4d61	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Alex Dadgar	bc804dda2e	Nomad 0.9.0-beta1 generated code	2019-01-30 10:49:44 -08:00
Preetha Appan	c848a1d387	ensure tests run a 0.9 server	2019-01-29 16:19:45 -06:00
Preetha Appan	496eb1de0c	Guard operator endpoints for minimum server version	2019-01-29 15:50:36 -06:00
Preetha Appan	7578522f58	variable name fix	2019-01-29 13:48:45 -06:00
Preetha Appan	a6cebbbf9e	Make sure that all servers are 0.9 before applying scheduler config entry	2019-01-29 12:47:42 -06:00
Michael Schurter	3aba7ee826	nomad: fix panic when no node conn found A missing return would cause a panic when a server could find no route to a client.	2019-01-28 21:55:35 -08:00
Mahmood Ali	f9164dae67	Merge pull request #5228 from hashicorp/f-vault-err-tweaks server/vault: tweak error messages	2019-01-25 11:17:31 -05:00
Mahmood Ali	f4560d8a2a	server/vault: tweak error messages Closes #5139	2019-01-25 10:33:54 -05:00
Preetha	ec92bf673c	Merge pull request #5223 from hashicorp/f-jobs-list-datacenters Add Datacenters to the JobListStub struct	2019-01-24 08:13:30 -06:00
Michael Schurter	13f061a83f	Merge pull request #5196 from hashicorp/f-plugin-utils Make plugins/shared external and make pluginutls/	2019-01-23 06:59:32 -08:00
Michael Schurter	32daa7b47b	goimports until make check is happy	2019-01-23 06:27:14 -08:00
Michael Schurter	be0bab7c3f	move pluginutils -> helper/pluginutils I wanted a different color bikeshed, so I get to paint it	2019-01-22 15:50:08 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Alex Dadgar	cdcd3c929c	loader and singleton	2019-01-22 15:11:57 -08:00
Alex Dadgar	6c2782f037	move catalog + grpcutils	2019-01-22 15:11:57 -08:00
Preetha Appan	38422642cb	Use DesiredState to determine whether to stop sending task events	2019-01-22 16:43:32 -06:00
Michael Lange	ce7bc4f56f	Add Datacenters to the JobsListStub struct So it can be used for filtering the full list of jobs	2019-01-22 11:16:35 -08:00
Mahmood Ali	e1803b685b	tests: deflake TestClientAllocations_GarbageCollect_Remote Use the same strategy as one in f2f383b07543a09ca989b82738926f7248e1ab28	2019-01-19 09:07:27 -05:00
Mahmood Ali	b2203a3a22	Merge pull request #5215 from hashicorp/test-fix-garbagecollect test: fix flaky garbage collect test	2019-01-18 21:10:01 -05:00
Mahmood Ali	05e32fb525	Merge pull request #5213 from hashicorp/b-api-separate Slimmer /api package	2019-01-18 20:52:53 -05:00
Michael Schurter	0cd35ba335	test: fix flaky garbage collect test This seems to fix TestClientAllocations_GarbageCollectAll_Remote being flaky. This test confuses me. It joins 2 servers, but then goes out of its way to make sure the test client only interacts with one. There are not enough comments for me to figure out the precise assertions this test is trying to make. A good old fashioned wait-for-the-client-to-register seems to fix the flakiness though. The error was that the node could not be found, so this makes some sense. However, lots of other tests seem to use the same "wait for node" logic and don't appear to be flaky, so who knows why waiting fixes this one. Passes with -race.	2019-01-18 16:01:30 -08:00
Mahmood Ali	7bdd43f3e0	api: avoid codegen for syncing Given that the values will rarely change, specially considering that any changes would be backward incompatible change. As such, it's simpler to keep syncing manually in the rare occasion and avoid the syncing code overhead.	2019-01-18 18:52:31 -05:00
Preetha Appan	510d7839e4	code review comments	2019-01-18 17:41:39 -06:00
Mahmood Ali	253532ec00	api: avoid import nomad/structs pkg nomad/structs is an internal package and imports many libraries (e.g. raft, codec) that are not relevant to api clients, and may cause unnecessary dependency pain (e.g. `github.com/ugorji/go/codec` version is very old now). Here, we add a code generator that imports the relevant constants from `nomad/structs`. I considered using this approach for other structs, but didn't find a quick viable way to reduce duplication. `nomad/structs` use values as struct fields (e.g. `string`), while `api` uses value pointer (e.g. `*string`) instead. Also, sometimes, `api` structs contain deprecated fields or additional documentation, so simple copy-paste doesn't work. For these reasons, I opt to keep the status quo.	2019-01-18 14:51:19 -05:00
Preetha Appan	be9656d195	fix linting	2019-01-17 15:36:33 -06:00
Preetha Appan	0f8a113ead	Refactor to find jobs with child instances more effeciently also added unit tests	2019-01-17 14:29:48 -06:00
Preetha Appan	be36fee48e	Use IsParameterized/isPeriodic methods	2019-01-17 12:15:42 -06:00
Preetha Appan	81a8f18cac	Fix bug in reconcile summaries that affects periodic/parameterized jobs This fixes incorrect parent job summaries by recomputing them in the ReconcileJobSummaries method in the state store	2019-01-17 12:01:01 -06:00
Nick Ethier	597b7b751d	tr: add retry /w backoff to stats_hook failure	2019-01-12 12:18:24 -05:00
Mahmood Ali	4414a2ce1c	tests: remove tests for unsupported features With switching to driver plugins, driver validation is quite tricky and we need to do some design thinking before supporting it against.	2019-01-10 10:21:48 -05:00
Nick Wales	7a7b5da0df	Adds optional Consul service tags to nomad server and agent services, gh#4297	2019-01-09 22:02:46 +00:00
Mahmood Ali	1f2473263e	fix more cases of logging arity errors	2019-01-09 09:22:47 -05:00
Mahmood Ali	6f077a73dc	Fix panic on failure Error expects an odd number of arguments, and panics otherwise.	2019-01-08 12:19:44 -05:00
Michael Schurter	324e989327	Merge pull request #5034 from hashicorp/test-fix-races Test fix races	2019-01-08 07:04:09 -08:00
Alex Dadgar	79cfe26021	vet	2019-01-07 14:49:41 -08:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Nick Ethier	a96afb6c91	fix tests that fail as a result of async client startup	2018-12-20 00:53:44 -05:00
Michael Schurter	6c1dbb659d	test: fix race and nil panic in nomad/ tests Race was test only and due to unlocked map access. Panic was test only and due to checking a field on a struct even when we knew the struct was nil. Race output that was fixed: ``` ================== WARNING: DATA RACE Read at 0x00c000697dd0 by goroutine 768: runtime.mapaccess2() /usr/local/go/src/runtime/map.go:439 +0x0 github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds.func8() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:402 +0xe6 github.com/hashicorp/nomad/testutil.WaitForResultRetries() /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:30 +0x5a github.com/hashicorp/nomad/testutil.WaitForResult() /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:22 +0x57 github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:401 +0xb53 testing.tRunner() /usr/local/go/src/testing/testing.go:827 +0x162 Previous write at 0x00c000697dd0 by goroutine 569: runtime.mapassign() /usr/local/go/src/runtime/map.go:549 +0x0 github.com/hashicorp/nomad/nomad.(PeriodicDispatch).Add() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:224 +0x2eb github.com/hashicorp/nomad/nomad.(Server).restorePeriodicDispatcher() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:394 +0x29a github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:234 +0x593 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Goroutine 768 (running) created at: testing.(T).Run() /usr/local/go/src/testing/testing.go:878 +0x650 testing.runTests.func1() /usr/local/go/src/testing/testing.go:1119 +0xa8 testing.tRunner() /usr/local/go/src/testing/testing.go:827 +0x162 testing.runTests() /usr/local/go/src/testing/testing.go:1117 +0x4ee testing.(M).Run() /usr/local/go/src/testing/testing.go:1034 +0x2ee main.main() _testmain.go:1150 +0x221 Goroutine 569 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 ================== ```	2018-12-19 15:48:02 -08:00
Michael Schurter	004fa574cb	test: fix race in eval broker update chan Similar to previous commits the delayed eval update chan was set and access from different goroutines causing a race. Passing the chan on the stack resolves the race. Race output from `go test -race -run 'Server_RPC$'` in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c000339150 by goroutine 63: github.com/hashicorp/nomad/nomad.(EvalBroker).flush() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:708 +0x3dc github.com/hashicorp/nomad/nomad.(EvalBroker).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:174 +0xc4 github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:718 +0x1fd github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122 +0x95d github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c000339150 by goroutine 73: github.com/hashicorp/nomad/nomad.(EvalBroker).runDelayedEvalsWatcher() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:771 +0x176 Goroutine 63 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 73 (running) created at: github.com/hashicorp/nomad/nomad.(EvalBroker).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:170 +0x173 github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:207 +0x355 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Michael Schurter	1c137690c4	test: fix race around block eval chans Similar to previous commit, stop and change chans were being set and accessed from different goroutines. Passing the chans on the stack resolves the race. Output from `go test -race -run 'Server_RPC$' in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c0002b4e10 by goroutine 63: github.com/hashicorp/nomad/nomad.(BlockedEvals).Flush() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:648 +0x32a github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149 +0x12b github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721 +0x232 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122 +0x95d github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0002b4e10 by goroutine 75: github.com/hashicorp/nomad/nomad.(BlockedEvals).watchCapacity() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:483 +0xfe Goroutine 63 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 75 (finished) created at: github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:141 +0xba github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210 +0x392 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ================== WARNING: DATA RACE Write at 0x00c0002b4e50 by goroutine 63: github.com/hashicorp/nomad/nomad.(BlockedEvals).Flush() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:649 +0x388 github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149 +0x12b github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721 +0x232 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122 +0x95d github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0002b4e50 by goroutine 77: github.com/hashicorp/nomad/nomad.(BlockedEvals).prune() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:690 +0xae Goroutine 63 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 77 (finished) created at: github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:142 +0xdc github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210 +0x392 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Michael Schurter	80263861aa	test: fix race around updateCh handling PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and PeriodicDispatch.run accesses updateCh in another. The race can be prevented by having SetEnabled pass updateCh to run. Race detector output from `go test -race -run TestServer_RPC` in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c0001d3f48 by goroutine 75: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468 +0x256 github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724 +0x267 github.com/hashicorp/nomad/nomad.(Server).leaderLoop.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131 +0x3c github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163 +0x4dd github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0001d3f48 by goroutine 515: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).run() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338 +0x177 Goroutine 75 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 515 (running) created at: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176 +0x1bc github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231 +0x582 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Danielle Tomlinson	3647b701a6	taskrunner: Emit task events when a hook fails	2018-12-13 18:20:18 +01:00
Chris Baker	af593c401c	Merge pull request #4974 from hashicorp/b-1173-log-spam rpc accept loop: added backoff on logging	2018-12-12 16:54:42 -08:00
Chris Baker	121a9eb8cb	some changes for more idiomatic code	2018-12-12 23:11:17 +00:00
Alex Dadgar	fbe4d67d1b	fix iops related tests	2018-12-12 14:32:22 -08:00
Chris Baker	34600f8b75	fixed bug in loop delay	2018-12-12 19:16:41 +00:00
Chris Baker	89c64932c1	gofmt	2018-12-12 19:09:06 +00:00
Chris Baker	22c11d8799	improved code for readability	2018-12-12 18:52:06 +00:00
Preetha	f406e66ab8	Merge pull request #4881 from hashicorp/f-device-preemption Device preemption	2018-12-11 18:34:19 -06:00
Alex Dadgar	1531b6d534	Merge pull request #4970 from hashicorp/f-no-iops Deprecate IOPS	2018-12-11 12:51:22 -08:00
Chris Baker	59beae35df	nomad/rpc listener: modified to throttle logging on "permanent" Accept() errors as well (with a higher delay cap)	2018-12-07 22:14:15 +00:00
Chris Baker	707bac0a7b	rpc accept loop: added backoff on logging for failed connections, in case there is a fast fail loop (NMD-1173)	2018-12-07 20:12:55 +00:00
Alex Dadgar	c918a96490	Warn if IOPS is being used	2018-12-06 16:17:09 -08:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Alex Dadgar	14a61ea3ea	Don't GC running but desired stop allocations This PR fixes an edge case where we could GC an allocation that was in a desired stop state but had not terminated yet. This can be hit if the client hasn't shutdown the allocation yet or if the allocation is still shutting down (long kill_timeout). Fixes https://github.com/hashicorp/nomad/issues/4940	2018-12-05 13:01:12 -08:00
Mahmood Ali	adb4d69576	Merge pull request #4956 from hashicorp/b-vault-client-tweaks-followup server/vault: Lock Vault expiration tracking	2018-12-04 19:46:59 -05:00
Mahmood Ali	50e38104a5	server/nomad: Lock Vault expiration tracking `currentExpiration` field is accessed in multiple goroutines: Stats and renewal, so needs locking. I don't anticipate high contention, so simple mutex suffices.	2018-12-04 09:29:48 -05:00
Preetha Appan	8656d3379f	Add guards around subtracting summary count	2018-12-03 11:16:35 -06:00
Danielle Tomlinson	51a9f7369e	Merge pull request #4936 from hashicorp/f-legacy-refactor Refactor and repackage client/driver	2018-11-30 13:38:06 +01:00
Danielle Tomlinson	d4cbd608ff	nomad: Remove on-submission job validation With the introduction of driver plugins, we're temporarily relying on _run time validation_ of driver configurations, rather than submission time.	2018-11-30 10:47:08 +01:00
Nick Ethier	80ae7e34f4	Merge pull request #4906 from hashicorp/f-metric-prefix-master Port metric prefix filtering to master	2018-11-29 22:27:47 -05:00
Nick Ethier	b1484aec33	nomad: fix hclog usage	2018-11-29 22:27:39 -05:00
Mahmood Ali	0a2611e41f	vault: protect against empty Vault secret response Also, fix a case where a successful second attempt of loading token can cause a panic.	2018-11-29 09:34:17 -05:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Nick Ethier	95362eaa02	Merge pull request #4844 from hashicorp/f-docker-plugin Docker driver plugin	2018-11-20 20:43:03 -05:00
Mahmood Ali	2e6133fd33	nil secrets as recoverable to keep renew attempts	2018-11-20 17:11:55 -05:00
Mahmood Ali	5827438983	Renew past recorded expiry till unrecoverable error Keep attempting to renew Vault token past locally recorded expiry, just in case the token was renewed out of band, e.g. on another Nomad server, until Vault returns an unrecoverable error.	2018-11-20 17:10:55 -05:00
Mahmood Ali	5836a341dd	fix typo	2018-11-20 17:10:55 -05:00
Mahmood Ali	93add67e04	round ttl duration for users	2018-11-20 17:10:55 -05:00
Mahmood Ali	4a0544b369	Track renewal expiration properly	2018-11-20 17:10:55 -05:00
Mahmood Ali	79aa934a4b	reconcile interface	2018-11-20 17:10:55 -05:00
Mahmood Ali	6efea6d8fc	Populate agent-info with vault Return Vault TTL info to /agent/self API and `nomad agent-info` command.	2018-11-20 17:10:55 -05:00
Mahmood Ali	6034af5084	Avoid explicit precomputed stats field Seems like the stats field is a micro-optimization that doesn't justify the complexity it introduces. Removing it and computing the stats from revoking field directly.	2018-11-20 17:10:54 -05:00
Mahmood Ali	14842200ec	More metrics for Server vault Add a gauge to track remaining time-to-live, duration of renewal request API call.	2018-11-20 17:10:54 -05:00
Mahmood Ali	e1994e59bd	address review comments	2018-11-20 17:10:54 -05:00
Mahmood Ali	35179c9655	Wrap Vault API api errors for easing debugging	2018-11-20 17:10:54 -05:00
Mahmood Ali	55456fc823	Set a 1s floor for Vault renew operation backoff	2018-11-20 17:10:54 -05:00
Mahmood Ali	7ad8f6c103	Merge pull request #4903 from hashicorp/b-delete-versions-mod-while-iter Fix a panic related to batch GC	2018-11-20 15:16:02 -05:00
Mahmood Ali	6281700c0c	address review comments	2018-11-20 13:21:39 -05:00
Nick Ethier	5c5cae79ab	nomad: only lookup job is disable_dispatched_job_summary_metrics is set	2018-11-19 23:22:23 -05:00
Nick Ethier	8ac69f440d	nomad: lookup job instead of adding Dispatched to summary	2018-11-19 23:22:02 -05:00
Nick Ethier	85b221a1d6	nomad: add flag to disable publishing of job_summary metrics for dispatched jobs	2018-11-19 23:21:19 -05:00
Nick Ethier	29591a7c2e	task_runner: emit event on task exit with exit result details	2018-11-19 22:59:17 -05:00
Mahmood Ali	d744e71fa9	add a missing no errorassertion	2018-11-19 21:44:00 -05:00
Mahmood Ali	b93643cd96	Fix a panic related to batch GC `deleteJobVersions` does concurrent modifications to iterated items while iterating, by deleting job versions while it's iterating on them,	2018-11-19 20:59:45 -05:00
Mahmood Ali	bff9c3b3e9	Reproduce a panic related to batch GC Test case that reproduces a panic with the following stacktrace: ``` panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715] goroutine 35 [running]: testing.tRunner.func1(0xc0001e2200) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387 panic(0x167e400, 0x1c43a30) /usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(radixIterator).Next(0xc0003a0420, 0x1756059, 0xb) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e github.com/hashicorp/nomad/nomad/state.(StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1 github.com/hashicorp/nomad/nomad/state.(StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2 github.com/hashicorp/nomad/nomad/state.(StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79 github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685 testing.tRunner(0xc0001e2200, 0x1777138) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf created by testing.(T).Run /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353 ```	2018-11-19 20:58:32 -05:00
Michael Schurter	56ed4f01be	vault: fix panic by checking for nil secret Vault's RenewSelf(...) API may return (nil, nil). We failed to check if secret was nil before attempting to use it. RenewSelf: `e3eee5b4fb/api/auth_token.go (L138-L155)` Calls ParseSecret: `e3eee5b4fb/api/secret.go (L309-L311)` If anyone has an idea on how to test this I didn't see any options. We use a real Vault service, so there's no opportunity to mock the response.	2018-11-19 17:07:59 -08:00
Danielle Tomlinson	8bf17fe22d	Merge pull request #4875 from hashicorp/f-constraints scheduler: Make != constraints more flexible	2018-11-15 11:04:21 -08:00
Danielle Tomlinson	9c72dafc95	scheduler: Add is_set/is_not_set constraints This adds constraints for asserting that a given attribute or value exists, or does not exist. This acts as a companion to =, or != operators, e.g: ```hcl constraint { attribute = "${attrs.type}" operator = "!=" value = "database" } constraint { attribute = "${attrs.type}" operator = "is_set" } ```	2018-11-15 11:00:32 -08:00
Preetha Appan	e5de50fba8	Initial implementation of device preemption	2018-11-15 11:09:26 -06:00
Mahmood Ali	046f098bac	Track Node Device attributes and serve them in API	2018-11-14 14:42:29 -05:00
Mahmood Ali	a4a9347501	fix comment typos	2018-11-14 08:36:14 -05:00
Mahmood Ali	1e92161f14	Merge pull request #4858 from hashicorp/b-fix-master-20181109 Fix some tests in master	2018-11-13 16:08:26 -05:00
Alex Dadgar	08dc2ea702	Merge pull request #4867 from hashicorp/b-deployment-progress-deadline Blocked evaluation fixes	2018-11-13 10:29:03 -08:00
Mahmood Ali	865419e756	convert all config durations to strings in tests	2018-11-13 10:21:40 -05:00
Mahmood Ali	4e18846fd9	Adjust streaming duration This test expects 11 repeats of the same message emitted at intervals of 200ms; so we need more than 2 seconds to adjust for time sleep variations and the like. So raising it to 3s here that should be enough.	2018-11-13 10:21:40 -05:00
Mahmood Ali	1403ad21b9	Changelog job re-run fix	2018-11-13 07:52:51 -05:00
Mahmood Ali	e2d668f21c	Merge pull request #4861 from hashicorp/b-batch-deregister-transaction Run job deregistering in a single transaction	2018-11-12 20:59:44 -05:00
Alex Dadgar	a90dc978e1	Handle new eval being the duplicate properly	2018-11-12 16:02:23 -08:00
Mahmood Ali	8513b3cccb	Comment public functions and batch write txn	2018-11-12 16:09:39 -05:00
Preetha Appan	7ef126a027	Smaller methods, and added tests for RPC layer	2018-11-10 17:37:33 -06:00
Preetha Appan	75662b50d1	Use response object/querymeta/writemeta in scheduler config API	2018-11-10 10:31:10 -06:00
Mahmood Ali	9c0a15f3ce	Run job deregistering in a single transaction Fixes https://github.com/hashicorp/nomad/issues/4299 Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals. Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation. This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.	2018-11-09 22:35:26 -05:00
Preetha	3739713ce1	Merge pull request #4839 from hashicorp/b-gc-alloc-jobversion Remove terminal allocations associated with older job modify index	2018-11-09 12:21:42 -06:00
Preetha Appan	39072977d6	Use create index as trigger condition to gc old terminal allocs	2018-11-09 11:44:21 -06:00
Alex Dadgar	2f06d88f47	Merge pull request #4847 from hashicorp/b-blocked-eval Blocked evaluation fixes	2018-11-08 13:40:01 -08:00
Alex Dadgar	98398a8a44	Merge pull request #4842 from hashicorp/b-deployment-progress-deadline Fix multiple bugs with progress deadline handling	2018-11-08 13:31:54 -08:00
Alex Dadgar	991791a513	typo fix	2018-11-08 13:28:27 -08:00
Alex Dadgar	be54e56570	review fixes	2018-11-08 09:48:36 -08:00
Preetha Appan	5f0a9d2cfd	Show preemption output in plan CLI	2018-11-08 09:48:43 -06:00
Alex Dadgar	dbb05357bc	fix test	2018-11-07 11:59:24 -08:00
Alex Dadgar	36abd3a3d8	review comments	2018-11-07 10:33:22 -08:00
Alex Dadgar	e3cbb2c82e	allocs fit checks if devices get oversubscribed	2018-11-07 10:33:22 -08:00
Alex Dadgar	4f9b3ede87	Split device accounter and allocator	2018-11-07 10:32:03 -08:00
Alex Dadgar	6fa893c801	affinities	2018-11-07 10:32:03 -08:00
Alex Dadgar	feb83a2be3	assign devices	2018-11-07 10:32:03 -08:00
Alex Dadgar	2d2248e209	Add devices to allocated resources	2018-11-07 10:32:03 -08:00
Alex Dadgar	b1c5d52817	Track jobs by namespace	2018-11-07 10:22:08 -08:00
Alex Dadgar	6d8bb3a7bd	Duplicate blocked evals cancelling improved The old logic for cancelling duplicate blocked evaluations by job id had the issue where the newer evaluation could have additional node classes that it is (in)eligible for that we would not capture. This could make it such that cluster state could change such that the job would make progress but no evaluation was unblocked.	2018-11-07 10:08:23 -08:00
Preetha Appan	a9aec7e628	Fix failing resource subtraction test	2018-11-06 12:26:26 -06:00
Alex Dadgar	261aae32b1	more robust merging of the deployment status when getting updates from the client	2018-11-05 16:39:09 -08:00
Alex Dadgar	1c31970464	Fix multiple tgs with progress deadline handling Fix an issue in which the deployment watcher would fail the deployment based on the earliest progress deadline of the deployment regardless of if the task group has finished. Further fix an issue where the blocked eval optimization would make it so no evals were created to progress the deployment. To reproduce this issue, prior to this commit, you can create a job with two task groups. The first group has count 1 and resources such that it can not be placed. The second group has count 3, max_parallel=1, and can be placed. Run this first and then update the second group to do a deployment. It will place the first of three, but never progress since there exists a blocked eval. However, that doesn't capture the fact that there are two groups being deployed.	2018-11-05 16:06:17 -08:00
Preetha Appan	6fdc84cce3	add comment	2018-11-02 18:11:36 -05:00
Preetha Appan	a6b714b81c	update preemption tests to use new node resource structs also includes a fix to remove unnecessary subtraction of network mbits	2018-11-02 17:59:53 -05:00
Preetha	b2b52b1ada	Merge pull request #4794 from hashicorp/f-preemption-systemjobs Preemption for system jobs	2018-11-02 16:28:06 -05:00
Preetha Appan	c33469157d	unit test plan apply with preemptions	2018-11-01 20:06:32 -05:00
Preetha Appan	57fe5050f0	more minor review feedback	2018-11-01 17:05:17 -05:00
Preetha Appan	fd60e66f86	Plumb alloc resource cache in a few more places. also removed now unused method	2018-11-01 16:44:43 -05:00
Preetha Appan	e586817ce7	batch jobs GC removes terminal allocs if job modifyindex is older than running job	2018-11-01 00:05:31 -05:00
Mahmood Ali	9da19c6450	address review comments	2018-10-30 13:58:52 -04:00
Mahmood Ali	4937095389	Allow artifacts checksum interpolation Fixes https://github.com/hashicorp/nomad/issues/4814	2018-10-30 13:24:30 -04:00
Preetha Appan	f1c3eb2792	Introduce interface with multiple implementations for resource distance	2018-10-30 11:06:32 -05:00
Preetha Appan	8f7eb61823	Introduce a response object for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	1a5421f5d7	more minor cleanup	2018-10-30 11:06:32 -05:00
Preetha Appan	0494a098ce	More style and readablity fixes from review	2018-10-30 11:06:32 -05:00
Preetha Appan	1415032c13	More review comments	2018-10-30 11:06:32 -05:00
Preetha Appan	b97f85e3e0	style fixes	2018-10-30 11:06:32 -05:00
Preetha Appan	12278527c7	make default config a variable	2018-10-30 11:06:32 -05:00
Preetha Appan	32cc764072	Add fsm layer tests	2018-10-30 11:06:32 -05:00
Preetha Appan	7b8156fc47	Restore/Snapshot plus unit tests for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	8807c25b11	Modify preemption code to use new style of resource structs	2018-10-30 11:06:32 -05:00
Preetha Appan	c1c1c230e4	Make preemption config a struct to allow for enabling based on scheduler type	2018-10-30 11:06:32 -05:00
Preetha Appan	bd34cbb1f7	Support for new scheduler config API, first use case is to disable preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	3190a2c29b	Fix linting	2018-10-30 11:06:32 -05:00
Preetha Appan	eb38488d08	Fix logic bug, unit test for plan apply method in state store	2018-10-30 11:06:32 -05:00
Preetha Appan	9e4a35fff0	Fix comment	2018-10-30 11:06:32 -05:00
Preetha Appan	cc295b90de	Implement preemption for system jobs. This commit implements an allocation selection algorithm for finding allocations to preempt. It currently special cases network resource asks from others (cpu/memory/disk/iops).	2018-10-30 11:06:32 -05:00
Preetha Appan	d11064d6ba	structs and API changes to plan and alloc structs needed for preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	9257387a69	Add number of evictions to DesiredUpdates struct to use in CLI/API	2018-10-30 11:06:32 -05:00
Preetha Appan	5ff4b8e36f	REview feedback	2018-10-30 11:06:32 -05:00
Preetha Appan	5b3bfb63eb	structs and API changes to plan and alloc structs needed for preemption	2018-10-30 11:06:32 -05:00
Michael Schurter	5d49832de4	tests: fix usages of TestClient cleanup and mock driver	2018-10-29 14:21:05 -07:00
Michael Schurter	e060174130	ar: fix leader handling, state restoring, and destroying unrun ARs * Migrated all of the old leader task tests and got them passing * Refactor and consolidate task killing code in AR to always kill leader tasks first * Fixed lots of issues with state restoring * Fixed deadlock in AR.Destroy if AR.Run had never been called * Added a new in memory statedb for testing	2018-10-19 09:45:45 -07:00
Alex Dadgar	6f0ed6184b	Fix client reloading and pass the plugin loaders to server and client	2018-10-16 16:56:55 -07:00
Michael Schurter	a4b4d7b266	consul service hook Deregistration works but difficult to test due to terminal updates not being fully implemented in the new client/ar/tr.	2018-10-16 16:53:29 -07:00
Alex Dadgar	e401c660e7	Implement lifecycle hooks on the task runner	2018-10-16 16:53:29 -07:00
Alex Dadgar	a78cefec18	use int64	2018-10-16 15:34:32 -07:00
Preetha Appan	7c0d8c646c	Change CPU/Disk/MemoryMB to int everywhere in new resource structs	2018-10-16 16:21:42 -05:00
Alex Dadgar	f5a76d8411	review comments	2018-10-15 15:31:13 -07:00
Alex Dadgar	f9b056e1d1	Replace attributes map with new Attribute object	2018-10-13 14:08:58 -07:00
Alex Dadgar	04ba425dd5	validate constraints/affinities	2018-10-13 12:27:49 -07:00
Alex Dadgar	9b5aaac410	Device feasability checker	2018-10-13 12:27:49 -07:00
Alex Dadgar	bfb4caa2e7	node devices	2018-10-13 12:27:49 -07:00
Alex Dadgar	5a07f9f96e	parse affinities and constraints on devices	2018-10-11 14:05:19 -07:00
Alex Dadgar	a2a56a930c	Diff	2018-10-08 17:02:58 -07:00
Alex Dadgar	6b08b9d6b6	Define device request structs	2018-10-08 15:38:03 -07:00
Alex Dadgar	01f8e5b95f	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	bac5cb1e8b	Scheduler uses allocated resources	2018-10-02 17:08:25 -07:00
Alex Dadgar	147d2430a1	allocated resources structs	2018-09-29 18:47:28 -07:00
Alex Dadgar	5c8697667e	Node reserved resources	2018-09-29 18:44:55 -07:00
Alex Dadgar	3183153315	Node resources on client	2018-09-29 17:23:41 -07:00
Alex Dadgar	9b793531d6	Merge pull request #4720 from hashicorp/b-jet-fixes Series of scheduler fixes / debugging enhancements	2018-09-25 13:25:11 -07:00
Alex Dadgar	bd420692f3	fix logging	2018-09-25 10:49:55 -07:00
Preetha Appan	86e725e84c	Added logging around nacked evals in the scheduler worker	2018-09-25 10:49:02 -07:00
Alex Dadgar	6a21f9fe96	Unique TriggerBy for blocked evals Give blocked evals a unique triggerby reason to make debugging a chain of evaluations easier.	2018-09-24 14:47:49 -07:00
Alex Dadgar	e1a102f58c	test allocs fit	2018-09-24 13:59:01 -07:00
Alex Dadgar	d7f5be9148	Better comment on snapshotindex	2018-09-24 13:53:43 -07:00
Alex Dadgar	99498da6ed	Denormalize jobs in plan and ignore resources of terminal allocs Denormalize jobs in AppendAllocs: AppendAlloc was originally only ever called for inplace upgrades and new allocations. Both these code paths would remove the job from the allocation. Now we use this to also add fields such as FollowupEvalID which did not normalize the job. This is only a performance enhancement. Ignore terminal allocs: Failed allocations are annotated with the followup Eval ID when one is created to replace the failed allocation. However, in the plan applier, when we check if allocations fit, these terminal allocations were not filtered. This could result in the plan being rejected if the node would be overcommited if the terminal allocations resources were considered.	2018-09-24 13:53:43 -07:00
Alex Dadgar	de442226ae	Fix other instances of blocking queries	2018-09-24 13:52:39 -07:00
Alex Dadgar	7f0d241ef4	always handle failed allocation	2018-09-21 15:13:54 -07:00
Alex Dadgar	b2449ae1ce	Fix deployment watcher index usage Fixes three issues: 1. Retrieving the latest evaluation index was not properly selecting the greatest index. This would undermine checks we had to reduce the number of evaluations created when the latest eval index was greater than any alloc change 2. Fix an issue where the blocking query code was using the incorrect index such that the index was higher than necassary. 3. Special case handling of blocked evaluation since the create/snapshot index is no particularly useful since they can be reblocked.	2018-09-21 13:59:11 -07:00
Alex Dadgar	5009566503	do not bootstrap with non voters	2018-09-19 17:17:39 -07:00
Alex Dadgar	e8f89597f5	fix rpc test	2018-09-19 10:17:54 -07:00
Alex Dadgar	9971b3393f	yamux	2018-09-17 14:22:40 -07:00
Alex Dadgar	b2f500b48c	Serf/Raft/Memberlist logger	2018-09-17 13:57:52 -07:00
Alex Dadgar	ca28afa3b2	small fixes	2018-09-15 16:42:38 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Alex Dadgar	7739ef51ce	agent + consul	2018-09-13 10:43:40 -07:00
Preetha Appan	996484981c	Fix panic when reschedule policy for allocation can't be looked up because its task group changed	2018-09-05 17:01:02 -05:00
Alex Dadgar	4f89cabd34	Merge pull request #4631 from hashicorp/f-plugin-config Parse plugin configs	2018-09-04 17:04:13 -07:00
Alex Dadgar	cc92cd92cd	Merge pull request #4642 from hashicorp/b-vet Fix vet errors and use newer go version in travis	2018-09-04 17:04:02 -07:00
Alex Dadgar	c6576ddac1	Fix make check errors	2018-09-04 16:03:52 -07:00
Preetha Appan	26288b9522	Fix more review feedback	2018-09-04 16:10:11 -05:00
Preetha Appan	751c0eb5a5	code review feedback	2018-09-04 16:10:11 -05:00
Preetha Appan	4f8e925b54	Move topk and delay heap to separate packages under lib	2018-09-04 16:10:11 -05:00
Preetha Appan	9bc0962527	Track top k nodes by norm score rather than top k nodes per scorer	2018-09-04 16:10:11 -05:00
Preetha Appan	6ed527c636	Use heap to store top K scoring nodes. Scoring metadata is now aggregated by scorer type to make it easier to parse when reading it in the CLI.	2018-09-04 16:10:11 -05:00
Preetha Appan	dd5fe6373f	Fix scoring logic for uneven spread to incorporate current alloc count Also addressed other small code review comments	2018-09-04 16:10:11 -05:00
Preetha Appan	e72c0fe527	more cleanup	2018-09-04 16:10:11 -05:00
Preetha Appan	92d37acc2a	comment and formatting cleanup	2018-09-04 16:10:11 -05:00
Preetha Appan	5812f906c8	Allow empty spread targets, and validate target percentages.	2018-09-04 16:10:11 -05:00
Preetha Appan	71bff00326	validate spread from job/task group validate methods	2018-09-04 16:10:11 -05:00
Preetha Appan	fbd0004707	Fix warnings	2018-09-04 16:10:11 -05:00
Preetha Appan	5eb82b6260	Validate method, and rename ratio field to percent	2018-09-04 16:10:11 -05:00
Preetha Appan	0037d72fa8	Structs and validation for spread	2018-09-04 16:10:11 -05:00
Preetha Appan	c407e3626f	More review comments	2018-09-04 16:10:11 -05:00
Preetha Appan	dbbb4a957a	Fail validation if system job has affinities	2018-09-04 16:10:11 -05:00
Preetha Appan	0bc030c6fb	Treat set_contains as a synonym of set_contains_all	2018-09-04 16:10:11 -05:00
Preetha Appan	e85a721cfb	Include affinities in job and task diff, and more test cases	2018-09-04 16:10:11 -05:00
Preetha Appan	f06c7ab2ad	Fix Copy method for job and task to include affinities	2018-09-04 16:10:11 -05:00
Preetha Appan	9f0caa9c3d	Affinity parsing, api and structs	2018-09-04 16:10:11 -05:00
Preetha Appan	9e29cfee76	Use readlock	2018-09-04 11:45:05 -05:00
Preetha Appan	062e5f1898	Use eval broker lock when reading/modifying delay heap	2018-08-31 10:59:48 -05:00
Alex Dadgar	bff1669ee4	Plugin config parsing	2018-08-29 17:06:01 -07:00
Chelsea Komlo	0a69cdb304	Merge pull request #4565 from hashicorp/b-compare-cert-alg Error if TLS Certificate signature algorithm isn't supported in cipher suites	2018-08-15 16:09:46 -04:00
Xopherus	8d747578e8	Close multiplexer when context is cancelled Multiplexer continues to create rpc connections even when the context which is passed to the underlying rpc connections is cancelled by the server. This was causing #4413 - when a SIGHUP causes everything to reload, it uses context to cancel the underlying http/rpc connections so that they may come up with the new configuration. The multiplexer was not being cancelled properly so it would continue to create rpc connections and constantly fail, causing communication issues with other nomad agents. Fixes #4413	2018-08-13 19:32:49 -04:00
Chelsea Holland Komlo	31d6d00381	add simple getter for certificate	2018-08-10 12:37:21 -04:00
Andrei Burd	444ee45aff	Parametrized/periodic jobs per child tagged metric emmision	2018-06-21 10:40:56 +03:00
Alex Dadgar	b61051b3cd	Merge pull request #4409 from hashicorp/r-client-packages Refactor client packages	2018-06-13 17:32:25 -07:00
Alex Dadgar	300b1a7a15	Tests only use testlog package logger	2018-06-13 15:40:56 -07:00
Chelsea Komlo	03075b603a	Merge pull request #4399 from hashicorp/r-reload-refactor Refactor logic for dynamic reloading	2018-06-13 13:35:12 -04:00
Alex Dadgar	d0043691fb	remove structs + bump version	2018-06-11 13:52:19 -07:00
Alex Dadgar	af5753d2cd	bump version + generated files	2018-06-11 13:39:42 -07:00
Nick Ethier	e75e3ae665	nomad: use require pkg for tests	2018-06-11 13:50:50 -04:00
Nick Ethier	50c72adbd7	nomad: code review comments	2018-06-11 13:27:48 -04:00
Nick Ethier	a581cc9c01	nomad/structs: fix job diff test	2018-06-11 13:06:49 -04:00
Nick Ethier	41e010cdc2	nomad: add 'Dispatch' field to Job New -bash: Dispatch: command not found field is used to denote if the Job is a child dispatched job of a parameterized job.	2018-06-11 11:59:03 -04:00
Chelsea Holland Komlo	de03ce8070	move logic to determine whether to reload tls configuration to tlsutil helper	2018-06-08 14:33:58 -04:00
Chelsea Komlo	d738976234	Merge pull request #4395 from hashicorp/b-vault-second Fix for dynamically reloading vault	2018-06-07 18:03:00 -04:00
Chelsea Holland Komlo	dcc9cdfeb7	fixup! comment and move to always log server reload operation	2018-06-07 17:12:36 -04:00
Chelsea Holland Komlo	41e35edf0c	fix test that now requires different config for test assertions	2018-06-07 17:07:06 -04:00
Chelsea Holland Komlo	9f6bd7bf3a	move logic for testing equality for vault config	2018-06-07 16:23:50 -04:00
Chelsea Holland Komlo	282f37b1ee	fix for dynamically reloading vault	2018-06-07 15:34:18 -04:00
Nick Ethier	2555bff4f5	nomad: add error check in test	2018-06-06 14:08:42 -04:00
Nick Ethier	d35bf6d184	nomad: handle edge case where node drain event shouldn't be emitted	2018-06-06 14:02:10 -04:00

... 9 10 11 12 13 ...

3456 commits