open-nomad

Author	SHA1	Message	Date
Drew Bailey	7420446458	Merge pull request #6639 from hashicorp/return-after-forward return after request has been forwarded	2019-11-08 09:48:35 -05:00
Lars Lehtonen	39b68e0b88	TestEvalBroker_Dequeue_Blocked() proper goroutine error handling (#6651 ) TestEvalBroker_Dequeue_Blocked() improve test readability	2019-11-08 08:52:23 -05:00
Nick Ethier	e947aaed4f	nomad: fix bug that didn't allow for multiple connect services in same tg	2019-11-08 04:33:39 -05:00
Lars Lehtonen	6deae70e35	TestEvalBroker_PauseResumeNackTimeout() proper goroutine error handling (#6649 ) TestEvalBroker_PauseResumeNackTimeout() improve test readability	2019-11-07 16:04:59 -05:00
Lars Lehtonen	2638cbb31d	nomad: TestEvalBroker_EnqueueAll_Dequeue_Fair() proper goroutine error handling (#6636 ) nomad: TestEvalBroker_EnqueueAll_Dequeue_Fair() improve test readability	2019-11-07 10:39:29 -05:00
Drew Bailey	a5e2e1805f	return after request has been forwarded	2019-11-07 08:33:53 -05:00
Lars Lehtonen	e64f98837c	nomad: fix dropped error in TestJobEndpoint_Deregister_ACL (#6602 )	2019-11-06 16:40:45 -05:00
Drew Bailey	f4a7e3dc75	coordinate closing of doneCh, use interface to simplify callers comments	2019-11-05 11:44:26 -05:00
Drew Bailey	fe542680dc	log-json -> json fix typo command/agent/monitor/monitor.go Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com> Update command/agent/monitor/monitor.go Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com> address feedback, lock to prevent send on closed channel fix lock/unlock for dropped messages	2019-11-05 09:51:59 -05:00
Drew Bailey	298b8358a9	move forwarded monitor request into helper	2019-11-05 09:51:56 -05:00
Drew Bailey	8726b685de	address feedback	2019-11-05 09:51:56 -05:00
Drew Bailey	0e759c401c	moving endpoints over to frames	2019-11-05 09:51:54 -05:00
Drew Bailey	17d876d5ef	rename function, initialize log level better underscores instead of dashes for query params	2019-11-05 09:51:53 -05:00
Drew Bailey	8178beecf0	address feedback, use agent_endpoint instead of monitor	2019-11-05 09:51:53 -05:00
Drew Bailey	db65b1f4a5	agent:read acl policy for monitor	2019-11-05 09:51:52 -05:00
Drew Bailey	2533617888	rpc acl tests for both monitor endpoints	2019-11-05 09:51:51 -05:00
Drew Bailey	3c33747e1f	client monitor endpoint tests	2019-11-05 09:51:50 -05:00
Drew Bailey	4bc68855d0	use intercepting loggers for rpchandlers	2019-11-05 09:51:50 -05:00
Drew Bailey	3b9c33a5f0	new hclog with standardlogger intercept	2019-11-05 09:51:49 -05:00
Drew Bailey	a45ae1cd58	enable json formatting, use queryoptions	2019-11-05 09:51:49 -05:00
Drew Bailey	786989dbe3	New monitor pkg for shared monitor functionality Adds new package that can be used by client and server RPC endpoints to facilitate monitoring based off of a logger clean up old code small comment about write rm old comment about minsize rename to Monitor Removes connection logic from monitor command Keep connection logic in endpoints, use a channel to send results from monitoring use new multisink logger and interfaces small test for dropped messages update go-hclogger and update sink/intercept logger interfaces	2019-11-05 09:51:49 -05:00
Lars Lehtonen	0a4542fadc	nomad: fix test goroutine (#6593 )	2019-10-31 08:23:32 -04:00
Seth Hoenig	98592113a3	Merge pull request #6582 from hashicorp/b-vault-createToken-log-msg nomad: fix vault.CreateToken log message printing wrong error	2019-10-29 17:35:05 -05:00
Mahmood Ali	7f2e4dc5d8	Merge pull request #6574 from hashicorp/b-gh-6570-vault-role-validation vault: honor new `token_period` in vault token role	2019-10-29 10:18:59 -04:00
Seth Hoenig	838c6e3329	nomad: fix vault.CreateToken log message printing wrong error Fixes typo in word "failed". Fixes bug where incorrect error is printed. The old code would only ever print a nil error, instead of the validationErr which is being created.	2019-10-28 23:05:32 -05:00
Mahmood Ali	c5d8d66787	Fix admissionValidators `admissionValidators` doesn't aggregate errors correctly, as it aggregates errors in `errs` reference yet it always returns the nil `err`. Here, we avoid shadowing `err`, and move variable declarations to where they are used.	2019-10-28 10:52:53 -04:00
Mahmood Ali	abb930249a	consul connect: do basic validation before mutating job `groupConnectHook` assumes that Networks is a non-empty slice, but TG hasn't been validated yet and validation may depend on mutation results. As such, we do basic check here before dereferencing network slice elements.	2019-10-28 10:49:02 -04:00
Mahmood Ali	bb45a7a776	add tests for consul connect validation	2019-10-28 10:41:51 -04:00
Mahmood Ali	4c64658397	vault: Support new role field `token_role` Vault 1.2.0 deprecated `period` field in favor of `token_period` in auth role: > * Token store roles use new, common token fields for the values > that overlap with other auth backends. `period`, `explicit_max_ttl`, and > `bound_cidrs` will continue to work, with priority being given to the > `token_` prefixed versions of those parameters. They will also be returned > when doing a read on the role if they were used to provide values initially; > however, in Vault 1.4 if `period` or `explicit_max_ttl` is zero they will no > longer be returned. (`explicit_max_ttl` was already not returned if empty.) https://github.com/hashicorp/vault/blob/master/CHANGELOG.md#120-july-30th-2019	2019-10-28 09:33:26 -04:00
Seth Hoenig	8b03477f46	Merge pull request #6448 from hashicorp/f-set-connect-sidecar-tags connect: enable setting tags on consul connect sidecar service in job…	2019-10-17 15:14:09 -05:00
Seth Hoenig	039fbd3f3b	connect: enable setting tags on consul connect sidecar service in jobspec (#6415 )	2019-10-17 19:25:20 +00:00
Mahmood Ali	4e4a9b252c	Merge pull request #6290 from hashicorp/r-generated-code-refactor dev: avoid codecgen code in downstream projects	2019-10-15 08:22:31 -04:00
Danielle	fee482ae6c	Merge pull request #6331 from hashicorp/dani/f-volume-mount-propagation volumes: Add support for mount propagation	2019-10-14 14:29:40 +02:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Mahmood Ali	4b2ba62e35	acl: check ACL against object namespace Fix a bug where a millicious user can access or manipulate an alloc in a namespace they don't have access to. The allocation endpoints perform ACL checks against the request namespace, not the allocation namespace, and performs the allocation lookup independently from namespaces. Here, we check that the requested can access the alloc namespace regardless of the declared request namespace. Ideally, we'd enforce that the declared request namespace matches the actual allocation namespace. Unfortunately, we haven't documented alloc endpoints as namespaced functions; we suspect starting to enforce this will be very disruptive and inappropriate for a nomad point release. As such, we maintain current behavior that doesn't require passing the proper namespace in request. A future major release may start enforcing checking declared namespace.	2019-10-08 12:59:22 -04:00
Mahmood Ali	674a457865	use RequestNamespace(), the canonical way to get namespace	2019-09-27 07:40:58 -04:00
Mahmood Ali	e29ee4c400	nomad: defensive check for namespaces in job registration call In a job registration request, ensure that the request namespace "header" and job namespace field match. This should be the case already in prod, as http handlers ensures that the values match [1]. This mitigates bugs that exploit bugs where we may check a value but act on another, resulting into bypassing ACL system. [1] https://github.com/hashicorp/nomad/blob/v0.9.5/command/agent/job_endpoint.go#L415-L418	2019-09-26 17:02:47 -04:00
Lang Martin	fb41dd86ba	default raft protocol v2	2019-09-24 14:37:55 -04:00
Lang Martin	31d7f116dd	nomad/server comments	2019-09-24 14:36:18 -04:00
Tim Gross	cd9c23617f	client/connect: ConsulProxy LocalServicePort/Address (#6358 ) Without a `LocalServicePort`, Connect services will try to use the mapped port even when delivering traffic locally. A user can override this behavior by pinning the port value in the `service` stanza but this prevents us from using the Consul service name to reach the service. This commits configures the Consul proxy with its `LocalServicePort` and `LocalServiceAddress` fields.	2019-09-23 14:30:48 -04:00
Danielle Lancashire	78b61de45f	config: Hoist volume.config.source into volume Currently, using a Volume in a job uses the following configuration: ``` volume "alias-name" { type = "volume-type" read_only = true config { source = "host_volume_name" } } ``` This commit migrates to the following: ``` volume "alias-name" { type = "volume-type" source = "host_volume_name" read_only = true } ``` The original design was based due to being uncertain about the future of storage plugins, and to allow maxium flexibility. However, this causes a few issues, namely: - We frequently need to parse this configuration during submission, scheduling, and mounting - It complicates the configuration from and end users perspective - It complicates the ability to do validation As we understand the problem space of CSI a little more, it has become clear that we won't need the `source` to be in config, as it will be used in the majority of cases: - Host Volumes: Always need a source - Preallocated CSI Volumes: Always needs a source from a volume or claim name - Dynamic Persistent CSI Volumes: Always needs a source to attach the volumes to for managing upgrades and to avoid dangling. - Dynamic Ephemeral CSI Volumes: Less thought out, but `source` will probably point to the plugin name, and a `config` block will allow you to pass meta to the plugin. Or will point to a pre-configured ephemeral config. *If implemented The new design simplifies this by merging the source into the volume stanza to solve the above issues with usability, performance, and error handling.	2019-09-13 04:37:59 +02:00
Mahmood Ali	4b8280e51d	remove generated code	2019-09-06 19:24:15 +00:00
Nomad Release bot	dc7d728a82	Generate files for 0.10.0-beta1 release	2019-09-06 18:47:09 +00:00
Mahmood Ali	01f42053e4	dev: avoid codecgen code in downstream projects This is an attempt to ease dependency management for external driver plugins, by avoiding requiring them to compile ugorji/go generated files. Plugin developers reported some pain with the brittleness of ugorji/go dependency in particular, specially when using go mod, the default go mod manager in golang 1.13. Context -------- Nomad uses msgpack to persist and serialize internal structs, using ugorji/go library. As an optimization, we use ugorji/go code generation to speedup process and aovid the relection-based slow path. We commit these generated files in repository when we cut and tag the release to ease reproducability and debugging old releases. Thus, downstream projects that depend on release tag, indirectly depends on ugorji/go generated code. Sadly, the generated code is brittle and specific to the version of ugorji/go being used. When go mod picks another version of ugorji/go then nomad (go mod by default uses release according to semver), downstream projects face compilation errors. Interestingly, downstream projects don't commonly serialize nomad internal structs. Drivers and device plugins use grpc instead of msgpack for the most part. In the few cases where they use msgpag (e.g. decoding task config), they do without codegen path as they run on driver specific structs not the nomad internal structs. Also, the ugorji/go serialization through reflection is generally backward compatible (mod some ugorji/go regression bugs that get introduced every now and then :( ). Proposal --------- The proposal here is to keep committing ugorji/go codec generated files for releases but to use a go tag for them. All nomad development through the makefile, including releasing, CI and dev flow, has the tag enabled. Downstream plugin projects, by default, will skip these files and life proceed as normal for them. The downside is that nomad developers who use generated code but avoid using make must start passing additional go tag argument. Though this is not a blessed configuration.	2019-09-06 09:22:00 -04:00
Mahmood Ali	6d73ca0cfb	Merge pull request #6250 from hashicorp/f-raft-protocol-v3 Update default raft protocol to version 3	2019-09-04 09:34:41 -04:00
Mahmood Ali	c94a5ef1f8	tests: give up on TestAutopilot_CleanupStaleRaftServer for now	2019-09-04 09:10:53 -04:00
Nick Ethier	6a90a9f505	structs: canonicalize tg Services and Networks (#6257 )	2019-09-04 08:55:47 -04:00
Mahmood Ali	6cefd8f97e	tests: attempt to fix TestAutopilot_CleanupStaleRaftServer Also add a utility function for waiting for stable leadership	2019-09-04 08:49:33 -04:00
Mahmood Ali	035a7a94d9	tests: update time sensitive tests Fix tests whose messages seem timing dependent.	2019-09-04 08:45:25 -04:00
Mahmood Ali	0beb757b6f	tests: disable server auto join by default Tests typically call join cluster directly rather than rely on consul discovery. Worse, consul discovery seems to cause additional leadership transitions when a server is shutdown in tests than tests expect.	2019-09-04 07:54:54 -04:00
Mahmood Ali	3e2ab6e2a3	address review feedback	2019-09-03 21:44:39 -04:00
Mahmood Ali	0a6d73020c	use current nomad version in testing	2019-09-03 21:42:41 -04:00
Mahmood Ali	9bd56587cd	Fix raft tests Wait until leadership stabalizes and all non-voters get promoted before killing leader	2019-09-03 14:53:29 -04:00
Michael Schurter	5957030d18	connect: add unix socket to proxy grpc for envoy (#6232 ) * connect: add unix socket to proxy grpc for envoy Fixes #6124 Implement a L4 proxy from a unix socket inside a network namespace to Consul's gRPC endpoint on the host. This allows Envoy to connect to Consul's xDS configuration API. * connect: pointer receiver on structs with mutexes * connect: warn on all proxy errors	2019-09-03 08:43:38 -07:00
Buck Doyle	21ec6a237c	Merge branch 'master' into f-policy-json # Conflicts: # CHANGELOG.md	2019-09-03 09:56:25 -05:00
Jasmine Dahilig	4edebe389a	add default update stanza and max_parallel=0 disables deployments (#6191 )	2019-09-02 10:30:09 -07:00
Buck Doyle	ab96785fc9	Change test to use valid HCL for rules	2019-08-29 16:09:02 -05:00
Buck Doyle	4a159f5dcf	Change parsing error to set rules to nil	2019-08-29 15:50:34 -05:00
Buck Doyle	5495a7e689	Add standard error-handling for parse failure	2019-08-29 11:12:02 -05:00
Buck Doyle	8b06712d21	Merge branch 'master' into f-policy-json	2019-08-29 11:11:21 -05:00
Mahmood Ali	3da10b5cb3	scheduler: tests for multiple drivers in TG	2019-08-29 09:03:31 -04:00
Mahmood Ali	a67f5f0565	update tests to run with v2	2019-08-28 16:42:08 -04:00
Mahmood Ali	6eabf53b91	Default raft protocol to version 3	2019-08-28 15:56:59 -04:00
Michael Schurter	f5792635ca	Merge pull request #6218 from hashicorp/f-consul-defaults consul: use Consul's defaults and env vars	2019-08-28 11:54:44 -07:00
Nick Ethier	9e96971a75	cli: display group ports and address in alloc status command output (#6189 ) * cli: display group ports and address in alloc status command output * add assertions for port.To = -1 case and convert assertions to testify	2019-08-27 23:59:36 -04:00
Nick Ethier	cbb27e74bc	Add environment variables for connect upstreams (#6171 ) * taskenv: add connect upstream env vars + test * set taskenv upstreams instead of appending * Update client/taskenv/env.go Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-27 23:41:38 -04:00
Michael Schurter	3b0e1d8ef7	consul: use Consul's defaults and env vars Use Consul's API package defaults and env vars as Nomad's defaults.	2019-08-27 14:56:52 -07:00
Mahmood Ali	3791a70aa9	Merge pull request #5676 from hashicorp/f-b-upgrade-ugorji-dep-20190508 Update ugorji/go to latest	2019-08-23 18:29:49 -04:00
Jerome Gravel-Niquet	cbdc1978bf	Consul service meta (#6193 ) * adds meta object to service in job spec, sends it to consul * adds tests for service meta * fix tests * adds docs * better hashing for service meta, use helper for copying meta when registering service * tried to be DRY, but looks like it would be more work to use the helper function	2019-08-23 12:49:02 -04:00
Michael Schurter	95b8048553	Merge pull request #6121 from hashicorp/f-connect-bootstrap connect: task hook for bootstrapping envoy sidecar	2019-08-22 10:58:31 -07:00
Michael Schurter	59e0b67c7f	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Danielle Lancashire	2e5f28029f	remove hidden field from host volumes We're not shipping support for "hidden" volumes in 0.10 any more, I'll convert this to an issue+mini RFC for future enhancement.	2019-08-22 08:48:05 +02:00
Danielle	0428284aee	Merge pull request #6180 from hashicorp/dani/readonly-acl Fine grained ACLs for Host Volumes	2019-08-21 22:22:14 +02:00
Danielle Lancashire	91bb67f713	acls: Break mount acl into mount-rw and mount-ro	2019-08-21 21:17:30 +02:00
Nick Ethier	c8556daf37	structs: validate no tcp checks for connect services (#6169 )	2019-08-21 12:42:53 -04:00
Michael Schurter	050cc32fde	Merge pull request #6157 from hashicorp/f-connect-register Register connect enabled group services with Consul	2019-08-20 14:45:38 -07:00
Tim Gross	7dc6ee2d27	structs: add taskgroup networks and services to plan diffs Adds a check for differences in `job.Diff` so that task group networks and services, including new Consul connect stanzas, show up in the job plan outputs.	2019-08-20 16:18:30 -04:00
Michael Schurter	b008fd1724	connect: register group services with Consul Fixes #6042 Add new task group service hook for registering group services like Connect-enabled services. Does not yet support checks.	2019-08-20 12:25:10 -07:00
Tim Gross	a0e923f46c	add optional task field to group service checks	2019-08-20 09:35:31 -04:00
Mahmood Ali	d699a70875	Merge pull request #5911 from hashicorp/b-rpc-consistent-reads Block rpc handling until state store is caught up	2019-08-20 09:29:37 -04:00
Nick Ethier	24f5a4c276	sidecar_task override in connect admission controller (#6140 ) * structs: use seperate SidecarTask struct for sidecar_task stanza and add merge * nomad: merge SidecarTask into proxy task during connect Mutate hook	2019-08-20 01:22:46 -04:00
Nick Ethier	965f00b2fc	Builtin Admission Controller Framework (#6116 ) * nomad: add admission controller framework * nomad: add admission controller framework and Consul Connect hooks * run admission controllers before checking permissions * client: add default node meta for connect configurables * nomad: remove validateJob func since it has been moved to admission controller * nomad: use new TaskKind type * client: use consts for connect sidecar image and log level * Apply suggestions from code review Co-Authored-By: Michael Schurter <mschurter@hashicorp.com> * nomad: add job register test with connect sidecar * Update nomad/job_endpoint_hooks.go Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-15 11:22:37 -04:00
Preetha Appan	72e45dd01e	More code review feedback	2019-08-12 17:41:40 -05:00
Preetha	76c8a11b31	Apply suggestions from code review Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-12 17:03:30 -05:00
Preetha Appan	219dc05541	Fix type for kind	2019-08-12 14:39:50 -05:00
Preetha Appan	35506c516d	Improve validation logic and add table driven tests	2019-08-12 14:39:50 -05:00
Preetha Appan	d324a9864e	Add validation for kind field if it is a consul connect proxy	2019-08-12 14:39:50 -05:00
Danielle Lancashire	b38c1d810e	job_endpoint: Validate volume permissions	2019-08-12 15:39:09 +02:00
Danielle Lancashire	33db40d4e6	structs: Document VolumeMount	2019-08-12 15:39:08 +02:00
Danielle Lancashire	861caa9564	HostVolumeConfig: Source -> Path	2019-08-12 15:39:08 +02:00
Danielle Lancashire	e132a30899	structs: Unify Volume and VolumeRequest	2019-08-12 15:39:08 +02:00
Danielle Lancashire	6d7b417e54	structs: Add declarations of basic structs for volume support	2019-08-12 15:39:08 +02:00
Nick Ethier	1871c1edbc	Add sidecar_task stanza parsing (#6104 ) * jobspec: breakup parse.go into smaller files * add sidecar_task parsing to jobspec and api * jobspec: combine service parsing logic for task and group service stanzas * api: use slice of ConsulUpstream values instead of pointers	2019-08-09 15:18:53 -04:00
Preetha Appan	a393ea79e8	Add field "kind" to task for use in connect tasks	2019-08-07 18:43:36 -05:00
Jasmine Dahilig	8d980edd2e	add create and modify timestamps to evaluations (#5881 )	2019-08-07 09:50:35 -07:00
Michael Schurter	3e4796799a	Merge pull request #6003 from pete-woods/add-job-status-metrics nomad: add job status metrics	2019-08-07 08:02:16 -07:00
Michael Schurter	d2862b33e6	Merge pull request #6045 from hashicorp/f-connect-groupservice consul: add Connect structs	2019-08-06 15:43:38 -07:00
Michael Schurter	ef9d100d2f	Merge pull request #6082 from hashicorp/b-vault-deadlock vault: fix deadlock in SetConfig	2019-08-06 15:30:17 -07:00
Michael Schurter	ecb1a65bb9	Merge pull request #6077 from hashicorp/b-vault-revlock vault: fix race in accessor revocations	2019-08-06 14:28:47 -07:00
Michael Schurter	b8e127b3c0	vault: ensure SetConfig calls are serialized This is a defensive measure as SetConfig should only be called serially.	2019-08-06 11:17:10 -07:00
Michael Schurter	5022341b27	vault: fix deadlock in SetConfig This seems to be the minimum viable patch for fixing a deadlock between establishConnection and SetConfig. SetConfig calls tomb.Kill+tomb.Wait while holding v.lock. establishConnection needs to acquire v.lock to exit but SetConfig is holding v.lock until tomb.Wait exits. tomb.Wait can't exit until establishConnect does! ``` SetConfig -> tomb.Wait ^ \| \| v v.lock <- establishConnection ```	2019-08-06 10:40:14 -07:00
Michael Schurter	17fd82d6ad	consul: add Connect structs Refactor all Consul structs into {api,structs}/services.go because api/tasks.go didn't make sense anymore and structs/structs.go is gigantic.	2019-08-06 08:15:07 -07:00
Michael Schurter	d0a83eb818	vault: fix race in accessor revocations	2019-08-05 15:08:04 -07:00
Preetha Appan	8b298621ef	Add more comments to clarify job.Stable field	2019-08-05 15:00:53 -05:00
Preetha Appan	e6a496bac0	Code review feedback	2019-07-31 01:04:08 -04:00
Preetha Appan	99eca85206	Scheduler changes to support network at task group level Also includes unit tests for binpacker and preemption. The tests verify that network resources specified at the task group level are properly accounted for	2019-07-31 01:04:08 -04:00
Michael Schurter	4501fe3c4d	structs: deepcopy shared alloc resources Also DRY up Networks code by using Networks.Copy	2019-07-31 01:04:06 -04:00
Michael Schurter	fb487358fb	connect: add group.service stanza support	2019-07-31 01:04:05 -04:00
Nick Ethier	a03f6a95a2	structs: refactor network validation to seperate fn	2019-07-31 01:03:16 -04:00
Danielle	1e7571eb85	fix structs comment Co-Authored-By: nickethier <ncethier@gmail.com>	2019-07-31 01:03:16 -04:00
Nick Ethier	aa7c08679e	structs: Add validations for task group networks	2019-07-31 01:03:16 -04:00
Nick Ethier	6c160df689	fix tests from introducing new struct fields	2019-07-31 01:03:16 -04:00
Nick Ethier	8650429e38	Add network stanza to group Adds a network stanza and additional options to the task group level in prep for allowing shared networking between tasks of an alloc.	2019-07-31 01:03:12 -04:00
Preetha Appan	d048029b5a	remove generated code and change version to 0.10.0	2019-07-30 15:56:05 -05:00
Nomad Release bot	e39fb11531	Generate files for 0.9.4 release	2019-07-30 19:05:18 +00:00
Buck Doyle	0a1a0419cb	Combine conditionals	2019-07-29 10:38:07 -05:00
Buck Doyle	0a082c1e5e	Update assertion to use better failure-reporting	2019-07-29 10:35:07 -05:00
Buck Doyle	c3deb7703d	Update policy endpoint to permit anonymous access	2019-07-26 13:07:42 -05:00
Pete Woods	9096aa3d23	Add job status metrics This avoids having to write services to repeatedly hit the jobs API	2019-07-26 10:12:49 +01:00
Buck Doyle	77f5a38c8f	Add parsed rules to policy response	2019-07-25 10:43:57 -05:00
Preetha Appan	6b4c40f5a8	remove generated code	2019-07-23 12:07:49 -05:00
Nomad Release bot	04187c8b86	Generate files for 0.9.4-rc1 release	2019-07-22 21:42:36 +00:00
Jasmine Dahilig	2157f6ddf1	add formatting for hcl parsing error messages (#5972 )	2019-07-19 10:04:39 -07:00
Lang Martin	f282da4ced	blocked_evals_test disable calls Flush	2019-07-18 10:32:13 -04:00
Lang Martin	8f7a20839e	worker comment system -> core	2019-07-18 10:32:13 -04:00
Lang Martin	83d20169f6	blocked_evals reset system evals on Flush	2019-07-18 10:32:13 -04:00
Lang Martin	6e3425babf	blocked_evals_test Test_UnblockNode	2019-07-18 10:32:12 -04:00
Lang Martin	ea275d5ce7	fsm attach UnblockNode on node updates	2019-07-18 10:32:12 -04:00
Lang Martin	3bf618f217	blocked_evals system evals indexed by job and node	2019-07-18 10:32:12 -04:00
Michael Schurter	81b4b6f19b	Merge pull request #5791 from hashicorp/b-plan-snapshotindex nomad: include snapshot index when submitting plans	2019-07-17 09:25:00 -07:00
Mahmood Ali	ad39bcef60	rpc: use tls wrapped connection for streaming rpc This ensures that server-to-server streaming RPC calls use the tls wrapped connections. Prior to this, `streamingRpcImpl` function uses tls for setting header and invoking the rpc method, but returns unwrapped tls connection. Thus, streaming writes fail with tls errors. This tls streaming bug existed since 0.8.0[1], but PR #5654[2] exacerbated it in 0.9.2. Prior to PR #5654, nomad client used to shuffle servers at every heartbeat -- `servers.Manager.setServers`[3] always shuffled servers and was called by heartbeat code[4]. Shuffling servers meant that a nomad client would heartbeat and establish a connection against all nomad servers eventually. When handling streaming RPC calls, nomad servers used these local connection to communicate directly to the client. The server-to-server forwarding logic was left mostly unexercised. PR #5654 means that a nomad client may connect to a single server only and caused the server-to-server forward streaming RPC code to get exercised more and unearthed the problem. [1] https://github.com/hashicorp/nomad/blob/v0.8.0/nomad/rpc.go#L501-L515 [2] https://github.com/hashicorp/nomad/pull/5654 [3] https://github.com/hashicorp/nomad/blob/v0.9.1/client/servers/manager.go#L198-L216 [4] https://github.com/hashicorp/nomad/blob/v0.9.1/client/client.go#L1603	2019-07-12 14:41:44 +08:00
Mahmood Ali	9c9bec62fd	rpc: add positive tests for server streaming RPC	2019-07-12 14:32:52 +08:00
Lang Martin	0b97175a16	node_endpoint preserve both messages as rpcs and in raft	2019-07-10 13:56:20 -04:00
Lang Martin	ee4848167c	core_sched add compat comment for later removal	2019-07-10 13:56:20 -04:00
Lang Martin	c13c97c6c2	structs drop deprecation warning, revert unnecessary comment change	2019-07-10 13:56:20 -04:00
Lang Martin	a95225d754	NodeDeregisterBatch -> NodeBatchDeregister match JobBatch pattern	2019-07-10 13:56:20 -04:00
Lang Martin	a8e72a5b68	state_store error if called without node_ids	2019-07-10 13:56:20 -04:00
Lang Martin	44cbca9b98	fsm new NodeDeregisterBatchRequestType sorted at the end of the case	2019-07-10 13:56:20 -04:00
Lang Martin	91e139dcb5	structs NodeDeregisterBatchRequestType must go at the end	2019-07-10 13:56:20 -04:00
Lang Martin	1cc6b4062c	fsm label batch_deregister_node metrics explicitly Co-Authored-By: Mahmood Ali <mahmood@notnoop.com>	2019-07-10 13:56:20 -04:00
Lang Martin	ad3549f906	core_sched use the new rpc names	2019-07-10 13:56:20 -04:00
Lang Martin	ce0f03651a	fsm support new NodeDeregisterBatchRequest	2019-07-10 13:56:20 -04:00
Lang Martin	fa5649998e	node endpoint support new NodeDeregisterBatchRequest	2019-07-10 13:56:19 -04:00
Lang Martin	683ab8d1d2	structs add NodeDeregisterBatchRequest	2019-07-10 13:56:19 -04:00
Lang Martin	82349aba5d	node_endpoint argument setup	2019-07-10 13:56:19 -04:00
Lang Martin	6dbf5d7d13	fsm return an error on both NodeDeregisterRequest fields set	2019-07-10 13:56:19 -04:00
Lang Martin	fbc78ba96c	fsm variable names for consistency	2019-07-10 13:56:19 -04:00
Lang Martin	09fd05bd8f	node_endpoint raft store then shutdown, test deprecation	2019-07-10 13:56:19 -04:00
Lang Martin	4610c70777	util simplify partitionAll	2019-07-10 13:56:19 -04:00
Lang Martin	d22d9fb5b2	core_sched check ServersMeetMinimumVersion	2019-07-10 13:56:19 -04:00
Lang Martin	3bf41211fb	fsm honor new and old style NodeDeregisterRequests	2019-07-10 13:56:19 -04:00
Lang Martin	3fb82e83a5	structs add back NodeDeregisterRequest.NodeID, compatibility	2019-07-10 13:56:19 -04:00
Lang Martin	a4472e3d34	core_sched check ServersMeetMinimumVersion, send old node deregister	2019-07-10 13:56:19 -04:00
Lang Martin	8e53c105fc	state_store just one index update, test deletion	2019-07-10 13:56:19 -04:00
Lang Martin	3e2d1f0338	node_endpoint improve error messages	2019-07-10 13:56:19 -04:00
Lang Martin	5a6a947e98	state_store improve error messages	2019-07-10 13:56:19 -04:00
Lang Martin	fd14cedf95	drainer watch_nodes_test batch of 1	2019-07-10 13:56:19 -04:00
Lang Martin	b176066d42	node_endpoint deregister the batch of nodes	2019-07-10 13:56:19 -04:00
Lang Martin	a97407e030	fsm NodeDeregisterRequest is now a batch	2019-07-10 13:56:19 -04:00
Lang Martin	d5ff2834ca	core_sched batch node deregistration requests	2019-07-10 13:56:19 -04:00
Lang Martin	10848841be	util partitionAll for paging	2019-07-10 13:56:19 -04:00
Lang Martin	be2d6853cb	state_store DeleteNode operates on a batch of ids	2019-07-10 13:56:19 -04:00
Lang Martin	77cf037bff	struct NodeDeregisterRequest has a batch of NodeIDs	2019-07-10 13:56:19 -04:00
Mahmood Ali	ea3a98357f	Block rpc handling until state store is caught up Here, we ensure that when leader only responds to RPC calls when state store is up to date. At leadership transition or launch with restored state, the server local store might not be caught up with latest raft logs and may return a stale read. The solution here is to have an RPC consistency read gate, enabled when `establishLeadership` completes before we respond to RPC calls. `establishLeadership` is gated by a `raft.Barrier` which ensures that all prior raft logs have been applied. Conversely, the gate is disabled when leadership is lost. This is very much inspired by https://github.com/hashicorp/consul/pull/3154/files	2019-07-02 16:07:37 +08:00
Preetha Appan	3cb798235d	Missed one revert of backwards compatibility for node drain	2019-07-01 16:46:05 -05:00
Preetha Appan	aa2b4b4e00	Undo removal of node drain compat changes Decided to remove that in 0.10	2019-07-01 15:12:01 -05:00
Preetha Appan	3484f18984	Fix more tests	2019-06-26 16:30:53 -05:00
Preetha Appan	ff1b80dba6	Fix node drain test	2019-06-26 16:12:07 -05:00
Preetha Appan	23319e04d6	Restore accidentally deleted block	2019-06-26 13:59:14 -05:00
Michael Schurter	69ba495f0c	nomad: expand comments on subtle plan apply behaviors	2019-06-26 08:49:24 -07:00
Preetha Appan	66fa6a67ec	newline	2019-06-25 19:41:09 -05:00
Preetha Appan	10e7d6df6d	Remove compat code associated with many previous versions of nomad This removes compat code for namespaces (0.7), Drain(0.8) and other older features from releases older than Nomad 0.7	2019-06-25 19:05:25 -05:00
Michael Schurter	e4bc943a68	nomad: SnapshotAfter -> SnapshotMinIndex Rename SnapshotAfter to SnapshotMinIndex. The old name was not technically accurate. SnapshotAtOrAfter is more accurate, but wordy and still lacks context about what precisely it is at or after (the index). SnapshotMinIndex was chosen as it describes the action (snapshot), a constraint (minimum), and the object of the constraint (index).	2019-06-24 12:16:46 -07:00
Michael Schurter	0f8164b2f1	nomad: evaluate plans after previous plan index The previous commit prevented evaluating plans against a state snapshot which is older than the snapshot at which the plan was created. This is correct and prevents failures trying to retrieve referenced objects that may not exist until the plan's snapshot. However, this is insufficient to guarantee consistency if the following events occur: 1. P1, P2, and P3 are enqueued with snapshot @ 100 2. Leader evaluates and applies Plan P1 with snapshot @ 100 3. Leader evaluates Plan P2 with snapshot+P1 @ 100 4. P1 commits @ 101 4. Leader evaluates applies Plan P3 with snapshot+P2 @ 100 Since only the previous plan is optimistically applied to the state store, the snapshot used to evaluate a plan may not contain the N-2 plan! To ensure plans are evaluated and applied serially we must consider all previous plan's committed indexes when evaluating further plans. Therefore combined with the last PR, the minimum index at which to evaluate a plan is: min(previousPlanResultIndex, plan.SnapshotIndex)	2019-06-24 12:16:46 -07:00
Michael Schurter	e10fea1d7a	nomad: include snapshot index when submitting plans Plan application should use a state snapshot at or after the Raft index at which the plan was created otherwise it risks being rejected based on stale data. This commit adds a Plan.SnapshotIndex which is set by workers when submitting plan. SnapshotIndex is set to the Raft index of the snapshot the worker used to generate the plan. Plan.SnapshotIndex plays a similar role to PlanResult.RefreshIndex. While RefreshIndex informs workers their StateStore is behind the leader's, SnapshotIndex is a way to prevent the leader from using a StateStore behind the worker's. Plan.SnapshotIndex should be considered the lower bound index for consistently handling plan application. Plans must also be committed serially, so Plan N+1 should use a state snapshot containing Plan N. This is guaranteed for plans after the first plan after a leader election. The Raft barrier on leader election ensures the leader's statestore has caught up to the log index at which it was elected. This guarantees its StateStore is at an index > lastPlanIndex.	2019-06-24 12:16:46 -07:00
Chris Baker	59fac48d92	alloc lifecycle: 404 when attempting to stop non-existent allocation	2019-06-20 21:27:22 +00:00
Preetha	586e50d1a4	Merge pull request #5841 from hashicorp/f-raft-snapshot-metrics Raft and state store indexes as metrics	2019-06-19 12:01:03 -05:00
Preetha Appan	dc0ac81609	Change interval of raft stats collection to 10s	2019-06-19 11:58:46 -05:00
Preetha Appan	104d66f10c	Changed name of metric	2019-06-17 15:51:31 -05:00
Chris Baker	e0170e1c67	metrics: add namespace label to allocation metrics	2019-06-17 20:50:26 +00:00
Preetha Appan	c54b4a5b17	Emit metrics with raft commit and apply index and statestore latest index	2019-06-14 16:30:27 -05:00
Jasmine Dahilig	ed9740db10	Merge pull request #5664 from hashicorp/f-http-hcl-region backfill region from hcl for jobUpdate and jobPlan	2019-06-13 12:25:01 -07:00
Jasmine Dahilig	51e141be7a	backfill region from job hcl in jobUpdate and jobPlan endpoints - updated region in job metadata that gets persisted to nomad datastore - fixed many unrelated unit tests that used an invalid region value (they previously passed because hcl wasn't getting picked up and the job would default to global region)	2019-06-13 08:03:16 -07:00
Nick Ethier	1b7fa4fe29	Optional Consul service tags for nomad server and agent services (#5706 ) Optional Consul service tags for nomad server and agent services	2019-06-13 09:00:35 -04:00
Mahmood Ali	e31159bf1f	Prepare for 0.9.4 dev cycle	2019-06-12 18:47:50 +00:00
Nomad Release bot	4803215109	Generate files for 0.9.3 release	2019-06-12 16:11:16 +00:00
Mahmood Ali	07f2c77c44	comment DenormalizeAllocationDiffSlice applies to terminal allocs only	2019-06-12 08:28:43 -04:00
Lang Martin	fe8a4781d8	config merge maintains *HCL string fields used for duration conversion	2019-06-11 16:34:04 -04:00
Mahmood Ali	392f5bac44	Stop updating allocs.Job on stopping or preemption	2019-06-10 18:30:20 -04:00
Mahmood Ali	6c8e329819	test that stopped alloc jobs aren't modified When an alloc is stopped, test that we don't update the job found in alloc with new job that is no longer relevent for this alloc.	2019-06-10 17:14:26 -04:00
Mahmood Ali	d30c3d10b0	Merge pull request #5747 from hashicorp/b-test-fixes-20190521-1 More test fixes	2019-06-05 19:09:18 -04:00
Mahmood Ali	87173111de	Merge pull request #5746 from hashicorp/b-no-updating-inmem-node set node.StatusUpdatedAt in raft	2019-06-05 19:05:21 -04:00
Mahmood Ali	97957fbf75	Prepare for 0.9.3 dev cycle	2019-06-05 14:54:00 +00:00
Nomad Release bot	43bfbf3fcc	Generate files for 0.9.2 release	2019-06-05 11:59:27 +00:00
Michael Schurter	073893f529	nomad: disable service+batch preemption by default Enterprise only. Disable preemption for service and batch jobs by default. Maintain backward compatibility in a x.y.Z release. Consider switching the default for new clusters in the future.	2019-06-04 15:54:50 -07:00
Michael Schurter	a8fc50cc1b	nomad: revert use of SnapshotAfter in planApply Revert plan_apply.go changes from #5411 Since non-Command Raft messages do not update the StateStore index, SnapshotAfter may unnecessarily block and needlessly fail in idle clusters where the last Raft message is a non-Command message. This is trivially reproducible with the dev agent and a job that has 2 tasks, 1 of which fails. The correct logic would be to SnapshotAfter the previous plan's index to ensure consistency. New clusters or newly elected leaders will not have a previous plan, so the index the leader was elected should be used instead.	2019-06-03 15:34:21 -07:00
Mahmood Ali	a4ead8ff79	remove 0.9.2-rc1 generated code	2019-05-23 11:14:24 -04:00
Nomad Release bot	6d6bc59732	Generate files for 0.9.2-rc1 release	2019-05-22 19:29:30 +00:00
Lang Martin	d46613ff44	structs check TaskGroup.Update for nil	2019-05-22 12:34:57 -04:00
Lang Martin	10a3fd61b0	comment replace COMPAT 0.7.0 for job.Update with more current info	2019-05-22 12:34:57 -04:00
Lang Martin	67ebcc47dd	structs comment todo DeploymentStatus & DeploymentStatusDescription	2019-05-22 12:34:57 -04:00
Lang Martin	21bf9fdf90	structs job warnings for taskgroup with mixed auto_promote settings	2019-05-22 12:34:57 -04:00
Lang Martin	0f6f543a5f	deployment_watcher auto promote iff every task group is auto promotable	2019-05-22 12:34:57 -04:00
Lang Martin	d27d6f8ede	structs validate requires Canary for AutoPromote	2019-05-22 12:32:08 -04:00
Lang Martin	0c668ecc7a	log error on autoPromoteDeployment failure	2019-05-22 12:32:08 -04:00
Lang Martin	f23f9fd99e	describe a pending deployment without auto_promote more explicitly	2019-05-22 12:32:08 -04:00
Lang Martin	34230577df	describe a pending deployment with auto_promote accurately	2019-05-22 12:32:08 -04:00
Lang Martin	b5fd735960	add update AutoPromote bool	2019-05-22 12:32:08 -04:00
Lang Martin	3c5a9fed22	deployments_watcher_test new TestWatcher_AutoPromoteDeployment	2019-05-22 12:32:08 -04:00
Lang Martin	0bebf5d7f8	deployment_watcher when it's ok to autopromote, do so	2019-05-22 12:32:08 -04:00
Lang Martin	0cf4168ed9	deployments_watcher comments	2019-05-22 12:32:08 -04:00
Lang Martin	0c403eafde	state_store typo in a comment	2019-05-22 12:32:08 -04:00
Lang Martin	e1e28307be	new deploymentwatcher/doc.go for package level documentation	2019-05-22 12:32:08 -04:00
Mahmood Ali	9ff5f163b5	update callers in tests	2019-05-21 21:10:17 -04:00
Mahmood Ali	6bdbeed319	set node.StatusUpdatedAt in raft Fix a case where `node.StatusUpdatedAt` was manipulated directly in memory. This ensures that StatusUpdatedAt is set in raft layer, and ensures that the field is updated when node drain/eligibility is updated too.	2019-05-21 16:13:32 -04:00
Mahmood Ali	2159d0f3ac	tests: fix some nomad/drainer test data races	2019-05-21 14:40:58 -04:00
Mahmood Ali	3b0152d778	tests: fix deploymentwatcher tests data races	2019-05-21 14:29:45 -04:00
Michael Schurter	689794e08d	nomad: fix deadlock in UnblockClassAndQuota Previous commit could introduce a deadlock if the capacityChangeCh was full and the receiving side exited before freeing a slot for the sending side could send. Flush would then block forever waiting to acquire the lock just to throw the pending update away. The race is around getting/setting the chan field, not chan operations, so only lock around getting the chan field.	2019-05-20 15:41:52 -07:00
Michael Schurter	8c99214f69	nomad: fix race in BlockedEvals I assume the mutex was being released before sending on capacityChangeCh to avoid blocking in the critical section, but: 1. This is race. 2. capacityChangeCh has a huge buffer (8096). If it's full things already seem Very Bad, and a little backpressure seems appropriate.	2019-05-20 15:26:20 -07:00
Michael Schurter	05a9c6aedb	Merge pull request #5411 from hashicorp/b-snapshotafter Block plan application until state store has caught up to raft	2019-05-20 14:03:10 -07:00
Mahmood Ali	cd64ada95d	Run TestClientAllocations_Restart_ACL test	2019-05-17 20:30:23 -04:00
Michael Schurter	0e39927782	nomad: emit more detailed error Avoid returning context.DeadlineExceeded as it lacks helpful information and is often ignored or handled specially by callers.	2019-05-17 14:37:42 -07:00
Michael Schurter	b80a7e0feb	nomad: wait for state store to sync in plan apply Wait for state store to catch up with raft when applying plans.	2019-05-17 14:37:12 -07:00
Michael Schurter	1bc731da47	nomad: remove unused NotifyGroup struct I don't think it's been used for a long time.	2019-05-17 13:30:23 -07:00
Michael Schurter	9732bc37ff	nomad: refactor waitForIndex into SnapshotAfter Generalize wait for index logic in the state store for reuse elsewhere. Also begin plumbing in a context to combine handling of timeouts and shutdown.	2019-05-17 13:30:23 -07:00
Preetha	c8fdf20c66	Merge pull request #5717 from hashicorp/b-plan-apply-preemptions Fix bug in plan applier introduced in PR-5602	2019-05-16 11:01:05 -05:00
Preetha	2dcd4291f8	Merge pull request #5702 from hashicorp/f-filter-by-create-index Filter deployments by create index	2019-05-15 21:50:41 -05:00
Preetha	555dd23c2c	remove stray newline Co-Authored-By: Danielle <dani@builds.terrible.systems>	2019-05-15 21:11:52 -05:00
Preetha Appan	2b787aad7e	Fix bug in plan applier introduced in PR-5602 This fixes a bug in the state store during plan apply. When denormalizing preempted allocations it incorrectly set the preemptor's job during the update. This eventually causes a panic downstream in the client. Added a test assertion that failed before and passes after this fix	2019-05-15 20:34:06 -05:00
Danielle	d202582502	Merge pull request #5699 from hashicorp/dani/b-eval-broker-lifetime Eval Broker: Prevent redundant enqueue's when a node is not a leader	2019-05-15 23:30:52 +01:00
Danielle Lancashire	2fb93a6229	evalbroker: test for no enqueue on disabled	2019-05-15 11:02:21 +02:00
Nick Ethier	ade97bc91f	fixup #5172 and rebase against master	2019-05-14 14:37:34 -04:00
Nick Ethier	cab6a95668	Merge branch 'master' into pr/5172 * master: (912 commits) Update redirects.txt Added redirect for Spark guide link client: log when server list changes docs: mention regression in task config validation fix update to changelog update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665 typo: "atleast" -> "at least" implement nomad exec for rkt docs: fixed typo use pty/tty terminology similar to github.com/kr/pty vendor github.com/kr/pty drivers: implement streaming exec for executor based drivers executors: implement streaming exec executor: scaffolding for executor grpc handling client: expose allocated memory per task client improve a comment in updateNetworks stalebot: Add 'thinking' as an exempt label (#5684) Added Sparrow link update links to use new canonical location Add redirects for restructing done in GH-5667 ...	2019-05-14 14:10:33 -04:00
Michael Schurter	d7e5ace1ed	client: do not restart dead tasks until server is contacted Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.	2019-05-14 10:53:27 -07:00
Danielle Lancashire	d9815888ed	evalbroker: Simplify nextDelayedEval locking	2019-05-14 14:06:27 +02:00
Danielle Lancashire	38562afbc1	evalbroker: No new enqueues when disabled Currently when an evalbroker is disabled, it still recieves delayed enqueues via log application in the fsm. This causes an ever growing heap of evaluations that will never be drained, and can cause memory issues in larger clusters, or when left running for an extended period of time without a leader election. This commit prevents the enqueuing of evaluations while we are disabled, and relies on the leader restoreEvals routine to handle reconciling state during a leadership transition. Existing dequeues during an Enabled->Disabled broker state transition are handled by the enqueueLocked function dropping evals.	2019-05-14 13:59:10 +02:00
Danielle Lancashire	c91ae21a6c	evalbroker: Flush within update lock Primarily a cleanup commit, however, currently there is a potential race condition (that I'm not sure we've ever actually hit) during a flapping SetEnabled/Disabled state where we may never correctly restart the eval broker, if it was being called from multiple routines.	2019-05-14 13:26:56 +02:00
Preetha Appan	4d3f74e161	Fix test setup to have correct jobcreateindex for deployments	2019-05-13 18:53:47 -05:00
Preetha Appan	d448750449	Lookup job only once, and fix tests	2019-05-13 18:33:41 -05:00
Preetha Appan	07690d6f9e	Add flag similar to --all for allocs to be able to filter deployments by latest	2019-05-13 18:33:41 -05:00
Jasmine Dahilig	30d346ca15	Merge pull request #5665 from hashicorp/b-empty-datacenters add non-empty string validation for datacenters	2019-05-13 10:23:26 -07:00
Mahmood Ali	cf1f3625b4	Update ugorji/go to latest Our testing so far indicates that ugorji/go/codec maintains backward compatiblity with the version we are using now, for purposes of Nomad serialization. Using latest ugorji/go allows us to get back to using upstream library, get get the optimizations benefits in RPC paths (including code generation optimizations). ugorji/go introduced two significant changes: * time binary format in `debb8e2d2e`. Setting `h.BasicHandle.TimeNotBuiltin = true` restores old behavior * ugorji/go started honoring `json` tag as well: v1.1.4 is the latest but has a bug in handling RawString that's fixed in `d09a80c1e0` .	2019-05-09 19:35:58 -04:00
Mahmood Ali	919827f2df	Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base nomad exec part 1: plumbing and docker driver	2019-05-09 18:09:27 -04:00
Mahmood Ali	3c668732af	server: server forwarding logic for nomad exec endpoint	2019-05-09 16:49:08 -04:00
Jasmine Dahilig	0ba2bd15b9	add unit tests for datacenter non-empty string validation	2019-05-08 11:51:52 -07:00
Mahmood Ali	9d3f13e9b3	remove Index field from EmitNodeEventsResponse `Index` is already included as part of `WriteMeta` embedding. This is a backward compatible change: Clients never read the field; and Server refernces to `EmitNodeEventsResponse.Index` would be using the value in `WriteMeta`, which is consistent with other response structs.	2019-05-08 08:42:26 -04:00
Preetha	1538913a2a	Merge pull request #5628 from hashicorp/f-preemption-config Add config to disable preemption for batch/service jobs	2019-05-06 15:40:35 -05:00
Mahmood Ali	f35ad92a8b	Merge pull request #5646 from hashicorp/some-ugorji-fixes Codegen codec helpers for all nomad structs	2019-05-06 13:23:12 -04:00
Lang Martin	9f3f11df97	Merge pull request #5601 from hashicorp/b-config-parse-direct-hcl config parse direct hcl	2019-05-06 12:05:19 -04:00
Mahmood Ali	92c133b905	Update peers info with new raft config details	2019-05-03 16:55:53 -04:00
Preetha Appan	ad3c263d3f	Rename to match system scheduler config. Also added docs	2019-05-03 14:06:12 -05:00
Jasmine Dahilig	016495c368	add non-empty string validation for datacenters	2019-05-03 06:48:02 -07:00
Hemanth Basappa	3fef02aa93	Add support in nomad for supporting raft 3 protocol peers.json	2019-05-02 09:11:23 -07:00
Mahmood Ali	21d21baf8b	codegen codecs for nomad structs `ls *[!_test].go` was ignoring any file that ends with `s.go` (or any of the letter inside `[]`), including `structs.go`!	2019-05-01 12:42:55 -04:00
Lang Martin	598112a1cc	tag HCL bookkeeping keys with json:"-" to keep them out of the api	2019-04-30 10:29:14 -04:00
Lang Martin	5ebae65d1a	agent/config, config/* mapstructure tags -> hcl tags	2019-04-30 10:29:14 -04:00
Preetha Appan	6615d5c868	Add config to disable preemption for batch/service jobs	2019-04-29 18:48:07 -05:00
Lang Martin	371014b781	Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config client fingerprinter doesn't overwrite manual configuration	2019-04-26 12:55:34 -04:00
Danielle Lancashire	3409e0be89	allocs: Add nomad alloc signal command This command will be used to send a signal to either a single task within an allocation, or all of the tasks if <task-name> is omitted. If the sent signal terminates the allocation, it will be treated as if the allocation has crashed, rather than as if it was operator-terminated. Signal validation is currently handled by the driver itself and nomad does not attempt to restrict or validate them.	2019-04-25 12:43:32 +02:00
Arshneet Singh	b7b050cdd1	Change min version required for plan optimization	2019-04-24 12:36:07 -07:00
Arshneet Singh	9cc39edb67	Return error when preempted/stopped alloc doesn't exist during denormalization	2019-04-24 12:36:07 -07:00
Lang Martin	19ba0f4882	structs_test use testify require.True instead of t.Fatal	2019-04-23 17:00:11 -04:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	0dd4c109e8	Compat tags	2019-04-23 09:18:01 -07:00
Arshneet Singh	65f5fab131	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	198a838b61	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	832f607433	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Lang Martin	8aa97cff13	tests over setwise equality of fingerprinted parts	2019-04-19 15:49:24 -04:00
Lang Martin	7de6e28ddc	structs need to keep assert Equal interface implementation for tests	2019-04-19 15:23:49 -04:00
Lang Martin	977d33970b	structs equals use labeled continue for clarity	2019-04-19 15:23:48 -04:00
Lang Martin	7b99488afa	struct equals use a working pattern for setwise comparison	2019-04-19 15:23:48 -04:00
Lang Martin	eba4e29440	client fingerprinter doesn't overwrite manual configuration Revert "Revert accidental merge of pr #5482" This reverts commit c45652ab8c113487b9d4fbfb107782cbcf8a85b0.	2019-04-19 15:23:48 -04:00
Preetha Appan	22109d1e20	Add preemption related fields to AllocationListStub	2019-04-18 10:36:44 -05:00
Lang Martin	a2a1e7829d	Revert accidental merge of pr #5482 Revert "fingerprint Constraints and Affinities have Equals, as set" This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609. Revert "client tests assert the independent handling of interface and speed" This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40. Revert "structs missed applying a style change from the review" This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2. Revert "client, structs comments" This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2. Revert "client fingerprint updateNetworks preserves the network configuration" This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c. Revert "client_test cleanup comments from review" This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445. Revert "client Networks Equals is set equality" This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573. Revert "struct cleanup indentation in RequestedDevice Equals" This reverts commit f4746411cab328215def6508955b160a53452da3. Revert "struct Equals checks for identity before value checking" This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226. Revert "fix client-test, avoid hardwired platform dependecy on lo0" This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7. Revert "refactor error in client fingerprint to include the offending data" This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5. Revert "add client updateNodeResources to merge but preserve manual config" This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032. Revert "refactor struts.RequestedDevice to have its own Equals" This reverts commit 689782524090e51183474516715aa2f34908b8e6. Revert "refactor structs.Resource.Networks to have its own Equals" This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1. Revert "refactor structs.Resource.Devices to have its own Equals" This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9. Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources" This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5. Revert "add structs.Resources Equals" This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b. Revert "test that fingerprint resources are updated, net not clobbered" This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.	2019-04-11 10:29:40 -04:00
Lang Martin	07ff740408	fingerprint Constraints and Affinities have Equals, as set	2019-04-11 09:56:22 -04:00
Lang Martin	8f07698c03	structs missed applying a style change from the review	2019-04-11 09:56:22 -04:00
Lang Martin	7258a13c72	client, structs comments	2019-04-11 09:56:22 -04:00
Lang Martin	1878bf694e	client Networks Equals is set equality	2019-04-11 09:56:22 -04:00
Lang Martin	e1c91afd19	struct cleanup indentation in RequestedDevice Equals	2019-04-11 09:56:22 -04:00
Lang Martin	0c90efebdc	struct Equals checks for identity before value checking	2019-04-11 09:56:22 -04:00
Lang Martin	1a594b53f6	refactor struts.RequestedDevice to have its own Equals	2019-04-11 09:56:21 -04:00
Lang Martin	ec1ccdeda0	refactor structs.Resource.Networks to have its own Equals NodeResource.Networks uses the same function	2019-04-11 09:56:21 -04:00
Lang Martin	06008465c4	refactor structs.Resource.Devices to have its own Equals	2019-04-11 09:56:21 -04:00
Lang Martin	36f3022246	add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources	2019-04-11 09:56:21 -04:00
Lang Martin	d4567e9909	add structs.Resources Equals	2019-04-11 09:56:21 -04:00
Danielle Lancashire	e135876493	allocs: Add nomad alloc restart This adds a `nomad alloc restart` command and api that allows a job operator with the alloc-lifecycle acl to perform an in-place restart of a Nomad allocation, or a given subtask.	2019-04-11 14:25:49 +02:00
Chris Baker	34e100cc96	server vault client: use two vault clients, one with namespace, one without for /sys calls	2019-04-10 10:34:10 -05:00
Michael Schurter	cc7768c170	Update nomad/structs/config/vault.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-04-10 10:34:10 -05:00
Chris Baker	a26d4fe1e5	docs: -vault-namespace, VAULT_NAMESPACE, and config agent: added VAULT_NAMESPACE env-based configuration	2019-04-10 10:34:10 -05:00
Chris Baker	d3041cdb17	wip: added config parsing support, CLI flag, still need more testing, VAULT_ var, documentation	2019-04-10 10:34:10 -05:00
Chris Baker	0eaeef872f	config/docs: added `namespace` to vault config server/client: process `namespace` config, setting on the instantiated vault client	2019-04-10 10:34:10 -05:00
Michael Schurter	c0cd96ef75	Update nomad/job_endpoint_test.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-04-10 10:34:10 -05:00
Michael Schurter	188c32421a	Update nomad/job_endpoint.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-04-10 10:34:10 -05:00
Chris Baker	0ba1600545	server/job_endpoint: accept vault token and pass as part of Job.RegisterRequest [#4555 ]	2019-04-10 10:34:10 -05:00
James Rasell	9470507cf4	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Michael Schurter	45b4827ad7	Bump to 0.9.1-dev	2019-04-09 09:01:48 -07:00
Nomad Release bot	e307734e4a	Generate files for 0.9.0 release	2019-04-09 01:56:00 +00:00
Michael Schurter	3af602b633	Remove 0.9.0-rc2 generated files	2019-04-03 07:41:09 -07:00
Nomad Release bot	16b4336ccf	Generate files for 0.9.0-rc2 release	2019-04-03 01:54:29 +00:00
Michael Schurter	9afbc45cff	Bump to dev post-0.9.0-rc1 release	2019-03-22 08:26:30 -07:00
Nomad Release bot	3ab3dd4105	Generate files for 0.9.0-rc1 release	2019-03-21 19:06:13 +00:00
HashedDan	caad68e799	server: inconsistent receiver notation corrected Signed-off-by: HashedDan <georgedanielmangum@gmail.com>	2019-03-16 17:53:53 -05:00
Alex Dadgar	e779d9444b	Update nomad/eval_endpoint_test.go Co-Authored-By: schmichael <michael.schurter@gmail.com>	2019-03-05 15:19:15 -08:00
Alex Dadgar	1857f5d7c1	Update nomad/eval_endpoint.go Co-Authored-By: schmichael <michael.schurter@gmail.com>	2019-03-05 15:19:07 -08:00
Michael Schurter	e37bbb21a5	nomad: simplify code and improve parameter name	2019-03-04 13:44:14 -08:00
Michael Schurter	05f51499ba	nomad: compare current eval when setting WaitIndex Consider currently dequeued Evaluation's ModifyIndex when determining its WaitIndex. Normally the Evaluation itself would already be in the state store snapshot used to determine the WaitIndex. However, since the FSM applies Raft messages to the state store concurrently with Dequeueing, it's possible the currently dequeued Evaluation won't yet exist in the state store snapshot used by JobsForEval. This can be solved by always considering the current eval's modify index and using it if it is greater than all of the evals returned by the state store.	2019-03-01 15:23:39 -08:00
Michael Schurter	3f386e3951	Remove generated files for 0.9.0-beta3	2019-02-26 10:34:08 -08:00
Michael Schurter	d74755900e	Generate files for 0.9.0-beta3 release	2019-02-26 09:44:49 -08:00
Charlie Voiselle	604c49beb8	Merge pull request #5344 from hashicorp/b-nexteval-for-failed-follow-up Set NextEval when making `failed-follow-up` evals	2019-02-22 14:14:41 -08:00
Charlie Voiselle	006afdca9b	Added comments * caller should created eval id * prev/next eval used in failed-follow-up	2019-02-22 10:22:52 -08:00
Charlie Voiselle	c28c195f42	Set NextEval when making `failed-follow-up` evals This allows users to locate failed-follow-up evals more easily	2019-02-20 16:07:11 -08:00
Michael Schurter	6580ed668e	client: don't redownload completed artifacts on retries Track the download status of each artifact independently so that if only one of many artifacts fails to download, completed artifacts aren't downloaded again.	2019-02-20 08:45:12 -08:00
Michael Schurter	2db91425e3	Remove 0.9.0-beta2 generated files	2019-02-01 08:28:44 -08:00
Alex Dadgar	84d0afccae	Generate files for 0.9.0-beta2	2019-01-30 13:31:50 -08:00
Alex Dadgar	d2e5ede119	remove generated structs	2019-01-30 12:38:34 -08:00
Alex Dadgar	41265d4d61	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Alex Dadgar	bc804dda2e	Nomad 0.9.0-beta1 generated code	2019-01-30 10:49:44 -08:00
Preetha Appan	c848a1d387	ensure tests run a 0.9 server	2019-01-29 16:19:45 -06:00
Preetha Appan	496eb1de0c	Guard operator endpoints for minimum server version	2019-01-29 15:50:36 -06:00
Preetha Appan	7578522f58	variable name fix	2019-01-29 13:48:45 -06:00
Preetha Appan	a6cebbbf9e	Make sure that all servers are 0.9 before applying scheduler config entry	2019-01-29 12:47:42 -06:00
Michael Schurter	3aba7ee826	nomad: fix panic when no node conn found A missing return would cause a panic when a server could find no route to a client.	2019-01-28 21:55:35 -08:00
Mahmood Ali	f9164dae67	Merge pull request #5228 from hashicorp/f-vault-err-tweaks server/vault: tweak error messages	2019-01-25 11:17:31 -05:00
Mahmood Ali	f4560d8a2a	server/vault: tweak error messages Closes #5139	2019-01-25 10:33:54 -05:00
Preetha	ec92bf673c	Merge pull request #5223 from hashicorp/f-jobs-list-datacenters Add Datacenters to the JobListStub struct	2019-01-24 08:13:30 -06:00
Michael Schurter	13f061a83f	Merge pull request #5196 from hashicorp/f-plugin-utils Make plugins/shared external and make pluginutls/	2019-01-23 06:59:32 -08:00
Michael Schurter	32daa7b47b	goimports until make check is happy	2019-01-23 06:27:14 -08:00
Michael Schurter	be0bab7c3f	move pluginutils -> helper/pluginutils I wanted a different color bikeshed, so I get to paint it	2019-01-22 15:50:08 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Alex Dadgar	cdcd3c929c	loader and singleton	2019-01-22 15:11:57 -08:00
Alex Dadgar	6c2782f037	move catalog + grpcutils	2019-01-22 15:11:57 -08:00
Preetha Appan	38422642cb	Use DesiredState to determine whether to stop sending task events	2019-01-22 16:43:32 -06:00
Michael Lange	ce7bc4f56f	Add Datacenters to the JobsListStub struct So it can be used for filtering the full list of jobs	2019-01-22 11:16:35 -08:00
Mahmood Ali	e1803b685b	tests: deflake TestClientAllocations_GarbageCollect_Remote Use the same strategy as one in f2f383b07543a09ca989b82738926f7248e1ab28	2019-01-19 09:07:27 -05:00
Mahmood Ali	b2203a3a22	Merge pull request #5215 from hashicorp/test-fix-garbagecollect test: fix flaky garbage collect test	2019-01-18 21:10:01 -05:00
Mahmood Ali	05e32fb525	Merge pull request #5213 from hashicorp/b-api-separate Slimmer /api package	2019-01-18 20:52:53 -05:00
Michael Schurter	0cd35ba335	test: fix flaky garbage collect test This seems to fix TestClientAllocations_GarbageCollectAll_Remote being flaky. This test confuses me. It joins 2 servers, but then goes out of its way to make sure the test client only interacts with one. There are not enough comments for me to figure out the precise assertions this test is trying to make. A good old fashioned wait-for-the-client-to-register seems to fix the flakiness though. The error was that the node could not be found, so this makes some sense. However, lots of other tests seem to use the same "wait for node" logic and don't appear to be flaky, so who knows why waiting fixes this one. Passes with -race.	2019-01-18 16:01:30 -08:00
Mahmood Ali	7bdd43f3e0	api: avoid codegen for syncing Given that the values will rarely change, specially considering that any changes would be backward incompatible change. As such, it's simpler to keep syncing manually in the rare occasion and avoid the syncing code overhead.	2019-01-18 18:52:31 -05:00
Preetha Appan	510d7839e4	code review comments	2019-01-18 17:41:39 -06:00
Mahmood Ali	253532ec00	api: avoid import nomad/structs pkg nomad/structs is an internal package and imports many libraries (e.g. raft, codec) that are not relevant to api clients, and may cause unnecessary dependency pain (e.g. `github.com/ugorji/go/codec` version is very old now). Here, we add a code generator that imports the relevant constants from `nomad/structs`. I considered using this approach for other structs, but didn't find a quick viable way to reduce duplication. `nomad/structs` use values as struct fields (e.g. `string`), while `api` uses value pointer (e.g. `*string`) instead. Also, sometimes, `api` structs contain deprecated fields or additional documentation, so simple copy-paste doesn't work. For these reasons, I opt to keep the status quo.	2019-01-18 14:51:19 -05:00
Preetha Appan	be9656d195	fix linting	2019-01-17 15:36:33 -06:00
Preetha Appan	0f8a113ead	Refactor to find jobs with child instances more effeciently also added unit tests	2019-01-17 14:29:48 -06:00
Preetha Appan	be36fee48e	Use IsParameterized/isPeriodic methods	2019-01-17 12:15:42 -06:00
Preetha Appan	81a8f18cac	Fix bug in reconcile summaries that affects periodic/parameterized jobs This fixes incorrect parent job summaries by recomputing them in the ReconcileJobSummaries method in the state store	2019-01-17 12:01:01 -06:00
Nick Ethier	597b7b751d	tr: add retry /w backoff to stats_hook failure	2019-01-12 12:18:24 -05:00
Mahmood Ali	4414a2ce1c	tests: remove tests for unsupported features With switching to driver plugins, driver validation is quite tricky and we need to do some design thinking before supporting it against.	2019-01-10 10:21:48 -05:00
Nick Wales	7a7b5da0df	Adds optional Consul service tags to nomad server and agent services, gh#4297	2019-01-09 22:02:46 +00:00
Mahmood Ali	1f2473263e	fix more cases of logging arity errors	2019-01-09 09:22:47 -05:00

... 5 6 7 8 9 ...

3260 commits