open-nomad

Author	SHA1	Message	Date
Seth Hoenig	87be8c4c4b	consul: correctly check consul acl token namespace when using consul oss This PR fixes the Nomad Object Namespace <-> Consul ACL Token relationship check when using Consul OSS (or Consul ENT without namespace support). Nomad v1.1.0 introduced a regression where Nomad would fail the validation when submitting Connect jobs and allow_unauthenticated set to true, with Consul OSS - because it would do the namespace check against the Consul ACL token assuming the "default" namespace, which does not work because Consul OSS does not have namespaces. Instead of making the bad assumption, expand the namespace check to handle each special case explicitly. Fixes #10718	2021-06-08 13:55:57 -05:00
Seth Hoenig	209e2d6d81	consul: pr cleanup namespace probe function signatures	2021-06-07 15:41:01 -05:00
Seth Hoenig	519429a2de	consul: probe consul namespace feature before using namespace api This PR changes Nomad's wrapper around the Consul NamespaceAPI so that it will detect if the Consul Namespaces feature is enabled before making a request to the Namespaces API. Namespaces are not enabled in Consul OSS, and require a suitable license to be used with Consul ENT. Previously Nomad would check for a 404 status code when makeing a request to the Namespaces API to "detect" if Consul OSS was being used. This does not work for Consul ENT with Namespaces disabled, which returns a 500. Now we avoid requesting the namespace API altogether if Consul is detected to be the OSS sku, or if the Namespaces feature is not licensed. Since Consul can be upgraded from OSS to ENT, or a new license applied, we cache the value for 1 minute, refreshing on demand if expired. Fixes https://github.com/hashicorp/nomad-enterprise/issues/575 Note that the ticket originally describes using attributes from https://github.com/hashicorp/nomad/issues/10688. This turns out not to be possible due to a chicken-egg situation between bootstrapping the agent and setting up the consul client. Also fun: the Consul fingerprinter creates its own Consul client, because there is no [currently] no way to pass the agent's client through the fingerprint factory.	2021-06-07 12:19:25 -05:00
Seth Hoenig	d026ff1f66	consul/connect: add support for connect mesh gateways This PR implements first-class support for Nomad running Consul Connect Mesh Gateways. Mesh gateways enable services in the Connect mesh to make cross-DC connections via gateways, where each datacenter may not have full node interconnectivity. Consul docs with more information: https://www.consul.io/docs/connect/gateways/mesh-gateway The following group level service block can be used to establish a Connect mesh gateway. service { connect { gateway { mesh { // no configuration } } } } Services can make use of a mesh gateway by configuring so in their upstream blocks, e.g. service { connect { sidecar_service { proxy { upstreams { destination_name = "<service>" local_bind_port = <port> datacenter = "<datacenter>" mesh_gateway { mode = "<mode>" } } } } } } Typical use of a mesh gateway is to create a bridge between datacenters. A mesh gateway should then be configured with a service port that is mapped from a host_network configured on a WAN interface in Nomad agent config, e.g. client { host_network "public" { interface = "eth1" } } Create a port mapping in the group.network block for use by the mesh gateway service from the public host_network, e.g. network { mode = "bridge" port "mesh_wan" { host_network = "public" } } Use this port label for the service.port of the mesh gateway, e.g. service { name = "mesh-gateway" port = "mesh_wan" connect { gateway { mesh {} } } } Currently Envoy is the only supported gateway implementation in Consul. By default Nomad client will run the latest official Envoy docker image supported by the local Consul agent. The Envoy task can be customized by setting `meta.connect.gateway_image` in agent config or by setting the `connect.sidecar_task` block. Gateways require Consul 1.8.0+, enforced by the Nomad scheduler. Closes #9446	2021-06-04 08:24:49 -05:00
Mahmood Ali	102763c979	Support disabling TCP checks for connect sidecar services	2021-05-07 12:10:26 -04:00
Seth Hoenig	09cd01a5f3	e2e: add e2e tests for consul namespaces on ent with acls This PR adds e2e tests for Consul Namespaces for Nomad Enterprise with Consul ACLs enabled. Needed to add support for Consul ACL tokens with `namespace` and `namespace_prefix` blocks, which Nomad parses and validates before tossing the token. These bits will need to be picked back to OSS.	2021-04-27 14:45:54 -06:00
Seth Hoenig	509490e5d2	e2e: consul namespace tests from nomad ent (cherry-picked from ent without _ent things) This is part 2/4 of e2e tests for Consul Namespaces. Took a first pass at what the parameterized tests can look like, but only on the ENT side for this PR. Will continue to refactor in the next PRs. Also fixes 2 bugs: - Config Entries registered by Nomad Server on job registration were not getting Namespace set - Group level script checks were not getting Namespace set Those changes will need to be copied back to Nomad OSS. Nomad OSS + no ACLs (previously, needs refactor) Nomad ENT + no ACLs (this) Nomad OSS + ACLs (todo) Nomad ENT + ALCs (todo)	2021-04-19 15:35:31 -06:00
Nick Spain	653d84ef68	Add a 'body' field to the check stanza Consul allows specifying the HTTP body to send in a health check. Nomad uses Consul for health checking so this just plumbs the value through to where the Consul API is called. There is no validation that `body` is not used with an incompatible check method like GET.	2021-04-13 09:15:35 -04:00
Seth Hoenig	fe8fce00d9	consul: minor CR cleanup	2021-04-05 10:10:16 -06:00
Seth Hoenig	f17ba33f61	consul: plubming for specifying consul namespace in job/group This PR adds the common OSS changes for adding support for Consul Namespaces, which is going to be a Nomad Enterprise feature. There is no new functionality provided by this changeset and hopefully no new bugs.	2021-04-05 10:03:19 -06:00
Charlie Voiselle	0473f35003	Fixup uses of `sanity` (#10187 ) * Fixup uses of `sanity` * Remove unnecessary comments. These checks are better explained by earlier comments about the context of the test. Per @tgross, moved the tests together to better reinforce the overall shared context. * Update nomad/fsm_test.go	2021-03-16 18:05:08 -04:00
Andre Ilhicas	f45fc6c899	consul/connect: enable setting local_bind_address in upstream	2021-02-26 11:37:31 +00:00
Drew Bailey	86d9e1ff90	Merge pull request #9955 from hashicorp/on-update-services Service and Check on_update configuration option (readiness checks)	2021-02-24 10:11:05 -05:00
Seth Hoenig	d557d6bf94	consul/connect: Fix bug where connect sidecar services would be unnecessarily re-registered This PR fixes a bug where sidecar services would be re-registered into Consul every ~30 seconds, caused by the parent service having its tags field set and the sidecar_service tags unset. Nomad would directly compare the tags between its copy of the sidecar service definition and the tags of the sidecar service reported by Consul. This does not work, because Consul will under-the-hood set the sidecar service tags to inherit the parent service tags if the sidecar service tags are unset. The comparison then done by Nomad would not match, if the parent sidecar tags are set. Fixes #10025	2021-02-22 12:02:58 -06:00
AndrewChubatiuk	157f8de903	pr comments	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	cb84d8ef11	fixed typo	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	4f49614760	fixed typo	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	c4bece2b09	fixed typo	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	cb527c5d83	fixed tests	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	abb7fd1e36	fixed connect port label	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	3e11860b6d	fixed connect port label	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	cd152643fb	fixed connect port label	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	af7a16d774	tests fix	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	3d0aa2ef56	allocate sidecar task port on host_network interface	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	47c690479c	fixed tests	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	46c639c5d6	fixed tests	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	77f9b84a30	fixed tests	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	99201412da	removed proxy suffix	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	844ac16900	fixed variable initialization	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	78465bbd23	customized default sidecar checks	2021-02-13 02:42:13 +02:00
Drew Bailey	82f971f289	OnUpdate configuration for services and checks Allow for readiness type checks by configuring nomad to ignore warnings or errors reported by a service check. This allows the deployment to progress and while Consul handles introducing the sercive into a resource pool once the check passes.	2021-02-08 08:32:40 -05:00
Seth Hoenig	8b05efcf88	consul/connect: Add support for Connect terminating gateways This PR implements Nomad built-in support for running Consul Connect terminating gateways. Such a gateway can be used by services running inside the service mesh to access "legacy" services running outside the service mesh while still making use of Consul's service identity based networking and ACL policies. https://www.consul.io/docs/connect/gateways/terminating-gateway These gateways are declared as part of a task group level service definition within the connect stanza. service { connect { gateway { proxy { // envoy proxy configuration } terminating { // terminating-gateway configuration entry } } } } Currently Envoy is the only supported gateway implementation in Consul. The gateay task can be customized by configuring the connect.sidecar_task block. When the gateway.terminating field is set, Nomad will write/update the Configuration Entry into Consul on job submission. Because CEs are global in scope and there may be more than one Nomad cluster communicating with Consul, there is an assumption that any terminating gateway defined in Nomad for a particular service will be the same among Nomad clusters. Gateways require Consul 1.8.0+, checked by a node constraint. Closes #9445	2021-01-25 10:36:04 -06:00
Nick Ethier	6705f845f2	Merge pull request #9739 from hashicorp/b-alloc-netmode-ports Use port's to value when building service address under 'alloc' addr_mode	2021-01-07 09:16:27 -05:00
Nick Ethier	bb060bd46b	command/agent/consul: remove duplicated tests	2021-01-06 14:11:31 -05:00
Kris Hicks	868ba0cea5	consul: Refactor parts of UpdateWorkload (#9737 ) This removes modification of ops in methods that UpdateWorkload calls, keeping them local to UpdateWorkload. It also includes some rewrites of checkRegs for clarity.	2021-01-06 11:11:28 -08:00
Nick Ethier	ab01e19df3	command/agent/consul: use port's to value when building service address under 'alloc' addr_mode	2021-01-06 13:52:48 -05:00
Seth Hoenig	6c9366986b	consul/connect: avoid NPE from unset connect gateway proxy Submitting a job with an ingress gateway in host networking mode with an absent gateway.proxy block would cause the Nomad client to panic on NPE. The consul registration bits would assume the proxy stanza was not nil, but it could be if the user does not supply any manually configured envoy proxy settings. Check the proxy field is not nil before using it. Fixes #9669	2021-01-05 09:27:01 -06:00
Seth Hoenig	e81e9223ef	consul/connect: enable setting datacenter in connect upstream Before, upstreams could only be defined using the default datacenter. Now, the `datacenter` field can be set in a connect upstream definition, informing consul of the desire for an instance of the upstream service in the specified datacenter. The field is optional and continues to default to the local datacenter. Closes #8964	2020-11-30 10:38:30 -06:00
Seth Hoenig	b19bc6be2b	consul: prevent re-registration churn by correctly comparing sidecar tags Previously, connect sidecars would be re-registered with consul every cycle of Nomad's reconciliation loop around Consul service registrations. This is because part of the comparison used `reflect.DeepEqual` on []string objects, which returns false when one object is `[]string{}` and the other is `[]string{}(nil)`. Unforunately, this was always the case, and every Connect sidecar service would be re-registered on every iteration, which happens every 30 seconds.	2020-11-11 18:01:17 -06:00
Nick Ethier	04f5c4ee5f	ar/groupservice: remove drivernetwork (#9233 ) * ar/groupservice: remove drivernetwork * consul: allow host address_mode to accept raw port numbers * consul: fix logic for blank address	2020-11-05 15:00:22 -05:00
Nick Ethier	4903e5b114	Consul with CNI and host_network addresses (#9095 ) * consul: advertise cni and multi host interface addresses * structs: add service/check address_mode validation * ar/groupservices: fetch networkstatus at hook runtime * ar/groupservice: nil check network status getter before calling * consul: comment network status can be nil	2020-10-15 15:32:21 -04:00
Seth Hoenig	ed13e5723f	consul/connect: dynamically select envoy sidecar at runtime As newer versions of Consul are released, the minimum version of Envoy it supports as a sidecar proxy also gets bumped. Starting with the upcoming Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the default implementation of Connect sidecar proxy. This PR introduces a change such that each Nomad Client will query its local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545) and then launch the Connect sidecar proxy task using the latest supported version of Envoy. If the `SupportedProxies` API component is not available from Consul, Nomad will fallback to the old version of Envoy supported by old versions of Consul. Setting the meta configuration option `meta.connect.sidecar_image` or setting the `connect.sidecar_task` stanza will take precedence as is the current behavior for sidecar proxies. Setting the meta configuration option `meta.connect.gateway_image` will take precedence as is the current behavior for connect gateways. `meta.connect.sidecar_image` and `meta.connect.gateway_image` may make use of the special `${NOMAD_envoy_version}` variable interpolation, which resolves to the newest version of Envoy supported by the Consul agent. Addresses #8585 #7665	2020-10-13 09:14:12 -05:00
Pierre Cauchois	1efe05f516	Do not double-remove checks removed by Consul When deregistering a service, consul also deregisters the associated checks. The current state keeps track of all services and all checks separately and deregisters them in sequence, which leads, whether during syncs or shutdowns, to check deregistrations happening twice and failing the second time (generating errors in logs) This fix includes: - a fix to the sync logic that just pulls the checks after the services have been synced - a fix to the shutdown mechanism that gets an updated list of checks after deregistering the services, so that we get a cleaner check deregistration process.	2020-10-06 00:30:29 +00:00
Seth Hoenig	26e77623e5	consul/connect: fixup tests to use new consul sdk	2020-08-24 12:02:41 -05:00
Seth Hoenig	c4fa644315	consul/connect: remove envoy dns option from gateway proxy config	2020-08-24 09:11:55 -05:00
Seth Hoenig	5b072029f2	consul/connect: add initial support for ingress gateways This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza. ```hcl service { connect { gateway { proxy { // envoy proxy configuration } ingress { // ingress-gateway configuration entry } } } } ``` A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value. Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver. Aims to address #8294 and tangentially #8647	2020-08-21 16:21:54 -05:00
Seth Hoenig	fd4804bf26	consul: able to set pass/fail thresholds on consul service checks This change adds the ability to set the fields `success_before_passing` and `failures_before_critical` on Consul service check definitions. This is a feature added to Consul v1.7.0 and later. https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical Nomad doesn't do much besides pass the fields through to Consul. Fixes #6913	2020-08-10 14:08:09 -05:00
Seth Hoenig	e79b79034d	connect/native: fixup command/agent/consul/connect test cases	2020-06-24 09:05:56 -05:00
Seth Hoenig	4d71f22a11	consul/connect: add support for running connect native tasks This PR adds the capability of running Connect Native Tasks on Nomad, particularly when TLS and ACLs are enabled on Consul. The `connect` stanza now includes a `native` parameter, which can be set to the name of task that backs the Connect Native Consul service. There is a new Client configuration parameter for the `consul` stanza called `share_ssl`. Like `allow_unauthenticated` the default value is true, but recommended to be disabled in production environments. When enabled, the Nomad Client's Consul TLS information is shared with Connect Native tasks through the normal Consul environment variables. This does NOT include auth or token information. If Consul ACLs are enabled, Service Identity Tokens are automatically and injected into the Connect Native task through the CONSUL_HTTP_TOKEN environment variable. Any of the automatically set environment variables can be overridden by the Connect Native task using the `env` stanza. Fixes #6083	2020-06-22 14:07:44 -05:00
Seth Hoenig	9aa9721143	connect: fix bug where absent connect.proxy stanza needs default config In some refactoring, a bug was introduced where if the connect.proxy stanza in a submitted job was nil, the default proxy configuration would not be initialized with default values, effectively breaking Connect. connect { sidecar_service {} # should work } In contrast, by setting an empty proxy stanza, the config values would be inserted correctly. connect { sidecar_service { proxy {} # workaround } } This commit restores the original behavior, where having a proxy stanza present is not required. The unit test for this case has also been corrected.	2020-04-01 11:19:32 -06:00
Seth Hoenig	0266f056b8	connect: enable proxy.passthrough configuration Enable configuration of HTTP and gRPC endpoints which should be exposed by the Connect sidecar proxy. This changeset is the first "non-magical" pass that lays the groundwork for enabling Consul service checks for tasks running in a network namespace because they are Connect-enabled. The changes here provide for full configuration of the connect { sidecar_service { proxy { expose { paths = [{ path = <exposed endpoint> protocol = <http or grpc> local_path_port = <local endpoint port> listener_port = <inbound mesh port> }, ... ] } } } stanza. Everything from `expose` and below is new, and partially implements the precedent set by Consul: https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference Combined with a task-group level network port-mapping in the form: port "exposeExample" { to = -1 } it is now possible to "punch a hole" through the network namespace to a specific HTTP or gRPC path, with the anticipated use case of creating Consul checks on Connect enabled services. A future PR may introduce more automagic behavior, where we can do things like 1) auto-fill the 'expose.path.local_path_port' with the default value of the 'service.port' value for task-group level connect-enabled services. 2) automatically generate a port-mapping 3) enable an 'expose.checks' flag which automatically creates exposed endpoints for every compatible consul service check (http/grpc checks on connect enabled services).	2020-03-31 17:15:27 -06:00
Seth Hoenig	1ce4eb17fa	client: use consistent name for struct receiver parameter This helps reduce the number of squiggly lines in Goland.	2020-03-31 17:15:27 -06:00
Seth Hoenig	b3664c628c	Merge pull request #7524 from hashicorp/docs-consul-acl-minimums consul: annotate Consul interfaces with ACLs	2020-03-30 13:27:27 -06:00
Seth Hoenig	0a812ab689	consul: annotate Consul interfaces with ACLs	2020-03-30 10:17:28 -06:00
Mahmood Ali	b0cc23ae63	tests: deflake TestConsul_PeriodicSync	2020-03-30 07:06:47 -04:00
Jasmine Dahilig	7b3f3497ed	mock task hook coordinator in consul integration test	2020-03-21 17:52:55 -04:00
Seth Hoenig	0f99cdd0d9	Merge pull request #7192 from hashicorp/b-connect-stanza-ignore consul/connect: in-place update sidecar service registrations on changes	2020-02-21 09:24:53 -06:00
Seth Hoenig	54b5173eca	consul/connect: in-place update sidecar service registrations on changes Fix a bug where consul service definitions would not be updated if changes were made to the service in the Nomad job. Currently this only fixes the bug for cases where the fix is a matter of updating consul agent's service registration. There is related bug where destructive changes are required (see #6877) which will be fixed in another PR. The enable_tag_override configuration setting for the parent service is applied to the sidecar service. Fixes #6459	2020-02-19 13:07:04 -06:00
Mahmood Ali	98ad59b1de	update rest of consul packages	2020-02-16 16:25:04 -06:00
Seth Hoenig	0e44094d1a	client: enable configuring enable_tag_override for services Consul provides a feature of Service Definitions where the tags associated with a service can be modified through the Catalog API, overriding the value(s) configured in the agent's service configuration. To enable this feature, the flag enable_tag_override must be configured in the service definition. Previously, Nomad did not allow configuring this flag, and thus the default value of false was used. Now, it is configurable. Because Nomad itself acts as a state machine around the the service definitions of the tasks it manages, it's worth describing what happens when this feature is enabled and why. Consider the basic case where there is no Nomad, and your service is provided to consul as a boring JSON file. The ultimate source of truth for the definition of that service is the file, and is stored in the agent. Later, Consul performs "anti-entropy" which synchronizes the Catalog (stored only the leaders). Then with enable_tag_override=true, the tags field is available for "external" modification through the Catalog API (rather than directly configuring the service definition file, or using the Agent API). The important observation is that if the service definition ever changes (i.e. the file is changed & config reloaded OR the Agent API is used to modify the service), those "external" tag values are thrown away, and the new service definition is once again the source of truth. In the Nomad case, Nomad itself is the source of truth over the Agent in the same way the JSON file was the source of truth in the example above. That means any time Nomad sets a new service definition, any externally configured tags are going to be replaced. When does this happen? Only on major lifecycle events, for example when a task is modified because of an updated job spec from the 'nomad job run <existing>' command. Otherwise, Nomad's periodic re-sync's with Consul will now no longer try to restore the externally modified tag values (as long as enable_tag_override=true). Fixes #2057	2020-02-10 08:00:55 -06:00
Seth Hoenig	78a7d1e426	comments: cleanup some leftover debug comments and such	2020-01-31 19:04:35 -06:00
Seth Hoenig	8219c78667	nomad: handle SI token revocations concurrently Be able to revoke SI token accessors concurrently, and also ratelimit the requests being made to Consul for the various ACL API uses.	2020-01-31 19:04:14 -06:00
Seth Hoenig	2c7ac9a80d	nomad: fixup token policy validation	2020-01-31 19:04:08 -06:00
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Nick Ethier	5636203d4e	consul: fix var name from rebase	2020-01-27 14:00:19 -05:00
Nick Ethier	0ae99b3c9c	consul: fix var name from rebase	2020-01-27 12:55:52 -05:00
Nick Ethier	5cbb94e16e	consul: add support for canary meta	2020-01-27 09:53:30 -05:00
Nick Ethier	bd454a4c6f	client: improve group service stanza interpolation and check_re… (#6586 ) * client: improve group service stanza interpolation and check_restart support Interpolation can now be done on group service stanzas. Note that some task runtime specific information that was previously available when the service was registered poststart of a task is no longer available. The check_restart stanza for checks defined on group services will now properly restart the allocation upon check failures if configured.	2019-11-18 13:04:01 -05:00
Michael Schurter	9fed8d1bed	client: fix panic from 0.8 -> 0.10 upgrade makeAllocTaskServices did not do a nil check on AllocatedResources which causes a panic when upgrading directly from 0.8 to 0.10. While skipping 0.9 is not supported we intend to fix serious crashers caused by such upgrades to prevent cluster outages. I did a quick audit of the client package and everywhere else that accesses AllocatedResources appears to be properly guarded by a nil check.	2019-11-01 07:47:03 -07:00
Seth Hoenig	039fbd3f3b	connect: enable setting tags on consul connect sidecar service in jobspec (#6415 )	2019-10-17 19:25:20 +00:00
Tim Gross	cd9c23617f	client/connect: ConsulProxy LocalServicePort/Address (#6358 ) Without a `LocalServicePort`, Connect services will try to use the mapped port even when delivering traffic locally. A user can override this behavior by pinning the port value in the `service` stanza but this prevents us from using the Consul service name to reach the service. This commits configures the Consul proxy with its `LocalServicePort` and `LocalServiceAddress` fields.	2019-09-23 14:30:48 -04:00
Tim Gross	e3e30c15a9	remove resolved TODO from UpdateTTL docstring (#6336 )	2019-09-16 16:26:06 -04:00
Tim Gross	0f29dcc935	support script checks for task group services (#6197 ) In Nomad prior to Consul Connect, all Consul checks work the same except for Script checks. Because the Task being checked is running in its own container namespaces, the check is executed by Nomad in the Task's context. If the Script check passes, Nomad uses the TTL check feature of Consul to update the check status. This means in order to run a Script check, we need to know what Task to execute it in. To support Consul Connect, we need Group Services, and these need to be registered in Consul along with their checks. We could push the Service down into the Task, but this doesn't work if someone wants to associate a service with a task's ports, but do script checks in another task in the allocation. Because Nomad is handling the Script check and not Consul anyways, this moves the script check handling into the task runner so that the task runner can own the script check's configuration and lifecycle. This will allow us to pass the group service check configuration down into a task without associating the service itself with the task. When tasks are checked for script checks, we walk back through their task group to see if there are script checks associated with the task. If so, we'll spin off script check tasklets for them. The group-level service and any restart behaviors it needs are entirely encapsulated within the group service hook.	2019-09-03 15:09:04 -04:00
Evan Ercolano	fcf66918d0	Remove unused canary param from MakeTaskServiceID	2019-08-31 16:53:23 -04:00
Michael Schurter	67b7bc1e90	consul: ignore connect services when syncing Consul registers Connect services automatically, however Nomad thinks it owns them due to the _nomad prefix. Since the services are managed by Consul, Nomad needs to explicitly ignore them or otherwies they will be removed.	2019-08-30 11:53:41 -07:00
Jerome Gravel-Niquet	cbdc1978bf	Consul service meta (#6193 ) * adds meta object to service in job spec, sends it to consul * adds tests for service meta * fix tests * adds docs * better hashing for service meta, use helper for copying meta when registering service * tried to be DRY, but looks like it would be more work to use the helper function	2019-08-23 12:49:02 -04:00
Michael Schurter	59e0b67c7f	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Michael Schurter	b008fd1724	connect: register group services with Consul Fixes #6042 Add new task group service hook for registering group services like Connect-enabled services. Does not yet support checks.	2019-08-20 12:25:10 -07:00
Michael Schurter	db4de5fae9	Merge pull request #5975 from hashicorp/b-check-watcher-deadlock consul: fix deadlock in check-based restarts	2019-07-18 13:13:40 -07:00
Michael Schurter	6d095b3b36	consul: add test for check watcher deadlock	2019-07-18 08:24:09 -07:00
Michael Schurter	826d2503e6	Update command/agent/consul/check_watcher.go Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2019-07-18 07:08:27 -07:00
Michael Schurter	5407584bc3	consul: fix deadlock in check-based restarts Fixes #5395 Alternative to #5957 Make task restarting asynchronous when handling check-based restarts. This matches the pre-0.9 behavior where TaskRunner.Restart was an asynchronous signal. The check-based restarting code was not designed to handle blocking in TaskRunner.Restart. 0.9 made it reentrant and could easily overwhelm the buffered update chan and deadlock. Many thanks to @byronwolfman for his excellent debugging, PR, and reproducer! I created this alternative as changing the functionality of TaskRunner.Restart has a much larger impact. This approach reverts to old known-good behavior and minimizes the number of places changes are made.	2019-07-17 15:22:21 -07:00
Mahmood Ali	ec7e258d71	address review feedback	2019-07-17 10:43:13 +07:00
Mahmood Ali	e07413c420	Avoid de-registering slowly restored services When a nomad client restarts/upgraded, nomad restores state from running task and starts the sync loop. If sync loop runs early, it may deregister services from Consul prematurely even when Consul has the running service as healthy. This is not ideal, as re-registering the service means potentially waiting a whole service health check interval before declaring the service healthy. We attempt to mitigate this by introducing an initialization probation period. During this time, we only deregister services and checks that were explicitly deregistered, and leave unrecognized ones alone. This serves as a grace period for restoring to complete, or for operators to restore should they recognize they restored with the wrong nomad data directory.	2019-06-14 11:15:21 -04:00
Danielle Lancashire	8112177503	consul: Include port-label in service registration It is possible to provide multiple identically named services with different port assignments in a Nomad configuration. We introduced a regression when migrating to stable service identifiers where multiple services with the same name would conflict, and the last definition would take precedence. This commit includes the port label in the stable service identifier to allow the previous behaviour where this was supported, for example providing: ```hcl service { name = "redis-cache" tags = ["global", "cache"] port = "db" check { name = "alive" type = "tcp" interval = "10s" timeout = "2s" } } service { name = "redis-cache" tags = ["global", "foo"] port = "foo" check { name = "alive" type = "tcp" port = "db" interval = "10s" timeout = "2s" } } service { name = "redis-cache" tags = ["global", "bar"] port = "bar" check { name = "alive" type = "tcp" port = "db" interval = "10s" timeout = "2s" } } ``` in a nomad task definition is now completely valid. Each service definition with the same name must still have a unique port label however.	2019-06-13 15:24:54 +02:00
Mahmood Ali	2ddc39973d	Merge pull request #5668 from hashicorp/flaky-test-20190430 fix flaky test by allowing for call invocation overhead	2019-05-13 12:33:44 -04:00
Danielle Lancashire	0da2924b2a	consul: Document example check id	2019-05-09 13:22:22 +02:00
Mahmood Ali	d405fcb093	fix flaky test by allowing for call invocation overhead	2019-05-08 18:04:37 -04:00
Danielle Lancashire	d824e00d1a	consul: Do not deregister external checks This commit causes sync to skip deregistering checks that are not managed by nomad, such as service maintenance mode checks. This is handled in the same way as service registrations - by doing a Nomad specific prefix match.	2019-05-02 16:54:18 +02:00
Danielle Lancashire	0b8e85118e	consul: Use a stable identifier for services The current implementation of Service Registration uses a hash of the nomad-internal state of a service to register it with Consul, this means that any update to the service invalidates this name and we then deregister, and recreate the service in Consul. While this behaviour slightly simplifies reasoning about service registration, this becomes problematic when we add consul health checks to a service. When the service is re-registered, so are the checks, which default to failing for at least one check period. This commit migrates us to using a stable identifier based on the allocation, task, and service identifiers, and uses the difference between the remote and local state to decide when to push updates. It uses the existing hashing mechanic to decide when UpdateTask should regenerate service registrations for providing to Sync, but this should be removable as part of a future refactor. It additionally introduces the _nomad-check- prefix for check definitions, to allow for future allowing of consul features like maintenance mode.	2019-05-02 16:54:18 +02:00
Michael Schurter	e3e1797850	consul: squelch noisy useless logs Only log when syncing actually did something.	2019-02-04 11:07:57 -08:00
Michael Schurter	fc1bb95ef8	Remove old comment; it's been fixed!	2019-01-14 09:56:53 -08:00
Mahmood Ali	916a40bb9e	move cstructs.DeviceNetwork to drivers pkg	2019-01-08 09:11:47 -05:00
Nick Ethier	82175d1328	client/drivermananger: add driver manager The driver manager is modeled after the device manager and is started by the client. It's responsible for handling driver lifecycle and reattachment state, as well as processing the incomming fingerprint and task events from each driver. The mananger exposes a method for registering event handlers for task events that is used by the task runner to update the server when a task has been updated with an event. Since driver fingerprinting has been implemented by the driver manager, it is no longer needed in the fingerprint mananger and has been removed.	2018-12-18 22:55:18 -05:00
Michael Schurter	8fa5e90095	consul: add ScriptExecutor context wrapper Since d335a82859ca2177bc6deda0c2c85b559daf2db3 ScriptExecutors now take a timeout duration instead of a context. This broke the script check removal code which used context cancelation propagation to remove script checks while they were executing. This commit adds a wrapper around ScriptExecutors that obeys context cancelation again. The only downside is that it leaks a goroutine until the underlying Exec call completes or timeouts. Since check removal is relatively rare, check timeouts usually low, and scripts usually fast, the risk of leaking a goroutine seems very small.	2018-12-03 20:26:31 -08:00
Michael Schurter	6459c19ffc	consul: fix script checks exiting after 1 run Fixes a regression caused in d335a82859ca2177bc6deda0c2c85b559daf2db3 The removal of the inner context made the remaining cancels cancel the outer context and cause script checks to exit prematurely.	2018-12-03 18:50:02 -08:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Preetha Appan	18708d3f0b	Pass service metadata "external-source" for consul UI integration	2018-11-16 11:28:56 -06:00
Mahmood Ali	c7610d8c22	mark and skip failing consul failing tests	2018-11-13 10:21:40 -05:00
Michael Schurter	6bdbfb8129	tests: get consul integration tests building	2018-11-05 12:32:05 -08:00

1 2 3 4 5 ...

325 commits