Commit Graph

3604 Commits

Author SHA1 Message Date
Seth Hoenig 720780992c consul/connect: copy bind address map if empty
This parameter is now supposed to be non-nil even if
empty, and the Copy method should also maintain that
invariant.
2021-01-25 10:36:04 -06:00
Seth Hoenig 1ad219c441 consul/connect: remove debug line 2021-01-25 10:36:04 -06:00
Seth Hoenig 8b05efcf88 consul/connect: Add support for Connect terminating gateways
This PR implements Nomad built-in support for running Consul Connect
terminating gateways. Such a gateway can be used by services running
inside the service mesh to access "legacy" services running outside
the service mesh while still making use of Consul's service identity
based networking and ACL policies.

https://www.consul.io/docs/connect/gateways/terminating-gateway

These gateways are declared as part of a task group level service
definition within the connect stanza.

service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      terminating {
        // terminating-gateway configuration entry
      }
    }
  }
}

Currently Envoy is the only supported gateway implementation in
Consul. The gateay task can be customized by configuring the
connect.sidecar_task block.

When the gateway.terminating field is set, Nomad will write/update
the Configuration Entry into Consul on job submission. Because CEs
are global in scope and there may be more than one Nomad cluster
communicating with Consul, there is an assumption that any terminating
gateway defined in Nomad for a particular service will be the same
among Nomad clusters.

Gateways require Consul 1.8.0+, checked by a node constraint.

Closes #9445
2021-01-25 10:36:04 -06:00
Drew Bailey 007158ee75
ignore setting job summary when oldstatus == newstatus (#9884) 2021-01-25 10:34:27 -05:00
Drew Bailey 630babb886
prevent double job status update (#9768)
* Prevent Job Statuses from being calculated twice

https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard
2021-01-22 09:18:17 -05:00
Kris Hicks 8f9e47a8e7
Clean up Task Validation tests (#9833)
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2021-01-21 11:53:02 -08:00
Dennis Schön 3eaf1432aa
validate connect block allowed only within group.service 2021-01-20 14:34:23 -05:00
Seth Hoenig f213b8c51b consul/connect: always set gateway proxy default timeout
If the connect.proxy stanza is left unset, the connection timeout
value is not set but is assumed to be, and may cause a non-fatal NPE
on job submission.
2021-01-19 11:23:41 -06:00
Kris Hicks d71a90c8a4
Fix some errcheck errors (#9811)
* Throw away result of multierror.Append

When given a *multierror.Error, it is mutated, therefore the return
value is not needed.

* Simplify MergeMultierrorWarnings, use StringBuilder

* Hash.Write() never returns an error

* Remove error that was always nil

* Remove error from Resources.Add signature

When this was originally written it could return an error, but that was
refactored away, and callers of it as of today never handle the error.

* Throw away results of io.Copy during Bridge

* Handle errors when computing node class in test
2021-01-14 12:46:35 -08:00
Kris Hicks 39e369c3bb
csi: Return error when deleting node (#9803)
In this change we'll properly return the error in the
CSIPluginTypeMonolith case (which is the type given in DeleteNode()),
and also return the error when the given ID is not found.

This was found via errcheck.
2021-01-14 12:44:50 -08:00
Kris Hicks abb8f2ebc0
Refactor Job.Scale() (#9771) 2021-01-14 12:40:42 -08:00
Kris Hicks f77ffb3b5b
Add missing sink.Cancel() in fsm (#9818) 2021-01-14 12:39:20 -08:00
Drew Bailey 03a9541822
ignore poststop task in alloc health tracker (#9548), fixes #9361
* investigating where to ignore poststop task in alloc health tracker

* ignore poststop when setting latest start time for allocation

* clean up logic

* lifecycle: isolate mocks for poststop deployment test

* lifecycle: update comments in tracker

Co-authored-by: Jasmine Dahilig <jasmine@dahilig.com>
2021-01-12 10:03:48 -08:00
Chris Baker 3546469205 nicer error message 2021-01-08 21:13:29 +00:00
Chris Baker d43e0d10c0 appease the linter and fix an incorrect test 2021-01-08 19:38:25 +00:00
Chris Baker 49effd5840 in Job.Scale, ensure that new count is within [min,max] configured in scaling policy
resolves #9758
2021-01-08 19:24:36 +00:00
Seth Hoenig 6c9366986b consul/connect: avoid NPE from unset connect gateway proxy
Submitting a job with an ingress gateway in host networking mode
with an absent gateway.proxy block would cause the Nomad client
to panic on NPE.

The consul registration bits would assume the proxy stanza was
not nil, but it could be if the user does not supply any manually
configured envoy proxy settings.

Check the proxy field is not nil before using it.

Fixes #9669
2021-01-05 09:27:01 -06:00
Chris Baker fd6beefe11 simple test to ensure that scaling endpoint methods support IsRead for
stale read support
2021-01-05 13:42:18 +00:00
Mahmood Ali ae0be24abb tweak bootstrap testing 2021-01-04 09:00:40 -05:00
Mahmood Ali 2ea8ae7584 Only bootstrap when `bootstrap_expect` 2020-12-17 20:06:14 -05:00
Mahmood Ali 421380a300 add a failing test for unexpected bootstrapping 2020-12-17 20:06:14 -05:00
Seth Hoenig 3a3a175e1a consul/connect: enable configuring custom gateway task
Add the ability to configure the Task used for Connect gateways,
similar to how sidecar Task can be configured.

The implementation here simply re-uses the sidecar_task stanza,
and now gets applied whether connect.sidecar_service or
connect.gateway is the thing being defined. In retrospect,
connect.sidecar_task could have been more generically named
like connect.task to make it a little more re-usable.

Closes #9474
2020-12-17 08:51:52 -06:00
Seth Hoenig beaa6359d5 consul/connect: fix regression where client connect images ignored
Nomad v1.0.0 introduced a regression where the client configurations
for `connect.sidecar_image` and `connect.gateway_image` would be
ignored despite being set. This PR restores that functionality.

There was a missing layer of interpolation that needs to occur for
these parameters. Since Nomad 1.0 now supports dynamic envoy versioning
through the ${NOMAD_envoy_version} psuedo variable, we basically need
to first interpolate

  ${connect.sidecar_image} => envoyproxy/envoy:v${NOMAD_envoy_version}

then use Consul at runtime to resolve to a real image, e.g.

  envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.16.0

Of course, if the version of Consul is too old to provide an envoy
version preference, we then need to know to fallback to the old
version of envoy that we used before.

  envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09

Beyond that, we also need to continue to support jobs that set the
sidecar task themselves, e.g.

  sidecar_task { config { image: "custom/envoy" } }

which itself could include teh pseudo envoy version variable.
2020-12-14 09:47:55 -06:00
Drew Bailey 54becaab7d
Events/acl events (#9595)
* fix acl event creation

* allow way to access secretID without exposing it to stream

test that values are omitted

test event creation

test acl events

payloads are pointers

fix failing tests, do all security steps inside constructor

* increase time

* ignore empty tokens

* uncomment line

* changelog
2020-12-11 10:40:50 -05:00
Seth Hoenig 52c9dbbb91 consul/connect: set default Envoy worker threads for gateways
Applying the default --concurrency for gateways was missed before.
Set the default Envoy concurrency to 1 for connect gateways. The
same override value meta.connect.proxy_concurrency applies.
2020-12-10 10:36:29 -06:00
Kris Hicks 0cf9cae656
Apply some suggested fixes from staticcheck (#9598) 2020-12-10 07:29:18 -08:00
Seth Hoenig b3d744fea3
Merge pull request #9586 from hashicorp/f-connect-interp
consul/connect: interpolate connect block
2020-12-09 13:21:50 -06:00
Kris Hicks 0a3a748053
Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Seth Hoenig b51459a879 consul/connect: interpolate connect block
This PR enables job submitters to use interpolation in the connect
block of jobs making use of consul connect. Before, only the name of
the connect service would be interpolated, and only for a few select
identifiers related to the job itself (#6853). Now, all connect fields
can be interpolated using the full spectrum of runtime parameters.

Note that the service name is interpolated at job-submission time,
and cannot make use of values known only at runtime.

Fixes #7221
2020-12-09 09:10:00 -06:00
Kris Hicks 93155ba3da
Add gocritic to golangci-lint config (#9556) 2020-12-08 12:47:04 -08:00
Dennis Schön a9c97d9257
use os.ErrDeadlineExceeded in tests 2020-12-07 10:40:28 -05:00
Drew Bailey c0bd238eb2
fix allocation spelling error, update docs (#9527)
* fix allocation spelling error, update docs

* assign TopicACLPolicy and TopicACLToken properly
2020-12-04 12:04:58 -05:00
James Rasell fd53963afb
core: fix typo msg used when job ID/name contains a null char. 2020-12-04 09:49:31 +01:00
Drew Bailey ce85288f2f
ensure node secret ID is not included in event stream (#9510) 2020-12-03 12:27:14 -05:00
Drew Bailey 17de8ebcb1
API: Event stream use full name instead of Eval/Alloc (#9509)
* use full name for events

use evaluation and allocation instead of short name

* update api event stream package and shortnames

* update docs

* make sync; fix typo

* backwards compat not from 1.0.0-beta event stream api changes

* use api types instead of string

* rm backwards compat note that only changed between prereleases

* remove backwards incompat that only existed in prereleases
2020-12-03 11:48:18 -05:00
Seth Hoenig 3b2b083cbf
Merge pull request #9487 from hashicorp/f-connect-sidecar-concurrency
consul/connect: default envoy concurrency to 1
2020-12-01 15:51:41 -06:00
Drew Bailey f9f5fe8236
Events switch on memdb change table instead of type to prevent duplicates (#9486)
* prevent duplicate job events

when a job is updated, the job_version table is updated with a structs.Job, this caused there to be multiple job events since we are switching off the change type and not the table

* test length

* add table value to tests
2020-12-01 15:14:05 -05:00
Seth Hoenig bf857684d1 consul/connect: default envoy concurrency to 1
Previously, every Envoy Connect sidecar would spawn as many worker
threads as logical CPU cores. That is Envoy's default behavior when
`--concurrency` is not explicitly set. Nomad now sets the concurrency
flag to 1, which is sensible for the default cpu = 250 Mhz resources
allocated for sidecar proxies. The concurrency value can be configured
in Client configuration by setting `meta.connect.proxy_concurrency`.

Closes #9341
2020-12-01 13:12:45 -06:00
Drew Bailey 1f8e1aa631
pass in msgType for UpsertJob (#9475) 2020-12-01 14:00:52 -05:00
Michael Schurter ea0e1789f4
Merge pull request #9435 from hashicorp/f-allocupdate-timer
client: always wait 200ms before sending updates
2020-12-01 08:45:17 -08:00
Drew Bailey 9adca240f8
Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447)
* upsertaclpolicies

* delete acl policies msgtype

* upsert acl policies msgtype

* delete acl tokens msgtype

* acl bootstrap msgtype

wip unsubscribe on token delete

test that subscriptions are closed after an ACL token has been deleted

Start writing policyupdated test

* update test to use before/after policy

* add SubscribeWithACLCheck to run acl checks on subscribe

* update rpc endpoint to use broker acl check

* Add and use subscriptions.closeSubscriptionFunc

This fixes the issue of not being able to defer unlocking the mutex on
the event broker in the for loop.

handle acl policy updates

* rpc endpoint test for terminating acl change

* add comments

Co-authored-by: Kris Hicks <khicks@hashicorp.com>
2020-12-01 11:11:34 -05:00
Drew Bailey 70ae7ec621
return potential errors from txn.Commit (#9483) 2020-12-01 10:05:37 -05:00
Benjamin Buzbee e0acbbfcc6
Fix RPC retry logic in nomad client's rpc.go for blocking queries (#9266) 2020-11-30 15:11:10 -05:00
Drew Bailey a0b7f05a7b
Remove Managed Sinks from Nomad (#9470)
* Remove Managed Sinks from Nomad

Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.

* update comment

* changelog
2020-11-30 14:00:31 -05:00
Seth Hoenig e81e9223ef consul/connect: enable setting datacenter in connect upstream
Before, upstreams could only be defined using the default datacenter.
Now, the `datacenter` field can be set in a connect upstream definition,
informing consul of the desire for an instance of the upstream service
in the specified datacenter. The field is optional and continues to
default to the local datacenter.

Closes #8964
2020-11-30 10:38:30 -06:00
Tim Gross b2cd0da0a2
CSI: fix transaction handling in state store (#9438)
When making updates to CSI plugins, the state store methods that have open
write transactions were querying the state store using the same methods used
by the CSI RPC endpoint, but these method creates their own top-level read
transactions. During concurrent plugin updates (as happens when a plugin job
is stopped), this can cause write skew in the plugin counts.

* Refactor the CSIPlugin query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
* Refactor the CSIVolume query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
* CSI volumes need to be "denormalized" with their plugins and (optionally)
allocations. Read-only RPC endpoints should take a snapshot so that we can
make multiple state store method calls with a consistent view.
2020-11-25 11:15:57 -05:00
Michael Schurter 9bd1f267d2 nomad: try to avoid slice resizing when batching 2020-11-24 09:14:00 -08:00
Seth Hoenig 74a34704c5
Merge pull request #8743 from hashicorp/f-task_network_warning
Validate and document 0.12 mbits/network deprecations
2020-11-23 15:36:18 -06:00
Drew Bailey c8b1a84d1e
Events/mv structs (#9430)
* move structs to structs/event.go to avoid import cycle
2020-11-23 14:01:10 -05:00
Seth Hoenig a35c0db6c7 nomad/structs: validate deprecated task.resource.network port labels
Enable users to submit jobs that still make use of the deprecated
task.resources.network stanza. Such jobs can be submitted, but
will emit a warning.
2020-11-23 12:40:40 -06:00