Commit graph

3767 commits

Author SHA1 Message Date
Charlie Voiselle 8afb9eb05d
Fix parameterized <-> non-parameterized job error (#10357)
The error messages are reversed from tests performed above them.  The test uses the `validateJobUpdate()` function, but ignores the text of the error message itself.
2021-04-12 09:27:04 -04:00
Tim Gross 3113cced7b CSI: ensure page slices are within bounds
Plugins could potentially ignore the `max_entries` field and return a list of
entries that is greater, so we slice the return value in the server RPC to
enforce these value. But page sizes less than the number of entries for the
external CSI ListVolumes and ListSnapshots RPCs could cause a panic, so fix
the boundary checking.
2021-04-09 14:12:38 -04:00
Lars Lehtonen 61d3c3b480 nomad/structs: fix diff 2021-04-09 08:21:46 -04:00
Tim Gross 0892d34ff9 CSI: capability block is required for volume registration 2021-04-08 13:02:24 -04:00
Tim Gross d2e479505c CSI: capability check ListVolumes at RPC for nicer error messages
The plugin stub object does not include fine-grained capability checks, which
means `nomad volume status -verbose` will return ugly and verbose error
"Unimplemented" messages from the plugin if it does not support the CSI
`ListVolumes` RPC. Return a nicer error message from our RPC handler instead.
2021-04-07 12:00:22 -04:00
Tim Gross 276633673d CSI: use AccessMode/AttachmentMode from CSIVolumeClaim
Registration of Nomad volumes previously allowed for a single volume
capability (access mode + attachment mode pair). The recent `volume create`
command requires that we pass a list of requested capabilities, but the
existing workflow for claiming volumes and attaching them on the client
assumed that the volume's single capability was correct and unchanging.

Add `AccessMode` and `AttachmentMode` to `CSIVolumeClaim`, use these fields to
set the initial claim value, and add backwards compatibility logic to handle
the existing volumes that already have claims without these fields.
2021-04-07 11:24:09 -04:00
Tim Gross dbcc2694b0 refactor: move VolumeRequest validation to Validate method 2021-04-07 11:24:09 -04:00
Tim Gross 72c07f15fb refactor: internal claim methods should be private 2021-04-07 11:24:09 -04:00
Seth Hoenig fe8fce00d9 consul: minor CR cleanup 2021-04-05 10:10:16 -06:00
Seth Hoenig f17ba33f61 consul: plubming for specifying consul namespace in job/group
This PR adds the common OSS changes for adding support for Consul Namespaces,
which is going to be a Nomad Enterprise feature. There is no new functionality
provided by this changeset and hopefully no new bugs.
2021-04-05 10:03:19 -06:00
Chris Baker 21bc48ca29 json handles were moved to a new package in #10202
this was unecessary after refactoring, so this moves them back to their
original location in package structs
2021-04-02 13:31:10 +00:00
Chris Baker 436d46bd19
Merge branch 'main' into f-node-drain-api 2021-04-01 15:22:57 -05:00
Tim Gross 0856483115 CSI: fingerprint detailed node capabilities
In order to support new node RPCs, we need to fingerprint plugin capabilities
in more detail. This changeset mirrors recent work to fingerprint controller
capabilities, but is not yet in use by any Nomad RPC.
2021-04-01 16:00:58 -04:00
Tim Gross 466b620fa4
CSI: volume snapshot 2021-04-01 11:16:52 -04:00
Tim Gross 71b9daffb9 CSI: fix misleading HTTP test
The HTTP test to create CSI volumes depends on having a controller plugin to
talk to, but the test was using a node-only plugin, which allows it to
silently ignore the missing controller.
2021-03-31 16:37:09 -04:00
Tim Gross 9fc4cf1419 CSI: fingerprint detailed controller capabilities
In order to support new controller RPCs, we need to fingerprint volume
capabilities in more detail and perform controller RPCs only when the specific
capability is present. This fixes a bug in Ceph support where the plugin can
only suport create/delete but we assume that it also supports attach/detach.
2021-03-31 16:37:09 -04:00
Tim Gross f149abfa41 CSI: volume creation/registration should not validate attachment
The CSI specification requires that we validate a list of `Capability` (access
mode + accessibility) when we create volume, but the existing volume
registration workflow incorrectly validates a single capability. The
specific capability required by a volume claim is checked at the time we make
the claim, so remove the check for `AttachmentMode`/`AcccessMode`.
2021-03-31 16:37:09 -04:00
Tim Gross aec5337862 CSI: HTTP handlers for create/delete/list 2021-03-31 16:37:09 -04:00
Tim Gross d38008176e CSI: create/delete/list volume RPCs
This commit implements the RPC handlers on the client that talk to the CSI
plugins on that client for the Create/Delete/List RPC.
2021-03-31 16:37:09 -04:00
Tim Gross 43622680fa test infrastructure for mock client RPCs (#10193)
This commit includes a new test client that allows overriding the RPC
protocols. Only the RPCs that are passed in are registered, which lets you
implement a mock RPC in the server tests. This commit includes an example of
this for the ClientCSI RPC server.
2021-03-31 16:37:09 -04:00
Mahmood Ali e24dac2f39 fixup! oversubscription: Add MemoryMaxMB to internal structs 2021-03-31 09:26:26 -04:00
Mahmood Ali 4ec2d8f5e4 fixup! oversubscription: Add MemoryMaxMB to internal structs 2021-03-31 08:52:20 -04:00
Mahmood Ali 0c2551270a oversubscription: Add MemoryMaxMB to internal structs
Start tracking a new MemoryMaxMB field that represents the maximum memory a task
may use in the client. This allows tasks to specify a memory reservation (to be
used by scheduler when placing the task) but use excess memory used on the
client if the client has any.

This commit adds the server tracking for the value, and ensures that allocations
AllocatedResource fields include the value.
2021-03-30 16:55:58 -04:00
Nick Ethier daecfa61e6
Merge pull request #10203 from hashicorp/f-cpu-cores
Reserved Cores [1/4]: Structs and scheduler implementation
2021-03-29 14:05:54 -04:00
Chris Baker 16e37b986a reworked Node.Canonicalize() to enforce invariants, fixed a broken test 2021-03-26 18:58:38 +00:00
Chris Baker 99ef00d5da reinserted/expanded fsm node.canonicalize test that was still needed 2021-03-26 17:10:39 +00:00
Chris Baker 770c9cecb5 restored Node.Sanitize() for RPC endpoints
multiple other updates from code review
2021-03-26 17:03:15 +00:00
Mahmood Ali dbc3850358
Merge pull request #10145 from hashicorp/b-periodic-init-status
periodic: always reset periodic children status
2021-03-26 09:19:08 -04:00
Mahmood Ali 5d75705edd dispatched parameterized job should clear status too 2021-03-25 15:14:21 -04:00
Mahmood Ali e643742a38 Add a test for parameterized summary counts 2021-03-25 11:27:09 -04:00
Mahmood Ali b0e048bfa4 periodic: always reset periodic children status
Fixes a bug where Nomad reports negative or incorrect running children
counts for periodic jobs.

The periodic dispatcher derives a child job without reseting the status.
If the periodic job has a `running` status, the derived job will start
as `running` status and transition to `pending`.  Since this is
unexpected transition, the counting in StateStore.setJobSummary gets out of sync and
result in negative/incorrect values.

Note that this only affects periodic jobs after a leader transition.
During the first job registration, the job is added with `pending` or
`""` status. However, after a leader transition, the new leader
repopulates the dispatcher heap with `"running"` status and triggers the
bug.
2021-03-25 11:27:09 -04:00
Chris Baker 07646316e5 some comments on the new json extensions/encoding 2021-03-23 18:18:51 +00:00
Chris Baker a43b3a8736 change to fail-safe in json encoding 2021-03-23 18:13:10 +00:00
Chris Baker cb540ed691 added tests that the API doesn't leak Node.SecretID
added more documentation on JSON encoding to the contributing guide
2021-03-23 18:09:20 +00:00
Drew Bailey 74836b95b2
configuration and oss components for licensing (#10216)
* configuration and oss components for licensing

* vendor sync
2021-03-23 09:08:14 -04:00
Mahmood Ali 73b7b98018
testing: default nomad test nodes to 1.0.0 (#10213) 2021-03-23 08:24:26 -04:00
Chris Baker a186badf35 moved JSON handlers and extension code around a bit for proper order of
initialization
2021-03-22 14:12:42 +00:00
Chris Baker 9f7bc5a575 refactor? 2021-03-22 01:49:21 +00:00
Chris Baker dd291e69f4 removed deprecated fields from Drain structs and API
node drain: use msgtype on txn so that events are emitted
wip: encoding extension to add Node.Drain field back to API responses

new approach for hiding Node.SecretID in the API, using `json` tag
documented this approach in the contributing guide
refactored the JSON handlers with extensions
modified event stream encoding to use the go-msgpack encoders with the extensions
2021-03-21 15:30:11 +00:00
Nick Ethier 648ade63ad scheduler: implement scheduling of reserved cores 2021-03-19 00:29:07 -04:00
Nick Ethier 26b200e8bd api: add new 'cores' field to task resources 2021-03-18 23:13:30 -04:00
Nick Ethier 4b2912d343 structs: add struct fields and funcs for reservable cpu cores 2021-03-18 22:49:06 -04:00
Tim Gross fa25e048b2
CSI: unique volume per allocation
Add a `PerAlloc` field to volume requests that directs the scheduler to test
feasibility for volumes with a source ID that includes the allocation index
suffix (ex. `[0]`), rather than the exact source ID.

Read the `PerAlloc` field when making the volume claim at the client to
determine if the allocation index suffix (ex. `[0]`) should be added to the
volume source ID.
2021-03-18 15:35:11 -04:00
Tim Gross 9b2b580d1a
CSI: remove prefix matching from CSIVolumeByID and fix CLI prefix matching (#10158)
Callers of `CSIVolumeByID` are generally assuming they should receive a single
volume. This potentially results in feasibility checking being performed
against the wrong volume if a volume's ID is a prefix substring of other
volume (for example: "test" and "testing").

Removing the incorrect prefix matching from `CSIVolumeByID` breaks prefix
matching in the command line client. Add the required elements for prefix
matching to the commands and API.
2021-03-18 14:32:40 -04:00
Charlie Voiselle 0473f35003
Fixup uses of sanity (#10187)
* Fixup uses of `sanity`
* Remove unnecessary comments.

These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.

* Update nomad/fsm_test.go
2021-03-16 18:05:08 -04:00
Mahmood Ali 1d48433356
server: handle invalid jobs in expose handler hook (#10154)
The expose handler hook must handle if the submitted job is invalid. Without this validation, the rpc handler panics on invalid input.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-03-10 09:12:46 -05:00
Tim Gross 7010a344d6 one-time token: never return expired tokens 2021-03-10 08:17:56 -05:00
Tim Gross 97b0e26d1f RPC endpoints to support 'nomad ui -login'
RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and
`ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`).
Includes adding expiration to the periodic core GC job.
2021-03-10 08:17:56 -05:00
Tim Gross 6b2c4b56d0 state store updates for one-time tokens
The `OneTimeToken` struct is to support the `nomad ui -login` command. This
changeset adds the struct to the Nomad state store.
2021-03-10 08:17:56 -05:00
Andre Ilhicas f45fc6c899
consul/connect: enable setting local_bind_address in upstream 2021-02-26 11:37:31 +00:00
Drew Bailey 86d9e1ff90
Merge pull request #9955 from hashicorp/on-update-services
Service and Check on_update configuration option (readiness checks)
2021-02-24 10:11:05 -05:00
Tim Gross b764f52ab9
deploymentwatcher: reset progress deadline on promotion (#10042)
In a deployment with two groups (ex. A and B), if group A's canary becomes
healthy before group B's, the deadline for the overall deployment will be set
to that of group A. When the deployment is promoted, if group A is done it
will not contribute to the next deadline cutoff. Group B's old deadline will
be used instead, which will be in the past and immediately trigger a
deployment progress failure. Reset the progress deadline when the job is
promotion to avoid this bug, and to better conform with implicit user
expectations around how the progress deadline should interact with promotions.
2021-02-22 16:44:03 -05:00
James Rasell 6553cc3da7
drainer: fix error message when handling drain deadlined nodes. 2021-02-18 11:45:44 +01:00
AndrewChubatiuk cd152643fb fixed connect port label 2021-02-13 02:42:14 +02:00
AndrewChubatiuk 3d0aa2ef56 allocate sidecar task port on host_network interface 2021-02-13 02:42:13 +02:00
Nick Ethier fcc1f4c805
Merge pull request #9946 from hashicorp/b-9477
structs: namespace port validation by host_network
2021-02-11 12:53:28 -05:00
Seth Hoenig 45e0e70a50 consul/connect: enable custom sidecars to use expose checks
This PR enables jobs configured with a custom sidecar_task to make
use of the `service.expose` feature for creating checks on services
in the service mesh. Before we would check that sidecar_task had not
been set (indicating that something other than envoy may be in use,
which would not support envoy's expose feature). However Consul has
not added support for anything other than envoy and probably never
will, so having the restriction in place seems like an unnecessary
hindrance. If Consul ever does support something other than Envoy,
they will likely find a way to provide the expose feature anyway.

Fixes #9854
2021-02-09 10:49:37 -06:00
Drew Bailey 8507d54e3b
e2e test for on_update service checks
check_restart not compatible with on_update=ignore

reword caveat
2021-02-08 08:32:40 -05:00
Drew Bailey 82f971f289
OnUpdate configuration for services and checks
Allow for readiness type checks by configuring nomad to ignore warnings
or errors reported by a service check. This allows the deployment to
progress and while Consul handles introducing the sercive into a
resource pool once the check passes.
2021-02-08 08:32:40 -05:00
Nick Ethier eacc4da499
Merge branch 'master' into b-9477 2021-02-05 11:58:13 -05:00
Tim Gross eb3dd17fb2 volumes: implement plan diff for volume requests
The details of host volume and CSI volume requests do not show up in `nomad
plan` outputs, although the updates are detected by the scheduler and result
in an update as expected.
2021-02-04 16:55:17 -05:00
Chris Baker ebbb760ec4 support for scaling_policy in global prefix search 2021-02-03 19:26:57 +00:00
Nick Ethier 43a4d72fda structs: namespace port validation by host_network 2021-02-02 14:56:52 -05:00
Seth Hoenig 720780992c consul/connect: copy bind address map if empty
This parameter is now supposed to be non-nil even if
empty, and the Copy method should also maintain that
invariant.
2021-01-25 10:36:04 -06:00
Seth Hoenig 1ad219c441 consul/connect: remove debug line 2021-01-25 10:36:04 -06:00
Seth Hoenig 8b05efcf88 consul/connect: Add support for Connect terminating gateways
This PR implements Nomad built-in support for running Consul Connect
terminating gateways. Such a gateway can be used by services running
inside the service mesh to access "legacy" services running outside
the service mesh while still making use of Consul's service identity
based networking and ACL policies.

https://www.consul.io/docs/connect/gateways/terminating-gateway

These gateways are declared as part of a task group level service
definition within the connect stanza.

service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      terminating {
        // terminating-gateway configuration entry
      }
    }
  }
}

Currently Envoy is the only supported gateway implementation in
Consul. The gateay task can be customized by configuring the
connect.sidecar_task block.

When the gateway.terminating field is set, Nomad will write/update
the Configuration Entry into Consul on job submission. Because CEs
are global in scope and there may be more than one Nomad cluster
communicating with Consul, there is an assumption that any terminating
gateway defined in Nomad for a particular service will be the same
among Nomad clusters.

Gateways require Consul 1.8.0+, checked by a node constraint.

Closes #9445
2021-01-25 10:36:04 -06:00
Drew Bailey 007158ee75
ignore setting job summary when oldstatus == newstatus (#9884) 2021-01-25 10:34:27 -05:00
Drew Bailey 630babb886
prevent double job status update (#9768)
* Prevent Job Statuses from being calculated twice

https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard
2021-01-22 09:18:17 -05:00
Kris Hicks 8f9e47a8e7
Clean up Task Validation tests (#9833)
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2021-01-21 11:53:02 -08:00
Dennis Schön 3eaf1432aa
validate connect block allowed only within group.service 2021-01-20 14:34:23 -05:00
Seth Hoenig f213b8c51b consul/connect: always set gateway proxy default timeout
If the connect.proxy stanza is left unset, the connection timeout
value is not set but is assumed to be, and may cause a non-fatal NPE
on job submission.
2021-01-19 11:23:41 -06:00
Kris Hicks d71a90c8a4
Fix some errcheck errors (#9811)
* Throw away result of multierror.Append

When given a *multierror.Error, it is mutated, therefore the return
value is not needed.

* Simplify MergeMultierrorWarnings, use StringBuilder

* Hash.Write() never returns an error

* Remove error that was always nil

* Remove error from Resources.Add signature

When this was originally written it could return an error, but that was
refactored away, and callers of it as of today never handle the error.

* Throw away results of io.Copy during Bridge

* Handle errors when computing node class in test
2021-01-14 12:46:35 -08:00
Kris Hicks 39e369c3bb
csi: Return error when deleting node (#9803)
In this change we'll properly return the error in the
CSIPluginTypeMonolith case (which is the type given in DeleteNode()),
and also return the error when the given ID is not found.

This was found via errcheck.
2021-01-14 12:44:50 -08:00
Kris Hicks abb8f2ebc0
Refactor Job.Scale() (#9771) 2021-01-14 12:40:42 -08:00
Kris Hicks f77ffb3b5b
Add missing sink.Cancel() in fsm (#9818) 2021-01-14 12:39:20 -08:00
Drew Bailey 03a9541822
ignore poststop task in alloc health tracker (#9548), fixes #9361
* investigating where to ignore poststop task in alloc health tracker

* ignore poststop when setting latest start time for allocation

* clean up logic

* lifecycle: isolate mocks for poststop deployment test

* lifecycle: update comments in tracker

Co-authored-by: Jasmine Dahilig <jasmine@dahilig.com>
2021-01-12 10:03:48 -08:00
Chris Baker 3546469205 nicer error message 2021-01-08 21:13:29 +00:00
Chris Baker d43e0d10c0 appease the linter and fix an incorrect test 2021-01-08 19:38:25 +00:00
Chris Baker 49effd5840 in Job.Scale, ensure that new count is within [min,max] configured in scaling policy
resolves #9758
2021-01-08 19:24:36 +00:00
Seth Hoenig 6c9366986b consul/connect: avoid NPE from unset connect gateway proxy
Submitting a job with an ingress gateway in host networking mode
with an absent gateway.proxy block would cause the Nomad client
to panic on NPE.

The consul registration bits would assume the proxy stanza was
not nil, but it could be if the user does not supply any manually
configured envoy proxy settings.

Check the proxy field is not nil before using it.

Fixes #9669
2021-01-05 09:27:01 -06:00
Chris Baker fd6beefe11 simple test to ensure that scaling endpoint methods support IsRead for
stale read support
2021-01-05 13:42:18 +00:00
Mahmood Ali ae0be24abb tweak bootstrap testing 2021-01-04 09:00:40 -05:00
Mahmood Ali 2ea8ae7584 Only bootstrap when bootstrap_expect 2020-12-17 20:06:14 -05:00
Mahmood Ali 421380a300 add a failing test for unexpected bootstrapping 2020-12-17 20:06:14 -05:00
Seth Hoenig 3a3a175e1a consul/connect: enable configuring custom gateway task
Add the ability to configure the Task used for Connect gateways,
similar to how sidecar Task can be configured.

The implementation here simply re-uses the sidecar_task stanza,
and now gets applied whether connect.sidecar_service or
connect.gateway is the thing being defined. In retrospect,
connect.sidecar_task could have been more generically named
like connect.task to make it a little more re-usable.

Closes #9474
2020-12-17 08:51:52 -06:00
Seth Hoenig beaa6359d5 consul/connect: fix regression where client connect images ignored
Nomad v1.0.0 introduced a regression where the client configurations
for `connect.sidecar_image` and `connect.gateway_image` would be
ignored despite being set. This PR restores that functionality.

There was a missing layer of interpolation that needs to occur for
these parameters. Since Nomad 1.0 now supports dynamic envoy versioning
through the ${NOMAD_envoy_version} psuedo variable, we basically need
to first interpolate

  ${connect.sidecar_image} => envoyproxy/envoy:v${NOMAD_envoy_version}

then use Consul at runtime to resolve to a real image, e.g.

  envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.16.0

Of course, if the version of Consul is too old to provide an envoy
version preference, we then need to know to fallback to the old
version of envoy that we used before.

  envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09

Beyond that, we also need to continue to support jobs that set the
sidecar task themselves, e.g.

  sidecar_task { config { image: "custom/envoy" } }

which itself could include teh pseudo envoy version variable.
2020-12-14 09:47:55 -06:00
Drew Bailey 54becaab7d
Events/acl events (#9595)
* fix acl event creation

* allow way to access secretID without exposing it to stream

test that values are omitted

test event creation

test acl events

payloads are pointers

fix failing tests, do all security steps inside constructor

* increase time

* ignore empty tokens

* uncomment line

* changelog
2020-12-11 10:40:50 -05:00
Seth Hoenig 52c9dbbb91 consul/connect: set default Envoy worker threads for gateways
Applying the default --concurrency for gateways was missed before.
Set the default Envoy concurrency to 1 for connect gateways. The
same override value meta.connect.proxy_concurrency applies.
2020-12-10 10:36:29 -06:00
Kris Hicks 0cf9cae656
Apply some suggested fixes from staticcheck (#9598) 2020-12-10 07:29:18 -08:00
Seth Hoenig b3d744fea3
Merge pull request #9586 from hashicorp/f-connect-interp
consul/connect: interpolate connect block
2020-12-09 13:21:50 -06:00
Kris Hicks 0a3a748053
Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Seth Hoenig b51459a879 consul/connect: interpolate connect block
This PR enables job submitters to use interpolation in the connect
block of jobs making use of consul connect. Before, only the name of
the connect service would be interpolated, and only for a few select
identifiers related to the job itself (#6853). Now, all connect fields
can be interpolated using the full spectrum of runtime parameters.

Note that the service name is interpolated at job-submission time,
and cannot make use of values known only at runtime.

Fixes #7221
2020-12-09 09:10:00 -06:00
Kris Hicks 93155ba3da
Add gocritic to golangci-lint config (#9556) 2020-12-08 12:47:04 -08:00
Dennis Schön a9c97d9257
use os.ErrDeadlineExceeded in tests 2020-12-07 10:40:28 -05:00
Drew Bailey c0bd238eb2
fix allocation spelling error, update docs (#9527)
* fix allocation spelling error, update docs

* assign TopicACLPolicy and TopicACLToken properly
2020-12-04 12:04:58 -05:00
James Rasell fd53963afb
core: fix typo msg used when job ID/name contains a null char. 2020-12-04 09:49:31 +01:00
Drew Bailey ce85288f2f
ensure node secret ID is not included in event stream (#9510) 2020-12-03 12:27:14 -05:00
Drew Bailey 17de8ebcb1
API: Event stream use full name instead of Eval/Alloc (#9509)
* use full name for events

use evaluation and allocation instead of short name

* update api event stream package and shortnames

* update docs

* make sync; fix typo

* backwards compat not from 1.0.0-beta event stream api changes

* use api types instead of string

* rm backwards compat note that only changed between prereleases

* remove backwards incompat that only existed in prereleases
2020-12-03 11:48:18 -05:00
Seth Hoenig 3b2b083cbf
Merge pull request #9487 from hashicorp/f-connect-sidecar-concurrency
consul/connect: default envoy concurrency to 1
2020-12-01 15:51:41 -06:00
Drew Bailey f9f5fe8236
Events switch on memdb change table instead of type to prevent duplicates (#9486)
* prevent duplicate job events

when a job is updated, the job_version table is updated with a structs.Job, this caused there to be multiple job events since we are switching off the change type and not the table

* test length

* add table value to tests
2020-12-01 15:14:05 -05:00
Seth Hoenig bf857684d1 consul/connect: default envoy concurrency to 1
Previously, every Envoy Connect sidecar would spawn as many worker
threads as logical CPU cores. That is Envoy's default behavior when
`--concurrency` is not explicitly set. Nomad now sets the concurrency
flag to 1, which is sensible for the default cpu = 250 Mhz resources
allocated for sidecar proxies. The concurrency value can be configured
in Client configuration by setting `meta.connect.proxy_concurrency`.

Closes #9341
2020-12-01 13:12:45 -06:00
Drew Bailey 1f8e1aa631
pass in msgType for UpsertJob (#9475) 2020-12-01 14:00:52 -05:00
Michael Schurter ea0e1789f4
Merge pull request #9435 from hashicorp/f-allocupdate-timer
client: always wait 200ms before sending updates
2020-12-01 08:45:17 -08:00
Drew Bailey 9adca240f8
Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447)
* upsertaclpolicies

* delete acl policies msgtype

* upsert acl policies msgtype

* delete acl tokens msgtype

* acl bootstrap msgtype

wip unsubscribe on token delete

test that subscriptions are closed after an ACL token has been deleted

Start writing policyupdated test

* update test to use before/after policy

* add SubscribeWithACLCheck to run acl checks on subscribe

* update rpc endpoint to use broker acl check

* Add and use subscriptions.closeSubscriptionFunc

This fixes the issue of not being able to defer unlocking the mutex on
the event broker in the for loop.

handle acl policy updates

* rpc endpoint test for terminating acl change

* add comments

Co-authored-by: Kris Hicks <khicks@hashicorp.com>
2020-12-01 11:11:34 -05:00
Drew Bailey 70ae7ec621
return potential errors from txn.Commit (#9483) 2020-12-01 10:05:37 -05:00
Benjamin Buzbee e0acbbfcc6
Fix RPC retry logic in nomad client's rpc.go for blocking queries (#9266) 2020-11-30 15:11:10 -05:00
Drew Bailey a0b7f05a7b
Remove Managed Sinks from Nomad (#9470)
* Remove Managed Sinks from Nomad

Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.

* update comment

* changelog
2020-11-30 14:00:31 -05:00
Seth Hoenig e81e9223ef consul/connect: enable setting datacenter in connect upstream
Before, upstreams could only be defined using the default datacenter.
Now, the `datacenter` field can be set in a connect upstream definition,
informing consul of the desire for an instance of the upstream service
in the specified datacenter. The field is optional and continues to
default to the local datacenter.

Closes #8964
2020-11-30 10:38:30 -06:00
Tim Gross b2cd0da0a2
CSI: fix transaction handling in state store (#9438)
When making updates to CSI plugins, the state store methods that have open
write transactions were querying the state store using the same methods used
by the CSI RPC endpoint, but these method creates their own top-level read
transactions. During concurrent plugin updates (as happens when a plugin job
is stopped), this can cause write skew in the plugin counts.

* Refactor the CSIPlugin query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
* Refactor the CSIVolume query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
* CSI volumes need to be "denormalized" with their plugins and (optionally)
allocations. Read-only RPC endpoints should take a snapshot so that we can
make multiple state store method calls with a consistent view.
2020-11-25 11:15:57 -05:00
Michael Schurter 9bd1f267d2 nomad: try to avoid slice resizing when batching 2020-11-24 09:14:00 -08:00
Seth Hoenig 74a34704c5
Merge pull request #8743 from hashicorp/f-task_network_warning
Validate and document 0.12 mbits/network deprecations
2020-11-23 15:36:18 -06:00
Drew Bailey c8b1a84d1e
Events/mv structs (#9430)
* move structs to structs/event.go to avoid import cycle
2020-11-23 14:01:10 -05:00
Seth Hoenig a35c0db6c7 nomad/structs: validate deprecated task.resource.network port labels
Enable users to submit jobs that still make use of the deprecated
task.resources.network stanza. Such jobs can be submitted, but
will emit a warning.
2020-11-23 12:40:40 -06:00
Nick Ethier f1ea79f5a8 remove references to default mbits 2020-11-23 10:32:13 -06:00
Nick Ethier d21cbeb30f command: remove task network usage from init examples 2020-11-23 10:25:11 -06:00
Nick Ethier 9471892df4 mock: add default host network 2020-11-23 10:11:00 -06:00
Nick Ethier 7266376ae6 nomad: update validate to check group networks for task port usage 2020-11-23 10:11:00 -06:00
Nick Ethier c4ddb0a43a website: add mbits and network deprecation notice 2020-11-23 10:09:36 -06:00
Tim Gross c320c1ba57
CSI: fix struct copying errors (#9239)
The CSIVolume struct "denormalizes" allocations when it's first queried from
the state store. The CSIVolumeByID method on the state store copies the volume
before denormalizing so that we don't end up with unexpected changes. The
copying has some subtle bugs that meant that Allocations (as well as
Topologies and MountOptions) were not getting copied when expected.

Also, ensure we never write allocations attached to volumes to the state store
during claims.
2020-11-18 10:59:25 -05:00
Seth Hoenig 4cc3c01d5b
Merge pull request #9352 from hashicorp/f-artifact-headers
jobspec: add support for headers in artifact stanza
2020-11-13 14:04:27 -06:00
Seth Hoenig bb8a5816a0 jobspec: add support for headers in artifact stanza
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.

The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.

Closes #9306
2020-11-13 12:03:54 -06:00
Lars Lehtonen 60936f554c
nomad/structs: fix noop breaks (#9348) 2020-11-13 08:28:11 -05:00
Jasmine Dahilig d6110cbed4
lifecycle: add poststop hook (#8194) 2020-11-12 08:01:42 -08:00
Nick Ethier 5e1634eda1
structs: canonicalize allocatedtaskresources to populate shared ports (#9309) 2020-11-11 16:21:47 -05:00
Tim Gross 60874ebe25
csi: Postrun hook should not change mode (#9323)
The unpublish workflow requires that we know the mode (RW vs RO) if we want to
unpublish the node. Update the hook and the Unpublish RPC so that we mark the
claim for release in a new state but leave the mode alone. This fixes a bug
where RO claims were failing node unpublish.

The core job GC doesn't know the mode, but we don't need it for that workflow,
so add a mode specifically for GC; the volumewatcher uses this as a sentinel
to check whether claims (with their specific RW vs RO modes) need to be claimed.
2020-11-11 13:06:30 -05:00
Chris Baker fbe3670b74
Merge pull request #9317 from hashicorp/f-recommendations-cli-autocomplete
recommendations CLI: autocomplete support
2020-11-11 12:04:23 -06:00
Chris Baker e3c0ea654d auto-complete for recommendations CLI, plus OSS components of recommendations prefix search 2020-11-11 11:13:43 +00:00
Chris Baker 53aa5e75c9 fix #9227: use both job and type query on scaling policy list endpoint 2020-11-10 23:26:35 +00:00
Luiz Aoqui ea81ac5d3d
Merge pull request #9296 from hashicorp/b-remove-namespace-from-scale-request
Remove Namespace field from JobScaleRequest
2020-11-09 15:13:33 -05:00
Luiz Aoqui c536286c7a
remove Namespace field from JobScaleRequest 2020-11-09 13:02:05 -05:00
Kris Hicks 0b590a5040
events: Use single eventsFromChanges func (#9281) 2020-11-05 13:06:52 -08:00
Chris Baker b2a4f64b65
Merge pull request #9278 from hashicorp/b-9268-all-namespace-allocs-acl
fix ACL bugs in listing allocs across all namespaces
2020-11-05 14:59:47 -06:00
Kris Hicks bcb460c36e
Fix handling of deleted change (#9280) 2020-11-05 11:06:41 -08:00
Chris Baker be32fb7d3c updated Allocation.List to properly handle ACL checking for namespace=* 2020-11-05 17:26:33 +00:00
Kris Hicks 20f5fa7f99
Refactor GenericEventsFromChanges (#9279) 2020-11-05 09:06:08 -08:00
Chris Baker 6743803e5c documenting test for #9268 2020-11-05 16:19:55 +00:00
Mahmood Ali be11f735c2
Merge pull request #9205 from hashicorp/b-gh-7703
Repurpose dispatch-job capability to dispatch periodic jobs
2020-11-02 13:11:58 -05:00
Drew Bailey d62d8a8587
Event sink manager improvements (#9206)
* Improve managed sink run loop and reloading

resetCh no longer needed

length of buffer equal to count of items, not count of events in each item

update equality fn name, pr feedback

clean up sink manager sink creation

* update test to reflect changes

* bad editor find and replace

* pr feedback
2020-11-02 09:21:32 -05:00
Kris Hicks a98a8253d8
Update subscription filter func (#9232)
This adds support for specifying a global topic match for a specific
key.
2020-10-30 10:07:38 -07:00
Chris Baker 719077a26d added new policy capabilities for recommendations API
state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type
2020-10-28 14:32:16 +00:00
Mahmood Ali 320239264f dispatch-job capability to dispatch periodic jobs 2020-10-27 16:33:01 -04:00
Drew Bailey 86080e25a9
Send events to EventSinks (#9171)
* Process to send events to configured sinks

This PR adds a SinkManager to a server which is responsible for managing
managed sinks. Managed sinks subscribe to the event broker and send
events to a sink writer (webhook). When changes to the eventstore are
made the sinkmanager and managed sink are responsible for reloading or
starting a new managed sink.

* periodically check in sink progress to raft

Save progress on the last successfully sent index to raft. This allows a
managed sink to resume close to where it left off in the event of a lost
server or leadership change

dereference eventsink so we can accurately use the watchch

When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger
2020-10-26 17:27:54 -04:00
Drew Bailey 1ae39a9ed9
event sink crud operation api (#9155)
* network sink rpc/api plumbing

state store methods and restore

upsert sink test

get sink

delete sink

event sink list and tests

go generate new msg types

validate sink on upsert

* go generate
2020-10-23 14:23:00 -04:00
Michael Schurter c2dd9bc996 core: open source namespaces 2020-10-22 15:26:32 -07:00
Drew Bailey f3dcefe5a9
remove event durability (#9147)
* remove event durability

temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs

* drop events table schema and state store methods

* fix neweventbuffer invocations
2020-10-22 12:21:03 -04:00
Tim Gross 8459f1ead5
csi: prevent in-use plugin GC from blocking volume GC (#9141)
During CSI plugin GC, we don't return an error if the volume is in use,
because this is not an error condition. If we were to return an error during a
`nomad system gc`, we would not continue on to GC volumes.

But check for the specific error message fails if the GC is performed on a
worker rather than on the leader, due to RPC forwarding wrapping the error
message. Use a less specific test so that we don't return an error.
2020-10-21 16:54:28 -04:00
Alexander Shtuchkin 90fd8bb85f
Implement 'batch mode' for persisting allocations on the client. (#9093)
Fixes #9047, see problem details there.

As a solution, we use BoltDB's 'Batch' mode that combines multiple
parallel writes into small number of transactions. See
https://github.com/boltdb/bolt#batch-read-write-transactions for
more information.
2020-10-20 16:15:37 -04:00
Drew Bailey 8451de99b2
adds two base event stream e2e tests (#9126)
* adds two base event stream e2e tests

test evaluation filter keys are included

* Apply suggestions from code review

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* gc aftereach

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-10-20 08:26:21 -04:00
Drew Bailey 6c788fdccd
Events/msgtype cleanup (#9117)
* use msgtype in upsert node

adds message type to signature for upsert node, update tests, remove placeholder method

* UpsertAllocs msg type test setup

* use upsertallocs with msg type in signature

update test usage of delete node

delete placeholder msgtype method

* add msgtype to upsert evals signature, update test call sites with test setup msg type

handle snapshot upsert eval outside of FSM and ignore eval event

remove placeholder upsertevalsmsgtype

handle job plan rpc and prevent event creation for plan

msgtype cleanup upsertnodeevents

updatenodedrain msgtype

msg type 0 is a node registration event, so set the default  to the ignore type

* fix named import

* fix signature ordering on upsertnode to match
2020-10-19 09:30:15 -04:00
Drew Bailey c57e760933
remove special node drain event type
rely on standardized events instead of special node drain event
2020-10-15 16:44:36 -04:00