Commit Graph

735 Commits

Author SHA1 Message Date
Danielle Lancashire 78b7784f2b api: Include CSI metadata on nodes 2020-03-23 13:58:29 -04:00
Danielle Lancashire 426c26d7c0 CSI Plugin Registration (#6555)
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:

* A `csi_plugin` stanza as part of a Nomad task configuration, to
  allow a task to expose that it is a plugin.

* A new task runner hook: `csi_plugin_supervisor`. This hook does two
  things. When the `csi_plugin` stanza is detected, it will
  automatically configure the plugin task to receive bidirectional
  mounts to the CSI intermediary directory. At runtime, it will then
  perform an initial heartbeat of the plugin and handle submitting it to
  the new `dynamicplugins.Registry` for further use by the client, and
  then run a lightweight heartbeat loop that will emit task events
  when health changes.

* The `dynamicplugins.Registry` for handling plugins that run
  as Nomad tasks, in contrast to the existing catalog that requires
  `go-plugin` type plugins and to know the plugin configuration in
  advance.

* The `csimanager` which fingerprints CSI plugins, in a similar way to
  `drivermanager` and `devicemanager`. It currently only fingerprints
  the NodeID from the plugin, and assumes that all plugins are
  monolithic.

Missing features

* We do not use the live updates of the `dynamicplugin` registry in
  the `csimanager` yet.

* We do not deregister the plugins from the client when they shutdown
  yet, they just become indefinitely marked as unhealthy. This is
  deliberate until we figure out how we should manage deploying new
  versions of plugins/transitioning them.
2020-03-23 13:58:28 -04:00
Jasmine Dahilig 73a64e4397 change jobspec lifecycle stanza to use sidecar attribute instead of
block_until status
2020-03-21 17:52:57 -04:00
Jasmine Dahilig 1485b342e2 remove deadline code for now 2020-03-21 17:52:56 -04:00
Jasmine Dahilig b69e8e3a42 remove api package dependency on structs package 2020-03-21 17:52:55 -04:00
Jasmine Dahilig 7064deaafb put lifecycle nil and empty checks in api Canonicalize 2020-03-21 17:52:50 -04:00
Jasmine Dahilig 39b5eb245c remove api dependency on structs package, copy lifecycle defaults to api package 2020-03-21 17:52:49 -04:00
Jasmine Dahilig f6e58d6dad add canonicalize in the right place 2020-03-21 17:52:41 -04:00
Jasmine Dahilig fc13fa9739 change TaskLifecycle RunLevel to Hook and add Deadline time duration 2020-03-21 17:52:37 -04:00
Mahmood Ali 3b5786ddb3 add lifecycle to api and parser 2020-03-21 17:52:36 -04:00
James Rasell 8a5acf7fd5
Merge pull request #5970 from jrasell/bug-gh-5506
Fix returned EOF error when calling Nodes GC/GcAlloc API
2020-03-12 10:04:17 +01:00
Michael Schurter 2dcc85bed1 jobspec: fixup vault_grace deprecation
Followup to #7170

- Moved canonicalization of VaultGrace back into `api/` package.
- Fixed tests.
- Made docs styling consistent.
2020-03-10 14:58:49 -07:00
Michael Schurter b72b3e765c
Merge pull request #7170 from fredrikhgrelland/consul_template_upgrade
Update consul-template to v0.24.1 and remove deprecated vault grace
2020-03-10 14:15:47 -07:00
Michael Schurter 452b6f004f
Merge pull request #7231 from hashicorp/b-alloc-dev-panic
api: fix panic when displaying devices w/o stat
2020-03-09 07:34:59 -07:00
Mahmood Ali 37e0598344 api: alloc exec recovers from bad client connection
If alloc exec fails to connect to the nomad client associated with the
alloc, fail over to using a server.

The code attempted to special case `net.Error` for failover to rule out
other permanent non-networking errors, by reusing a pattern in the
logging handling.

But this pattern does not apply here.  `net/http.Http` wraps all errors
as `*url.Error` that is net.Error.  The websocket doesn't, and instead
returns the raw error.  If the raw error isn't a `net.Error`, like in
the case of TLS handshake errors, the api package would fail immediately
rather than failover.
2020-03-04 17:43:00 -05:00
Michael Schurter ac3db90497 api: fix panic when displaying devices w/o stat
"<none>" mathces `node status -verbose` output
2020-02-26 21:24:31 -05:00
Fredrik Hoem Grelland edb3bd0f3f Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170) 2020-02-23 16:24:53 +01:00
James Rasell 6463532577
Fix panic when canonicalizing a jobspec with incorrect job type.
When canonicalizing the ReschedulePolicy a panic was possible if
the passed job type was not valid. This change protects against
this possibility, in a verbose way to ensure the code path is
clear.
2020-02-21 09:14:36 +01:00
James Rasell e7eb49fe84 api: check response content length before decoding.
The API decodeBody function will now check the content length
before attempting to decode. If the length is zero, and the out
interface is nil then it is safe to assume the API call is not
returning any data to the user. This allows us to better handle
passing nil to API calls in a single place.
2020-02-20 10:07:44 +01:00
Mahmood Ali f492ab6d9e implement MinQuorum 2020-02-16 16:04:59 -06:00
Seth Hoenig 0e44094d1a client: enable configuring enable_tag_override for services
Consul provides a feature of Service Definitions where the tags
associated with a service can be modified through the Catalog API,
overriding the value(s) configured in the agent's service configuration.

To enable this feature, the flag enable_tag_override must be configured
in the service definition.

Previously, Nomad did not allow configuring this flag, and thus the default
value of false was used. Now, it is configurable.

Because Nomad itself acts as a state machine around the the service definitions
of the tasks it manages, it's worth describing what happens when this feature
is enabled and why.

Consider the basic case where there is no Nomad, and your service is provided
to consul as a boring JSON file. The ultimate source of truth for the definition
of that service is the file, and is stored in the agent. Later, Consul performs
"anti-entropy" which synchronizes the Catalog (stored only the leaders). Then
with enable_tag_override=true, the tags field is available for "external"
modification through the Catalog API (rather than directly configuring the
service definition file, or using the Agent API). The important observation
is that if the service definition ever changes (i.e. the file is changed &
config reloaded OR the Agent API is used to modify the service), those
"external" tag values are thrown away, and the new service definition is
once again the source of truth.

In the Nomad case, Nomad itself is the source of truth over the Agent in
the same way the JSON file was the source of truth in the example above.
That means any time Nomad sets a new service definition, any externally
configured tags are going to be replaced. When does this happen? Only on
major lifecycle events, for example when a task is modified because of an
updated job spec from the 'nomad job run <existing>' command. Otherwise,
Nomad's periodic re-sync's with Consul will now no longer try to restore
the externally modified tag values (as long as enable_tag_override=true).

Fixes #2057
2020-02-10 08:00:55 -06:00
Seth Hoenig f030a22c7c command, docs: create and document consul token configuration for connect acls (gh-6716)
This change provides an initial pass at setting up the configuration necessary to
enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul
Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert`
commands (similar to Vault tokens).

These values are not actually used yet in this changeset.
2020-01-31 19:02:53 -06:00
Drew Bailey da4af9bef3
fix tests, update changelog 2020-01-29 13:55:39 -05:00
Nick Ethier 5cbb94e16e consul: add support for canary meta 2020-01-27 09:53:30 -05:00
Drew Bailey f97d2e96c1
refactor api profile methods
comment why we ignore errors parsing params
2020-01-09 15:15:12 -05:00
Drew Bailey b702dede49
adds qc param, address pr feedback 2020-01-09 15:15:11 -05:00
Drew Bailey 45210ed901
Rename profile package to pprof
Address pr feedback, rename profile package to pprof to more accurately
describe its purpose. Adds gc param for heap lookup profiles.
2020-01-09 15:15:10 -05:00
Drew Bailey 1b8af920f3
address pr feedback 2020-01-09 15:15:09 -05:00
Drew Bailey 92469ffcb3
comments for api usage of agent profile 2020-01-09 15:15:09 -05:00
Drew Bailey 9a80938fb1
region forwarding; prevent recursive forwards for impossible requests
prevent region forwarding loop, backfill tests

fix failing test
2020-01-09 15:15:06 -05:00
Drew Bailey aec81a0b99
api agent endpoints
helper func to return serverPart based off of serverID
2020-01-09 15:15:05 -05:00
Drew Bailey 49ad5fbc85
agent pprof endpoints
wip, agent endpoint and client endpoint for pprof profiles

agent endpoint test
2020-01-09 15:15:02 -05:00
Mahmood Ali 0ec9532ab1
Merge pull request #6831 from hashicorp/add_inmemory_certificate
Add option to set certificate in-memory
2019-12-19 08:54:32 -05:00
Drew Bailey 24929776a2
shutdown delay for task groups
copy struct values

ensure groupserviceHook implements RunnerPreKillhook

run deregister first

test that shutdown times are delayed

move magic number into variable
2019-12-16 11:38:16 -05:00
Michel Vocks 5cb462fd13 Add raw field for ClientCert and ClientKey 2019-12-16 14:30:00 +01:00
Michel Vocks 6e413b3929 Update go mod 2019-12-16 12:47:10 +01:00
Michel Vocks 3864d91d03 Add option to set certificate in-memory via SDK 2019-12-16 10:59:27 +01:00
Michael Schurter ecf970b5a5
Merge pull request #6370 from pmcatominey/tls-server-name
command: add -tls-server-name flag
2019-11-20 08:44:54 -08:00
Michael Schurter 796758b8a5 core: add semver constraint
The existing version constraint uses logic optimized for package
managers, not schedulers, when checking prereleases:

- 1.3.0-beta1 will *not* satisfy ">= 0.6.1"
- 1.7.0-rc1 will *not* satisfy ">= 1.6.0-beta1"

This is due to package managers wishing to favor final releases over
prereleases.

In a scheduler versions more often represent the earliest release all
required features/APIs are available in a system. Whether the constraint
or the version being evaluated are prereleases has no impact on
ordering.

This commit adds a new constraint - `semver` - which will use Semver
v2.0 ordering when evaluating constraints. Given the above examples:

- 1.3.0-beta1 satisfies ">= 0.6.1" using `semver`
- 1.7.0-rc1 satisfies ">= 1.6.0-beta1" using `semver`

Since existing jobspecs may rely on the old behavior, a new constraint
was added and the implicit Consul Connect and Vault constraints were
updated to use it.
2019-11-19 08:40:19 -08:00
Luiz Aoqui 5bd7cdd5c3
api: add `StartedAt` in `Node.DrainStrategy` 2019-11-13 17:54:40 -05:00
Mahmood Ali 90d81fcd55 api: go-uuid is no longer needed 2019-11-12 11:02:33 -05:00
Mahmood Ali d4514c7b73 api: avoid depending on helper internal package 2019-11-12 11:02:33 -05:00
Chris Raborg 763735d449 Update MonitorDrain comment to indicate channel is closed on errors (#6671)
Fixes #6645
2019-11-11 14:15:17 -05:00
Drew Bailey 0e49da7f55
update test 2019-11-08 15:49:04 -05:00
Drew Bailey 3b4d44d030
switch to uuid helper package 2019-11-08 09:28:06 -05:00
Drew Bailey e53788c47f
Remove response body from websocket error
If a websocket connection errors we currently return the error with a
copy of the response body. The response body from the websocket can
often times be completely illegible so remove it from the error string.

make alloc id empty for more reliable failure

un-gzip if content encoding header present
2019-11-08 09:28:02 -05:00
Ben Barnard b87ecd5f8c Escape job ID in API requests (#2411)
Jobs can be created with user-provided IDs containing any character
except spaces. The jobId needs to be escaped when used in a request
path, otherwise jobs created with names such as "why?" can't be managed
after they are created.
2019-11-07 08:35:39 -05:00
James Rasell 4ee23df7ae Remove trailing dot on drain message to ensure better consistency. (#5956) 2019-11-05 16:53:38 -05:00
Drew Bailey ddfa20b993
address feedback, fix gauge metric name 2019-11-05 09:51:57 -05:00
Drew Bailey e4b3e1d7d4
allow more time for streaming message
remove unused struct
2019-11-05 09:51:55 -05:00