Commit graph

1832 commits

Author SHA1 Message Date
Seth Hoenig 40d36fc0ec agent: revert use of http connlimit
https://github.com/hashicorp/nomad/pull/9608 introduced the use of the
built-in HTTP 429 response handler provided by go-connlimit. There is
concern though around plausible DOS attacks that need to be addressed,
so this PR reverts that functionality.

It keeps a fix in the tests around the use of an HTTPS enabled client
for when the server is listening on HTTPS. Previously, the tests would
fail deterministically with io.EOF because that's how the TLS server
terminates invalid connections.

Now, the result is much less deterministic. The state of the client
connection and the server socket depends on when the connection is
closed and how far along the handshake was.
2020-12-14 14:40:14 -06:00
Seth Hoenig 0091325721 command: give flag-helpers a better name 2020-12-14 10:07:27 -06:00
Seth Hoenig d3f1c3adcf
Merge pull request #9608 from hashicorp/f-go-connlimit
Use go-connlimit to ratelimit with 429 responses
2020-12-10 11:05:07 -06:00
Seth Hoenig a28cd45988 client: fix plumbing of testing object into helper 2020-12-10 11:04:38 -06:00
Kris Hicks 0cf9cae656
Apply some suggested fixes from staticcheck (#9598) 2020-12-10 07:29:18 -08:00
Seth Hoenig 2cc5787f97 client: fix https test cases in client rate limits 2020-12-10 09:20:28 -06:00
Ben Buzbee a8e4aa76c6 Use new go-connlimit with HTTP 429 response
This is essentially a port of Consul's similar fix
Changes are:
go get -u github.com/hashicorp/go-connlimit
go mod vendor
Use new HTTP429 handler

20d1ea7d2d
2020-12-09 17:57:16 -06:00
Kris Hicks 0a3a748053
Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Kris Hicks 93155ba3da
Add gocritic to golangci-lint config (#9556) 2020-12-08 12:47:04 -08:00
Dennis Schön 038f2cce57
return 405 on non-GET requests to /v1/event/stream (fixes #9526) (#9564) 2020-12-08 13:09:20 -05:00
Dave May be0a14d70b
fix AgentHostRequest panic found in GH-9546 (#9554)
* debug: refactor nodeclass test
* debug: add case to track down SIGSEGV on client to server Agent.Host RPC
* verify server to avoid panic on AgentHostRequest RPC call, fixes GH-9546
* simplify Agent.Host RPC lookup logic
2020-12-07 17:34:40 -05:00
Dennis Schön a9c97d9257
use os.ErrDeadlineExceeded in tests 2020-12-07 10:40:28 -05:00
Drew Bailey 9adca240f8
Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447)
* upsertaclpolicies

* delete acl policies msgtype

* upsert acl policies msgtype

* delete acl tokens msgtype

* acl bootstrap msgtype

wip unsubscribe on token delete

test that subscriptions are closed after an ACL token has been deleted

Start writing policyupdated test

* update test to use before/after policy

* add SubscribeWithACLCheck to run acl checks on subscribe

* update rpc endpoint to use broker acl check

* Add and use subscriptions.closeSubscriptionFunc

This fixes the issue of not being able to defer unlocking the mutex on
the event broker in the for loop.

handle acl policy updates

* rpc endpoint test for terminating acl change

* add comments

Co-authored-by: Kris Hicks <khicks@hashicorp.com>
2020-12-01 11:11:34 -05:00
Drew Bailey a0b7f05a7b
Remove Managed Sinks from Nomad (#9470)
* Remove Managed Sinks from Nomad

Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.

* update comment

* changelog
2020-11-30 14:00:31 -05:00
Seth Hoenig e81e9223ef consul/connect: enable setting datacenter in connect upstream
Before, upstreams could only be defined using the default datacenter.
Now, the `datacenter` field can be set in a connect upstream definition,
informing consul of the desire for an instance of the upstream service
in the specified datacenter. The field is optional and continues to
default to the local datacenter.

Closes #8964
2020-11-30 10:38:30 -06:00
Tim Gross 4e79ddea45
csi/api: populate ReadAllocs/WriteAllocs fields (#9377)
The API is missing values for `ReadAllocs` and `WriteAllocs` fields, resulting
in allocation claims not being populated in the web UI. These fields mirror
the fields in `nomad/structs.CSIVolume`. Returning a separate list of stubs
for read and write would be ideal, but this can't be done without either
bloating the API response with repeated full `Allocation` data, or causing a
panic in previous versions of the CLI.

The `nomad/structs` fields are persisted with nil values and are populated
during RPC, so we'll do the same in the HTTP API and populate the `ReadAllocs`
and `WriteAllocs` fields with a map of allocation IDs, but with null
values. The web UI will then create its `ReadAllocations` and
`WriteAllocations` fields by mapping from those IDs to the values in
`Allocations`, instead of flattening the map into a list.
2020-11-25 16:44:06 -05:00
Seth Hoenig 3c17dc2ecc api: safely access legacy MBits field 2020-11-23 10:36:10 -06:00
Nick Ethier 9471892df4 mock: add default host network 2020-11-23 10:11:00 -06:00
Chris Baker 7c6071e6c4 updated alloc_endpoint to mutate a copy of the returned allocation, instead of the instance in the state store 2020-11-15 17:52:50 +00:00
Chris Baker b244d5e949 documenting test for #9367 2020-11-15 17:47:50 +00:00
Seth Hoenig bb8a5816a0 jobspec: add support for headers in artifact stanza
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.

The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.

Closes #9306
2020-11-13 12:03:54 -06:00
Seth Hoenig b19bc6be2b consul: prevent re-registration churn by correctly comparing sidecar tags
Previously, connect sidecars would be re-registered with consul every cycle
of Nomad's reconciliation loop around Consul service registrations. This is
because part of the comparison used `reflect.DeepEqual` on []string objects,
which returns false when one object is `[]string{}` and the other is `[]string{}(nil)`.

Unforunately, this was always the case, and every Connect sidecar service
would be re-registered on every iteration, which happens every 30 seconds.
2020-11-11 18:01:17 -06:00
Nick Ethier 5e1634eda1
structs: canonicalize allocatedtaskresources to populate shared ports (#9309) 2020-11-11 16:21:47 -05:00
Tim Gross 60874ebe25
csi: Postrun hook should not change mode (#9323)
The unpublish workflow requires that we know the mode (RW vs RO) if we want to
unpublish the node. Update the hook and the Unpublish RPC so that we mark the
claim for release in a new state but leave the mode alone. This fixes a bug
where RO claims were failing node unpublish.

The core job GC doesn't know the mode, but we don't need it for that workflow,
so add a mode specifically for GC; the volumewatcher uses this as a sentinel
to check whether claims (with their specific RW vs RO modes) need to be claimed.
2020-11-11 13:06:30 -05:00
Mahmood Ali 69849a42a5
Merge pull request #9298 from hashicorp/f-hcl2-localsvars
HCL2: Variables and Locals
2020-11-09 16:44:37 -05:00
Luiz Aoqui ea81ac5d3d
Merge pull request #9296 from hashicorp/b-remove-namespace-from-scale-request
Remove Namespace field from JobScaleRequest
2020-11-09 15:13:33 -05:00
Mahmood Ali 1ae3e8a8eb Start using the new jobspec2 API 2020-11-09 15:01:31 -05:00
Luiz Aoqui c536286c7a
remove Namespace field from JobScaleRequest 2020-11-09 13:02:05 -05:00
Nick Ethier 04f5c4ee5f
ar/groupservice: remove drivernetwork (#9233)
* ar/groupservice: remove drivernetwork

* consul: allow host address_mode to accept raw port numbers

* consul: fix logic for blank address
2020-11-05 15:00:22 -05:00
Kris Hicks 1da9e7fc67
Add event sink API and CLI commands (#9226)
Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>
2020-11-02 09:57:35 -08:00
Chris Baker 719077a26d added new policy capabilities for recommendations API
state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type
2020-10-28 14:32:16 +00:00
Drew Bailey 86080e25a9
Send events to EventSinks (#9171)
* Process to send events to configured sinks

This PR adds a SinkManager to a server which is responsible for managing
managed sinks. Managed sinks subscribe to the event broker and send
events to a sink writer (webhook). When changes to the eventstore are
made the sinkmanager and managed sink are responsible for reloading or
starting a new managed sink.

* periodically check in sink progress to raft

Save progress on the last successfully sent index to raft. This allows a
managed sink to resume close to where it left off in the event of a lost
server or leadership change

dereference eventsink so we can accurately use the watchch

When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger
2020-10-26 17:27:54 -04:00
Drew Bailey 1ae39a9ed9
event sink crud operation api (#9155)
* network sink rpc/api plumbing

state store methods and restore

upsert sink test

get sink

delete sink

event sink list and tests

go generate new msg types

validate sink on upsert

* go generate
2020-10-23 14:23:00 -04:00
Michael Schurter c2dd9bc996 core: open source namespaces 2020-10-22 15:26:32 -07:00
Mahmood Ali 059e87c862
Merge pull request #9142 from hashicorp/f-hclv2-2.3
Support HCLv2 for Nomad jobs
2020-10-22 12:26:28 -05:00
Drew Bailey f3dcefe5a9
remove event durability (#9147)
* remove event durability

temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs

* drop events table schema and state store methods

* fix neweventbuffer invocations
2020-10-22 12:21:03 -04:00
Mahmood Ali f52bda4c30 api: update /render api to parse hclv2 2020-10-21 15:46:57 -04:00
Drew Bailey 6c788fdccd
Events/msgtype cleanup (#9117)
* use msgtype in upsert node

adds message type to signature for upsert node, update tests, remove placeholder method

* UpsertAllocs msg type test setup

* use upsertallocs with msg type in signature

update test usage of delete node

delete placeholder msgtype method

* add msgtype to upsert evals signature, update test call sites with test setup msg type

handle snapshot upsert eval outside of FSM and ignore eval event

remove placeholder upsertevalsmsgtype

handle job plan rpc and prevent event creation for plan

msgtype cleanup upsertnodeevents

updatenodedrain msgtype

msg type 0 is a node registration event, so set the default  to the ignore type

* fix named import

* fix signature ordering on upsertnode to match
2020-10-19 09:30:15 -04:00
Drew Bailey fba0d6dc6a
event buffer size and durable count must be non negative 2020-10-15 16:34:33 -04:00
Nick Ethier 4903e5b114
Consul with CNI and host_network addresses (#9095)
* consul: advertise cni and multi host interface addresses

* structs: add service/check address_mode validation

* ar/groupservices: fetch networkstatus at hook runtime

* ar/groupservice: nil check network status getter before calling

* consul: comment network status can be nil
2020-10-15 15:32:21 -04:00
Michael Schurter ea55c497b7
Merge pull request #9094 from hashicorp/f-1.0
s/0.13/1.0/g
2020-10-15 08:53:33 -07:00
James Rasell 42a6e7140f
Merge pull request #9083 from hashicorp/b-fix-enterprise-config-merge
agent: fix enterprise config overlay merging.
2020-10-15 08:40:49 +02:00
Michael Schurter 9c3972937b s/0.13/1.0/g
1.0 here we come!
2020-10-14 15:17:47 -07:00
Michael Schurter dd09fa1a4a
Merge pull request #9055 from hashicorp/f-9017-resources
api: add field filters to /v1/{allocations,nodes}
2020-10-14 14:49:39 -07:00
Michael Schurter 6890cffd7a unify boolean parameter parsing 2020-10-14 12:23:25 -07:00
Drew Bailey c463479848
filter on additional filter keys, remove switch statement duplication
properly wire up durable event count

move newline responsibility

moves newline creation from NDJson to the http handler, json stream only encodes and sends now

ignore snapshot restore if broker is disabled

enable dev mode to access event steam without acl

use mapping instead of switch

use pointers for config sizes, remove unused ttl, simplify closed conn logic
2020-10-14 14:14:33 -04:00
Michael Schurter 8ccbd92cb6 api: add field filters to /v1/{allocations,nodes}
Fixes #9017

The ?resources=true query parameter includes resources in the object
stub listings. Specifically:

- For `/v1/nodes?resources=true` both the `NodeResources` and
  `ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
  included.

The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)
2020-10-14 10:35:22 -07:00
Drew Bailey 684807bddb
namespace filtering 2020-10-14 12:44:43 -04:00
Drew Bailey df96b89958
Add EvictCallbackFn to handle removing entries from go-memdb when they
are removed from the event buffer.

Wire up event buffer size config, use pointers for structs.Events
instead of copying.
2020-10-14 12:44:42 -04:00
Drew Bailey 315f77a301
rehydrate event publisher on snapshot restore
address pr feedback
2020-10-14 12:44:41 -04:00
Drew Bailey d793529d61
event durability count and cfg 2020-10-14 12:44:40 -04:00
Drew Bailey b4c135358d
use Events to wrap index and events, store in events table 2020-10-14 12:44:39 -04:00
Drew Bailey 559517455a
wire up enable_event_publisher 2020-10-14 12:44:38 -04:00
Drew Bailey 4793bb4e01
Events/deployment events (#9004)
* Node Drain events and Node Events (#8980)

Deployment status updates

handle deployment status updates (paused, failed, resume)

deployment alloc health

generate events from apply plan result

txn err check, slim down deployment event

one ndjson line per index

* consolidate down to node event + type

* fix UpdateDeploymentAllocHealth test invocations

* fix test
2020-10-14 12:44:37 -04:00
Drew Bailey a4a2975edf
Event Stream API/RPC (#8947)
This Commit adds an /v1/events/stream endpoint to stream events from.

The stream framer has been updated to include a SendFull method which
does not fragment the data between multiple frames. This essentially
treats the stream framer as a envelope to adhere to the stream framer
interface in the UI.

If the `encode` query parameter is omitted events will be streamed as
newline delimted JSON.
2020-10-14 12:44:36 -04:00
James Rasell e0734bed77
agent: fix enterprise config overlay merging. 2020-10-14 09:35:16 +02:00
Chris Baker 1d35578bed removed backwards-compatible/untagged metrics deprecated in 0.7 2020-10-13 20:18:39 +00:00
Seth Hoenig ed13e5723f consul/connect: dynamically select envoy sidecar at runtime
As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.

This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.

Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.

Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.

`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.

Addresses #8585 #7665
2020-10-13 09:14:12 -05:00
Yoan Blanc 891accb89a
use allow/deny instead of the colored alternatives (#9019)
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-10-12 08:47:05 -04:00
Seth Hoenig 6cffbecb3a
Merge pull request #9033 from pierreca/verify-remove-checks
Do not double-remove checks removed by Consul
2020-10-06 10:16:13 -05:00
James Rasell ffe6533ad1
Merge pull request #9027 from hashicorp/f-gh-9026
cli: move tests to use NewMockUi func.
2020-10-06 08:28:18 +02:00
Pierre Cauchois 1efe05f516 Do not double-remove checks removed by Consul
When deregistering a service, consul also deregisters the associated
checks. The current state keeps track of all services and all checks
separately and deregisters them in sequence, which leads, whether during
syncs or shutdowns, to check deregistrations happening twice and failing
the second time (generating errors in logs)

This fix includes:
- a fix to the sync logic that just pulls the checks *after* the
services have been synced
- a fix to the shutdown mechanism that gets an updated list of checks
after deregistering the services, so that we get a cleaner check
deregistration process.
2020-10-06 00:30:29 +00:00
Chris Baker 7f701fddd0 updated docs and validation to further prohibit null chars in region, datacenter, and job name 2020-10-05 18:01:50 +00:00
James Rasell 2ed78b8a7e
cli: move tests to use NewMockUi func. 2020-10-05 16:07:41 +02:00
Kent 'picat' Gruber 5e1c716835
Merge pull request #8998 from hashicorp/keygen-32-bytes
Use 32-byte key for gossip encryption to enable AES-256
2020-10-02 17:17:55 -04:00
Kent 'picat' Gruber b03f79700c Fix panic in test due to the agent's logger not being initialized yet
So a null logger is used to avoid the problem.
2020-10-02 11:10:27 -04:00
Fredrik Hoem Grelland 953d4de8dd
update consul-template to v0.25.1 (#8988) 2020-10-01 14:08:49 -04:00
Kent 'picat' Gruber 90e85f9add Fix other usages of initKeyring func to use logger as third argument 2020-10-01 11:13:06 -04:00
Kent 'picat' Gruber b98bb99dfe Log AES-128 and AES-192 key sizes during keyring initialization 2020-10-01 11:12:14 -04:00
Michael Schurter 765473e8b0 jobspec: lower min cpu resources from 10->1
Since CPU resources are usually a soft limit it is desirable to allow
setting it as low as possible to allow tasks to run only in "idle" time.

Setting it to 0 is still not allowed to avoid potential unintentional
side effects with allowing a zero value. While there may not be any side
effects this commit attempts to minimize risk by avoiding the issue.

This does *not* change the defaults.
2020-09-30 12:15:13 -07:00
Michael Schurter 1544341f09
Merge pull request #8862 from hashicorp/release-0.12.4
Prepare for 0.13 development cycle
2020-09-10 09:14:44 -07:00
Mahmood Ali d4f385d6e1
Upgrade to golang 1.15 (#8858)
Upgrade to golang 1.15

Starting with golang 1.5, setting Ctty value result in `Setctty set but Ctty not valid in child` error, as part of https://github.com/golang/go/issues/29458 .
This commit lifts the fix in https://github.com/creack/pty/pull/97 .
2020-09-09 15:59:29 -04:00
Nomad Release bot 3b8a2f22dc Generate files for 0.12.4-rc1 release 2020-09-03 02:59:23 +00:00
Tim Gross b77fe023b5
MRD: move 'job stop -global' handling into RPC (#8776)
The initial implementation of global job stop for MRD looped over all the
regions in the CLI for expedience. This changeset includes the OSS parts of
moving this into the RPC layer so that API consumers don't have to implement
this logic themselves.
2020-08-28 14:28:13 -04:00
Lang Martin 7d483f93c0
csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699)
* nomad/structs/csi: add explicit job support
* nomad/state/state_store: capture job updates directly
* api/nodes: CSIInfo needs the AllocID
* command/agent/csi_endpoint: AllocID was missing
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-08-27 17:20:00 -04:00
Seth Hoenig 9f1f2a5673 Merge branch 'master' into f-cc-ingress 2020-08-26 15:31:05 -05:00
Seth Hoenig dfe179abc5 consul/connect: fixup some comments and context timeout 2020-08-26 13:17:16 -05:00
Tim Gross f9b6c8153c
csi: fix panic in serializing nil allocs in volume API (#8735)
- fix panic in serializing nil allocs in volume API
- prevent potential panic in serializing plugin allocs
2020-08-25 10:13:05 -04:00
Seth Hoenig 26e77623e5 consul/connect: fixup tests to use new consul sdk 2020-08-24 12:02:41 -05:00
Seth Hoenig c4fa644315 consul/connect: remove envoy dns option from gateway proxy config 2020-08-24 09:11:55 -05:00
Yoan Blanc 327d17e0dc
fixup! vendor: consul/api, consul/sdk v1.6.0
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-08-24 08:59:03 +02:00
Seth Hoenig 5b072029f2 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
Nick Ethier 3cd5f46613
Update UI to use new allocated ports fields (#8631)
* nomad: canonicalize alloc shared resources to populate ports

* ui: network ports

* ui: remove unused task network references and update tests with new shared ports model

* ui: lint

* ui: revert auto formatting

* ui: remove unused page objects

* structs: remove unrelated test from bad conflict resolution

* ui: formatting
2020-08-20 11:07:13 -04:00
Tim Gross 22e77bb03c
mrd: remove redundant validation in HTTP endpoint (#8685)
The `regionForJob` function in the HTTP job endpoint overrides the region for
multiregion jobs to `global`, which is used as a sentinel value in the
server's job endpoint to avoid re-registration loops. This changeset removes
an extraneous check that results in errors in the web UI and makes
round-tripping through the HTTP API cumbersome for all consumers.
2020-08-18 16:48:09 -04:00
Lang Martin 6d8165c410
command/agent/csi_endpoint: explicit allocations (#8669) 2020-08-13 15:48:08 -04:00
Tim Gross 7dca72acbe
csi: fix panic from assignment to nil map in plugin API (#8666) 2020-08-13 11:36:41 -04:00
Tim Gross 3faa138732
fix panic converting structs to API in CSI endpoint (#8659) 2020-08-12 15:59:10 -04:00
Nomad Release bot 1ea9d4eb22 Generate files for 0.12.2 release 2020-08-12 00:50:49 +00:00
Lang Martin 07ea822c6a
nomad debug renamed to nomad operator debug (#8602)
* renamed: command/debug.go -> command/operator_debug.go
* website: rename debug -> operator debug
* website/pages/api-docs/agent: name in api docs
2020-08-11 15:39:44 -04:00
Lang Martin c82b2a2454
CSI: volume and plugin allocations in the API (#8590)
* command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints
2020-08-11 12:24:41 -04:00
Tim Gross 443fdaa86b
csi: nomad volume detach command (#8584)
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.

The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
2020-08-11 10:18:54 -04:00
Seth Hoenig fd4804bf26 consul: able to set pass/fail thresholds on consul service checks
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
  https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical

Nomad doesn't do much besides pass the fields through to Consul.

Fixes #6913
2020-08-10 14:08:09 -05:00
Drew Bailey b296558b8e
oss compoments for multi-vault namespaces
adds in oss components to support enterprise multi-vault namespace feature

upgrade specific doc on vault multi-namespaces

vault docs

update test to reflect new error
2020-07-24 10:14:59 -04:00
James Rasell 95db43eaf0
Merge pull request #8491 from hashicorp/b-gh-8481
api: task groups in system jobs do not support scaling stanzas.
2020-07-24 14:20:26 +02:00
Nomad Release bot f2f50bf48e Generate files for 0.12.1 release 2020-07-23 13:17:59 +00:00
Lars Lehtonen e26ea30b7e
command/agent: fix dropped test error (#8504) 2020-07-22 15:06:35 -04:00
James Rasell 2da8bd8f58
agent: task groups in system jobs do not support scaling stanzas. 2020-07-22 11:10:59 +02:00
Mahmood Ali 72ac33e4e7 Refactor setupLoggers 2020-07-17 11:05:57 -04:00
Mahmood Ali ad2d484974 Set AgentShutdown 2020-07-17 11:04:57 -04:00
Chris Baker f8478b6f82 Merge branch 'master' of github.com:hashicorp/nomad into release-0.12.0 2020-07-08 21:16:31 +00:00
Nick Ethier 119ece09a0
docs: add CNI and host_network docs (#8391)
Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>
2020-07-08 15:45:04 -04:00
Nomad Release bot 549e766eab Generate files for 0.12.0-rc1 release 2020-07-07 03:17:05 +00:00
Nick Ethier e0fb634309
ar: support opting into binding host ports to default network IP (#8321)
* ar: support opting into binding host ports to default network IP

* fix config plumbing

* plumb node address into network resource

* struct: only handle network resource upgrade path once
2020-07-06 18:51:46 -04:00
Tim Gross 18250f71fd
fix region flag vs job region handling in plan/submit (#8347) 2020-07-06 15:46:09 -04:00
Chris Baker 9100b6b7c0 changes to make sure that Max is present and valid, to improve error messages
* made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected
* added checks to HCL parsing that it is present
* when Scaling.Max is absent/invalid, don't return extraneous error messages during validation
* tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs

resolves #8355
2020-07-04 19:05:50 +00:00
Mahmood Ali 329969b97e tests: make testagent shutdown idempotent
Avoid double freeing ports if an agent.Shutdown() is called multiple
times.
2020-07-03 09:16:01 -04:00
Lang Martin 6c22cd587d
api: nomad debug new /agent/host (#8325)
* command/agent/host: collect host data, multi platform

* nomad/structs/structs: new HostDataRequest/Response

* client/agent_endpoint: add RPC endpoint

* command/agent/agent_endpoint: add Host

* api/agent: add the Host endpoint

* nomad/client_agent_endpoint: add Agent Host with forwarding

* nomad/client_agent_endpoint: use findClientConn

This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.

* command/debug: call agent hosts

* command/agent/host: eliminate calling external programs
2020-07-02 09:51:25 -04:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Tim Gross e52f76ed53 update compiled static assets 2020-06-24 16:37:13 -04:00
Tim Gross a449009e9f
multiregion validation fixes (#8265)
Multi-region jobs need to bypass validating counts otherwise we get spurious
warnings in Job.Plan.
2020-06-24 12:18:51 -04:00
Seth Hoenig e79b79034d connect/native: fixup command/agent/consul/connect test cases 2020-06-24 09:05:56 -05:00
Seth Hoenig 6c5ab7f45e consul/connect: split connect native flag and task in service 2020-06-23 10:22:22 -05:00
Seth Hoenig 4d71f22a11 consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
2020-06-22 14:07:44 -05:00
Michael Schurter 562704124d
Merge pull request #8208 from hashicorp/f-multi-network
multi-interface network support
2020-06-19 15:46:48 -07:00
Mahmood Ali 963b1251ff
Merge pull request #8082 from hashicorp/f-raft-multipler
Implement raft multipler flag
2020-06-19 10:04:59 -04:00
Nick Ethier f0559a8162
multi-interface network support 2020-06-19 09:42:10 -04:00
Mahmood Ali 38a01c050e
Merge pull request #8192 from hashicorp/f-status-allnamespaces-2
CLI Allow querying all namespaces for jobs and allocations - Try 2
2020-06-18 20:16:52 -04:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Mahmood Ali e784fe331a use '*' to indicate all namespaces
This reverts the introduction of AllNamespaces parameter that was merged
earlier but never got released.
2020-06-17 16:27:43 -04:00
Tim Gross 7b12445f29 multiregion: change AutoRevert to OnFailure 2020-06-17 11:05:45 -04:00
Tim Gross b09b7a2475 Multiregion job registration
Integration points for multiregion jobs to be registered in the enterprise
version of Nomad:
* hook in `Job.Register` for enterprise to send job to peer regions
* remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs
2020-06-17 11:04:58 -04:00
Tim Gross 161bcd9479 use constants from http package 2020-06-17 11:04:02 -04:00
Tim Gross b93efc16d5 multiregion CLI: nomad deployment unblock 2020-06-17 11:03:44 -04:00
Drew Bailey 9263fcb0d3 Multiregion deploy status and job status CLI 2020-06-17 11:03:34 -04:00
Tim Gross 6851024925 Multiregion structs
Initial struct definitions, jobspec parsing, validation, and conversion
between Nomad structs and API structs for multi-region deployments.
2020-06-17 11:00:14 -04:00
Chris Baker 1e3563e08c wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register 2020-06-16 18:45:17 +00:00
Mahmood Ali 69bb42acf8 tests: prefix agent logs to identify agent sources 2020-06-07 16:38:11 -04:00
Mahmood Ali 9eb13ae144 basic snapshot restore 2020-06-07 15:46:23 -04:00
Mahmood Ali de44d9641b
Merge pull request #8047 from hashicorp/f-snapshot-save
API for atomic snapshot backups
2020-06-01 07:55:16 -04:00
Mahmood Ali a73cd01a00
Merge pull request #8001 from hashicorp/f-jobs-list-across-nses
endpoint to expose all jobs across all namespaces
2020-05-31 21:28:03 -04:00
Mahmood Ali 0e8fafd739 implement raft multiplier 2020-05-31 12:24:27 -04:00
Drew Bailey 23d24c7a7f
removes pro tags (#8014) 2020-05-28 15:40:17 -04:00
Drew Bailey 34871f89be
Oss license support for ent builds (#8054)
* changes necessary to support oss licesning shims

revert nomad fmt changes

update test to work with enterprise changes

update tests to work with new ent enforcements

make check

update cas test to use scheduler algorithm

back out preemption changes

add comments

* remove unused method
2020-05-27 13:46:52 -04:00
Mahmood Ali 2108681c1d Endpoint for snapshotting server state 2020-05-21 20:04:38 -04:00
James Rasell ae0fb98c6b
api: return custom error if API attempts to decode empty body. 2020-05-19 15:46:31 +02:00
Mahmood Ali 5ab2d52e27 endpoint to expose all jobs across all namespaces
Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all
namespaces.  The returned list is to contain a `Namespace` field
indicating the job namespace.

If ACL is enabled, the request token needs to be a management token or
have `namespace:list-jobs` capability on all existing namespaces.
2020-05-18 13:50:46 -04:00
Nomad Release bot 189a378549 Generate files for 0.11.2 release 2020-05-14 20:49:42 +00:00
Mahmood Ali 9366181be6 always check default_scheduler_config config
Also, avoid early return on validation to avoid masking some validation
bugs in dev setup.
2020-05-14 14:16:12 -04:00
Lang Martin d3c4700cd3
server: stop after client disconnect (#7939)
* jobspec, api: add stop_after_client_disconnect

* nomad/state/state_store: error message typo

* structs: alloc methods to support stop_after_client_disconnect

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math

* scheduler/reconcile_util: delayByStopAfterClientDisconnect

* scheduler/reconcile: use delayByStopAfterClientDisconnect

* scheduler/util: updateNonTerminalAllocsToLost comments

This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now

* scheduler/testing: AssertEvalStatus just use require

* scheduler/generic_sched: don't create a blocked eval if delayed

* scheduler/generic_sched_test: several scheduling cases
2020-05-13 16:39:04 -04:00
Tim Gross 4374c1a837
csi: support Secrets parameter in CSI RPCs (#7923)
CSI plugins can require credentials for some publishing and
unpublishing workflow RPCs. Secrets are configured at the time of
volume registration, stored in the volume struct, and then passed
around as an opaque map by Nomad to the plugins.
2020-05-11 17:12:51 -04:00
Mahmood Ali 061a439f2c
Merge pull request #7912 from hashicorp/f-scheduler-algorithm-followup
Scheduler Algorithm Defaults handling and docs
2020-05-11 09:30:58 -04:00
Tim Gross 3aa761b151
Periodic GC for volume claims (#7881)
This changeset implements a periodic garbage collection of CSI volumes
with missing allocations. This can happen in a scenario where a node
update fails partially and the allocation updates are written to raft
but the evaluations to GC the volumes are dropped. This feature will
cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1
get any stray claims cleaned up.
2020-05-11 08:20:50 -04:00
Mahmood Ali 2c963885b0 handle upgrade path and defaults
Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on
upgrades or on API handling when user misses the value.

The scheduler already treats `""` value as binpack.  This PR merely
ensures that the operator API returns the effective value.
2020-05-09 12:34:08 -04:00
Tim Gross 801ebcfe8d
periodic GC for CSI plugins (#7878)
This changeset implements a periodic garbage collection of unused CSI
plugins. Plugins are self-cleaning when the last allocation for a
plugin is stopped, but this feature will cover any missing edge cases
and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins
cleaned up.
2020-05-06 16:49:12 -04:00
Mahmood Ali b9e3cde865 tests and some clean up 2020-05-01 13:13:30 -04:00
Charlie Voiselle 663fb677cf Add SchedulerAlgorithm to SchedulerConfig 2020-05-01 13:13:29 -04:00
Drew Bailey 42075ef30e
allow test to check if server is enterprise 2020-04-30 14:46:21 -04:00
Drew Bailey 59b76f90e8
hcl fmt from editor
license cli formatting, license endpoints ent only

test oss error

type assertions
2020-04-30 14:46:18 -04:00
Mahmood Ali b8fb32f5d2 http: adjust log level for request failure
Failed requests due to API client errors are to be marked as DEBUG.

The Error log level should be reserved to signal problems with the
cluster and are actionable for nomad system operators.  Logs due to
misbehaving API clients don't represent a system level problem and seem
spurius to nomad maintainers at best.  These log messages can also be
attack vectors for deniel of service attacks by filling servers disk
space with spurious log messages.
2020-04-22 16:19:59 -04:00
Mahmood Ali 5b42796f1e
Merge pull request #7704 from hashicorp/b-agent-shutdown-order
agent: shutdown agent http server last
2020-04-20 10:37:26 -04:00