Commit graph

2111 commits

Author SHA1 Message Date
AndrewChubatiuk 157f8de903 pr comments 2021-02-13 02:42:14 +02:00
AndrewChubatiuk cb84d8ef11 fixed typo 2021-02-13 02:42:14 +02:00
AndrewChubatiuk 4f49614760 fixed typo 2021-02-13 02:42:14 +02:00
AndrewChubatiuk c4bece2b09 fixed typo 2021-02-13 02:42:14 +02:00
AndrewChubatiuk cb527c5d83 fixed tests 2021-02-13 02:42:14 +02:00
AndrewChubatiuk abb7fd1e36 fixed connect port label 2021-02-13 02:42:14 +02:00
AndrewChubatiuk 3e11860b6d fixed connect port label 2021-02-13 02:42:14 +02:00
AndrewChubatiuk cd152643fb fixed connect port label 2021-02-13 02:42:14 +02:00
AndrewChubatiuk af7a16d774 tests fix 2021-02-13 02:42:14 +02:00
AndrewChubatiuk 3d0aa2ef56 allocate sidecar task port on host_network interface 2021-02-13 02:42:13 +02:00
AndrewChubatiuk 47c690479c fixed tests 2021-02-13 02:42:13 +02:00
AndrewChubatiuk 46c639c5d6 fixed tests 2021-02-13 02:42:13 +02:00
AndrewChubatiuk 77f9b84a30 fixed tests 2021-02-13 02:42:13 +02:00
AndrewChubatiuk 99201412da removed proxy suffix 2021-02-13 02:42:13 +02:00
AndrewChubatiuk 844ac16900 fixed variable initialization 2021-02-13 02:42:13 +02:00
AndrewChubatiuk 78465bbd23 customized default sidecar checks 2021-02-13 02:42:13 +02:00
Buck Doyle c22d1114d8
Add handling for license requests in OSS (#9963)
This changes the license-fetching endpoint to respond with 204 in
OSS instead of 501. It closes #9827.
2021-02-08 12:53:06 -06:00
Drew Bailey 82f971f289
OnUpdate configuration for services and checks
Allow for readiness type checks by configuring nomad to ignore warnings
or errors reported by a service check. This allows the deployment to
progress and while Consul handles introducing the sercive into a
resource pool once the check passes.
2021-02-08 08:32:40 -05:00
Chris Baker ce68ee164b Version 1.0.3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJgEuOKAAoJEFGFLYc0j/xMxF8H/3TTU6Tu+Xm0YvcsDaYDphZ/
 X7KQBV0aFiuL5VkTw4PzKEsgryIy9/sqEPyxxyKRowAmos9qhiusjNAIfqdP4TF8
 tdZmTedkfWir9uPD+hyv/LXpwbQ2T8kTwS3xHTYvaOmaCxZr710FEn+imnMk1AUn
 Xs5itkd/CYGr0nBLm+I5GutWSDPmL7Uw8J5Z30fFyoaxoCPAbCWQQNk793SCRUc5
 f/uo18V2tFInmQ+3sAdnM4gPewyStK/a5VvzWavL9fVDtYK83wlqWSchTXY5jpVz
 zNEzt/rYhbBzakPQQKb5zieblh2iGI8aHWpD5w4WduqO2Sg6B/5lAeNZIlW0UJg=
 =2g3c
 -----END PGP SIGNATURE-----

Merge tag 'v1.0.3' into post-release-1.0.3

Version 1.0.3
2021-01-29 19:30:08 +00:00
Seth Hoenig 8b05efcf88 consul/connect: Add support for Connect terminating gateways
This PR implements Nomad built-in support for running Consul Connect
terminating gateways. Such a gateway can be used by services running
inside the service mesh to access "legacy" services running outside
the service mesh while still making use of Consul's service identity
based networking and ACL policies.

https://www.consul.io/docs/connect/gateways/terminating-gateway

These gateways are declared as part of a task group level service
definition within the connect stanza.

service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      terminating {
        // terminating-gateway configuration entry
      }
    }
  }
}

Currently Envoy is the only supported gateway implementation in
Consul. The gateay task can be customized by configuring the
connect.sidecar_task block.

When the gateway.terminating field is set, Nomad will write/update
the Configuration Entry into Consul on job submission. Because CEs
are global in scope and there may be more than one Nomad cluster
communicating with Consul, there is an assumption that any terminating
gateway defined in Nomad for a particular service will be the same
among Nomad clusters.

Gateways require Consul 1.8.0+, checked by a node constraint.

Closes #9445
2021-01-25 10:36:04 -06:00
Drew Bailey 630babb886
prevent double job status update (#9768)
* Prevent Job Statuses from being calculated twice

https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard
2021-01-22 09:18:17 -05:00
Dennis Schön 3eaf1432aa
validate connect block allowed only within group.service 2021-01-20 14:34:23 -05:00
Dennis Schön c376545f49
don't prefix json logging 2021-01-20 09:09:31 -05:00
Nomad Release bot f36d983863 Generate files for 1.0.2 release 2021-01-13 16:52:51 +00:00
Nick Ethier 6705f845f2
Merge pull request #9739 from hashicorp/b-alloc-netmode-ports
Use port's to value when building service address under 'alloc' addr_mode
2021-01-07 09:16:27 -05:00
Nick Ethier bb060bd46b command/agent/consul: remove duplicated tests 2021-01-06 14:11:31 -05:00
Kris Hicks 868ba0cea5
consul: Refactor parts of UpdateWorkload (#9737)
This removes modification of ops in methods that UpdateWorkload calls, keeping
them local to UpdateWorkload. It also includes some rewrites of checkRegs for
clarity.
2021-01-06 11:11:28 -08:00
Nick Ethier ab01e19df3 command/agent/consul: use port's to value when building service address under 'alloc' addr_mode 2021-01-06 13:52:48 -05:00
Seth Hoenig 6c9366986b consul/connect: avoid NPE from unset connect gateway proxy
Submitting a job with an ingress gateway in host networking mode
with an absent gateway.proxy block would cause the Nomad client
to panic on NPE.

The consul registration bits would assume the proxy stanza was
not nil, but it could be if the user does not supply any manually
configured envoy proxy settings.

Check the proxy field is not nil before using it.

Fixes #9669
2021-01-05 09:27:01 -06:00
Seth Hoenig 40d36fc0ec agent: revert use of http connlimit
https://github.com/hashicorp/nomad/pull/9608 introduced the use of the
built-in HTTP 429 response handler provided by go-connlimit. There is
concern though around plausible DOS attacks that need to be addressed,
so this PR reverts that functionality.

It keeps a fix in the tests around the use of an HTTPS enabled client
for when the server is listening on HTTPS. Previously, the tests would
fail deterministically with io.EOF because that's how the TLS server
terminates invalid connections.

Now, the result is much less deterministic. The state of the client
connection and the server socket depends on when the connection is
closed and how far along the handshake was.
2020-12-14 14:40:14 -06:00
Seth Hoenig 0091325721 command: give flag-helpers a better name 2020-12-14 10:07:27 -06:00
Seth Hoenig d3f1c3adcf
Merge pull request #9608 from hashicorp/f-go-connlimit
Use go-connlimit to ratelimit with 429 responses
2020-12-10 11:05:07 -06:00
Seth Hoenig a28cd45988 client: fix plumbing of testing object into helper 2020-12-10 11:04:38 -06:00
Kris Hicks 0cf9cae656
Apply some suggested fixes from staticcheck (#9598) 2020-12-10 07:29:18 -08:00
Seth Hoenig 2cc5787f97 client: fix https test cases in client rate limits 2020-12-10 09:20:28 -06:00
Ben Buzbee a8e4aa76c6 Use new go-connlimit with HTTP 429 response
This is essentially a port of Consul's similar fix
Changes are:
go get -u github.com/hashicorp/go-connlimit
go mod vendor
Use new HTTP429 handler

20d1ea7d2d
2020-12-09 17:57:16 -06:00
Kris Hicks 0a3a748053
Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Kris Hicks 93155ba3da
Add gocritic to golangci-lint config (#9556) 2020-12-08 12:47:04 -08:00
Dennis Schön 038f2cce57
return 405 on non-GET requests to /v1/event/stream (fixes #9526) (#9564) 2020-12-08 13:09:20 -05:00
Dave May be0a14d70b
fix AgentHostRequest panic found in GH-9546 (#9554)
* debug: refactor nodeclass test
* debug: add case to track down SIGSEGV on client to server Agent.Host RPC
* verify server to avoid panic on AgentHostRequest RPC call, fixes GH-9546
* simplify Agent.Host RPC lookup logic
2020-12-07 17:34:40 -05:00
Dennis Schön a9c97d9257
use os.ErrDeadlineExceeded in tests 2020-12-07 10:40:28 -05:00
Drew Bailey 9adca240f8
Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447)
* upsertaclpolicies

* delete acl policies msgtype

* upsert acl policies msgtype

* delete acl tokens msgtype

* acl bootstrap msgtype

wip unsubscribe on token delete

test that subscriptions are closed after an ACL token has been deleted

Start writing policyupdated test

* update test to use before/after policy

* add SubscribeWithACLCheck to run acl checks on subscribe

* update rpc endpoint to use broker acl check

* Add and use subscriptions.closeSubscriptionFunc

This fixes the issue of not being able to defer unlocking the mutex on
the event broker in the for loop.

handle acl policy updates

* rpc endpoint test for terminating acl change

* add comments

Co-authored-by: Kris Hicks <khicks@hashicorp.com>
2020-12-01 11:11:34 -05:00
Drew Bailey a0b7f05a7b
Remove Managed Sinks from Nomad (#9470)
* Remove Managed Sinks from Nomad

Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.

* update comment

* changelog
2020-11-30 14:00:31 -05:00
Seth Hoenig e81e9223ef consul/connect: enable setting datacenter in connect upstream
Before, upstreams could only be defined using the default datacenter.
Now, the `datacenter` field can be set in a connect upstream definition,
informing consul of the desire for an instance of the upstream service
in the specified datacenter. The field is optional and continues to
default to the local datacenter.

Closes #8964
2020-11-30 10:38:30 -06:00
Tim Gross 4e79ddea45
csi/api: populate ReadAllocs/WriteAllocs fields (#9377)
The API is missing values for `ReadAllocs` and `WriteAllocs` fields, resulting
in allocation claims not being populated in the web UI. These fields mirror
the fields in `nomad/structs.CSIVolume`. Returning a separate list of stubs
for read and write would be ideal, but this can't be done without either
bloating the API response with repeated full `Allocation` data, or causing a
panic in previous versions of the CLI.

The `nomad/structs` fields are persisted with nil values and are populated
during RPC, so we'll do the same in the HTTP API and populate the `ReadAllocs`
and `WriteAllocs` fields with a map of allocation IDs, but with null
values. The web UI will then create its `ReadAllocations` and
`WriteAllocations` fields by mapping from those IDs to the values in
`Allocations`, instead of flattening the map into a list.
2020-11-25 16:44:06 -05:00
Seth Hoenig 3c17dc2ecc api: safely access legacy MBits field 2020-11-23 10:36:10 -06:00
Nick Ethier 9471892df4 mock: add default host network 2020-11-23 10:11:00 -06:00
Chris Baker 7c6071e6c4 updated alloc_endpoint to mutate a copy of the returned allocation, instead of the instance in the state store 2020-11-15 17:52:50 +00:00
Chris Baker b244d5e949 documenting test for #9367 2020-11-15 17:47:50 +00:00
Seth Hoenig bb8a5816a0 jobspec: add support for headers in artifact stanza
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.

The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.

Closes #9306
2020-11-13 12:03:54 -06:00
Seth Hoenig b19bc6be2b consul: prevent re-registration churn by correctly comparing sidecar tags
Previously, connect sidecars would be re-registered with consul every cycle
of Nomad's reconciliation loop around Consul service registrations. This is
because part of the comparison used `reflect.DeepEqual` on []string objects,
which returns false when one object is `[]string{}` and the other is `[]string{}(nil)`.

Unforunately, this was always the case, and every Connect sidecar service
would be re-registered on every iteration, which happens every 30 seconds.
2020-11-11 18:01:17 -06:00
Nick Ethier 5e1634eda1
structs: canonicalize allocatedtaskresources to populate shared ports (#9309) 2020-11-11 16:21:47 -05:00
Tim Gross 60874ebe25
csi: Postrun hook should not change mode (#9323)
The unpublish workflow requires that we know the mode (RW vs RO) if we want to
unpublish the node. Update the hook and the Unpublish RPC so that we mark the
claim for release in a new state but leave the mode alone. This fixes a bug
where RO claims were failing node unpublish.

The core job GC doesn't know the mode, but we don't need it for that workflow,
so add a mode specifically for GC; the volumewatcher uses this as a sentinel
to check whether claims (with their specific RW vs RO modes) need to be claimed.
2020-11-11 13:06:30 -05:00
Mahmood Ali 69849a42a5
Merge pull request #9298 from hashicorp/f-hcl2-localsvars
HCL2: Variables and Locals
2020-11-09 16:44:37 -05:00
Luiz Aoqui ea81ac5d3d
Merge pull request #9296 from hashicorp/b-remove-namespace-from-scale-request
Remove Namespace field from JobScaleRequest
2020-11-09 15:13:33 -05:00
Mahmood Ali 1ae3e8a8eb Start using the new jobspec2 API 2020-11-09 15:01:31 -05:00
Luiz Aoqui c536286c7a
remove Namespace field from JobScaleRequest 2020-11-09 13:02:05 -05:00
Nick Ethier 04f5c4ee5f
ar/groupservice: remove drivernetwork (#9233)
* ar/groupservice: remove drivernetwork

* consul: allow host address_mode to accept raw port numbers

* consul: fix logic for blank address
2020-11-05 15:00:22 -05:00
Kris Hicks 1da9e7fc67
Add event sink API and CLI commands (#9226)
Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>
2020-11-02 09:57:35 -08:00
Chris Baker 719077a26d added new policy capabilities for recommendations API
state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type
2020-10-28 14:32:16 +00:00
Drew Bailey 86080e25a9
Send events to EventSinks (#9171)
* Process to send events to configured sinks

This PR adds a SinkManager to a server which is responsible for managing
managed sinks. Managed sinks subscribe to the event broker and send
events to a sink writer (webhook). When changes to the eventstore are
made the sinkmanager and managed sink are responsible for reloading or
starting a new managed sink.

* periodically check in sink progress to raft

Save progress on the last successfully sent index to raft. This allows a
managed sink to resume close to where it left off in the event of a lost
server or leadership change

dereference eventsink so we can accurately use the watchch

When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger
2020-10-26 17:27:54 -04:00
Drew Bailey 1ae39a9ed9
event sink crud operation api (#9155)
* network sink rpc/api plumbing

state store methods and restore

upsert sink test

get sink

delete sink

event sink list and tests

go generate new msg types

validate sink on upsert

* go generate
2020-10-23 14:23:00 -04:00
Michael Schurter c2dd9bc996 core: open source namespaces 2020-10-22 15:26:32 -07:00
Mahmood Ali 059e87c862
Merge pull request #9142 from hashicorp/f-hclv2-2.3
Support HCLv2 for Nomad jobs
2020-10-22 12:26:28 -05:00
Drew Bailey f3dcefe5a9
remove event durability (#9147)
* remove event durability

temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs

* drop events table schema and state store methods

* fix neweventbuffer invocations
2020-10-22 12:21:03 -04:00
Mahmood Ali f52bda4c30 api: update /render api to parse hclv2 2020-10-21 15:46:57 -04:00
Drew Bailey 6c788fdccd
Events/msgtype cleanup (#9117)
* use msgtype in upsert node

adds message type to signature for upsert node, update tests, remove placeholder method

* UpsertAllocs msg type test setup

* use upsertallocs with msg type in signature

update test usage of delete node

delete placeholder msgtype method

* add msgtype to upsert evals signature, update test call sites with test setup msg type

handle snapshot upsert eval outside of FSM and ignore eval event

remove placeholder upsertevalsmsgtype

handle job plan rpc and prevent event creation for plan

msgtype cleanup upsertnodeevents

updatenodedrain msgtype

msg type 0 is a node registration event, so set the default  to the ignore type

* fix named import

* fix signature ordering on upsertnode to match
2020-10-19 09:30:15 -04:00
Drew Bailey fba0d6dc6a
event buffer size and durable count must be non negative 2020-10-15 16:34:33 -04:00
Nick Ethier 4903e5b114
Consul with CNI and host_network addresses (#9095)
* consul: advertise cni and multi host interface addresses

* structs: add service/check address_mode validation

* ar/groupservices: fetch networkstatus at hook runtime

* ar/groupservice: nil check network status getter before calling

* consul: comment network status can be nil
2020-10-15 15:32:21 -04:00
Michael Schurter ea55c497b7
Merge pull request #9094 from hashicorp/f-1.0
s/0.13/1.0/g
2020-10-15 08:53:33 -07:00
James Rasell 42a6e7140f
Merge pull request #9083 from hashicorp/b-fix-enterprise-config-merge
agent: fix enterprise config overlay merging.
2020-10-15 08:40:49 +02:00
Michael Schurter 9c3972937b s/0.13/1.0/g
1.0 here we come!
2020-10-14 15:17:47 -07:00
Michael Schurter dd09fa1a4a
Merge pull request #9055 from hashicorp/f-9017-resources
api: add field filters to /v1/{allocations,nodes}
2020-10-14 14:49:39 -07:00
Michael Schurter 6890cffd7a unify boolean parameter parsing 2020-10-14 12:23:25 -07:00
Drew Bailey c463479848
filter on additional filter keys, remove switch statement duplication
properly wire up durable event count

move newline responsibility

moves newline creation from NDJson to the http handler, json stream only encodes and sends now

ignore snapshot restore if broker is disabled

enable dev mode to access event steam without acl

use mapping instead of switch

use pointers for config sizes, remove unused ttl, simplify closed conn logic
2020-10-14 14:14:33 -04:00
Michael Schurter 8ccbd92cb6 api: add field filters to /v1/{allocations,nodes}
Fixes #9017

The ?resources=true query parameter includes resources in the object
stub listings. Specifically:

- For `/v1/nodes?resources=true` both the `NodeResources` and
  `ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
  included.

The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)
2020-10-14 10:35:22 -07:00
Drew Bailey 684807bddb
namespace filtering 2020-10-14 12:44:43 -04:00
Drew Bailey df96b89958
Add EvictCallbackFn to handle removing entries from go-memdb when they
are removed from the event buffer.

Wire up event buffer size config, use pointers for structs.Events
instead of copying.
2020-10-14 12:44:42 -04:00
Drew Bailey 315f77a301
rehydrate event publisher on snapshot restore
address pr feedback
2020-10-14 12:44:41 -04:00
Drew Bailey d793529d61
event durability count and cfg 2020-10-14 12:44:40 -04:00
Drew Bailey b4c135358d
use Events to wrap index and events, store in events table 2020-10-14 12:44:39 -04:00
Drew Bailey 559517455a
wire up enable_event_publisher 2020-10-14 12:44:38 -04:00
Drew Bailey 4793bb4e01
Events/deployment events (#9004)
* Node Drain events and Node Events (#8980)

Deployment status updates

handle deployment status updates (paused, failed, resume)

deployment alloc health

generate events from apply plan result

txn err check, slim down deployment event

one ndjson line per index

* consolidate down to node event + type

* fix UpdateDeploymentAllocHealth test invocations

* fix test
2020-10-14 12:44:37 -04:00
Drew Bailey a4a2975edf
Event Stream API/RPC (#8947)
This Commit adds an /v1/events/stream endpoint to stream events from.

The stream framer has been updated to include a SendFull method which
does not fragment the data between multiple frames. This essentially
treats the stream framer as a envelope to adhere to the stream framer
interface in the UI.

If the `encode` query parameter is omitted events will be streamed as
newline delimted JSON.
2020-10-14 12:44:36 -04:00
James Rasell e0734bed77
agent: fix enterprise config overlay merging. 2020-10-14 09:35:16 +02:00
Chris Baker 1d35578bed removed backwards-compatible/untagged metrics deprecated in 0.7 2020-10-13 20:18:39 +00:00
Seth Hoenig ed13e5723f consul/connect: dynamically select envoy sidecar at runtime
As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.

This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.

Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.

Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.

`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.

Addresses #8585 #7665
2020-10-13 09:14:12 -05:00
Yoan Blanc 891accb89a
use allow/deny instead of the colored alternatives (#9019)
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-10-12 08:47:05 -04:00
Seth Hoenig 6cffbecb3a
Merge pull request #9033 from pierreca/verify-remove-checks
Do not double-remove checks removed by Consul
2020-10-06 10:16:13 -05:00
James Rasell ffe6533ad1
Merge pull request #9027 from hashicorp/f-gh-9026
cli: move tests to use NewMockUi func.
2020-10-06 08:28:18 +02:00
Pierre Cauchois 1efe05f516 Do not double-remove checks removed by Consul
When deregistering a service, consul also deregisters the associated
checks. The current state keeps track of all services and all checks
separately and deregisters them in sequence, which leads, whether during
syncs or shutdowns, to check deregistrations happening twice and failing
the second time (generating errors in logs)

This fix includes:
- a fix to the sync logic that just pulls the checks *after* the
services have been synced
- a fix to the shutdown mechanism that gets an updated list of checks
after deregistering the services, so that we get a cleaner check
deregistration process.
2020-10-06 00:30:29 +00:00
Chris Baker 7f701fddd0 updated docs and validation to further prohibit null chars in region, datacenter, and job name 2020-10-05 18:01:50 +00:00
James Rasell 2ed78b8a7e
cli: move tests to use NewMockUi func. 2020-10-05 16:07:41 +02:00
Kent 'picat' Gruber 5e1c716835
Merge pull request #8998 from hashicorp/keygen-32-bytes
Use 32-byte key for gossip encryption to enable AES-256
2020-10-02 17:17:55 -04:00
Kent 'picat' Gruber b03f79700c Fix panic in test due to the agent's logger not being initialized yet
So a null logger is used to avoid the problem.
2020-10-02 11:10:27 -04:00
Fredrik Hoem Grelland 953d4de8dd
update consul-template to v0.25.1 (#8988) 2020-10-01 14:08:49 -04:00
Kent 'picat' Gruber 90e85f9add Fix other usages of initKeyring func to use logger as third argument 2020-10-01 11:13:06 -04:00
Kent 'picat' Gruber b98bb99dfe Log AES-128 and AES-192 key sizes during keyring initialization 2020-10-01 11:12:14 -04:00
Michael Schurter 765473e8b0 jobspec: lower min cpu resources from 10->1
Since CPU resources are usually a soft limit it is desirable to allow
setting it as low as possible to allow tasks to run only in "idle" time.

Setting it to 0 is still not allowed to avoid potential unintentional
side effects with allowing a zero value. While there may not be any side
effects this commit attempts to minimize risk by avoiding the issue.

This does *not* change the defaults.
2020-09-30 12:15:13 -07:00
Michael Schurter 1544341f09
Merge pull request #8862 from hashicorp/release-0.12.4
Prepare for 0.13 development cycle
2020-09-10 09:14:44 -07:00
Mahmood Ali d4f385d6e1
Upgrade to golang 1.15 (#8858)
Upgrade to golang 1.15

Starting with golang 1.5, setting Ctty value result in `Setctty set but Ctty not valid in child` error, as part of https://github.com/golang/go/issues/29458 .
This commit lifts the fix in https://github.com/creack/pty/pull/97 .
2020-09-09 15:59:29 -04:00
Nomad Release bot 3b8a2f22dc Generate files for 0.12.4-rc1 release 2020-09-03 02:59:23 +00:00
Tim Gross b77fe023b5
MRD: move 'job stop -global' handling into RPC (#8776)
The initial implementation of global job stop for MRD looped over all the
regions in the CLI for expedience. This changeset includes the OSS parts of
moving this into the RPC layer so that API consumers don't have to implement
this logic themselves.
2020-08-28 14:28:13 -04:00
Lang Martin 7d483f93c0
csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699)
* nomad/structs/csi: add explicit job support
* nomad/state/state_store: capture job updates directly
* api/nodes: CSIInfo needs the AllocID
* command/agent/csi_endpoint: AllocID was missing
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-08-27 17:20:00 -04:00
Seth Hoenig 9f1f2a5673 Merge branch 'master' into f-cc-ingress 2020-08-26 15:31:05 -05:00
Seth Hoenig dfe179abc5 consul/connect: fixup some comments and context timeout 2020-08-26 13:17:16 -05:00
Tim Gross f9b6c8153c
csi: fix panic in serializing nil allocs in volume API (#8735)
- fix panic in serializing nil allocs in volume API
- prevent potential panic in serializing plugin allocs
2020-08-25 10:13:05 -04:00
Seth Hoenig 26e77623e5 consul/connect: fixup tests to use new consul sdk 2020-08-24 12:02:41 -05:00
Seth Hoenig c4fa644315 consul/connect: remove envoy dns option from gateway proxy config 2020-08-24 09:11:55 -05:00
Yoan Blanc 327d17e0dc
fixup! vendor: consul/api, consul/sdk v1.6.0
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-08-24 08:59:03 +02:00
Seth Hoenig 5b072029f2 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
Nick Ethier 3cd5f46613
Update UI to use new allocated ports fields (#8631)
* nomad: canonicalize alloc shared resources to populate ports

* ui: network ports

* ui: remove unused task network references and update tests with new shared ports model

* ui: lint

* ui: revert auto formatting

* ui: remove unused page objects

* structs: remove unrelated test from bad conflict resolution

* ui: formatting
2020-08-20 11:07:13 -04:00
Tim Gross 22e77bb03c
mrd: remove redundant validation in HTTP endpoint (#8685)
The `regionForJob` function in the HTTP job endpoint overrides the region for
multiregion jobs to `global`, which is used as a sentinel value in the
server's job endpoint to avoid re-registration loops. This changeset removes
an extraneous check that results in errors in the web UI and makes
round-tripping through the HTTP API cumbersome for all consumers.
2020-08-18 16:48:09 -04:00
Lang Martin 6d8165c410
command/agent/csi_endpoint: explicit allocations (#8669) 2020-08-13 15:48:08 -04:00
Tim Gross 7dca72acbe
csi: fix panic from assignment to nil map in plugin API (#8666) 2020-08-13 11:36:41 -04:00
Tim Gross 3faa138732
fix panic converting structs to API in CSI endpoint (#8659) 2020-08-12 15:59:10 -04:00
Nomad Release bot 1ea9d4eb22 Generate files for 0.12.2 release 2020-08-12 00:50:49 +00:00
Lang Martin 07ea822c6a
nomad debug renamed to nomad operator debug (#8602)
* renamed: command/debug.go -> command/operator_debug.go
* website: rename debug -> operator debug
* website/pages/api-docs/agent: name in api docs
2020-08-11 15:39:44 -04:00
Lang Martin c82b2a2454
CSI: volume and plugin allocations in the API (#8590)
* command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints
2020-08-11 12:24:41 -04:00
Tim Gross 443fdaa86b
csi: nomad volume detach command (#8584)
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.

The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
2020-08-11 10:18:54 -04:00
Seth Hoenig fd4804bf26 consul: able to set pass/fail thresholds on consul service checks
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
  https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical

Nomad doesn't do much besides pass the fields through to Consul.

Fixes #6913
2020-08-10 14:08:09 -05:00
Drew Bailey b296558b8e
oss compoments for multi-vault namespaces
adds in oss components to support enterprise multi-vault namespace feature

upgrade specific doc on vault multi-namespaces

vault docs

update test to reflect new error
2020-07-24 10:14:59 -04:00
James Rasell 95db43eaf0
Merge pull request #8491 from hashicorp/b-gh-8481
api: task groups in system jobs do not support scaling stanzas.
2020-07-24 14:20:26 +02:00
Nomad Release bot f2f50bf48e Generate files for 0.12.1 release 2020-07-23 13:17:59 +00:00
Lars Lehtonen e26ea30b7e
command/agent: fix dropped test error (#8504) 2020-07-22 15:06:35 -04:00
James Rasell 2da8bd8f58
agent: task groups in system jobs do not support scaling stanzas. 2020-07-22 11:10:59 +02:00
Mahmood Ali 72ac33e4e7 Refactor setupLoggers 2020-07-17 11:05:57 -04:00
Mahmood Ali ad2d484974 Set AgentShutdown 2020-07-17 11:04:57 -04:00
Chris Baker f8478b6f82 Merge branch 'master' of github.com:hashicorp/nomad into release-0.12.0 2020-07-08 21:16:31 +00:00
Nick Ethier 119ece09a0
docs: add CNI and host_network docs (#8391)
Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>
2020-07-08 15:45:04 -04:00
Nomad Release bot 549e766eab Generate files for 0.12.0-rc1 release 2020-07-07 03:17:05 +00:00
Nick Ethier e0fb634309
ar: support opting into binding host ports to default network IP (#8321)
* ar: support opting into binding host ports to default network IP

* fix config plumbing

* plumb node address into network resource

* struct: only handle network resource upgrade path once
2020-07-06 18:51:46 -04:00
Tim Gross 18250f71fd
fix region flag vs job region handling in plan/submit (#8347) 2020-07-06 15:46:09 -04:00
Chris Baker 9100b6b7c0 changes to make sure that Max is present and valid, to improve error messages
* made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected
* added checks to HCL parsing that it is present
* when Scaling.Max is absent/invalid, don't return extraneous error messages during validation
* tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs

resolves #8355
2020-07-04 19:05:50 +00:00
Mahmood Ali 329969b97e tests: make testagent shutdown idempotent
Avoid double freeing ports if an agent.Shutdown() is called multiple
times.
2020-07-03 09:16:01 -04:00
Lang Martin 6c22cd587d
api: nomad debug new /agent/host (#8325)
* command/agent/host: collect host data, multi platform

* nomad/structs/structs: new HostDataRequest/Response

* client/agent_endpoint: add RPC endpoint

* command/agent/agent_endpoint: add Host

* api/agent: add the Host endpoint

* nomad/client_agent_endpoint: add Agent Host with forwarding

* nomad/client_agent_endpoint: use findClientConn

This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.

* command/debug: call agent hosts

* command/agent/host: eliminate calling external programs
2020-07-02 09:51:25 -04:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Tim Gross e52f76ed53 update compiled static assets 2020-06-24 16:37:13 -04:00
Tim Gross a449009e9f
multiregion validation fixes (#8265)
Multi-region jobs need to bypass validating counts otherwise we get spurious
warnings in Job.Plan.
2020-06-24 12:18:51 -04:00
Seth Hoenig e79b79034d connect/native: fixup command/agent/consul/connect test cases 2020-06-24 09:05:56 -05:00
Seth Hoenig 6c5ab7f45e consul/connect: split connect native flag and task in service 2020-06-23 10:22:22 -05:00
Seth Hoenig 4d71f22a11 consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
2020-06-22 14:07:44 -05:00
Michael Schurter 562704124d
Merge pull request #8208 from hashicorp/f-multi-network
multi-interface network support
2020-06-19 15:46:48 -07:00
Mahmood Ali 963b1251ff
Merge pull request #8082 from hashicorp/f-raft-multipler
Implement raft multipler flag
2020-06-19 10:04:59 -04:00
Nick Ethier f0559a8162
multi-interface network support 2020-06-19 09:42:10 -04:00
Mahmood Ali 38a01c050e
Merge pull request #8192 from hashicorp/f-status-allnamespaces-2
CLI Allow querying all namespaces for jobs and allocations - Try 2
2020-06-18 20:16:52 -04:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Mahmood Ali e784fe331a use '*' to indicate all namespaces
This reverts the introduction of AllNamespaces parameter that was merged
earlier but never got released.
2020-06-17 16:27:43 -04:00
Tim Gross 7b12445f29 multiregion: change AutoRevert to OnFailure 2020-06-17 11:05:45 -04:00
Tim Gross b09b7a2475 Multiregion job registration
Integration points for multiregion jobs to be registered in the enterprise
version of Nomad:
* hook in `Job.Register` for enterprise to send job to peer regions
* remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs
2020-06-17 11:04:58 -04:00
Tim Gross 161bcd9479 use constants from http package 2020-06-17 11:04:02 -04:00
Tim Gross b93efc16d5 multiregion CLI: nomad deployment unblock 2020-06-17 11:03:44 -04:00
Drew Bailey 9263fcb0d3 Multiregion deploy status and job status CLI 2020-06-17 11:03:34 -04:00
Tim Gross 6851024925 Multiregion structs
Initial struct definitions, jobspec parsing, validation, and conversion
between Nomad structs and API structs for multi-region deployments.
2020-06-17 11:00:14 -04:00
Chris Baker 1e3563e08c wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register 2020-06-16 18:45:17 +00:00
Mahmood Ali 69bb42acf8 tests: prefix agent logs to identify agent sources 2020-06-07 16:38:11 -04:00
Mahmood Ali 9eb13ae144 basic snapshot restore 2020-06-07 15:46:23 -04:00
Mahmood Ali de44d9641b
Merge pull request #8047 from hashicorp/f-snapshot-save
API for atomic snapshot backups
2020-06-01 07:55:16 -04:00
Mahmood Ali a73cd01a00
Merge pull request #8001 from hashicorp/f-jobs-list-across-nses
endpoint to expose all jobs across all namespaces
2020-05-31 21:28:03 -04:00
Mahmood Ali 0e8fafd739 implement raft multiplier 2020-05-31 12:24:27 -04:00
Drew Bailey 23d24c7a7f
removes pro tags (#8014) 2020-05-28 15:40:17 -04:00
Drew Bailey 34871f89be
Oss license support for ent builds (#8054)
* changes necessary to support oss licesning shims

revert nomad fmt changes

update test to work with enterprise changes

update tests to work with new ent enforcements

make check

update cas test to use scheduler algorithm

back out preemption changes

add comments

* remove unused method
2020-05-27 13:46:52 -04:00
Mahmood Ali 2108681c1d Endpoint for snapshotting server state 2020-05-21 20:04:38 -04:00
James Rasell ae0fb98c6b
api: return custom error if API attempts to decode empty body. 2020-05-19 15:46:31 +02:00
Mahmood Ali 5ab2d52e27 endpoint to expose all jobs across all namespaces
Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all
namespaces.  The returned list is to contain a `Namespace` field
indicating the job namespace.

If ACL is enabled, the request token needs to be a management token or
have `namespace:list-jobs` capability on all existing namespaces.
2020-05-18 13:50:46 -04:00
Nomad Release bot 189a378549 Generate files for 0.11.2 release 2020-05-14 20:49:42 +00:00
Mahmood Ali 9366181be6 always check default_scheduler_config config
Also, avoid early return on validation to avoid masking some validation
bugs in dev setup.
2020-05-14 14:16:12 -04:00
Lang Martin d3c4700cd3
server: stop after client disconnect (#7939)
* jobspec, api: add stop_after_client_disconnect

* nomad/state/state_store: error message typo

* structs: alloc methods to support stop_after_client_disconnect

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math

* scheduler/reconcile_util: delayByStopAfterClientDisconnect

* scheduler/reconcile: use delayByStopAfterClientDisconnect

* scheduler/util: updateNonTerminalAllocsToLost comments

This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now

* scheduler/testing: AssertEvalStatus just use require

* scheduler/generic_sched: don't create a blocked eval if delayed

* scheduler/generic_sched_test: several scheduling cases
2020-05-13 16:39:04 -04:00
Tim Gross 4374c1a837
csi: support Secrets parameter in CSI RPCs (#7923)
CSI plugins can require credentials for some publishing and
unpublishing workflow RPCs. Secrets are configured at the time of
volume registration, stored in the volume struct, and then passed
around as an opaque map by Nomad to the plugins.
2020-05-11 17:12:51 -04:00
Mahmood Ali 061a439f2c
Merge pull request #7912 from hashicorp/f-scheduler-algorithm-followup
Scheduler Algorithm Defaults handling and docs
2020-05-11 09:30:58 -04:00
Tim Gross 3aa761b151
Periodic GC for volume claims (#7881)
This changeset implements a periodic garbage collection of CSI volumes
with missing allocations. This can happen in a scenario where a node
update fails partially and the allocation updates are written to raft
but the evaluations to GC the volumes are dropped. This feature will
cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1
get any stray claims cleaned up.
2020-05-11 08:20:50 -04:00
Mahmood Ali 2c963885b0 handle upgrade path and defaults
Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on
upgrades or on API handling when user misses the value.

The scheduler already treats `""` value as binpack.  This PR merely
ensures that the operator API returns the effective value.
2020-05-09 12:34:08 -04:00
Tim Gross 801ebcfe8d
periodic GC for CSI plugins (#7878)
This changeset implements a periodic garbage collection of unused CSI
plugins. Plugins are self-cleaning when the last allocation for a
plugin is stopped, but this feature will cover any missing edge cases
and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins
cleaned up.
2020-05-06 16:49:12 -04:00
Mahmood Ali b9e3cde865 tests and some clean up 2020-05-01 13:13:30 -04:00
Charlie Voiselle 663fb677cf Add SchedulerAlgorithm to SchedulerConfig 2020-05-01 13:13:29 -04:00
Drew Bailey 42075ef30e
allow test to check if server is enterprise 2020-04-30 14:46:21 -04:00
Drew Bailey 59b76f90e8
hcl fmt from editor
license cli formatting, license endpoints ent only

test oss error

type assertions
2020-04-30 14:46:18 -04:00
Mahmood Ali b8fb32f5d2 http: adjust log level for request failure
Failed requests due to API client errors are to be marked as DEBUG.

The Error log level should be reserved to signal problems with the
cluster and are actionable for nomad system operators.  Logs due to
misbehaving API clients don't represent a system level problem and seem
spurius to nomad maintainers at best.  These log messages can also be
attack vectors for deniel of service attacks by filling servers disk
space with spurious log messages.
2020-04-22 16:19:59 -04:00
Mahmood Ali 5b42796f1e
Merge pull request #7704 from hashicorp/b-agent-shutdown-order
agent: shutdown agent http server last
2020-04-20 10:37:26 -04:00
Mahmood Ali 4e1366f285 agent: route http logs through hclog
Pipe http server log to hclog, so that it uses the same logging format
as rest of nomad logs.  Also, supports emitting them as json logs, when
json formatting is set.

The http server logs are emitted as Trace level, as they are typically
repsent HTTP client errors (e.g. failed tls handshakes, invalid headers,
etc).

Though, Panic logs represent server errors and are relayed as Error
level.
2020-04-20 10:33:40 -04:00
Mahmood Ali b78680eee7 agent: shutdown agent http server last
Shutdown http server last, after nomad client/server components
terminate.

Before this change, if the agent is taking an unexpectedly long time to
shutdown, the operator cannot query the http server directly: they
cannot access agent specific http endpoints and need to query another
agent about the troublesome agent.

Unexpectedly long shutdown can happen in normal cases, e.g. a client
might hung is if one of the allocs it is running has a long
shutdown_delay.

Here, we switch to ensuring that the http server is shutdown last.

I believe this doesn't require extra care in agent shutting down logic
while operators may be able to submit write http requests.  We already
need to cope with operators submiting these http requests to another
agent or by servers updating the client allocations.
2020-04-13 10:50:07 -04:00
Mahmood Ali 14d6fec05a tests: deflake some SetServer related tests
Some tests assert on numbers on numbers of servers, e.g.
TestHTTP_AgentSetServers and TestHTTP_AgentListServers_ACL . Though, in dev and
test modes, the agent starts with servers having duplicate entries for
advertised and normalized RPC values, then settles with one unique value after
Raft/Serf re-sets servers with one single unique value.

This leads to flakiness, as the test will fail if assertion runs before Serf
update takes effect.

Here, we update the inital dev handling so it only adds a unique value if the
advertised and normalized values are the same.

Sample log lines illustrating the problem:

```
=== CONT  TestHTTP_AgentSetServers
    TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:9008 Address:127.0.0.1:9008}]"
    TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO]  nomad: serf: EventMemberJoin: TestHTTP_AgentSetServers.global 127.0.0.1
    TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.035Z [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:9008, 127.0.0.1:9008] old_servers=[]
...
    TestHTTP_AgentSetServers: agent_endpoint_test.go:759:
                Error Trace:    agent_endpoint_test.go:759
                                                        http_test.go:1089
                                                        agent_endpoint_test.go:705
                Error:          "[127.0.0.1:9008 127.0.0.1:9008]" should have 1 item(s), but has 2
                Test:           TestHTTP_AgentSetServers
```
2020-04-07 09:27:48 -04:00
Mahmood Ali ed4c4d13a4 fixup! backend: support WS authentication handshake in alloc/exec 2020-04-03 14:20:31 -04:00
Mahmood Ali e63e096136 backend: support WS authentication handshake in alloc/exec
The javascript Websocket API doesn't support setting custom headers
(e.g. `X-Nomad-Token`).  This change adds support for having an
authentication handshake message: clients can set `ws_handshake` URL
query parameter to true and send a single handshake message with auth
token first before any other mssage.

This is a backward compatible change: it does not affect nomad CLI path, as it
doesn't set `ws_handshake` parameter.
2020-04-03 11:18:54 -04:00
Mahmood Ali 990cfb6fef agent config parsing tests for scheduler config 2020-04-03 07:54:32 -04:00
Chris Baker 277d29c6e7
Merge pull request #7572 from hashicorp/f-7422-scaling-events
finalizing scaling API work
2020-04-01 13:49:22 -05:00
Seth Hoenig 9aa9721143 connect: fix bug where absent connect.proxy stanza needs default config
In some refactoring, a bug was introduced where if the connect.proxy
stanza in a submitted job was nil, the default proxy configuration
would not be initialized with default values, effectively breaking
Connect.

      connect {
        sidecar_service {} # should work
      }

In contrast, by setting an empty proxy stanza, the config values would
be inserted correctly.

      connect {
        sidecar_service {
	  proxy {} # workaround
	}
      }

This commit restores the original behavior, where having a proxy
stanza present is not required.

The unit test for this case has also been corrected.
2020-04-01 11:19:32 -06:00
Chris Baker 40d6b3bbd1 adding raft and state_store support to track job scaling events
updated ScalingEvent API to record "message string,error bool" instead
of confusing "reason,error *string"
2020-04-01 16:15:14 +00:00
Seth Hoenig 14c7cebdea connect: enable automatic expose paths for individual group service checks
Part of #6120

Building on the support for enabling connect proxy paths in #7323, this change
adds the ability to configure the 'service.check.expose' flag on group-level
service check definitions for services that are connect-enabled. This is a slight
deviation from the "magic" that Consul provides. With Consul, the 'expose' flag
exists on the connect.proxy stanza, which will then auto-generate expose paths
for every HTTP and gRPC service check associated with that connect-enabled
service.

A first attempt at providing similar magic for Nomad's Consul Connect integration
followed that pattern exactly, as seen in #7396. However, on reviewing the PR
we realized having the `expose` flag on the proxy stanza inseperably ties together
the automatic path generation with every HTTP/gRPC defined on the service. This
makes sense in Consul's context, because a service definition is reasonably
associated with a single "task". With Nomad's group level service definitions
however, there is a reasonable expectation that a service definition is more
abstractly representative of multiple services within the task group. In this
case, one would want to define checks of that service which concretely make HTTP
or gRPC requests to different underlying tasks. Such a model is not possible
with the course `proxy.expose` flag.

Instead, we now have the flag made available within the check definitions themselves.
By making the expose feature resolute to each check, it is possible to have
some HTTP/gRPC checks which make use of the envoy exposed paths, as well as
some HTTP/gRPC checks which make use of some orthongonal port-mapping to do
checks on some other task (or even some other bound port of the same task)
within the task group.

Given this example,

group "server-group" {
  network {
    mode = "bridge"
    port "forchecks" {
      to = -1
    }
  }

  service {
    name = "myserver"
    port = 2000

    connect {
      sidecar_service {
      }
    }

    check {
      name     = "mycheck-myserver"
      type     = "http"
      port     = "forchecks"
      interval = "3s"
      timeout  = "2s"
      method   = "GET"
      path     = "/classic/responder/health"
      expose   = true
    }
  }
}

Nomad will automatically inject (via job endpoint mutator) the
extrapolated expose path configuration, i.e.

expose {
  path {
    path            = "/classic/responder/health"
    protocol        = "http"
    local_path_port = 2000
    listener_port   = "forchecks"
  }
}

Documentation is coming in #7440 (needs updating, doing next)

Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6
which will make the examples in the documentation actually runnable.

Will add some e2e tests based on the above when it becomes available.
2020-03-31 17:15:50 -06:00
Seth Hoenig 41244c5857 jobspec: parse multi expose.path instead of explicit slice 2020-03-31 17:15:27 -06:00
Seth Hoenig 0266f056b8 connect: enable proxy.passthrough configuration
Enable configuration of HTTP and gRPC endpoints which should be exposed by
the Connect sidecar proxy. This changeset is the first "non-magical" pass
that lays the groundwork for enabling Consul service checks for tasks
running in a network namespace because they are Connect-enabled. The changes
here provide for full configuration of the

  connect {
    sidecar_service {
      proxy {
        expose {
          paths = [{
		path = <exposed endpoint>
                protocol = <http or grpc>
                local_path_port = <local endpoint port>
                listener_port = <inbound mesh port>
	  }, ... ]
       }
    }
  }

stanza. Everything from `expose` and below is new, and partially implements
the precedent set by Consul:
  https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference

Combined with a task-group level network port-mapping in the form:

  port "exposeExample" { to = -1 }

it is now possible to "punch a hole" through the network namespace
to a specific HTTP or gRPC path, with the anticipated use case of creating
Consul checks on Connect enabled services.

A future PR may introduce more automagic behavior, where we can do things like

1) auto-fill the 'expose.path.local_path_port' with the default value of the
   'service.port' value for task-group level connect-enabled services.

2) automatically generate a port-mapping

3) enable an 'expose.checks' flag which automatically creates exposed endpoints
   for every compatible consul service check (http/grpc checks on connect
   enabled services).
2020-03-31 17:15:27 -06:00
Seth Hoenig 1ce4eb17fa client: use consistent name for struct receiver parameter
This helps reduce the number of squiggly lines in Goland.
2020-03-31 17:15:27 -06:00
Yoan Blanc 225c9c1215 fixup! vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:48:07 -04:00
Yoan Blanc 761d014071 vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:45:21 -04:00
Seth Hoenig b3664c628c
Merge pull request #7524 from hashicorp/docs-consul-acl-minimums
consul: annotate Consul interfaces with ACLs
2020-03-30 13:27:27 -06:00
Mahmood Ali 7df337e4c4
Merge pull request #7534 from hashicorp/b-windows-dev-network
windows: support -dev mode
2020-03-30 14:35:28 -04:00
Seth Hoenig 0a812ab689 consul: annotate Consul interfaces with ACLs 2020-03-30 10:17:28 -06:00
Drew Bailey a98dc8c768
update audit examples to an endpoint that is audited 2020-03-30 10:03:11 -04:00
Mahmood Ali dedf1cd3d7 tests: remove TestHTTP_NodeDrain_Compat
Nomad 0.11 servers no longer support having pre-0.8 clients.
2020-03-30 07:06:52 -04:00
Mahmood Ali 8b2b3f99d3 tests: deflake TestHTTP_NodeDrain
A node may be recognized as not running any allocs and have its drain
flag reset before the test queries it.
2020-03-30 07:06:52 -04:00
Mahmood Ali b0cc23ae63 tests: deflake TestConsul_PeriodicSync 2020-03-30 07:06:47 -04:00
Mahmood Ali ec6afa5795 windows: support -dev mode
Support running `nomad agent -dev` in Windows, by setting proper network
interface.

Prior to this change, `nomad` uses `lo` interface but Windows uses
"Loopback Pseudo-Interface 1" to refer to loopback device interface:
https://github.com/golang/go/blob/go1.14.1/src/net/net_windows_test.go#L304-L318
.
2020-03-28 12:01:51 -04:00
Drew Bailey a66b4be0f3
remove auditing for /ui/ 2020-03-27 10:12:42 -04:00
Drew Bailey de687edb2e
wrap http.Handlers
better comments
2020-03-27 09:35:10 -04:00
Drew Bailey b96a4da6fc
sync changes made to oss files from ent 2020-03-25 10:57:44 -04:00
Drew Bailey 218bfff6dd
add in change missed from ent 2020-03-25 10:53:38 -04:00
Drew Bailey 97cc19276d
add auditor 2020-03-25 10:48:23 -04:00
Drew Bailey 7329a88758
allow all build contexts to use noOpAuditor 2020-03-25 10:38:40 -04:00
Mahmood Ali 1c1186b344
Merge pull request #7487 from hashicorp/b-xss-oss
agent: prevent XSS by controlling Content-Type
2020-03-25 09:56:11 -04:00
Michael Schurter 29622013fa remove double negative from comment
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-03-25 09:45:43 -04:00
Michael Schurter 1a27b8a07d test: assert monitor endpoint sets proper headers 2020-03-25 09:45:43 -04:00
Michael Schurter d6d44a8214 test: assert fs endpoints are xss safe 2020-03-25 09:45:43 -04:00
Michael Schurter 5ff458e840 agent: prevent XSS by controlling Content-Type 2020-03-25 09:45:43 -04:00
Mahmood Ali c7cf60c837 tests: test agent to use a noop auditor 2020-03-25 08:45:44 -04:00
Mahmood Ali ceed57b48f per-task restart policy 2020-03-24 17:00:41 -04:00
Lang Martin 8bd0405f33
csi: return an empty result list from plugins & volumes without type, not an error (#7471) 2020-03-24 14:28:28 -04:00
Chris Baker bc13bfb433 bad conversion between api.ScalingPolicy and structs.ScalingPolicy meant
that we were throwing away .Min if provided
2020-03-24 14:39:06 +00:00
Chris Baker f6ec5f9624 made count optional during job scaling actions
added ACL protection in Job.Scale
in Job.Scale, only perform a Job.Register if the Count was non-nil
2020-03-24 14:39:05 +00:00
Chris Baker 233db5258a changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults.
- tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent)
- Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max

modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and
2020-03-24 13:57:17 +00:00
Chris Baker 00092a6c29 fixed http endpoints for job.register and job.scalestatus 2020-03-24 13:57:16 +00:00
Chris Baker 925b59e1d2 wip: scaling status return, almost done 2020-03-24 13:57:15 +00:00
Chris Baker 42270d862c wip: some tests still failing
updating job scaling endpoints to match RFC, cleaning up the API object as well
2020-03-24 13:57:14 +00:00
Chris Baker abc7a52f56 finished refactoring state store, schema, etc 2020-03-24 13:57:14 +00:00
Chris Baker 3d54f1feba wip: added Enabled to ScalingPolicyListStub, removed JobID from body of scaling request 2020-03-24 13:57:12 +00:00
Chris Baker 024d203267 wip: added tests for client methods around group scaling 2020-03-24 13:57:11 +00:00
Chris Baker 1c5c2eb71b wip: add GET endpoint for job group scaling target 2020-03-24 13:57:10 +00:00
Chris Baker 179ab68258 wip: added job.scale rpc endpoint, needs explicit test (tested via http now) 2020-03-24 13:57:09 +00:00
Chris Baker 8453e667c2 wip: working on job group scaling endpoint 2020-03-24 13:55:20 +00:00
Chris Baker 6665d0bfb0 wip: added policy get endpoint, added UUID to policy 2020-03-24 13:55:20 +00:00
Chris Baker 9c2560ceeb wip: upsert/delete scaling policies on job upsert/delete 2020-03-24 13:55:18 +00:00
Chris Baker 65d92f1fbf WIP: adding ScalingPolicy to api/structs and state store 2020-03-24 13:55:18 +00:00
Drew Bailey 10f3b6899b
rename struct field to auditor 2020-03-23 20:09:01 -04:00
Drew Bailey cf5fcf3748
make auditor interface more explicit 2020-03-23 19:32:58 -04:00
Drew Bailey d0d32d8f06
fix compilation with correct func 2020-03-23 14:32:11 -04:00
Tim Gross 076fbbf08f
Merge pull request #7012 from hashicorp/f-csi-volumes
Container Storage Interface Support
2020-03-23 14:19:46 -04:00
Lang Martin e100444740 csi: add mount_options to volumes and volume requests (#7398)
Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host.

Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context.

closes #7007

* nomad/structs/volumes: add MountOptions to volume request

* jobspec/test-fixtures/basic.hcl: add mount_options to volume block

* jobspec/parse_test: add expected MountOptions

* api/tasks: add mount_options

* jobspec/parse_group: use hcl decode not mapstructure, mount_options

* client/allocrunner/csi_hook: pass MountOptions through

client/allocrunner/csi_hook: add a VolumeMountOptions

client/allocrunner/csi_hook: drop Options

client/allocrunner/csi_hook: use the structs options

* client/pluginmanager/csimanager/interface: UsageOptions.MountOptions

* client/pluginmanager/csimanager/volume: pass MountOptions in capabilities

* plugins/csi/plugin: remove todo 7007 comment

* nomad/structs/csi: MountOptions

* api/csi: add options to the api for parsing, match structs

* plugins/csi/plugin: move VolumeMountOptions to structs

* api/csi: use specific type for mount_options

* client/allocrunner/csi_hook: merge MountOptions here

* rename CSIOptions to CSIMountOptions

* client/allocrunner/csi_hook

* client/pluginmanager/csimanager/volume

* nomad/structs/csi

* plugins/csi/fake/client: add PrevVolumeCapability

* plugins/csi/plugin

* client/pluginmanager/csimanager/volume_test: remove debugging

* client/pluginmanager/csimanager/volume: fix odd merging logic

* api: rename CSIOptions -> CSIMountOptions

* nomad/csi_endpoint: remove a 7007 comment

* command/alloc_status: show mount options in the volume list

* nomad/structs/csi: include MountOptions in the volume stub

* api/csi: add MountOptions to stub

* command/volume_status_csi: clean up csiVolMountOption, add it

* command/alloc_status: csiVolMountOption lives in volume_csi_status

* command/node_status: display mount flags

* nomad/structs/volumes: npe

* plugins/csi/plugin: npe in ToCSIRepresentation

* jobspec/parse_test: expand volume parse test cases

* command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions

* command/volume_status_csi: copy paste error

* jobspec/test-fixtures/basic: hclfmt

* command/volume_status_csi: clean up csiVolMountOption
2020-03-23 13:59:25 -04:00
Lang Martin 99841222ed csi: change the API paths to match CLI command layout (#7325)
* command/agent/csi_endpoint: support type filter in volumes & plugins

* command/agent/http: use /v1/volume/csi & /v1/plugin/csi

* api/csi: use /v1/volume/csi & /v1/plugin/csi

* api/nodes: use /v1/volume/csi & /v1/plugin/csi

* api/nodes: not /volumes/csi, just /volumes

* command/agent/csi_endpoint: fix ot parameter parsing
2020-03-23 13:58:30 -04:00
Lang Martin 80619137ab csi: volumes listed in nomad node status (#7318)
* api/allocations: GetTaskGroup finds the taskgroup struct

* command/node_status: display CSI volume names

* nomad/state/state_store: new CSIVolumesByNodeID

* nomad/state/iterator: new SliceIterator type implements memdb.ResultIterator

* nomad/csi_endpoint: deal with a slice of volumes

* nomad/state/state_store: CSIVolumesByNodeID return a SliceIterator

* nomad/structs/csi: CSIVolumeListRequest takes a NodeID

* nomad/csi_endpoint: use the return iterator

* command/agent/csi_endpoint: parse query params for CSIVolumes.List

* api/nodes: new CSIVolumes to list volumes by node

* command/node_status: use the new list endpoint to print volumes

* nomad/state/state_store: error messages consider the operator

* command/node_status: include the Provider
2020-03-23 13:58:30 -04:00
Lang Martin 887e1f28c9 csi: CLI for volume status, registration/deregistration and plugin status (#7193)
* command/csi: csi, csi_plugin, csi_volume

* helper/funcs: move ExtraKeys from parse_config to UnusedKeys

* command/agent/config_parse: use helper.UnusedKeys

* api/csi: annotate CSIVolumes with hcl fields

* command/csi_plugin: add Synopsis

* command/csi_volume_register: use hcl.Decode style parsing

* command/csi_volume_list

* command/csi_volume_status: list format, cleanup

* command/csi_plugin_list

* command/csi_plugin_status

* command/csi_volume_deregister

* command/csi_volume: add Synopsis

* api/contexts/contexts: add csi search contexts to the constants

* command/commands: register csi commands

* api/csi: fix struct tag for linter

* command/csi_plugin_list: unused struct vars

* command/csi_plugin_status: unused struct vars

* command/csi_volume_list: unused struct vars

* api/csi: add allocs to CSIPlugin

* command/csi_plugin_status: format the allocs

* api/allocations: copy Allocation.Stub in from structs

* nomad/client_rpc: add some error context with Errorf

* api/csi: collapse read & write alloc maps to a stub list

* command/csi_volume_status: cleanup allocation display

* command/csi_volume_list: use Schedulable instead of Healthy

* command/csi_volume_status: use Schedulable instead of Healthy

* command/csi_volume_list: sprintf string

* command/csi: delete csi.go, csi_plugin.go

* command/plugin: refactor csi components to sub-command plugin status

* command/plugin: remove csi

* command/plugin_status: remove csi

* command/volume: remove csi

* command/volume_status: split out csi specific

* helper/funcs: add RemoveEqualFold

* command/agent/config_parse: use helper.RemoveEqualFold

* api/csi: do ,unusedKeys right

* command/volume: refactor csi components to `nomad volume`

* command/volume_register: split out csi specific

* command/commands: use the new top level commands

* command/volume_deregister: hardwired type csi for now

* command/volume_status: csiFormatVolumes rescued from volume_list

* command/plugin_status: avoid a panic on no args

* command/volume_status: avoid a panic on no args

* command/plugin_status: predictVolumeType

* command/volume_status: predictVolumeType

* nomad/csi_endpoint_test: move CreateTestPlugin to testing

* command/plugin_status_test: use CreateTestCSIPlugin

* nomad/structs/structs: add CSIPlugins and CSIVolumes search consts

* nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix

* nomad/search_endpoint: add CSIPlugins and CSIVolumes

* command/plugin_status: move the header to the csi specific

* command/volume_status: move the header to the csi specific

* nomad/state/state_store: CSIPluginByID prefix

* command/status: rename the search context to just Plugins/Volumes

* command/plugin,volume_status: test return ids now

* command/status: rename the search context to just Plugins/Volumes

* command/plugin_status: support -json and -t

* command/volume_status: support -json and -t

* command/plugin_status_csi: comments

* command/*_status: clean up text

* api/csi: fix stale comments

* command/volume: make deregister sound less fearsome

* command/plugin_status: set the id length

* command/plugin_status_csi: more compact plugin health

* command/volume: better error message, comment
2020-03-23 13:58:30 -04:00
Danielle Lancashire 15c6c05ccf api: Parse CSI Volumes
Previously when deserializing volumes we skipped over volumes that were
not of type `host`. This commit ensures that we parse both host and csi
volumes correctly.
2020-03-23 13:58:30 -04:00
Lang Martin 88316208a0 csi: server-side plugin state tracking and api (#6966)
* structs: CSIPlugin indexes jobs acting as plugins and node updates

* schema: csi_plugins table for CSIPlugin

* nomad: csi_endpoint use vol.Denormalize, plugin requests

* nomad: csi_volume_endpoint: rename to csi_endpoint

* agent: add CSI plugin endpoints

* state_store_test: use generated ids to avoid t.Parallel conflicts

* contributing: add note about registering new RPC structs

* command: agent http register plugin lists

* api: CSI plugin queries, ControllerHealthy -> ControllersHealthy

* state_store: copy on write for volumes and plugins

* structs: copy on write for volumes and plugins

* state_store: CSIVolumeByID returns an unhealthy volume, denormalize

* nomad: csi_endpoint use CSIVolumeDenormalizePlugins

* structs: remove struct errors for missing objects

* nomad: csi_endpoint return nil for missing objects, not errors

* api: return meta from Register to avoid EOF error

* state_store: CSIVolumeDenormalize keep allocs in their own maps

* state_store: CSIVolumeDeregister error on missing volume

* state_store: CSIVolumeRegister set indexes

* nomad: csi_endpoint use CSIVolumeDenormalizePlugins tests
2020-03-23 13:58:29 -04:00
Lang Martin 2f646fa5e9 agent: csi endpoint 2020-03-23 13:58:29 -04:00
Danielle Lancashire 426c26d7c0 CSI Plugin Registration (#6555)
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:

* A `csi_plugin` stanza as part of a Nomad task configuration, to
  allow a task to expose that it is a plugin.

* A new task runner hook: `csi_plugin_supervisor`. This hook does two
  things. When the `csi_plugin` stanza is detected, it will
  automatically configure the plugin task to receive bidirectional
  mounts to the CSI intermediary directory. At runtime, it will then
  perform an initial heartbeat of the plugin and handle submitting it to
  the new `dynamicplugins.Registry` for further use by the client, and
  then run a lightweight heartbeat loop that will emit task events
  when health changes.

* The `dynamicplugins.Registry` for handling plugins that run
  as Nomad tasks, in contrast to the existing catalog that requires
  `go-plugin` type plugins and to know the plugin configuration in
  advance.

* The `csimanager` which fingerprints CSI plugins, in a similar way to
  `drivermanager` and `devicemanager`. It currently only fingerprints
  the NodeID from the plugin, and assumes that all plugins are
  monolithic.

Missing features

* We do not use the live updates of the `dynamicplugin` registry in
  the `csimanager` yet.

* We do not deregister the plugins from the client when they shutdown
  yet, they just become indefinitely marked as unhealthy. This is
  deliberate until we figure out how we should manage deploying new
  versions of plugins/transitioning them.
2020-03-23 13:58:28 -04:00
Drew Bailey b09abef332
Audit config, seams for enterprise audit features
allow oss to parse sink duration

clean up audit sink parsing

ent eventer config reload

fix typo

SetEnabled to eventer interface

client acl test

rm dead code

fix failing test
2020-03-23 13:47:42 -04:00
Jasmine Dahilig 73a64e4397 change jobspec lifecycle stanza to use sidecar attribute instead of
block_until status
2020-03-21 17:52:57 -04:00
Jasmine Dahilig 1485b342e2 remove deadline code for now 2020-03-21 17:52:56 -04:00
Jasmine Dahilig 7b3f3497ed mock task hook coordinator in consul integration test 2020-03-21 17:52:55 -04:00
Jasmine Dahilig fc13fa9739 change TaskLifecycle RunLevel to Hook and add Deadline time duration 2020-03-21 17:52:37 -04:00
Mahmood Ali 4ebeac721a update structs with lifecycle 2020-03-21 17:52:36 -04:00
James Rasell e3d14cc634
cli: fix indentation issue with -dev-connect agent help output. 2020-03-18 12:25:20 +01:00
Michael Schurter b72b3e765c
Merge pull request #7170 from fredrikhgrelland/consul_template_upgrade
Update consul-template to v0.24.1 and remove deprecated vault grace
2020-03-10 14:15:47 -07:00
Mahmood Ali 19f25f588f
Merge pull request #7252 from hashicorp/b-test-cluster-forming
Simplify Bootstrap logic in tests
2020-03-03 16:56:08 -05:00
Mahmood Ali acbfeb5815 Simplify Bootstrap logic in tests
This change updates tests to honor `BootstrapExpect` exclusively when
forming test clusters and removes test only knobs, e.g.
`config.DevDisableBootstrap`.

Background:

Test cluster creation is fragile.  Test servers don't follow the
BootstapExpected route like production clusters.  Instead they start as
single node clusters and then get rejoin and may risk causing brain
split or other test flakiness.

The test framework expose few knobs to control those (e.g.
`config.DevDisableBootstrap` and `config.Bootstrap`) that control
whether a server should bootstrap the cluster.  These flags are
confusing and it's unclear when to use: their usage in multi-node
cluster isn't properly documented.  Furthermore, they have some bad
side-effects as they don't control Raft library: If
`config.DevDisableBootstrap` is true, the test server may not
immediately attempt to bootstrap a cluster, but after an election
timeout (~50ms), Raft may force a leadership election and win it (with
only one vote) and cause a split brain.

The knobs are also confusing as Bootstrap is an overloaded term.  In
BootstrapExpect, we refer to bootstrapping the cluster only after N
servers are connected.  But in tests and the knobs above, it refers to
whether the server is a single node cluster and shouldn't wait for any
other server.

Changes:

This commit makes two changes:

First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or
`DevMode` flags.  This change is relatively trivial.

Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped.
This allows us to keep `BootstrapExpected` immutable.  Previously, the
flag was a config value but it gets set to 0 after cluster bootstrap
completes.
2020-03-02 13:47:43 -05:00
Mahmood Ali 386f20099b Honor CNI and bridge related fields
Nomad agent may silently ignore cni_path and bridge setting, when it
merges configs from multiple files (or against default/dev config).

This PR ensures that the values are merged properly.
2020-02-28 14:23:13 -05:00
Mahmood Ali 437d03779c tests: add tests for parsing cni fields 2020-02-28 14:18:45 -05:00
Fredrik Hoem Grelland edb3bd0f3f Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170) 2020-02-23 16:24:53 +01:00
Seth Hoenig 0f99cdd0d9
Merge pull request #7192 from hashicorp/b-connect-stanza-ignore
consul/connect: in-place update sidecar service registrations on changes
2020-02-21 09:24:53 -06:00
Seth Hoenig 54b5173eca consul/connect: in-place update sidecar service registrations on changes
Fix a bug where consul service definitions would not be updated if changes
were made to the service in the Nomad job. Currently this only fixes the
bug for cases where the fix is a matter of updating consul agent's service
registration. There is related bug where destructive changes are required
(see #6877) which will be fixed in another PR.

The enable_tag_override configuration setting for the parent service is
applied to the sidecar service.

Fixes #6459
2020-02-19 13:07:04 -06:00
Mahmood Ali f4d8e1296f
Merge pull request #7171 from hashicorp/update-autopilot-20200214
Update consul vendor and add MinQuorum flag
2020-02-19 10:45:20 -06:00
Drew Bailey 3c0719274c
inlude pro in http_oss.go 2020-02-18 10:29:28 -05:00
Mahmood Ali 98ad59b1de update rest of consul packages 2020-02-16 16:25:04 -06:00
Mahmood Ali f492ab6d9e implement MinQuorum 2020-02-16 16:04:59 -06:00
Mahmood Ali fd51982018 tests: Avoid StartAsLeader raft config flag
It's being deprecated
2020-02-13 18:56:53 -05:00
Seth Hoenig 543354aabe
Merge pull request #7106 from hashicorp/f-ctag-override
client: enable configuring enable_tag_override for services
2020-02-13 12:34:48 -06:00
Seth Hoenig 0e44094d1a client: enable configuring enable_tag_override for services
Consul provides a feature of Service Definitions where the tags
associated with a service can be modified through the Catalog API,
overriding the value(s) configured in the agent's service configuration.

To enable this feature, the flag enable_tag_override must be configured
in the service definition.

Previously, Nomad did not allow configuring this flag, and thus the default
value of false was used. Now, it is configurable.

Because Nomad itself acts as a state machine around the the service definitions
of the tasks it manages, it's worth describing what happens when this feature
is enabled and why.

Consider the basic case where there is no Nomad, and your service is provided
to consul as a boring JSON file. The ultimate source of truth for the definition
of that service is the file, and is stored in the agent. Later, Consul performs
"anti-entropy" which synchronizes the Catalog (stored only the leaders). Then
with enable_tag_override=true, the tags field is available for "external"
modification through the Catalog API (rather than directly configuring the
service definition file, or using the Agent API). The important observation
is that if the service definition ever changes (i.e. the file is changed &
config reloaded OR the Agent API is used to modify the service), those
"external" tag values are thrown away, and the new service definition is
once again the source of truth.

In the Nomad case, Nomad itself is the source of truth over the Agent in
the same way the JSON file was the source of truth in the example above.
That means any time Nomad sets a new service definition, any externally
configured tags are going to be replaced. When does this happen? Only on
major lifecycle events, for example when a task is modified because of an
updated job spec from the 'nomad job run <existing>' command. Otherwise,
Nomad's periodic re-sync's with Consul will now no longer try to restore
the externally modified tag values (as long as enable_tag_override=true).

Fixes #2057
2020-02-10 08:00:55 -06:00
Michael Schurter 65d38d9255 test: fix flaky TestHTTP_FreshClientAllocMetrics 2020-02-07 15:50:53 -08:00
Michael Schurter 9d3093fa31 test: fix missing agent shutdowns 2020-02-07 15:50:53 -08:00
Michael Schurter d96ceee8c5 testagent: fix case where agent would retry forever 2020-02-07 15:50:53 -08:00
Michael Schurter e903501e65 test: improve error messages when failing 2020-02-07 15:50:53 -08:00
Michael Schurter 63032917fc test: allow goroutine to exit even if test blocks 2020-02-07 15:50:53 -08:00
Michael Schurter 9905dec6a3 test: workaround limits race 2020-02-07 15:50:53 -08:00
Michael Schurter 19a1932bbb test: wait longer than timeout
The 1s timeout raced with the 1s deadline it was trying to detect.
2020-02-07 15:50:53 -08:00
Michael Schurter fd81208db7 test: fix flaky health test
Test set Agent.client=nil which prevented the client from being
shutdown. This leaked goroutines and could cause panics due to the
leaked client goroutines logging after their parent test had finished.

Removed ACLs from the server test because I couldn't get it to work with
the test agent, and it tested very little.
2020-02-07 15:50:53 -08:00
Michael Schurter 2896f78f77 client: fix race accessing Node.status
* Call Node.Canonicalize once when Node is created.
 * Lock when accessing fields mutated by node update goroutine
2020-02-07 15:50:47 -08:00
Drew Bailey d830998572
agent Profile req nil check s.agent.Server()
clean up logic and tests
2020-02-03 13:20:05 -05:00
Drew Bailey c4f45f9bde
Fix panic when monitoring a local client node
Fixes a panic when accessing a.agent.Server() when agent is a client
instead. This pr removes a redundant ACL check since ACLs are validated
at the RPC layer. It also nil checks the agent server and uses Client()
when appropriate.
2020-02-03 13:20:04 -05:00
Seth Hoenig 78a7d1e426 comments: cleanup some leftover debug comments and such 2020-01-31 19:04:35 -06:00
Seth Hoenig 076cb4754e agent: re-enable the server in dev mode 2020-01-31 19:04:19 -06:00
Seth Hoenig 8219c78667 nomad: handle SI token revocations concurrently
Be able to revoke SI token accessors concurrently, and also
ratelimit the requests being made to Consul for the various
ACL API uses.
2020-01-31 19:04:14 -06:00
Seth Hoenig 2c7ac9a80d nomad: fixup token policy validation 2020-01-31 19:04:08 -06:00
Seth Hoenig 9df33f622f nomad: proxy requests for Service Identity tokens between Clients and Consul
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
2020-01-31 19:03:53 -06:00
Seth Hoenig f030a22c7c command, docs: create and document consul token configuration for connect acls (gh-6716)
This change provides an initial pass at setting up the configuration necessary to
enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul
Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert`
commands (similar to Vault tokens).

These values are not actually used yet in this changeset.
2020-01-31 19:02:53 -06:00
Michael Schurter c82b14b0c4 core: add limits to unauthorized connections
Introduce limits to prevent unauthorized users from exhausting all
ephemeral ports on agents:

 * `{https,rpc}_handshake_timeout`
 * `{http,rpc}_max_conns_per_client`

The handshake timeout closes connections that have not completed the TLS
handshake by the deadline (5s by default). For RPC connections this
timeout also separately applies to first byte being read so RPC
connections with TLS enabled have `rpc_handshake_time * 2` as their
deadline.

The connection limit per client prevents a single remote TCP peer from
exhausting all ephemeral ports. The default is 100, but can be lowered
to a minimum of 26. Since streaming RPC connections create a new TCP
connection (until MultiplexV2 is used), 20 connections are reserved for
Raft and non-streaming RPCs to prevent connection exhaustion due to
streaming RPCs.

All limits are configurable and may be disabled by setting them to `0`.

This also includes a fix that closes connections that attempt to create
TLS RPC connections recursively. While only users with valid mTLS
certificates could perform such an operation, it was added as a
safeguard to prevent programming errors before they could cause resource
exhaustion.
2020-01-30 10:38:25 -08:00
Mahmood Ali 90cae566e5
Merge pull request #6935 from hashicorp/b-default-preemption-flag
scheduler: allow configuring default preemption for system scheduler
2020-01-28 15:11:06 -05:00
Mahmood Ali af17b4afc7 Support customizing full scheduler config 2020-01-28 14:51:42 -05:00
Nick Ethier 5636203d4e consul: fix var name from rebase 2020-01-27 14:00:19 -05:00
Nick Ethier 0ae99b3c9c consul: fix var name from rebase 2020-01-27 12:55:52 -05:00
Nick Ethier 5cbb94e16e consul: add support for canary meta 2020-01-27 09:53:30 -05:00
Mahmood Ali 1ab682f622 scheduler: allow configuring default preemption for system scheduler
Some operators want a greater control over when preemption is enabled,
especially during an upgrade to limit potential side-effects.
2020-01-13 08:30:49 -05:00
Drew Bailey f97d2e96c1
refactor api profile methods
comment why we ignore errors parsing params
2020-01-09 15:15:12 -05:00
Drew Bailey b702dede49
adds qc param, address pr feedback 2020-01-09 15:15:11 -05:00
Drew Bailey 085659f6ff
condense table test 2020-01-09 15:15:10 -05:00
Drew Bailey 45210ed901
Rename profile package to pprof
Address pr feedback, rename profile package to pprof to more accurately
describe its purpose. Adds gc param for heap lookup profiles.
2020-01-09 15:15:10 -05:00
Drew Bailey 1b8af920f3
address pr feedback 2020-01-09 15:15:09 -05:00
Drew Bailey 4ced73875b
leave acl checking to rpc endpoints
fix test expectation

test wrapNonJSON
2020-01-09 15:15:08 -05:00
Drew Bailey 279512c7f8
provide helpful error, cleanup logic 2020-01-09 15:15:08 -05:00
Drew Bailey 7bbba613a5
prevent doubly wrapping with rpc error 2020-01-09 15:15:07 -05:00
Drew Bailey fd42020ad6
RPC server EnableDebug option
Passes in agent enable_debug config to nomad server and client configs.
This allows for rpc endpoints to have more granular control if they
should be enabled or not in combination with ACLs.

enable debug on client test
2020-01-09 15:15:07 -05:00
Drew Bailey 9a80938fb1
region forwarding; prevent recursive forwards for impossible requests
prevent region forwarding loop, backfill tests

fix failing test
2020-01-09 15:15:06 -05:00
Drew Bailey 46121fe3fd
move shared structs out of client and into nomad 2020-01-09 15:15:05 -05:00
Drew Bailey 3672414888
test pprof headers and profile methods
tidy up, add comments

clean up seconds param assignment
2020-01-09 15:15:04 -05:00
Drew Bailey fc37448683
warn when enabled debug is on when registering
m -> a receiver name

return codederrors, fix query
2020-01-09 15:15:04 -05:00
Drew Bailey 62eb2d76a6
acl and debug test table
rename implementation method
2020-01-09 15:15:03 -05:00
Drew Bailey 50288461c9
Server request forwarding for Agent.Profile
Return rpc errors for profile requests, set up remote forwarding to
target leader or server id for profile requests.

server forwarding, endpoint tests
2020-01-09 15:15:03 -05:00
Drew Bailey 901f362858
test for known pprof endpoints 2020-01-09 15:15:02 -05:00
Drew Bailey 49ad5fbc85
agent pprof endpoints
wip, agent endpoint and client endpoint for pprof profiles

agent endpoint test
2020-01-09 15:15:02 -05:00
Drew Bailey d9e41d2880
docs for shutdown delay
update docs, address pr comments

ensure pointer is not nil

use pointer for diff tests, set vs unset
2019-12-16 11:38:35 -05:00
Drew Bailey 24929776a2
shutdown delay for task groups
copy struct values

ensure groupserviceHook implements RunnerPreKillhook

run deregister first

test that shutdown times are delayed

move magic number into variable
2019-12-16 11:38:16 -05:00
Seth Hoenig f0c3dca49c tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Michael Schurter 3008473f9b
Merge branch 'master' into release-0102 2019-12-04 14:13:34 -08:00
Mahmood Ali 7b8cfee162 tests: deflake TestHTTP_FreshClientAllocMetrics
The test asserts that alloc counts get reported accurately in metrics by
inspecting the metrics endpoint directly.  Sadly, the metrics as
collected by `armon/go-metrics` seem to be stateful and may contain info
from other tests.

This means that the test can fail depending on the order of returned
metrics.

Inspecting the metrics output of one failing run, you can see the
duplicate guage entries but for different node_ids:

```
    {
      "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
      "Value": 10,
      "Labels": {
        "datacenter": "dc1",
        "node_class": "none",
        "node_id": "67402bf4-00f3-bd8d-9fa8-f4d1924a892a"
      }
    },
    {
      "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
      "Value": 0,
      "Labels": {
        "datacenter": "dc1",
        "node_class": "none",
        "node_id": "a2945b48-7e66-68e2-c922-49b20dd4e20c"
      }
    },
```
2019-11-22 18:41:21 -05:00
Nomad Release bot db6420367d Generate files for 0.10.2-rc1 release 2019-11-22 18:42:49 +00:00
Mahmood Ali be6c60455e
Merge pull request #6669 from hashicorp/b-cors-allow-credentials
Allow UI to query client directly for task logs/state
2019-11-20 15:14:01 -05:00
Preetha 42c1c85285
Merge pull request #6421 from hashicorp/b-acl-bootstrap-codes
api: acl bootstrap errors aren't 500
2019-11-20 10:36:08 -06:00
Preetha be4a51d5b8
Merge pull request #6349 from hashicorp/b-host-stats
client: Return empty values when host stats fail
2019-11-20 10:13:02 -06:00
Mahmood Ali 6f8bb5e90b api: acl bootstrap errors aren't 500
Noticed that ACL endpoints return 500 status code for user errors.  This
is confusing and can lead to false monitoring alerts.

Here, I introduce a concept of RPCCoded errors to be returned by RPC
that signal a code in addition to error message.  Codes for now match
HTTP codes to ease reasoning.

```
$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9))

$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9))
```
2019-11-19 15:51:57 -05:00
Nick Ethier bd454a4c6f
client: improve group service stanza interpolation and check_re… (#6586)
* client: improve group service stanza interpolation and check_restart support

Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.

The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
2019-11-18 13:04:01 -05:00
Drew Bailey 9b63828658
serverID to target remote leader or server
handle the case where we request a server-id which is this current server

update docs, error on node and server id params

more accurate names for tests

use shared no leader err, formatting

rm bad comment

remove redundant variable
2019-11-14 10:07:35 -05:00
Drew Bailey b644e1f47d
add server-id to monitor specific server 2019-11-14 09:53:41 -05:00
Drew Bailey acd97d0731
Merge pull request #6670 from hashicorp/api/fallthrough-test
test rootfallthrough handler
2019-11-13 10:51:31 -05:00
Lars Lehtonen 1dbf44bc40 command/agent: Prune Dead Code (#6682)
* remove unused MockPeriodicJob() from tests
* remove unused getIndex() from tests
* remove unused checkIndex() from tests
* remove unused assertIndex() from tests
* remove unused Agent.findLoopbackDevice()
2019-11-13 08:20:01 -05:00
Drew Bailey f5310ff63f
fix so assertions are test case driven 2019-11-12 14:28:21 -05:00
Charlie Voiselle 835831a3d8 Added service wrapper code (#6220)
This is the basic code to add the Windows Service Manager hooks to Nomad.

Includes vendoring golang.org/x/sys/windows/svc and added Docs:
* guide for installing as a windows service.
* configuration for logging to file from PR #6429
2019-11-11 15:16:07 -05:00
Drew Bailey f989f38594
test /ui/ path 2019-11-11 12:12:42 -05:00
Drew Bailey a0548824f3
test rootfallthrough handler 2019-11-11 12:08:44 -05:00
Mahmood Ali b2145f2d02 Allow UI to query client directly
Nomad web UI currently fails when querying client nodes for allocation
state end endpoints, due to CORS policy.

The issue is that CORS requests that are marked `withCredentials` need
the http server to include a `Access-Control-Allow-Credentials` [1].

But Nomad Task Logs and filesystem requests include authenticating
information and thus marked with `credentials=true`[2][3].

It's worth noting that the browser currently sends credentials and
authentication token to servers anyway; it's just that the response is
not made available to caller nomad ui javascript.  For task logs
specifically, nomad ui retries again by querying the web ui address
(typically pointing to a nomad server) which will forward the request
to the nomad client agent appropriately.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Credentials
[2] 101d0373ee/ui/app/components/task-log.js (L50)
[3] 101d0373ee/ui/app/services/token.js (L25-L39)
2019-11-11 15:13:30 +00:00
Lars Lehtonen 08d5342812 command/agent: TestAgent_ServerConfig() fix dropped errors (#6659) 2019-11-11 09:46:46 -05:00
Drew Bailey 04439a5a78
better func name, swap conditional 2019-11-11 08:35:56 -05:00
Drew Bailey c85df2dac7
returns a 404 if not found instead of redirect to ui 2019-11-08 15:34:35 -05:00
Drew Bailey 7b2ad28ef6
unlock before returning, no need for label
comment, trigger build

return length written
2019-11-05 11:44:29 -05:00
Drew Bailey d3b48a3e45
simplify logch goroutine 2019-11-05 11:44:28 -05:00
Drew Bailey df57f70a68
wireup plain=true|false query param 2019-11-05 11:44:28 -05:00
Drew Bailey f4a7e3dc75
coordinate closing of doneCh, use interface to simplify callers
comments
2019-11-05 11:44:26 -05:00
Drew Bailey fe542680dc
log-json -> json
fix typo command/agent/monitor/monitor.go

Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>

Update command/agent/monitor/monitor.go

Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>

address feedback, lock to prevent send on closed channel

fix lock/unlock for dropped messages
2019-11-05 09:51:59 -05:00
Drew Bailey 84c8e79f90
simplify assert message 2019-11-05 09:51:56 -05:00
Drew Bailey 8726b685de
address feedback 2019-11-05 09:51:56 -05:00
Drew Bailey e4b3e1d7d4
allow more time for streaming message
remove unused struct
2019-11-05 09:51:55 -05:00
Drew Bailey 318b6c91bf
monitor command takes no args
rm extra new line

fix lint errors

return after close

fix, simplify test
2019-11-05 09:51:55 -05:00
Drew Bailey 0e759c401c
moving endpoints over to frames 2019-11-05 09:51:54 -05:00
Drew Bailey fb23c1325d
fix deadlock issue, switch to frames envelope 2019-11-05 09:51:54 -05:00
Drew Bailey 32f62edbb0
return 400 if invalid log_json param is given
Addresses feedback around monitor implementation

subselect on stopCh to prevent blocking forever.

Set up a separate goroutine to check every 3 seconds for dropped
messages.

rename returned ch to avoid confusion
2019-11-05 09:51:53 -05:00
Drew Bailey 17d876d5ef
rename function, initialize log level better
underscores instead of dashes for query params
2019-11-05 09:51:53 -05:00
Drew Bailey da6229d704
update go-hclog dep
remove duplicate lock
2019-11-05 09:51:52 -05:00
Drew Bailey db65b1f4a5
agent:read acl policy for monitor 2019-11-05 09:51:52 -05:00
Drew Bailey f46fd5b3e1
only look up rpchandler for node if we have nodeid
fix some comments and nomad monitor -h output
2019-11-05 09:51:51 -05:00
Drew Bailey 3b9c33a5f0
new hclog with standardlogger intercept 2019-11-05 09:51:49 -05:00
Drew Bailey a45ae1cd58
enable json formatting, use queryoptions 2019-11-05 09:51:49 -05:00
Drew Bailey 786989dbe3
New monitor pkg for shared monitor functionality
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger

clean up old code

small comment about write

rm old comment about minsize

rename to Monitor

Removes connection logic from monitor command

Keep connection logic in endpoints, use a channel to send results from
monitoring

use new multisink logger and interfaces

small test for dropped messages

update go-hclogger and update sink/intercept logger interfaces
2019-11-05 09:51:49 -05:00
Drew Bailey e076204820
get local rpc endpoint working 2019-11-05 09:51:48 -05:00
Drew Bailey 976c43157c
remove log_writer
prefix output with proper spacing

update gzip handler, adjust first byte flow to allow gzip handler bypass

wip, first stab at wiring up rpc endpoint
2019-11-05 09:51:48 -05:00
Drew Bailey 0de94466b2
Display error when remote side ended monitor
multisink logger

remove usage of logwriter
2019-11-05 09:51:48 -05:00
Drew Bailey f60e44afc7
Adds nomad monitor command
Adds nomad monitor command. Like consul monitor, this command allows you
to stream logs from a nomad agent in real time with a a specified log
level

add endpoint tests

Upgrade go-hclog to latest version

The current version of go-hclog pads log prefixes to equal lengths
so info becomes [INFO ] and debug becomes [DEBUG]. This breaks
hashicorp/logutils/level.go Check function. Upgrading to the latest
version removes this padding and fixes log filtering that uses logutils
Check
2019-11-05 09:51:47 -05:00
Drew Bailey b386119d15
Add Agent Monitor to receive streaming logs
Queries /v1/agent/monitor and receives streaming logs from client
2019-11-05 09:51:47 -05:00
Drew Bailey b0184e2032
Adds AgentMonitor Endpoint
AgentMonitor is an endpoint to stream logs for a given agent. It allows
callers to pass in a supplied log level, which may be different than the
agents config allowing for temporary debugging with lower log levels.

Pass in logWriter when setting up Agent
2019-11-05 09:51:46 -05:00
Michael Schurter 9fed8d1bed client: fix panic from 0.8 -> 0.10 upgrade
makeAllocTaskServices did not do a nil check on AllocatedResources
which causes a panic when upgrading directly from 0.8 to 0.10. While
skipping 0.9 is not supported we intend to fix serious crashers caused
by such upgrades to prevent cluster outages.

I did a quick audit of the client package and everywhere else that
accesses AllocatedResources appears to be properly guarded by a nil
check.
2019-11-01 07:47:03 -07:00
Mahmood Ali 3f6e50617a
Merge pull request #6047 from hashicorp/b-ignore-server-if-disabled
Only warn against BootstrapExpect set in CLI flag
2019-10-29 10:55:44 -04:00
Michael Schurter 39437a5c5b
Merge branch 'master' into release-0100 2019-10-22 08:17:57 -07:00
Nomad Release bot 3e6c9dd40e Generate files for 0.10.0 release 2019-10-22 12:34:56 +00:00
Seth Hoenig 8b03477f46
Merge pull request #6448 from hashicorp/f-set-connect-sidecar-tags
connect: enable setting tags on consul connect sidecar service in job…
2019-10-17 15:14:09 -05:00
Seth Hoenig 039fbd3f3b connect: enable setting tags on consul connect sidecar service in jobspec (#6415) 2019-10-17 19:25:20 +00:00
Mahmood Ali 61e66cb077
Merge pull request #6427 from hashicorp/b-fs-endpoint-errors
agent: report fs log errors as http errors
2019-10-15 20:12:59 -04:00
Mahmood Ali 88f8127820 tests: avoid using unnecessary pipe 2019-10-15 17:22:03 -04:00
Danielle fee482ae6c
Merge pull request #6331 from hashicorp/dani/f-volume-mount-propagation
volumes: Add support for mount propagation
2019-10-14 14:29:40 +02:00
Danielle Lancashire 4fbcc668d0
volumes: Add support for mount propagation
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.

Similar to Kubernetes, we expose 3 options for configuring mount
propagation:

- private, which is equivalent to `rprivate` on Linux, which does not allow the
           container to see any new nested mounts after the chroot was created.

- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
                that have been created _outside of the container_ to be visible
                inside the container after the chroot is created.

- bidirectional, which is equivalent to `rshared` on Linux, which allows both
                 the container to see new mounts created on the host, but
                 importantly _allows the container to create mounts that are
                 visible in other containers an don the host_

private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.

To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
2019-10-14 14:09:58 +02:00
Danielle 2640155ae5
Merge pull request #6429 from hashicorp/f-log-to-file
Add support for logging to a file
2019-10-11 13:35:39 +02:00
Nomad Release bot 3007f1662e Generate files for 0.10.0-rc1 release 2019-10-10 19:08:23 +00:00
Danielle Lancashire 5cedf6d024
logging: Correctly track number of written bytes
Currently this assumes that a short write will never happen. While these
are improbable in a case where rotation being off a few bytes would
matter, this now correctly tracks the number of written bytes.
2019-10-10 14:02:14 +02:00
Danielle Lancashire b67215d4f8
logging: Sort files when pruning old logs
Currently this logging implementation is dependent on the order of files
as returned by filepath.Glob, which although internal methods are
documented to be lexographical, does not publicly document this. Here we
defensively resort.
2019-10-10 13:51:16 +02:00
Mahmood Ali 4b2ba62e35 acl: check ACL against object namespace
Fix a bug where a millicious user can access or manipulate an alloc in a
namespace they don't have access to.  The allocation endpoints perform
ACL checks against the request namespace, not the allocation namespace,
and performs the allocation lookup independently from namespaces.

Here, we check that the requested can access the alloc namespace
regardless of the declared request namespace.

Ideally, we'd enforce that the declared request namespace matches
the actual allocation namespace.  Unfortunately, we haven't documented
alloc endpoints as namespaced functions; we suspect starting to enforce
this will be very disruptive and inappropriate for a nomad point
release.  As such, we maintain current behavior that doesn't require
passing the proper namespace in request.  A future major release may
start enforcing checking declared namespace.
2019-10-08 12:59:22 -04:00
Mahmood Ali 3c0d8c7611
Merge pull request #6441 from hashicorp/b-agent-token
Redact replication tokens in /agent/self
2019-10-08 12:55:45 -04:00
Danielle Lancashire 9eaac48f25
agent: Refactor log setup to support log-to-file 2019-10-07 14:42:32 +02:00
Danielle Lancashire 442f4888b3
agent: Introduce File Logger
This commit introduces a rotating file logger for Nomad Agent Logs. The
logger implementation itself is a lift and shift from Consul, with tests
updated to fit with the Nomad pattern of using require, and not having a
testutil for creating tempdirs cleanly.
2019-10-07 14:37:31 +02:00
Danielle Lancashire d3614ea0a8
config: Add required configuration for logging to a file 2019-10-07 14:16:59 +02:00
Mahmood Ali 317e0f9e44 agent: report fs log errors as http errors
This fixes two bugs:

First, FS Logs API endpoint only propagated error back to user if it was
encoded with code, which isn't common.  Other errors get suppressed and
callers get an empty response with 200 error code.  Now, these endpoints
return  a 500 status code along with the error message.

Before
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout"; echo
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:47:21 GMT
< Content-Length: 0
<
* Connection #0 to host 127.0.0.1 left intact
```

After
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout"; echo
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:48:12 GMT
< Content-Length: 60
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
alloc lookup failed: index error: UUID must be 36 characters
```

Second, we return 400 status code for request validation errors.

Before
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:47:29 GMT
< Content-Length: 22
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
must provide task name
```

After
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:49:18 GMT
< Content-Length: 22
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
must provide task name
```
2019-10-04 16:33:58 -04:00
Lang Martin fb41dd86ba default raft protocol v2 2019-09-24 14:37:55 -04:00
Tim Gross cd9c23617f
client/connect: ConsulProxy LocalServicePort/Address (#6358)
Without a `LocalServicePort`, Connect services will try to use the
mapped port even when delivering traffic locally. A user can override
this behavior by pinning the port value in the `service` stanza but
this prevents us from using the Consul service name to reach the
service.

This commits configures the Consul proxy with its `LocalServicePort`
and `LocalServiceAddress` fields.
2019-09-23 14:30:48 -04:00
Danielle Lancashire 39fe07f66b
api: Redact tokens in /agent/self 2019-09-23 19:07:27 +02:00
Danielle Lancashire 8b44369073
api: Redact ACL Replication Token
Currently when hitting the /v1/agent/self API with ACL Replication
enabled results in the token being returned in the API. This commit
redacts that information, as it should be treated as a shared secret.
2019-09-22 14:35:53 +02:00
Danielle Lancashire e81d113e3f
command: Improve metrics fail logging 2019-09-19 04:17:42 +02:00
Tim Gross e3e30c15a9
remove resolved TODO from UpdateTTL docstring (#6336) 2019-09-16 16:26:06 -04:00
Danielle Lancashire 78b61de45f
config: Hoist volume.config.source into volume
Currently, using a Volume in a job uses the following configuration:

```
volume "alias-name" {
  type = "volume-type"
  read_only = true

  config {
    source = "host_volume_name"
  }
}
```

This commit migrates to the following:

```
volume "alias-name" {
  type = "volume-type"
  source = "host_volume_name"
  read_only = true
}
```

The original design was based due to being uncertain about the future of storage
plugins, and to allow maxium flexibility.

However, this causes a few issues, namely:
- We frequently need to parse this configuration during submission,
scheduling, and mounting
- It complicates the configuration from and end users perspective
- It complicates the ability to do validation

As we understand the problem space of CSI a little more, it has become
clear that we won't need the `source` to be in config, as it will be
used in the majority of cases:

- Host Volumes: Always need a source
- Preallocated CSI Volumes: Always needs a source from a volume or claim name
- Dynamic Persistent CSI Volumes*: Always needs a source to attach the volumes
                                   to for managing upgrades and to avoid dangling.
- Dynamic Ephemeral CSI Volumes*: Less thought out, but `source` will probably point
                                  to the plugin name, and a `config` block will
                                  allow you to pass meta to the plugin. Or will
                                  point to a pre-configured ephemeral config.
*If implemented

The new design simplifies this by merging the source into the volume
stanza to solve the above issues with usability, performance, and error
handling.
2019-09-13 04:37:59 +02:00
Nomad Release bot dc7d728a82 Generate files for 0.10.0-beta1 release 2019-09-06 18:47:09 +00:00
Michael Schurter 31eb8375e5
Merge pull request #6282 from hashicorp/f-connect-dev-path
connect: check if consul is on PATH
2019-09-05 12:25:23 -07:00
Michael Schurter 457684e34e connect: check if consul is on PATH
Only in -dev-connect mode for now since its valid to install Consul
after Nomad has started in production.
2019-09-05 12:05:42 -07:00
Jasmine Dahilig e1c73cdab5
add validation for job_gc_interval (#6277) 2019-09-05 11:20:46 -07:00
Mahmood Ali 6d73ca0cfb
Merge pull request #6250 from hashicorp/f-raft-protocol-v3
Update default raft protocol to version 3
2019-09-04 09:34:41 -04:00
Tim Gross 0f29dcc935
support script checks for task group services (#6197)
In Nomad prior to Consul Connect, all Consul checks work the same
except for Script checks. Because the Task being checked is running in
its own container namespaces, the check is executed by Nomad in the
Task's context. If the Script check passes, Nomad uses the TTL check
feature of Consul to update the check status. This means in order to
run a Script check, we need to know what Task to execute it in.

To support Consul Connect, we need Group Services, and these need to
be registered in Consul along with their checks. We could push the
Service down into the Task, but this doesn't work if someone wants to
associate a service with a task's ports, but do script checks in
another task in the allocation.

Because Nomad is handling the Script check and not Consul anyways,
this moves the script check handling into the task runner so that the
task runner can own the script check's configuration and
lifecycle. This will allow us to pass the group service check
configuration down into a task without associating the service itself
with the task.

When tasks are checked for script checks, we walk back through their
task group to see if there are script checks associated with the
task. If so, we'll spin off script check tasklets for them. The
group-level service and any restart behaviors it needs are entirely
encapsulated within the group service hook.
2019-09-03 15:09:04 -04:00
Jasmine Dahilig 4edebe389a
add default update stanza and max_parallel=0 disables deployments (#6191) 2019-09-02 10:30:09 -07:00
Evan Ercolano fcf66918d0 Remove unused canary param from MakeTaskServiceID 2019-08-31 16:53:23 -04:00
Michael Schurter 4bd53deba9
Merge pull request #6236 from hashicorp/b-ignore-connect-services
consul: ignore connect services when syncing
2019-08-30 13:11:09 -07:00
Michael Schurter 67b7bc1e90 consul: ignore connect services when syncing
Consul registers Connect services automatically, however Nomad thinks it
owns them due to the _nomad prefix. Since the services are managed by
Consul, Nomad needs to explicitly ignore them or otherwies they will be
removed.
2019-08-30 11:53:41 -07:00
Tim Gross b79021adfd cli: split -dev and -dev-connect flags 2019-08-30 09:33:30 -04:00
Mahmood Ali 6eabf53b91 Default raft protocol to version 3 2019-08-28 15:56:59 -04:00
Tim Gross 4d4461d1f5 agent: -dev=connect mode bind to 0.0.0.0
The dev mode flag for connect was binding to the default interface's
IP, but this makes for a bad user experience for the CLI which will
default to 127.0.0.1. If we bind to 0.0.0.0 instead the CLI will work
without further configuration by the user.
2019-08-23 13:51:16 -04:00
Jerome Gravel-Niquet cbdc1978bf Consul service meta (#6193)
* adds meta object to service in job spec, sends it to consul

* adds tests for service meta

* fix tests

* adds docs

* better hashing for service meta, use helper for copying meta when registering service

* tried to be DRY, but looks like it would be more work to use the
helper function
2019-08-23 12:49:02 -04:00
Michael Schurter 95b8048553
Merge pull request #6121 from hashicorp/f-connect-bootstrap
connect: task hook for bootstrapping envoy sidecar
2019-08-22 10:58:31 -07:00
Michael Schurter 59e0b67c7f connect: task hook for bootstrapping envoy sidecar
Fixes #6041

Unlike all other Consul operations, boostrapping requires Consul be
available. This PR tries Consul 3 times with a backoff to account for
the group services being asynchronously registered with Consul.
2019-08-22 08:15:32 -07:00
Danielle Lancashire 2e5f28029f
remove hidden field from host volumes
We're not shipping support for "hidden" volumes in 0.10 any more, I'll
convert this to an issue+mini RFC for future enhancement.
2019-08-22 08:48:05 +02:00
Danielle Lancashire 9df7e0eb72
clientconfig: Fix parsing multiple host volumes 2019-08-21 22:19:58 +02:00
Michael Schurter 050cc32fde
Merge pull request #6157 from hashicorp/f-connect-register
Register connect enabled group services with Consul
2019-08-20 14:45:38 -07:00