Commit Graph

152 Commits

Author SHA1 Message Date
Drew Bailey 630babb886
prevent double job status update (#9768)
* Prevent Job Statuses from being calculated twice

https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard
2021-01-22 09:18:17 -05:00
Kris Hicks d71a90c8a4
Fix some errcheck errors (#9811)
* Throw away result of multierror.Append

When given a *multierror.Error, it is mutated, therefore the return
value is not needed.

* Simplify MergeMultierrorWarnings, use StringBuilder

* Hash.Write() never returns an error

* Remove error that was always nil

* Remove error from Resources.Add signature

When this was originally written it could return an error, but that was
refactored away, and callers of it as of today never handle the error.

* Throw away results of io.Copy during Bridge

* Handle errors when computing node class in test
2021-01-14 12:46:35 -08:00
Drew Bailey 03a9541822
ignore poststop task in alloc health tracker (#9548), fixes #9361
* investigating where to ignore poststop task in alloc health tracker

* ignore poststop when setting latest start time for allocation

* clean up logic

* lifecycle: isolate mocks for poststop deployment test

* lifecycle: update comments in tracker

Co-authored-by: Jasmine Dahilig <jasmine@dahilig.com>
2021-01-12 10:03:48 -08:00
Seth Hoenig beaa6359d5 consul/connect: fix regression where client connect images ignored
Nomad v1.0.0 introduced a regression where the client configurations
for `connect.sidecar_image` and `connect.gateway_image` would be
ignored despite being set. This PR restores that functionality.

There was a missing layer of interpolation that needs to occur for
these parameters. Since Nomad 1.0 now supports dynamic envoy versioning
through the ${NOMAD_envoy_version} psuedo variable, we basically need
to first interpolate

  ${connect.sidecar_image} => envoyproxy/envoy:v${NOMAD_envoy_version}

then use Consul at runtime to resolve to a real image, e.g.

  envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.16.0

Of course, if the version of Consul is too old to provide an envoy
version preference, we then need to know to fallback to the old
version of envoy that we used before.

  envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09

Beyond that, we also need to continue to support jobs that set the
sidecar task themselves, e.g.

  sidecar_task { config { image: "custom/envoy" } }

which itself could include teh pseudo envoy version variable.
2020-12-14 09:47:55 -06:00
Drew Bailey a0b7f05a7b
Remove Managed Sinks from Nomad (#9470)
* Remove Managed Sinks from Nomad

Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.

* update comment

* changelog
2020-11-30 14:00:31 -05:00
Nick Ethier 9471892df4 mock: add default host network 2020-11-23 10:11:00 -06:00
Nick Ethier 7266376ae6 nomad: update validate to check group networks for task port usage 2020-11-23 10:11:00 -06:00
Chris Baker 719077a26d added new policy capabilities for recommendations API
state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type
2020-10-28 14:32:16 +00:00
Drew Bailey 86080e25a9
Send events to EventSinks (#9171)
* Process to send events to configured sinks

This PR adds a SinkManager to a server which is responsible for managing
managed sinks. Managed sinks subscribe to the event broker and send
events to a sink writer (webhook). When changes to the eventstore are
made the sinkmanager and managed sink are responsible for reloading or
starting a new managed sink.

* periodically check in sink progress to raft

Save progress on the last successfully sent index to raft. This allows a
managed sink to resume close to where it left off in the event of a lost
server or leadership change

dereference eventsink so we can accurately use the watchch

When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger
2020-10-26 17:27:54 -04:00
Drew Bailey 1ae39a9ed9
event sink crud operation api (#9155)
* network sink rpc/api plumbing

state store methods and restore

upsert sink test

get sink

delete sink

event sink list and tests

go generate new msg types

validate sink on upsert

* go generate
2020-10-23 14:23:00 -04:00
Michael Schurter c2dd9bc996 core: open source namespaces 2020-10-22 15:26:32 -07:00
Alexander Shtuchkin 90fd8bb85f
Implement 'batch mode' for persisting allocations on the client. (#9093)
Fixes #9047, see problem details there.

As a solution, we use BoltDB's 'Batch' mode that combines multiple
parallel writes into small number of transactions. See
https://github.com/boltdb/bolt#batch-read-write-transactions for
more information.
2020-10-20 16:15:37 -04:00
Drew Bailey b4c135358d
use Events to wrap index and events, store in events table 2020-10-14 12:44:39 -04:00
Seth Hoenig ed13e5723f consul/connect: dynamically select envoy sidecar at runtime
As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.

This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.

Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.

Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.

`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.

Addresses #8585 #7665
2020-10-13 09:14:12 -05:00
Luiz Aoqui 88d4eecfd0
add scaling policy type 2020-09-29 17:57:46 -04:00
Jasmine Dahilig 71a694f39c
Merge pull request #8390 from hashicorp/lifecycle-poststart-hook
task lifecycle poststart hook
2020-08-31 13:53:24 -07:00
Seth Hoenig 5b072029f2 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
Michael Schurter de08ae8083 test: add allocrunner test for poststart hooks 2020-08-12 09:54:14 -07:00
Seth Hoenig 2511f48351 consul/connect: add support for bridge networks with connect native tasks
Before, Connect Native Tasks needed one of these to work:

- To be run in host networking mode
- To have the Consul agent configured to listen to a unix socket
- To have the Consul agent configured to listen to a public interface

None of these are a great experience, though running in host networking is
still the best solution for non-Linux hosts. This PR establishes a connection
proxy between the Consul HTTP listener and a unix socket inside the alloc fs,
bypassing the network namespace for any Connect Native task. Similar to and
re-uses a bunch of code from the gRPC listener version for envoy sidecar proxies.

Proxy is established only if the alloc is configured for bridge networking and
there is at least one Connect Native task in the Task Group.

Fixes #8290
2020-07-29 09:26:01 -05:00
Tim Gross bd457343de
MRD: all regions should start pending (#8433)
Deployments should wait until kicked off by `Job.Register` so that we can
assert that all regions have a scheduled deployment before starting any
region. This changeset includes the OSS fixes to support the ENT work.

`IsMultiregionStarter` has no more callers in OSS, so remove it here.
2020-07-14 10:57:37 -04:00
Nick Ethier fb9c458df1
nomad/mock: add NodeNetworkResources to mock Node 2020-06-19 14:22:24 -04:00
Nick Ethier 4a44deaa5c CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Tim Gross 7b12445f29 multiregion: change AutoRevert to OnFailure 2020-06-17 11:05:45 -04:00
Tim Gross b09b7a2475 Multiregion job registration
Integration points for multiregion jobs to be registered in the enterprise
version of Nomad:
* hook in `Job.Register` for enterprise to send job to peer regions
* remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs
2020-06-17 11:04:58 -04:00
Tim Gross 2082cf738a
csi: support for VolumeContext and VolumeParameters (#7957)
The MVP for CSI in the 0.11.0 release of Nomad did not include support
for opaque volume parameters or volume context. This changeset adds
support for both.

This also moves args for ControllerValidateCapabilities into a struct.
The CSI plugin `ControllerValidateCapabilities` struct that we turn
into a CSI RPC is accumulating arguments, so moving it into a request
struct will reduce the churn of this internal API, make the plugin
code more readable, and make this method consistent with the other
plugin methods in that package.
2020-05-15 08:16:01 -04:00
Tim Gross 4374c1a837
csi: support Secrets parameter in CSI RPCs (#7923)
CSI plugins can require credentials for some publishing and
unpublishing workflow RPCs. Secrets are configured at the time of
volume registration, stored in the volume struct, and then passed
around as an opaque map by Nomad to the plugins.
2020-05-11 17:12:51 -04:00
Tim Gross 083b35d651
csi: checkpoint volume claim garbage collection (#7782)
Adds a `CSIVolumeClaim` type to be tracked as current and past claims
on a volume. Allows for a client RPC failure during node or controller
detachment without having to keep the allocation around after the
first garbage collection eval.

This changeset lays groundwork for moving the actual detachment RPCs
into a volume watching loop outside the GC eval.
2020-04-23 11:06:23 -04:00
Tim Gross 54b3573fc9
state: support snapshot of CSI plugin and volume tables (#7546)
The `csi_plugins` and `csi_volumes` tables were missing support for
snapshot persist and restore. This means restoring a snapshot would
result in missing information for CSI.
2020-03-30 11:17:16 -04:00
Chris Baker 5979d6a81e more testing for ScalingPolicy, mainly around parsing and canonicalization for Min/Max 2020-03-24 19:43:50 +00:00
Chris Baker 233db5258a changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults.
- tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent)
- Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max

modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and
2020-03-24 13:57:17 +00:00
Chris Baker abc7a52f56 finished refactoring state store, schema, etc 2020-03-24 13:57:14 +00:00
Chris Baker 6665d0bfb0 wip: added policy get endpoint, added UUID to policy 2020-03-24 13:55:20 +00:00
Chris Baker 65d92f1fbf WIP: adding ScalingPolicy to api/structs and state store 2020-03-24 13:55:18 +00:00
Jasmine Dahilig 73a64e4397 change jobspec lifecycle stanza to use sidecar attribute instead of
block_until status
2020-03-21 17:52:57 -04:00
Jasmine Dahilig d54a83afee fix linting errors 2020-03-21 17:52:53 -04:00
Jasmine Dahilig a0fe570317 clean up restore test 2020-03-21 17:52:52 -04:00
Jasmine Dahilig 7ed08eb75a partial test for restore functionality 2020-03-21 17:52:52 -04:00
Jasmine Dahilig 81d051d7e8 fix bug in lifecycle scheduler test mocks 2020-03-21 17:52:51 -04:00
Jasmine Dahilig 0cc9212a54 add test cases for scheduler alloc placement with lifecycle resources 2020-03-21 17:52:47 -04:00
Jasmine Dahilig 0d2988652c add lifecycle job mock 2020-03-21 17:52:47 -04:00
Jasmine Dahilig c27223207c update task hook coordinator tests 2020-03-21 17:52:46 -04:00
Jasmine Dahilig 12393f90e7 add test for lifecycle coordinator 2020-03-21 17:52:42 -04:00
Seth Hoenig 9df33f622f nomad: proxy requests for Service Identity tokens between Clients and Consul
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
2020-01-31 19:03:53 -06:00
Michael Schurter 95fd2643d7 connect: canonicalize before adding sidecar
Fixes #6853

Canonicalize jobs first before adding any sidecars. This fixes a bug
where sidecar tasks were added without interpolated names and broke
validation. Sidecar tasks must be canonicalized independently.

Also adds a group network to the mock connect job because it wasn't a
valid connect job before!
2019-12-12 20:55:56 -08:00
Michael Schurter 5957030d18
connect: add unix socket to proxy grpc for envoy (#6232)
* connect: add unix socket to proxy grpc for envoy

Fixes #6124

Implement a L4 proxy from a unix socket inside a network namespace to
Consul's gRPC endpoint on the host. This allows Envoy to connect to
Consul's xDS configuration API.

* connect: pointer receiver on structs with mutexes

* connect: warn on all proxy errors
2019-09-03 08:43:38 -07:00
Jasmine Dahilig 4edebe389a
add default update stanza and max_parallel=0 disables deployments (#6191) 2019-09-02 10:30:09 -07:00
Mahmood Ali 3da10b5cb3 scheduler: tests for multiple drivers in TG 2019-08-29 09:03:31 -04:00
Jasmine Dahilig 8d980edd2e
add create and modify timestamps to evaluations (#5881) 2019-08-07 09:50:35 -07:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Alex Dadgar 4ee603c382 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00