Commit Graph

305 Commits

Author SHA1 Message Date
Derek Strickland 0a8e03f0f7
Expose Consul template configuration parameters (#11606)
This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza.

- `wait`

It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza.

- `max_stale`
- `block_query_wait`
- `consul_retry`
- `vault_retry` 
- `wait` 

Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure.

- `wait_bounds`

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-01-10 10:19:07 -05:00
Michael Schurter 88200f4eb9 core: fix DNS and CPU Core copying 2021-12-23 12:28:19 -08:00
Mahmood Ali 4d90afb425 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
James Rasell 0e926ef3fd
allow configuration of Docker hostnames in bridge mode (#11173)
Add a new hostname string parameter to the network block which
allows operators to specify the hostname of the network namespace.
Changing this causes a destructive update to the allocation and it
is omitted if empty from API responses. This parameter also supports
interpolation.

In order to have a hostname passed as a configuration param when
creating an allocation network, the CreateNetwork func of the
DriverNetworkManager interface needs to be updated. In order to
minimize the disruption of future changes, rather than add another
string func arg, the function now accepts a request struct along with
the allocID param. The struct has the hostname as a field.

The in-tree implementations of DriverNetworkManager.CreateNetwork
have been modified to account for the function signature change.
In updating for the change, the enhancement of adding hostnames to
network namespaces has also been added to the Docker driver, whilst
the default Linux manager does not current implement it.
2021-09-16 08:13:09 +02:00
Mahmood Ali bfc766357e
deployments: canary=0 is implicitly autopromote (#11013)
In a multi-task-group job, treat 0 canary groups as auto-promote.

This change fixes an edge case where Nomad requires a manual promotion,
if the job had any group with canary=0 and rest of groups having
auto_promote set.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-08-10 17:06:40 -04:00
Seth Hoenig ac5c83cafd core: remove internalization of affinity strings
Basically the same as #10896 but with the Affinity struct.
Since we use reflect.DeepEquals for job comparison, there is
risk of false positives for changes due to a job struct with
memoized vs non-memoized strings.

Closes #10897
2021-07-15 15:15:39 -05:00
Seth Hoenig bea8066187 core: add spec changed test with constriants 2021-07-14 10:44:09 -05:00
Seth Hoenig 52cf03df4a core: fix constraint tests 2021-07-14 10:39:38 -05:00
Tim Gross c01d661c98 csi: validate `volume` block has `attachment_mode` and `access_mode`
The `attachment_mode` and `access_mode` fields are required for CSI
volumes. The `mount_options` block is only allowed for CSI volumes.
2021-06-03 16:07:19 -04:00
Tim Gross 276633673d CSI: use AccessMode/AttachmentMode from CSIVolumeClaim
Registration of Nomad volumes previously allowed for a single volume
capability (access mode + attachment mode pair). The recent `volume create`
command requires that we pass a list of requested capabilities, but the
existing workflow for claiming volumes and attaching them on the client
assumed that the volume's single capability was correct and unchanging.

Add `AccessMode` and `AttachmentMode` to `CSIVolumeClaim`, use these fields to
set the initial claim value, and add backwards compatibility logic to handle
the existing volumes that already have claims without these fields.
2021-04-07 11:24:09 -04:00
Tim Gross dbcc2694b0 refactor: move VolumeRequest validation to Validate method 2021-04-07 11:24:09 -04:00
Chris Baker 436d46bd19
Merge branch 'main' into f-node-drain-api 2021-04-01 15:22:57 -05:00
Mahmood Ali 0c2551270a oversubscription: Add MemoryMaxMB to internal structs
Start tracking a new MemoryMaxMB field that represents the maximum memory a task
may use in the client. This allows tasks to specify a memory reservation (to be
used by scheduler when placing the task) but use excess memory used on the
client if the client has any.

This commit adds the server tracking for the value, and ensures that allocations
AllocatedResource fields include the value.
2021-03-30 16:55:58 -04:00
Nick Ethier daecfa61e6
Merge pull request #10203 from hashicorp/f-cpu-cores
Reserved Cores [1/4]: Structs and scheduler implementation
2021-03-29 14:05:54 -04:00
Chris Baker 770c9cecb5 restored Node.Sanitize() for RPC endpoints
multiple other updates from code review
2021-03-26 17:03:15 +00:00
Chris Baker dd291e69f4 removed deprecated fields from Drain structs and API
node drain: use msgtype on txn so that events are emitted
wip: encoding extension to add Node.Drain field back to API responses

new approach for hiding Node.SecretID in the API, using `json` tag
documented this approach in the contributing guide
refactored the JSON handlers with extensions
modified event stream encoding to use the go-msgpack encoders with the extensions
2021-03-21 15:30:11 +00:00
Nick Ethier 648ade63ad scheduler: implement scheduling of reserved cores 2021-03-19 00:29:07 -04:00
Nick Ethier 4b2912d343 structs: add struct fields and funcs for reservable cpu cores 2021-03-18 22:49:06 -04:00
Tim Gross fa25e048b2
CSI: unique volume per allocation
Add a `PerAlloc` field to volume requests that directs the scheduler to test
feasibility for volumes with a source ID that includes the allocation index
suffix (ex. `[0]`), rather than the exact source ID.

Read the `PerAlloc` field when making the volume claim at the client to
determine if the allocation index suffix (ex. `[0]`) should be added to the
volume source ID.
2021-03-18 15:35:11 -04:00
Nick Ethier 43a4d72fda structs: namespace port validation by host_network 2021-02-02 14:56:52 -05:00
Seth Hoenig 8b05efcf88 consul/connect: Add support for Connect terminating gateways
This PR implements Nomad built-in support for running Consul Connect
terminating gateways. Such a gateway can be used by services running
inside the service mesh to access "legacy" services running outside
the service mesh while still making use of Consul's service identity
based networking and ACL policies.

https://www.consul.io/docs/connect/gateways/terminating-gateway

These gateways are declared as part of a task group level service
definition within the connect stanza.

service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      terminating {
        // terminating-gateway configuration entry
      }
    }
  }
}

Currently Envoy is the only supported gateway implementation in
Consul. The gateay task can be customized by configuring the
connect.sidecar_task block.

When the gateway.terminating field is set, Nomad will write/update
the Configuration Entry into Consul on job submission. Because CEs
are global in scope and there may be more than one Nomad cluster
communicating with Consul, there is an assumption that any terminating
gateway defined in Nomad for a particular service will be the same
among Nomad clusters.

Gateways require Consul 1.8.0+, checked by a node constraint.

Closes #9445
2021-01-25 10:36:04 -06:00
Kris Hicks 8f9e47a8e7
Clean up Task Validation tests (#9833)
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2021-01-21 11:53:02 -08:00
Dennis Schön 3eaf1432aa
validate connect block allowed only within group.service 2021-01-20 14:34:23 -05:00
Kris Hicks d71a90c8a4
Fix some errcheck errors (#9811)
* Throw away result of multierror.Append

When given a *multierror.Error, it is mutated, therefore the return
value is not needed.

* Simplify MergeMultierrorWarnings, use StringBuilder

* Hash.Write() never returns an error

* Remove error that was always nil

* Remove error from Resources.Add signature

When this was originally written it could return an error, but that was
refactored away, and callers of it as of today never handle the error.

* Throw away results of io.Copy during Bridge

* Handle errors when computing node class in test
2021-01-14 12:46:35 -08:00
Nick Ethier d21cbeb30f command: remove task network usage from init examples 2020-11-23 10:25:11 -06:00
Nick Ethier 7266376ae6 nomad: update validate to check group networks for task port usage 2020-11-23 10:11:00 -06:00
Nick Ethier 5e1634eda1
structs: canonicalize allocatedtaskresources to populate shared ports (#9309) 2020-11-11 16:21:47 -05:00
Seth Hoenig 0c5ae5769f
Merge pull request #9029 from hashicorp/b-tgs-updates
consul/connect: trigger update as necessary on connect changes
2020-10-05 16:48:04 -05:00
Seth Hoenig f44a4f68ee consul/connect: trigger update as necessary on connect changes
This PR fixes a long standing bug where submitting jobs with changes
to connect services would not trigger updates as expected. Previously,
service blocks were not considered as sources of destructive updates
since they could be synced with consul non-destructively. With Connect,
task group services that have changes to their connect block or to
the service port should be destructive, since the network plumbing of
the alloc is going to need updating.

Fixes #8596 #7991

Non-destructive half in #7192
2020-10-05 14:53:00 -05:00
Chris Baker 7f701fddd0 updated docs and validation to further prohibit null chars in region, datacenter, and job name 2020-10-05 18:01:50 +00:00
Chris Baker 23ea7cd27c updated job validate to refute job/group/task IDs containing null characters
updated CHANGELOG and upgrade guide
2020-10-05 18:01:49 +00:00
Chris Baker c8fd9428d4 documenting tests around null characters in job id, task group name, and task name 2020-10-05 18:01:49 +00:00
Michael Schurter 765473e8b0 jobspec: lower min cpu resources from 10->1
Since CPU resources are usually a soft limit it is desirable to allow
setting it as low as possible to allow tasks to run only in "idle" time.

Setting it to 0 is still not allowed to avoid potential unintentional
side effects with allowing a zero value. While there may not be any side
effects this commit attempts to minimize risk by avoiding the issue.

This does *not* change the defaults.
2020-09-30 12:15:13 -07:00
Luiz Aoqui 88d4eecfd0
add scaling policy type 2020-09-29 17:57:46 -04:00
Seth Hoenig af9543c997 consul: fix validation of task in group-level script-checks
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.

This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset

Fixes #8952
2020-09-28 15:02:59 -05:00
Seth Hoenig 84176c9a41 consul/connect: make use of task kind to determine service name in consul token checks
When consul.allow_unauthenticated is set to false, the job_endpoint hook validates
that a `-consul-token` is provided and validates the token against the privileges
inherent to a Consul Service Identity policy for all the Connect enabled services
defined in the job.

Before, the check was assuming the service was of type sidecar-proxy. This fixes the
check to use the type of the task so we can distinguish between the different connect
types.
2020-08-27 12:14:40 -05:00
Seth Hoenig 5b072029f2 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
Nick Ethier 3cd5f46613
Update UI to use new allocated ports fields (#8631)
* nomad: canonicalize alloc shared resources to populate ports

* ui: network ports

* ui: remove unused task network references and update tests with new shared ports model

* ui: lint

* ui: revert auto formatting

* ui: remove unused page objects

* structs: remove unrelated test from bad conflict resolution

* ui: formatting
2020-08-20 11:07:13 -04:00
Tim Gross 38ec70eb8d
multiregion: validation should always return error for OSS (#8687) 2020-08-18 15:35:38 -04:00
Lang Martin a3bfd8c209
structs: Job.Validate only allows stop_after_client_disconnected on batch and service jobs (#8444)
* nomad/structs/structs: add to Job.Validate

* Update nomad/structs/structs.go

Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>

* nomad/structs/structs: match error strings to the config file

* nomad/structs/structs_test: clarify the test a bit

* nomad/structs/structs_test: typo in the test error comparison

Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2020-07-20 10:27:25 -04:00
Tim Gross bd457343de
MRD: all regions should start pending (#8433)
Deployments should wait until kicked off by `Job.Register` so that we can
assert that all regions have a scheduled deployment before starting any
region. This changeset includes the OSS fixes to support the ENT work.

`IsMultiregionStarter` has no more callers in OSS, so remove it here.
2020-07-14 10:57:37 -04:00
Tim Gross 0ce3c1e942
multiregion: allow empty region DCs (#8426)
It's supposed to be possible for a region not to have `datacenters` set so
that it can use the job's `datacenters` field. This requires that operators
use the same DC name across multiple regions, but that's the default client
configuration.
2020-07-13 13:34:19 -04:00
Mahmood Ali 6605ebd314
Merge pull request #8223 from hashicorp/f-multi-network-validate-ports
core: validate port numbers are < 65535
2020-06-26 08:31:01 -04:00
Michael Schurter 13ed710a04 core: validate port numbers are <= 65535
The scheduler returns a very strange error if it detects a port number
out of range. If these would somehow make it to the client they would
overflow when converted to an int32 and could cause conflicts.
2020-06-23 11:31:49 -07:00
Seth Hoenig 6c5ab7f45e consul/connect: split connect native flag and task in service 2020-06-23 10:22:22 -05:00
Seth Hoenig 4d71f22a11 consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
2020-06-22 14:07:44 -05:00
Michael Schurter 562704124d
Merge pull request #8208 from hashicorp/f-multi-network
multi-interface network support
2020-06-19 15:46:48 -07:00
Nick Ethier a87e91e971
test: fix up testing around host networks 2020-06-19 13:53:31 -04:00
Tim Gross b654e1b8a4
multiregion: all regions start in running if no max_parallel (#8209)
If `max_parallel` is not set, all regions should begin in a `running` state
rather than a `pending` state. Otherwise the first region is set to `running`
and then all the remaining regions once it enters `blocked. That behavior is
technically correct in that we have at most `max_parallel` regions running,
but definitely not what a user expects.
2020-06-19 11:17:09 -04:00
Nick Ethier 4a44deaa5c CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00