This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.
```hcl
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
ingress {
// ingress-gateway configuration entry
}
}
}
}
```
A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.
Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.
Aims to address #8294 and tangentially #8647
* docker: support group allocated ports
* docker: add new ports driver config to specify which group ports are mapped
* docker: update port mapping docs
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.
The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical
Nomad doesn't do much besides pass the fields through to Consul.
Fixes#6913
* update vault integration docs
docs/integrations/vault-integration was a copy of the learn guide. Remove that and move /docs/vault-integration to this location instead
fix link
fix link
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
* revert accidental deletion
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
In order to prevent staleness, changed driver links to point to releases page rather than a specific version.
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Postrun hooks for allocation runners don't currently block the registration of
terminal health with the servers, which is what allows system jobs to be
drained. So draining nodes with jobs that claim CSI volumes requires the
`-ignore-system` job to ensure that the postrun hook for service jobs gets a
chance to execute.
The Nomad binary size has been detailed differently in places
and is subject to changing almost daily. We should therefore
remove this to avoid confusion and misrepresentation.
adds in oss components to support enterprise multi-vault namespace feature
upgrade specific doc on vault multi-namespaces
vault docs
update test to reflect new error
Also fixed the same typo in a test. Fixing the typo fixes the link, but
the link was still broken when running the website locally due to the
trailing slash. It would have worked in prod thanks to redirects, but
using the canonical URL seems ideal.
Not sure if this was meant to imply adding more schedulers to Nomad is
easy, or that we plan on adding pluggable schedulers. Either way,
neither of those statements is really true unless you really stretch the
definitions of "easy" and "plan".
So remove this sentence as I can't imagine it does anything other than
confuse people.
Before docker, the only default was `SIGINT` for `kill_signal`. The
docker driver however defaults to `SIGTERM`, and we should document
as such.
Fixes#7140
This changes fixes a syntax error in the autoscaling apm plugin
docs as well as updates the scaling stanza doc. The stazna wording
implied its use was only for external autoscalers, whereas it also
is used by the UI.
Before, the service definition for a Connect Native service would always
require setting the `service.task` parameter. Now, that parameter is
automatically inferred when there is only one task in the task group.
Fixes#8274
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.
If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
The suggested plugin configuration to re-enable Docker volumes was erroneously
using the singlular `volume` instead of the correct `volumes`, making the
client fail to parse the configuration and causing it not to start.
Adds a `-global` flag for stopping multiregion jobs in all regions at
once. Warn the user if they attempt to stop a multiregion job in a single
region.
Per Slack conversation on April 30 with Schmichael and Gale. Aligning navigation verbiage with other .io sites (Community instead of Resources). Page updates remove Gitter and Mailing List links to direct users to the Forum. Also changed some header content and positioning. If this all doesn't work based on full content of page, happy to brainstorm a better approach. Thank you for reviewing!
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.
The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.
There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.
If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.
Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.
Fixes#6083
- Changed boilerplate intro copy to match messaging in approved 0.12 announcement copy launching next Monday
- Added Virtual Talks section with each of their Youtube links and year timestamps from this year
- Updated the Who Uses Nomad section in alignment with Nomad GitHub READDME in ordering
- Added new customer talks such as Cloudflare and yearly timestamps to each of them
- Removed outdated Community Tools and Integrations section
> If you do not run Nomad as root, make sure you add the Nomad user to the Docker group so Nomad can communicate with the Docker daemon.
Changing the username in the example from `vagrant` to `nomad`. Vagrant isn't addressed in the entire document, so I guess that this was a mistake.
- Guides now point to HashiCorp Learn, rather than old website
- Condensed the documentation & guides section for brevity
- Updated "Who Uses Nomad" page and section in README with new names collected from past 6 months
- Added yearly publication dates to each of the public talks
The tasklet passes the timeout for the script check into the task
driver's `Exec`, and its up to the task driver to enforce that via a
golang `context.WithDeadline`. In practice, this deadline is started
before the task driver starts setting up the execution
environment (because we need it to do things like timeout Docker API
calls).
Under even moderate load, the time it takes to set up the execution
context for the script check regularly exceeds a full second or
two. This can cause script checks to unexpected timeout or even never
execute if the context expires before the task driver ever gets a
chance to `execve`.
This changeset adds a notice to operators about setting script check
timeouts with plenty of padding and what to monitor for problems.
Creating a FAQ question to provide a home for additional context around
bootstrapping. Linking from API page to `default_server_config`
attribute. Added sample API response to to discuss "Updated: false"
Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all
namespaces. The returned list is to contain a `Namespace` field
indicating the job namespace.
If ACL is enabled, the request token needs to be a management token or
have `namespace:list-jobs` capability on all existing namespaces.
The MVP for CSI in the 0.11.0 release of Nomad did not include support
for opaque volume parameters or volume context. This changeset adds
support for both.
This also moves args for ControllerValidateCapabilities into a struct.
The CSI plugin `ControllerValidateCapabilities` struct that we turn
into a CSI RPC is accumulating arguments, so moving it into a request
struct will reduce the churn of this internal API, make the plugin
code more readable, and make this method consistent with the other
plugin methods in that package.
CSI plugins can require credentials for some publishing and
unpublishing workflow RPCs. Secrets are configured at the time of
volume registration, stored in the volume struct, and then passed
around as an opaque map by Nomad to the plugins.
This changeset implements a periodic garbage collection of CSI volumes
with missing allocations. This can happen in a scenario where a node
update fails partially and the allocation updates are written to raft
but the evaluations to GC the volumes are dropped. This feature will
cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1
get any stray claims cleaned up.
A few connect examples reference a fake 'test/test' image.
By replacing those with `hashicorpnomad/counter-api` we can
turn them into runnable examples.
Promote the Connect ACLs guide on the jobspec connect stanza docs
page. This was suggested in a ticket after someone got stuck not
realizing they needed to enable Consul Intentions for their connect
enabled services, which is covered in the guide.
Replace the existing top example with something that is directly
runnable on a `-dev-connect` nomad setup.
Add the _complete_ `countdash` example at the bottom in the
examples section, so that people do not need to go guide-hunting
to find a complete example. The hope is people will see more
runnable examples and be less afraid of connect.
This change replaces the top example for expose path configuration with
two new runnable examples. Users should be able to copy and paste those
jobs into a job file and run them against a basic connect enabled nomad
setup.
The example presented first demonstrates use of the service check expose
parameter with no dynamic port explicitly defined (new to 0.11.2).
This is expected to be the "90%" use case of users, and so we should
try to emphasise this pattern as best practice.
The example presented second demonstrates achieving the same goal as the
first exmaple, but utilizing the full plumbing available through the
`connect.proxy.expose` stanza. This should help readers comprehend what
is happening "under the hood".
This changeset implements a periodic garbage collection of unused CSI
plugins. Plugins are self-cleaning when the last allocation for a
plugin is stopped, but this feature will cover any missing edge cases
and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins
cleaned up.
The Datadog agent was rewritten in Go from version 6. This means
the codebase resides in a new GitHub repository. This change
updates the Nomad telemetry configuration documentation to point
to the latest repo.
This page has not been updated (yet) to reflect that support for all 3 job types (service, batch, system) which shipped in 0.9.2.
The current page implies that preemption is only available for system jobs.
This is early preparation for Nomad 0.12, where we plan to move Preemption from Enterprise feature suite to OSS for all.
* documents the scaling block in the JSON Job docs
resolves#7656
* add task-specific restart to JSON Job docs
companion to #7603
* [docs] improved and corrected scaling docs
* Update website/pages/api-docs/json-jobs.mdx
Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
* update `restart` documentation
#7288 added support for task-specific `restart` policy. this PR updates the docs to reflect that.
* added an explicit example of task-specific restart policy
* Update website/pages/docs/job-specification/restart.mdx