Commit graph

75 commits

Author SHA1 Message Date
Derek Strickland 0a8e03f0f7
Expose Consul template configuration parameters (#11606)
This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza.

- `wait`

It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza.

- `max_stale`
- `block_query_wait`
- `consul_retry`
- `vault_retry` 
- `wait` 

Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure.

- `wait_bounds`

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-01-10 10:19:07 -05:00
Tim Gross b0c3b99b03
scheduler: fix quadratic performance with spread blocks (#11712)
When the scheduler picks a node for each evaluation, the
`LimitIterator` provides at most 2 eligible nodes for the
`MaxScoreIterator` to choose from. This keeps scheduling fast while
producing acceptable results because the results are binpacked.

Jobs with a `spread` block (or node affinity) remove this limit in
order to produce correct spread scoring. This means that every
allocation within a job with a `spread` block is evaluated against
_all_ eligible nodes. Operators of large clusters have reported that
jobs with `spread` blocks that are eligible on a large number of nodes
can take longer than the nack timeout to evaluate (60s). Typical
evaluations are processed in milliseconds.

In practice, it's not necessary to evaluate every eligible node for
every allocation on large clusters, because the `RandomIterator` at
the base of the scheduler stack produces enough variation in each pass
that the likelihood of an uneven spread is negligible. Note that
feasibility is checked before the limit, so this only impacts the
number of _eligible_ nodes available for scoring, not the total number
of nodes.

This changeset sets the iterator limit for "large" `spread` block and
node affinity jobs to be equal to the number of desired
allocations. This brings an example problematic job evaluation down
from ~3min to ~10s. The included tests ensure that we have acceptable
spread results across a variety of large cluster topologies.
2021-12-21 10:10:01 -05:00
Andy Assareh 8ba4e063e2
Mesh Gateway doc enhancements (#11354)
* Mesh Gateway doc enhancements

1. I believe this line should be corrected to add mesh as one of the choices
2. I found that we are not setting this meta, and it is a required element for wan federation. I believe it would be helpful and potentially time saving to note that right here.
2021-12-20 17:10:44 -05:00
Luiz Aoqui 4b39494cd1
docs: add more references and examples to the template block (#11691) 2021-12-16 14:14:01 -05:00
Kevin Wang 3e6757f211
feat(website): extract /plugins /tools docs (#11584)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Mike Nomitch <mnomitch@hashicorp.com>
2021-12-09 14:25:18 -05:00
Shantanu Gadgil 0838678609
mention sysbatch in addition to batch (#11587) 2021-12-06 19:12:03 -05:00
Tim Gross ba038a1ebc
docs: mount_flags takes a slice of strings (#11583)
The `mount_flags` option takes a slice of strings, not a
comma-separated string like the flags passed to `mount(8)`.
2021-11-29 10:07:34 -05:00
Andy Assareh 8c638217ac
exactly one of ingress, terminating, or mesh must be configured
i believe mesh should be included in this statement was omitted.
2021-10-15 14:15:02 -07:00
Luiz Aoqui 63d1ac8939
docs: document that network mode is only supported on Linux (#11192)
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-10-01 23:17:20 -04:00
Luiz Aoqui a7872f0ba5
docs: add Nomad version requirement note for sysbatch (#11231) 2021-09-29 15:14:51 -04:00
Luiz Aoqui 8d19831247
docs: add some extra documentation around client host environment variables (#11208)
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-09-21 17:23:30 -04:00
James Rasell b5039c96a4
docs: add network.hostname job specification website entry. 2021-09-15 11:43:47 +02:00
James Rasell 4dd5c47a47
Merge pull request #11091 from hashicorp/consolidate-cni-plugins-to-1.0.0
cni: consolidate cni plugins within test install and docs to use v1.0.0
2021-08-30 09:39:39 +02:00
Mahmood Ali 53f11e0080
docs: note env and meta map assignment syntax (#11095) 2021-08-29 14:35:09 -04:00
James Rasell ec221ab792
docs: update website to detail cni plugins v1.0.0 2021-08-27 11:15:25 +02:00
Luiz Aoqui be2389c8ad
Update scaling and policy blocks documentation (#11071)
* website: update `scaling` and `policy` blocks documentation

* website: hclfmt examples in scaling block docs
2021-08-25 09:10:18 -04:00
James Rasell 75be5acb08
Merge pull request #11042 from hashicorp/docs-remove-ingress-host-port-callout
docs: Remove note on ingress gateway hosts field needing a port number
2021-08-25 12:29:59 +02:00
Mahmood Ali c37339a8c8
Merge pull request #9160 from hashicorp/f-sysbatch
core: implement system batch scheduler
2021-08-16 09:30:24 -04:00
Blake Covarrubias 0778ffab8c docs: Remove note on ingress gateway hosts field needing a port number
Update the ingress gateway documentation to remove the note stating
that a port must be specified for values in the `hosts` field when
the ingress gateway is listening on a non-standard HTTP port.

Specifying a port was required in Consul 1.8.0, but that requirement
was removed in 1.8.1 with hashicorp/consul#8190 which made Consul
include the port number when constructing the Envoy configuration.

Related Consul docs PR: hashicorp/consul#10827
2021-08-11 14:55:05 -07:00
Tim Gross de957b48ff docs: note CNI requirement for bridge networking
Using `bridge` networking requires that you have CNI plugins installed
on the client, but this isn't in the jobspec `network` docs which are
the first place someone will look when trying to configure task
networking.
2021-08-11 10:18:35 -04:00
Seth Hoenig 3371214431 core: implement system batch scheduler
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
2021-08-03 10:30:47 -04:00
Tim Gross 5e6aca18e4
docs: unset port to field maps to dynamic port (#10828) 2021-06-28 15:55:24 -04:00
Boris Shomodjvarac 64b1cafa57
docs: update csi_plugin example (#10821)
Current efs driver does not support telling it if its a `node` or a `controller`, and it will not print any error it will just ignore all other parameters then:(
So this will result in endpoint being `/tmp/csi.sock` and not `/csi/csi.sock` which will in turn break nomad/csi integration.

Also I changed the latest image tag to v1.3.2 to make sure anybody copy pasting this example is sure that it will work.

Tested on nomad 1.1.2
2021-06-28 08:28:03 -04:00
Tim Gross ad8eb33cd7
docs: improve CSI deployment recommendations (#10798)
* add some more context to the recommendations
* add recommendations around per-AZ `plugin_id`
2021-06-22 10:23:09 -04:00
Tim Gross 74947d6591
docs: host_network does support Docker task port mapping (#10774) 2021-06-17 09:11:10 -04:00
Seth Hoenig 839c0cc360 consul/connect: fix upstream mesh gateway default mode setting
This PR fixes the API to _not_ set the default mesh gateway mode. Before,
the mode would be set to "none" in Canonicalize, which is incorrect. We
should pass through the empty string so that folks can make use of Consul
service-defaults Config entries to configure the default mode.
2021-06-04 08:53:12 -05:00
Seth Hoenig d026ff1f66 consul/connect: add support for connect mesh gateways
This PR implements first-class support for Nomad running Consul
Connect Mesh Gateways. Mesh gateways enable services in the Connect
mesh to make cross-DC connections via gateways, where each datacenter
may not have full node interconnectivity.

Consul docs with more information:
https://www.consul.io/docs/connect/gateways/mesh-gateway

The following group level service block can be used to establish
a Connect mesh gateway.

service {
  connect {
    gateway {
      mesh {
        // no configuration
      }
    }
  }
}

Services can make use of a mesh gateway by configuring so in their
upstream blocks, e.g.

service {
  connect {
    sidecar_service {
      proxy {
        upstreams {
          destination_name = "<service>"
          local_bind_port  = <port>
          datacenter       = "<datacenter>"
          mesh_gateway {
            mode = "<mode>"
          }
        }
      }
    }
  }
}

Typical use of a mesh gateway is to create a bridge between datacenters.
A mesh gateway should then be configured with a service port that is
mapped from a host_network configured on a WAN interface in Nomad agent
config, e.g.

client {
  host_network "public" {
    interface = "eth1"
  }
}

Create a port mapping in the group.network block for use by the mesh
gateway service from the public host_network, e.g.

network {
  mode = "bridge"
  port "mesh_wan" {
    host_network = "public"
  }
}

Use this port label for the service.port of the mesh gateway, e.g.

service {
  name = "mesh-gateway"
  port = "mesh_wan"
  connect {
    gateway {
      mesh {}
    }
  }
}

Currently Envoy is the only supported gateway implementation in Consul.
By default Nomad client will run the latest official Envoy docker image
supported by the local Consul agent. The Envoy task can be customized
by setting `meta.connect.gateway_image` in agent config or by setting
the `connect.sidecar_task` block.

Gateways require Consul 1.8.0+, enforced by the Nomad scheduler.

Closes #9446
2021-06-04 08:24:49 -05:00
Tim Gross 99380aa3f0 docs: clarify default check.initial_status behavior 2021-06-03 10:02:25 -04:00
James Rasell 99128e8601
docs: fix jobspec hcl2 locals example. 2021-05-21 15:20:46 +02:00
Ahmed 8d41e22405 Update service.mdx 2021-05-17 15:41:50 -04:00
Mahmood Ali abf6418976
add a section about memory oversubscription (#10573)
add a section about memory oversubscription

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-05-13 13:35:51 -04:00
Mike Noordermeer 2445bece66
docs: clarify that a default update strategy is used when update strategy is omitted 2021-05-10 08:27:22 -04:00
Tim Gross 1fdb4c1511 documentation for disable_default_tcp_check 2021-05-07 13:16:39 -04:00
Mahmood Ali 488cd1e336 annotate 1.1 beta fields 2021-05-07 10:21:16 -04:00
Nick Ethier 42bbe6978d website: reserved cores docs 2021-05-05 08:11:41 -04:00
Andy Assareh 1616f80211 git example - suggest providing real repo
when troubleshooting it is better if this command will actually work (pointing to a real repository)
2021-05-03 08:12:10 -04:00
Mahmood Ali ba49661198
Docs memory oversubscription (#10478)
* update docs

* document memory_oversubscription_enabled scheduler config
2021-04-30 14:07:56 -04:00
Nick Spain 04e3132b4b Document usage of 'body' field 2021-04-13 09:15:35 -04:00
Tim Gross fd3d443863 docs: clean up explanation of volume per_alloc 2021-04-09 11:32:00 -04:00
Tim Gross 0892d34ff9 CSI: capability block is required for volume registration 2021-04-08 13:02:24 -04:00
Tim Gross 70f5363a89 docs: update CSI create/register fields
Add new `access_mode`/`attachment_mode` fields. Make it more clear which set
of fields belong to create vs register. Update the example spec that's
generated by `volume init`.
2021-04-07 11:24:09 -04:00
Seth Hoenig f17ba33f61 consul: plubming for specifying consul namespace in job/group
This PR adds the common OSS changes for adding support for Consul Namespaces,
which is going to be a Nomad Enterprise feature. There is no new functionality
provided by this changeset and hopefully no new bugs.
2021-04-05 10:03:19 -06:00
Bryce Kalow a6ca40fa4e
feat(website): migrates to new nav data format (#10264) 2021-03-31 08:43:17 -05:00
Tim Gross fa25e048b2
CSI: unique volume per allocation
Add a `PerAlloc` field to volume requests that directs the scheduler to test
feasibility for volumes with a source ID that includes the allocation index
suffix (ex. `[0]`), rather than the exact source ID.

Read the `PerAlloc` field when making the volume claim at the client to
determine if the allocation index suffix (ex. `[0]`) should be added to the
volume source ID.
2021-03-18 15:35:11 -04:00
Tim Gross 0b467a1e44 docs: clarify HCL is parsed in CLI 2021-03-15 15:41:43 -04:00
Luiz Aoqui 944887a684
docs: add ports to jobspec overview example 2021-03-12 13:36:39 -05:00
Tim Gross 4c4ea23e6f docs/artifact: clarify git:: prefix usage for private repos 2021-03-08 10:55:32 -05:00
Andre Ilhicas f45fc6c899
consul/connect: enable setting local_bind_address in upstream 2021-02-26 11:37:31 +00:00
Drew Bailey 86d9e1ff90
Merge pull request #9955 from hashicorp/on-update-services
Service and Check on_update configuration option (readiness checks)
2021-02-24 10:11:05 -05:00
Tim Gross b764f52ab9
deploymentwatcher: reset progress deadline on promotion (#10042)
In a deployment with two groups (ex. A and B), if group A's canary becomes
healthy before group B's, the deadline for the overall deployment will be set
to that of group A. When the deployment is promoted, if group A is done it
will not contribute to the next deadline cutoff. Group B's old deadline will
be used instead, which will be in the past and immediately trigger a
deployment progress failure. Reset the progress deadline when the job is
promotion to avoid this bug, and to better conform with implicit user
expectations around how the progress deadline should interact with promotions.
2021-02-22 16:44:03 -05:00