Commit graph

174 commits

Author SHA1 Message Date
Seth Hoenig 3c17dc2ecc api: safely access legacy MBits field 2020-11-23 10:36:10 -06:00
Seth Hoenig bb8a5816a0 jobspec: add support for headers in artifact stanza
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.

The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.

Closes #9306
2020-11-13 12:03:54 -06:00
Mahmood Ali 69849a42a5
Merge pull request #9298 from hashicorp/f-hcl2-localsvars
HCL2: Variables and Locals
2020-11-09 16:44:37 -05:00
Mahmood Ali 1ae3e8a8eb Start using the new jobspec2 API 2020-11-09 15:01:31 -05:00
Luiz Aoqui c536286c7a
remove Namespace field from JobScaleRequest 2020-11-09 13:02:05 -05:00
Chris Baker 719077a26d added new policy capabilities for recommendations API
state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type
2020-10-28 14:32:16 +00:00
Mahmood Ali f52bda4c30 api: update /render api to parse hclv2 2020-10-21 15:46:57 -04:00
Tim Gross b77fe023b5
MRD: move 'job stop -global' handling into RPC (#8776)
The initial implementation of global job stop for MRD looped over all the
regions in the CLI for expedience. This changeset includes the OSS parts of
moving this into the RPC layer so that API consumers don't have to implement
this logic themselves.
2020-08-28 14:28:13 -04:00
Seth Hoenig dfe179abc5 consul/connect: fixup some comments and context timeout 2020-08-26 13:17:16 -05:00
Seth Hoenig c4fa644315 consul/connect: remove envoy dns option from gateway proxy config 2020-08-24 09:11:55 -05:00
Seth Hoenig 5b072029f2 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
Tim Gross 22e77bb03c
mrd: remove redundant validation in HTTP endpoint (#8685)
The `regionForJob` function in the HTTP job endpoint overrides the region for
multiregion jobs to `global`, which is used as a sentinel value in the
server's job endpoint to avoid re-registration loops. This changeset removes
an extraneous check that results in errors in the web UI and makes
round-tripping through the HTTP API cumbersome for all consumers.
2020-08-18 16:48:09 -04:00
Seth Hoenig fd4804bf26 consul: able to set pass/fail thresholds on consul service checks
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
  https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical

Nomad doesn't do much besides pass the fields through to Consul.

Fixes #6913
2020-08-10 14:08:09 -05:00
Drew Bailey b296558b8e
oss compoments for multi-vault namespaces
adds in oss components to support enterprise multi-vault namespace feature

upgrade specific doc on vault multi-namespaces

vault docs

update test to reflect new error
2020-07-24 10:14:59 -04:00
James Rasell 2da8bd8f58
agent: task groups in system jobs do not support scaling stanzas. 2020-07-22 11:10:59 +02:00
Tim Gross 18250f71fd
fix region flag vs job region handling in plan/submit (#8347) 2020-07-06 15:46:09 -04:00
Tim Gross a449009e9f
multiregion validation fixes (#8265)
Multi-region jobs need to bypass validating counts otherwise we get spurious
warnings in Job.Plan.
2020-06-24 12:18:51 -04:00
Seth Hoenig 6c5ab7f45e consul/connect: split connect native flag and task in service 2020-06-23 10:22:22 -05:00
Nick Ethier f0559a8162
multi-interface network support 2020-06-19 09:42:10 -04:00
Mahmood Ali 38a01c050e
Merge pull request #8192 from hashicorp/f-status-allnamespaces-2
CLI Allow querying all namespaces for jobs and allocations - Try 2
2020-06-18 20:16:52 -04:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Mahmood Ali e784fe331a use '*' to indicate all namespaces
This reverts the introduction of AllNamespaces parameter that was merged
earlier but never got released.
2020-06-17 16:27:43 -04:00
Tim Gross 7b12445f29 multiregion: change AutoRevert to OnFailure 2020-06-17 11:05:45 -04:00
Tim Gross b09b7a2475 Multiregion job registration
Integration points for multiregion jobs to be registered in the enterprise
version of Nomad:
* hook in `Job.Register` for enterprise to send job to peer regions
* remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs
2020-06-17 11:04:58 -04:00
Drew Bailey 9263fcb0d3 Multiregion deploy status and job status CLI 2020-06-17 11:03:34 -04:00
Tim Gross 6851024925 Multiregion structs
Initial struct definitions, jobspec parsing, validation, and conversion
between Nomad structs and API structs for multi-region deployments.
2020-06-17 11:00:14 -04:00
Chris Baker 1e3563e08c wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register 2020-06-16 18:45:17 +00:00
Mahmood Ali 5ab2d52e27 endpoint to expose all jobs across all namespaces
Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all
namespaces.  The returned list is to contain a `Namespace` field
indicating the job namespace.

If ACL is enabled, the request token needs to be a management token or
have `namespace:list-jobs` capability on all existing namespaces.
2020-05-18 13:50:46 -04:00
Lang Martin d3c4700cd3
server: stop after client disconnect (#7939)
* jobspec, api: add stop_after_client_disconnect

* nomad/state/state_store: error message typo

* structs: alloc methods to support stop_after_client_disconnect

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math

* scheduler/reconcile_util: delayByStopAfterClientDisconnect

* scheduler/reconcile: use delayByStopAfterClientDisconnect

* scheduler/util: updateNonTerminalAllocsToLost comments

This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now

* scheduler/testing: AssertEvalStatus just use require

* scheduler/generic_sched: don't create a blocked eval if delayed

* scheduler/generic_sched_test: several scheduling cases
2020-05-13 16:39:04 -04:00
Chris Baker 40d6b3bbd1 adding raft and state_store support to track job scaling events
updated ScalingEvent API to record "message string,error bool" instead
of confusing "reason,error *string"
2020-04-01 16:15:14 +00:00
Seth Hoenig 14c7cebdea connect: enable automatic expose paths for individual group service checks
Part of #6120

Building on the support for enabling connect proxy paths in #7323, this change
adds the ability to configure the 'service.check.expose' flag on group-level
service check definitions for services that are connect-enabled. This is a slight
deviation from the "magic" that Consul provides. With Consul, the 'expose' flag
exists on the connect.proxy stanza, which will then auto-generate expose paths
for every HTTP and gRPC service check associated with that connect-enabled
service.

A first attempt at providing similar magic for Nomad's Consul Connect integration
followed that pattern exactly, as seen in #7396. However, on reviewing the PR
we realized having the `expose` flag on the proxy stanza inseperably ties together
the automatic path generation with every HTTP/gRPC defined on the service. This
makes sense in Consul's context, because a service definition is reasonably
associated with a single "task". With Nomad's group level service definitions
however, there is a reasonable expectation that a service definition is more
abstractly representative of multiple services within the task group. In this
case, one would want to define checks of that service which concretely make HTTP
or gRPC requests to different underlying tasks. Such a model is not possible
with the course `proxy.expose` flag.

Instead, we now have the flag made available within the check definitions themselves.
By making the expose feature resolute to each check, it is possible to have
some HTTP/gRPC checks which make use of the envoy exposed paths, as well as
some HTTP/gRPC checks which make use of some orthongonal port-mapping to do
checks on some other task (or even some other bound port of the same task)
within the task group.

Given this example,

group "server-group" {
  network {
    mode = "bridge"
    port "forchecks" {
      to = -1
    }
  }

  service {
    name = "myserver"
    port = 2000

    connect {
      sidecar_service {
      }
    }

    check {
      name     = "mycheck-myserver"
      type     = "http"
      port     = "forchecks"
      interval = "3s"
      timeout  = "2s"
      method   = "GET"
      path     = "/classic/responder/health"
      expose   = true
    }
  }
}

Nomad will automatically inject (via job endpoint mutator) the
extrapolated expose path configuration, i.e.

expose {
  path {
    path            = "/classic/responder/health"
    protocol        = "http"
    local_path_port = 2000
    listener_port   = "forchecks"
  }
}

Documentation is coming in #7440 (needs updating, doing next)

Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6
which will make the examples in the documentation actually runnable.

Will add some e2e tests based on the above when it becomes available.
2020-03-31 17:15:50 -06:00
Seth Hoenig 41244c5857 jobspec: parse multi expose.path instead of explicit slice 2020-03-31 17:15:27 -06:00
Seth Hoenig 0266f056b8 connect: enable proxy.passthrough configuration
Enable configuration of HTTP and gRPC endpoints which should be exposed by
the Connect sidecar proxy. This changeset is the first "non-magical" pass
that lays the groundwork for enabling Consul service checks for tasks
running in a network namespace because they are Connect-enabled. The changes
here provide for full configuration of the

  connect {
    sidecar_service {
      proxy {
        expose {
          paths = [{
		path = <exposed endpoint>
                protocol = <http or grpc>
                local_path_port = <local endpoint port>
                listener_port = <inbound mesh port>
	  }, ... ]
       }
    }
  }

stanza. Everything from `expose` and below is new, and partially implements
the precedent set by Consul:
  https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference

Combined with a task-group level network port-mapping in the form:

  port "exposeExample" { to = -1 }

it is now possible to "punch a hole" through the network namespace
to a specific HTTP or gRPC path, with the anticipated use case of creating
Consul checks on Connect enabled services.

A future PR may introduce more automagic behavior, where we can do things like

1) auto-fill the 'expose.path.local_path_port' with the default value of the
   'service.port' value for task-group level connect-enabled services.

2) automatically generate a port-mapping

3) enable an 'expose.checks' flag which automatically creates exposed endpoints
   for every compatible consul service check (http/grpc checks on connect
   enabled services).
2020-03-31 17:15:27 -06:00
Mahmood Ali ceed57b48f per-task restart policy 2020-03-24 17:00:41 -04:00
Chris Baker 233db5258a changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults.
- tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent)
- Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max

modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and
2020-03-24 13:57:17 +00:00
Chris Baker 00092a6c29 fixed http endpoints for job.register and job.scalestatus 2020-03-24 13:57:16 +00:00
Chris Baker 925b59e1d2 wip: scaling status return, almost done 2020-03-24 13:57:15 +00:00
Chris Baker 42270d862c wip: some tests still failing
updating job scaling endpoints to match RFC, cleaning up the API object as well
2020-03-24 13:57:14 +00:00
Chris Baker abc7a52f56 finished refactoring state store, schema, etc 2020-03-24 13:57:14 +00:00
Chris Baker 3d54f1feba wip: added Enabled to ScalingPolicyListStub, removed JobID from body of scaling request 2020-03-24 13:57:12 +00:00
Chris Baker 024d203267 wip: added tests for client methods around group scaling 2020-03-24 13:57:11 +00:00
Chris Baker 1c5c2eb71b wip: add GET endpoint for job group scaling target 2020-03-24 13:57:10 +00:00
Chris Baker 179ab68258 wip: added job.scale rpc endpoint, needs explicit test (tested via http now) 2020-03-24 13:57:09 +00:00
Chris Baker 8453e667c2 wip: working on job group scaling endpoint 2020-03-24 13:55:20 +00:00
Chris Baker 6665d0bfb0 wip: added policy get endpoint, added UUID to policy 2020-03-24 13:55:20 +00:00
Chris Baker 9c2560ceeb wip: upsert/delete scaling policies on job upsert/delete 2020-03-24 13:55:18 +00:00
Chris Baker 65d92f1fbf WIP: adding ScalingPolicy to api/structs and state store 2020-03-24 13:55:18 +00:00
Lang Martin e100444740 csi: add mount_options to volumes and volume requests (#7398)
Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host.

Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context.

closes #7007

* nomad/structs/volumes: add MountOptions to volume request

* jobspec/test-fixtures/basic.hcl: add mount_options to volume block

* jobspec/parse_test: add expected MountOptions

* api/tasks: add mount_options

* jobspec/parse_group: use hcl decode not mapstructure, mount_options

* client/allocrunner/csi_hook: pass MountOptions through

client/allocrunner/csi_hook: add a VolumeMountOptions

client/allocrunner/csi_hook: drop Options

client/allocrunner/csi_hook: use the structs options

* client/pluginmanager/csimanager/interface: UsageOptions.MountOptions

* client/pluginmanager/csimanager/volume: pass MountOptions in capabilities

* plugins/csi/plugin: remove todo 7007 comment

* nomad/structs/csi: MountOptions

* api/csi: add options to the api for parsing, match structs

* plugins/csi/plugin: move VolumeMountOptions to structs

* api/csi: use specific type for mount_options

* client/allocrunner/csi_hook: merge MountOptions here

* rename CSIOptions to CSIMountOptions

* client/allocrunner/csi_hook

* client/pluginmanager/csimanager/volume

* nomad/structs/csi

* plugins/csi/fake/client: add PrevVolumeCapability

* plugins/csi/plugin

* client/pluginmanager/csimanager/volume_test: remove debugging

* client/pluginmanager/csimanager/volume: fix odd merging logic

* api: rename CSIOptions -> CSIMountOptions

* nomad/csi_endpoint: remove a 7007 comment

* command/alloc_status: show mount options in the volume list

* nomad/structs/csi: include MountOptions in the volume stub

* api/csi: add MountOptions to stub

* command/volume_status_csi: clean up csiVolMountOption, add it

* command/alloc_status: csiVolMountOption lives in volume_csi_status

* command/node_status: display mount flags

* nomad/structs/volumes: npe

* plugins/csi/plugin: npe in ToCSIRepresentation

* jobspec/parse_test: expand volume parse test cases

* command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions

* command/volume_status_csi: copy paste error

* jobspec/test-fixtures/basic: hclfmt

* command/volume_status_csi: clean up csiVolMountOption
2020-03-23 13:59:25 -04:00
Danielle Lancashire 15c6c05ccf api: Parse CSI Volumes
Previously when deserializing volumes we skipped over volumes that were
not of type `host`. This commit ensures that we parse both host and csi
volumes correctly.
2020-03-23 13:58:30 -04:00
Danielle Lancashire 426c26d7c0 CSI Plugin Registration (#6555)
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:

* A `csi_plugin` stanza as part of a Nomad task configuration, to
  allow a task to expose that it is a plugin.

* A new task runner hook: `csi_plugin_supervisor`. This hook does two
  things. When the `csi_plugin` stanza is detected, it will
  automatically configure the plugin task to receive bidirectional
  mounts to the CSI intermediary directory. At runtime, it will then
  perform an initial heartbeat of the plugin and handle submitting it to
  the new `dynamicplugins.Registry` for further use by the client, and
  then run a lightweight heartbeat loop that will emit task events
  when health changes.

* The `dynamicplugins.Registry` for handling plugins that run
  as Nomad tasks, in contrast to the existing catalog that requires
  `go-plugin` type plugins and to know the plugin configuration in
  advance.

* The `csimanager` which fingerprints CSI plugins, in a similar way to
  `drivermanager` and `devicemanager`. It currently only fingerprints
  the NodeID from the plugin, and assumes that all plugins are
  monolithic.

Missing features

* We do not use the live updates of the `dynamicplugin` registry in
  the `csimanager` yet.

* We do not deregister the plugins from the client when they shutdown
  yet, they just become indefinitely marked as unhealthy. This is
  deliberate until we figure out how we should manage deploying new
  versions of plugins/transitioning them.
2020-03-23 13:58:28 -04:00