Commit Graph

3202 Commits

Author SHA1 Message Date
Mahmood Ali 37c0dbcfe6 fix codegen for ugorji/go
When generating ugorji/go package, we should use
github.com/hashicorp/go-msgpack/codec instead.

Also fix the reference for codegen_generated
2020-03-31 21:30:21 -04:00
Seth Hoenig 9880e798bf docs: note why check.Expose is not part of chech.Hash 2020-03-31 17:15:50 -06:00
Seth Hoenig 14c7cebdea connect: enable automatic expose paths for individual group service checks
Part of #6120

Building on the support for enabling connect proxy paths in #7323, this change
adds the ability to configure the 'service.check.expose' flag on group-level
service check definitions for services that are connect-enabled. This is a slight
deviation from the "magic" that Consul provides. With Consul, the 'expose' flag
exists on the connect.proxy stanza, which will then auto-generate expose paths
for every HTTP and gRPC service check associated with that connect-enabled
service.

A first attempt at providing similar magic for Nomad's Consul Connect integration
followed that pattern exactly, as seen in #7396. However, on reviewing the PR
we realized having the `expose` flag on the proxy stanza inseperably ties together
the automatic path generation with every HTTP/gRPC defined on the service. This
makes sense in Consul's context, because a service definition is reasonably
associated with a single "task". With Nomad's group level service definitions
however, there is a reasonable expectation that a service definition is more
abstractly representative of multiple services within the task group. In this
case, one would want to define checks of that service which concretely make HTTP
or gRPC requests to different underlying tasks. Such a model is not possible
with the course `proxy.expose` flag.

Instead, we now have the flag made available within the check definitions themselves.
By making the expose feature resolute to each check, it is possible to have
some HTTP/gRPC checks which make use of the envoy exposed paths, as well as
some HTTP/gRPC checks which make use of some orthongonal port-mapping to do
checks on some other task (or even some other bound port of the same task)
within the task group.

Given this example,

group "server-group" {
  network {
    mode = "bridge"
    port "forchecks" {
      to = -1
    }
  }

  service {
    name = "myserver"
    port = 2000

    connect {
      sidecar_service {
      }
    }

    check {
      name     = "mycheck-myserver"
      type     = "http"
      port     = "forchecks"
      interval = "3s"
      timeout  = "2s"
      method   = "GET"
      path     = "/classic/responder/health"
      expose   = true
    }
  }
}

Nomad will automatically inject (via job endpoint mutator) the
extrapolated expose path configuration, i.e.

expose {
  path {
    path            = "/classic/responder/health"
    protocol        = "http"
    local_path_port = 2000
    listener_port   = "forchecks"
  }
}

Documentation is coming in #7440 (needs updating, doing next)

Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6
which will make the examples in the documentation actually runnable.

Will add some e2e tests based on the above when it becomes available.
2020-03-31 17:15:50 -06:00
Seth Hoenig 0266f056b8 connect: enable proxy.passthrough configuration
Enable configuration of HTTP and gRPC endpoints which should be exposed by
the Connect sidecar proxy. This changeset is the first "non-magical" pass
that lays the groundwork for enabling Consul service checks for tasks
running in a network namespace because they are Connect-enabled. The changes
here provide for full configuration of the

  connect {
    sidecar_service {
      proxy {
        expose {
          paths = [{
		path = <exposed endpoint>
                protocol = <http or grpc>
                local_path_port = <local endpoint port>
                listener_port = <inbound mesh port>
	  }, ... ]
       }
    }
  }

stanza. Everything from `expose` and below is new, and partially implements
the precedent set by Consul:
  https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference

Combined with a task-group level network port-mapping in the form:

  port "exposeExample" { to = -1 }

it is now possible to "punch a hole" through the network namespace
to a specific HTTP or gRPC path, with the anticipated use case of creating
Consul checks on Connect enabled services.

A future PR may introduce more automagic behavior, where we can do things like

1) auto-fill the 'expose.path.local_path_port' with the default value of the
   'service.port' value for task-group level connect-enabled services.

2) automatically generate a port-mapping

3) enable an 'expose.checks' flag which automatically creates exposed endpoints
   for every compatible consul service check (http/grpc checks on connect
   enabled services).
2020-03-31 17:15:27 -06:00
Lang Martin e03c328792
csi: use node MaxVolumes during scheduling (#7565)
* nomad/state/state_store: CSIVolumesByNodeID ignores namespace

* scheduler/scheduler: add CSIVolumesByNodeID to the state interface

* scheduler/feasible: check node MaxVolumes

* nomad/csi_endpoint: no namespace inn CSIVolumesByNodeID anymore

* nomad/state/state_store: avoid DenormalizeAllocationSlice

* nomad/state/iterator: clean up SliceIterator Next

* scheduler/feasible_test: block with MaxVolumes

* nomad/state/state_store_test: fix args to CSIVolumesByNodeID
2020-03-31 17:16:47 -04:00
Lang Martin 8d4f39fba1
csi: add node events to report progress mounting and unmounting volumes (#7547)
* nomad/structs/structs: new NodeEventSubsystemCSI

* client/client: pass triggerNodeEvent in the CSIConfig

* client/pluginmanager/csimanager/instance: add eventer to instanceManager

* client/pluginmanager/csimanager/manager: pass triggerNodeEvent

* client/pluginmanager/csimanager/volume: node event on [un]mount

* nomad/structs/structs: use storage, not CSI

* client/pluginmanager/csimanager/volume: use storage, not CSI

* client/pluginmanager/csimanager/volume_test: eventer

* client/pluginmanager/csimanager/volume: event on error

* client/pluginmanager/csimanager/volume_test: check event on error

* command/node_status: remove an extra space in event detail format

* client/pluginmanager/csimanager/volume: use snake_case for details

* client/pluginmanager/csimanager/volume_test: snake_case details
2020-03-31 17:13:52 -04:00
Yoan Blanc 225c9c1215 fixup! vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:48:07 -04:00
Yoan Blanc 761d014071 vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:45:21 -04:00
Michael Schurter 464dae514c test: assert HostVolumes included in ListNodes 2020-03-30 17:34:44 -07:00
Michael Lange 4707a625d6 Add HostVolumes to the NodeListStub 2020-03-30 17:33:43 -07:00
Seth Hoenig b3664c628c
Merge pull request #7524 from hashicorp/docs-consul-acl-minimums
consul: annotate Consul interfaces with ACLs
2020-03-30 13:27:27 -06:00
Seth Hoenig 0a812ab689 consul: annotate Consul interfaces with ACLs 2020-03-30 10:17:28 -06:00
Tim Gross 54b3573fc9
state: support snapshot of CSI plugin and volume tables (#7546)
The `csi_plugins` and `csi_volumes` tables were missing support for
snapshot persist and restore. This means restoring a snapshot would
result in missing information for CSI.
2020-03-30 11:17:16 -04:00
Drew Bailey a98dc8c768
update audit examples to an endpoint that is audited 2020-03-30 10:03:11 -04:00
Mahmood Ali e76ff9f679
Merge pull request #7543 from hashicorp/test-flakiness-20200330_1
Test flakiness fixes - 2020-03-30 Edition
2020-03-30 09:26:26 -04:00
Mahmood Ali 57bebfdb5c tests: avoid logging after test completion 2020-03-30 09:08:34 -04:00
Mahmood Ali 13381448e0 avoid logging in draining job watcher
In tests where the logger is a test logger, emitting a trace log in a
background thread while it's shutting down may trigger a panic.  Thus
avoid logging Trace if err != nil.  Note that we already log an error
when err isn't a trace.

This fixes cases where tests panic with a trace like:

```
panic: Log in goroutine after TestAllocGarbageCollector_MakeRoomFor_MaxAllocs has completed

goroutine 30 [running]:
testing.(*common).logDepth(0xc000aa9e60, 0xc000c4a000, 0xab, 0x3)
        /usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:680 +0x4d3
testing.(*common).log(...)
        /usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:662
testing.(*common).Logf(0xc000aa9e60, 0x690b941, 0x4, 0xc001366c00, 0x2, 0x2)
        /usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:701 +0x7e
github.com/hashicorp/nomad/helper/testlog.(*writer).Write(0xc000a82a60, 0xc0000b48c0, 0xab, 0x13f, 0x0, 0x0, 0x0)
        /Users/notnoop/go/src/github.com/hashicorp/nomad/helper/testlog/testlog.go:34 +0x106
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*writer).Flush(0xc000a80900, 0xbf9870f000000001, 0x20a87556e, 0x8b12bc0)
        /Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/writer.go:29 +0x14f
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*intLogger).log(0xc000e2c180, 0xc0003b6880, 0x17, 0x1, 0x6974edc, 0x22, 0xc000db57a0, 0x6, 0x6)
        /Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/intlogger.go:139 +0x15d
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*intLogger).Trace(0xc000e2c180, 0x6974edc, 0x22, 0xc000db57a0, 0x6, 0x6)
        /Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/intlogger.go:446 +0x7a
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*interceptLogger).Trace(0xc0002f1ad0, 0x6974edc, 0x22, 0xc000db57a0, 0x6, 0x6)
        /Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/interceptlogger.go:48 +0x9c
github.com/hashicorp/nomad/nomad/drainer.(*drainingJobWatcher).watch(0xc0002f2380)
        /Users/notnoop/go/src/github.com/hashicorp/nomad/nomad/drainer/watch_jobs.go:147 +0x1125
created by github.com/hashicorp/nomad/nomad/drainer.NewDrainingJobWatcher
        /Users/notnoop/go/src/github.com/hashicorp/nomad/nomad/drainer/watch_jobs.go:89 +0x1e3
FAIL    github.com/hashicorp/nomad/client       10.605s
FAIL
```
2020-03-30 07:06:53 -04:00
Mahmood Ali 36ad8ee2e0 tests: add debugging for TestAutopilot_RollingUpdate 2020-03-30 07:06:53 -04:00
Chris Baker d6287c43b9 clean up some tests 2020-03-29 23:38:36 +00:00
Chris Baker 5e3c38be2f state_store:
* added method to retrieve all scaling policies for use in snapshotting, plus test
* better testing for ScalingPoliciesByNamespace
* added scaling policy snapshot persist and restore (and test of restore)

manually tested snapshot restore.

resolves #7539
2020-03-29 13:32:44 +00:00
Lang Martin 50ff9ccd44
csi: plugin deregistration on plugin job GC (#7502)
* nomad/structs/csi: delete just one plugin type from a node

* nomad/structs/csi: add DeleteAlloc

* nomad/state/state_store: add deleteJobFromPlugin

* nomad/state/state_store: use DeleteAlloc not DeleteNodeType

* move CreateTestCSIPlugin to state to avoid an import cycle

* nomad/state/state_store_test: delete a plugin by deleting its jobs

* nomad/*_test: move CreateTestCSIPlugin to state

* nomad/state/state_store: update one plugin per transaction

* command/plugin_status_test: move CreateTestCSIPlugin

* nomad: csi: handle nils CSIPlugin methods, clarity
2020-03-26 17:07:18 -04:00
Lang Martin 3375c92aa0
csi: make volume registration idempotent (#7490)
If not in use and not changing external ids, it should not be an error to register a volume again.

* nomad/state/state_store: make volume registration idempotent
2020-03-26 12:27:19 -04:00
Lang Martin ea80330aaa
csi: nomad/structs: test volume denormalize without plugin (#7472) 2020-03-26 09:43:59 -04:00
Mahmood Ali b33dbe539b tests: TestCSIPluginEndpoint_ACLNamespaceAlloc is ent
TestCSIPluginEndpoint_ACLNamespaceAlloc uses namespace features not
present in OSS.
2020-03-25 08:45:44 -04:00
Mahmood Ali 281fc9837c tests: relax index checks
TestStateStore_Indexes specifically tests for `nodes` index, but asserts
on the exact number of indexes present in the state.  This is fragile
and will break almost everytime we add a state index.
2020-03-25 08:45:38 -04:00
Mahmood Ali ceed57b48f per-task restart policy 2020-03-24 17:00:41 -04:00
Chris Baker ffd79583f6
Merge pull request #7474 from hashicorp/f-scaling-changes-from-review
more testing for scaling API
2020-03-24 15:32:10 -05:00
Chris Baker c638c2c352 update RPC scaling endpoint tests to use renamed 'scale' policy disposition 2020-03-24 20:18:12 +00:00
Chris Baker 5979d6a81e more testing for ScalingPolicy, mainly around parsing and canonicalization for Min/Max 2020-03-24 19:43:50 +00:00
Chris Baker aa5beafe64 Job.Scale should not result in job update or eval create if args.Count == nil
plus tests
2020-03-24 17:36:06 +00:00
Tim Gross 913da68296
csi: remove client from plugin on client node update (#7462)
Plugins track the client nodes where they are placed. On client
updates, remove the client from the plugin tracking if the client is
no longer running an instance of that controller/node plugin.

Extends the state store tests to ensure deregistration works as
expected and that controllers and nodes are being tracked
independently.
2020-03-24 13:26:31 -04:00
Chris Baker 9e530e167d
Merge pull request #7409 from hashicorp/scaling-api
Scaling API changes
2020-03-24 11:02:09 -05:00
Chris Baker 606c79b320 add acl validation to Scaling.ListPolicies and Scaling.GetPolicy 2020-03-24 14:39:05 +00:00
Chris Baker f6ec5f9624 made count optional during job scaling actions
added ACL protection in Job.Scale
in Job.Scale, only perform a Job.Register if the Count was non-nil
2020-03-24 14:39:05 +00:00
Chris Baker 41b002eecc wip: ACL checking for RPC Job.ScaleStatus 2020-03-24 14:39:05 +00:00
Lang Martin bd22afd003
csi: volume deregister fails for volumes actively in use (#7445)
* nomad/structs/csi: add InUse to CSIVolume

* nomad/state/state_store: block volume deregistration for in use vols
2020-03-24 10:10:44 -04:00
Chris Baker 233db5258a changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults.
- tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent)
- Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max

modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and
2020-03-24 13:57:17 +00:00
Chris Baker f9876a487e finished Job.ScaleStatus RPC, need to work on http endpoint 2020-03-24 13:57:16 +00:00
Chris Baker 925b59e1d2 wip: scaling status return, almost done 2020-03-24 13:57:15 +00:00
James Rasell f125b5fb2d scaling: ensure min and max int64s are in toplevel of block. 2020-03-24 13:57:15 +00:00
Chris Baker 42270d862c wip: some tests still failing
updating job scaling endpoints to match RFC, cleaning up the API object as well
2020-03-24 13:57:14 +00:00
Chris Baker abc7a52f56 finished refactoring state store, schema, etc 2020-03-24 13:57:14 +00:00
Chris Baker 116aa98ed7 wip: removed some commented junk from scaling poc 2020-03-24 13:57:13 +00:00
Chris Baker 3d54f1feba wip: added Enabled to ScalingPolicyListStub, removed JobID from body of scaling request 2020-03-24 13:57:12 +00:00
Chris Baker 024d203267 wip: added tests for client methods around group scaling 2020-03-24 13:57:11 +00:00
Chris Baker 179ab68258 wip: added job.scale rpc endpoint, needs explicit test (tested via http now) 2020-03-24 13:57:09 +00:00
Chris Baker 8453e667c2 wip: working on job group scaling endpoint 2020-03-24 13:55:20 +00:00
Chris Baker 6665d0bfb0 wip: added policy get endpoint, added UUID to policy 2020-03-24 13:55:20 +00:00
Chris Baker 9c2560ceeb wip: upsert/delete scaling policies on job upsert/delete 2020-03-24 13:55:18 +00:00
Chris Baker 65d92f1fbf WIP: adding ScalingPolicy to api/structs and state store 2020-03-24 13:55:18 +00:00