Commit Graph

242 Commits

Author SHA1 Message Date
Mahmood Ali 2c73552b4d
pool: track usage of incoming streams (#10710)
Track usage of incoming streams on a connection. Connections without
reference counts get marked as unused and reaped in a periodic job.

This fixes a bug where `alloc exec` and `alloc fs` sessions get terminated
unexpectedly. Previously, when a client heartbeats switches between
servers, the pool connection reaper eventually identifies the connection
as unused and closes it even if it has an active exec/fs sessions.

Fixes #10579
2021-06-07 10:22:37 -04:00
Seth Hoenig d026ff1f66 consul/connect: add support for connect mesh gateways
This PR implements first-class support for Nomad running Consul
Connect Mesh Gateways. Mesh gateways enable services in the Connect
mesh to make cross-DC connections via gateways, where each datacenter
may not have full node interconnectivity.

Consul docs with more information:
https://www.consul.io/docs/connect/gateways/mesh-gateway

The following group level service block can be used to establish
a Connect mesh gateway.

service {
  connect {
    gateway {
      mesh {
        // no configuration
      }
    }
  }
}

Services can make use of a mesh gateway by configuring so in their
upstream blocks, e.g.

service {
  connect {
    sidecar_service {
      proxy {
        upstreams {
          destination_name = "<service>"
          local_bind_port  = <port>
          datacenter       = "<datacenter>"
          mesh_gateway {
            mode = "<mode>"
          }
        }
      }
    }
  }
}

Typical use of a mesh gateway is to create a bridge between datacenters.
A mesh gateway should then be configured with a service port that is
mapped from a host_network configured on a WAN interface in Nomad agent
config, e.g.

client {
  host_network "public" {
    interface = "eth1"
  }
}

Create a port mapping in the group.network block for use by the mesh
gateway service from the public host_network, e.g.

network {
  mode = "bridge"
  port "mesh_wan" {
    host_network = "public"
  }
}

Use this port label for the service.port of the mesh gateway, e.g.

service {
  name = "mesh-gateway"
  port = "mesh_wan"
  connect {
    gateway {
      mesh {}
    }
  }
}

Currently Envoy is the only supported gateway implementation in Consul.
By default Nomad client will run the latest official Envoy docker image
supported by the local Consul agent. The Envoy task can be customized
by setting `meta.connect.gateway_image` in agent config or by setting
the `connect.sidecar_task` block.

Gateways require Consul 1.8.0+, enforced by the Nomad scheduler.

Closes #9446
2021-06-04 08:24:49 -05:00
Seth Hoenig 238ac718f2 connect: use exp backoff when waiting on consul envoy bootstrap
This PR wraps the use of the consul envoy bootstrap command in
an expoenential backoff closure, configured to timeout after 60
seconds. This is an increase over the current behavior of making
3 attempts over 6 seconds.

Should help with #10451
2021-04-27 09:21:50 -06:00
Chris Baker 21bc48ca29 json handles were moved to a new package in #10202
this was unecessary after refactoring, so this moves them back to their
original location in package structs
2021-04-02 13:31:10 +00:00
Chris Baker 436d46bd19
Merge branch 'main' into f-node-drain-api 2021-04-01 15:22:57 -05:00
Tim Gross f820021f9e deps: bump gopsutil to v3.21.2 2021-03-30 16:02:51 -04:00
Chris Baker 770c9cecb5 restored Node.Sanitize() for RPC endpoints
multiple other updates from code review
2021-03-26 17:03:15 +00:00
Chris Baker a186badf35 moved JSON handlers and extension code around a bit for proper order of
initialization
2021-03-22 14:12:42 +00:00
Charlie Voiselle 0473f35003
Fixup uses of `sanity` (#10187)
* Fixup uses of `sanity`
* Remove unnecessary comments.

These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.

* Update nomad/fsm_test.go
2021-03-16 18:05:08 -04:00
Tim Gross 97b0e26d1f RPC endpoints to support 'nomad ui -login'
RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and
`ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`).
Includes adding expiration to the periodic core GC job.
2021-03-10 08:17:56 -05:00
Kris Hicks d71a90c8a4
Fix some errcheck errors (#9811)
* Throw away result of multierror.Append

When given a *multierror.Error, it is mutated, therefore the return
value is not needed.

* Simplify MergeMultierrorWarnings, use StringBuilder

* Hash.Write() never returns an error

* Remove error that was always nil

* Remove error from Resources.Add signature

When this was originally written it could return an error, but that was
refactored away, and callers of it as of today never handle the error.

* Throw away results of io.Copy during Bridge

* Handle errors when computing node class in test
2021-01-14 12:46:35 -08:00
Kris Hicks 438717500d
gatedwriter: Fix race condition (#9791)
If one thread calls `Flush()` on a gatedwriter while another thread attempts to
`Write()` new data to it, strange things will happen.

The test I wrote shows that at the very least you can write _while_ flushing,
and the call to `Write()` will happen during the internal writes of the
buffered data, which is maybe not what is expected. (i.e. the `Write()`'d data
will be inserted somewhere in the middle of the data being `Flush()'d`)

It's also the case that, because `Write()` only has a read lock, if you had
multiple threads trying to write ("read") at the same time you might have data
loss because the `w.buf` that was read would not necessarily be up-to-date by
the time `p2` was appended to it and it was re-assigned to `w.buf`. You can see
this if you run the new gatedwriter tests with `-race` against the old implementation:

```
WARNING: DATA RACE
Read at 0x00c0000c0420 by goroutine 11:
  runtime.growslice()
      /usr/lib/go/src/runtime/slice.go:125 +0x0
  github.com/hashicorp/nomad/helper/gated-writer.(*Writer).Write()
      /home/hicks/workspace/nomad/helper/gated-writer/writer.go:41 +0x2b6
  github.com/hashicorp/nomad/helper/gated-writer.TestWriter_WithMultipleWriters.func1()
      /home/hicks/workspace/nomad/helper/gated-writer/writer_test.go:90 +0xea
```

This race condition is fixed in this change.
2021-01-14 12:43:14 -08:00
Seth Hoenig 59f230714f e2e: add e2e test for service registration 2021-01-05 08:48:12 -06:00
Mahmood Ali de954da350
docker: introduce a new hcl2-friendly `mount` syntax (#9635)
Introduce a new more-block friendly syntax for specifying mounts with a new `mount` block type with the target as label:

```hcl
config {
  image = "..."

  mount {
    type = "..."
    target = "target-path"
    volume_options { ... }
  }
}
```

The main benefit here is that by `mount` being a block, it can nest blocks and avoids the compatibility problems noted in https://github.com/hashicorp/nomad/pull/9634/files#diff-2161d829655a3a36ba2d916023e4eec125b9bd22873493c1c2e5e3f7ba92c691R128-R155 .

The intention is for us to promote this `mount` blocks and quietly deprecate the `mounts` type, while still honoring to preserve compatibility as much as we could.

This addresses the issue in https://github.com/hashicorp/nomad/issues/9604 .
2020-12-15 14:13:50 -05:00
Seth Hoenig 79e6b5d399
Merge pull request #9624 from hashicorp/b-connect-meta-regression
consul/connect: fix regression where client connect images ignored
2020-12-14 11:03:09 -06:00
Seth Hoenig 0091325721 command: give flag-helpers a better name 2020-12-14 10:07:27 -06:00
Seth Hoenig beaa6359d5 consul/connect: fix regression where client connect images ignored
Nomad v1.0.0 introduced a regression where the client configurations
for `connect.sidecar_image` and `connect.gateway_image` would be
ignored despite being set. This PR restores that functionality.

There was a missing layer of interpolation that needs to occur for
these parameters. Since Nomad 1.0 now supports dynamic envoy versioning
through the ${NOMAD_envoy_version} psuedo variable, we basically need
to first interpolate

  ${connect.sidecar_image} => envoyproxy/envoy:v${NOMAD_envoy_version}

then use Consul at runtime to resolve to a real image, e.g.

  envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.16.0

Of course, if the version of Consul is too old to provide an envoy
version preference, we then need to know to fallback to the old
version of envoy that we used before.

  envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09

Beyond that, we also need to continue to support jobs that set the
sidecar task themselves, e.g.

  sidecar_task { config { image: "custom/envoy" } }

which itself could include teh pseudo envoy version variable.
2020-12-14 09:47:55 -06:00
Seth Hoenig 9ec1af5310 command: remove use of flag impls from consul
In a few places Nomad was using flag implementations directly
from Consul, lending to Nomad's need to import consul. Replace
those uses with helpers already in Nomad, and copy over the bare
minimum needed to make the autopilot flags behave as they have.
2020-12-11 07:58:20 -06:00
Kris Hicks 0cf9cae656
Apply some suggested fixes from staticcheck (#9598) 2020-12-10 07:29:18 -08:00
Kris Hicks 0a3a748053
Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Kris Hicks 93155ba3da
Add gocritic to golangci-lint config (#9556) 2020-12-08 12:47:04 -08:00
Dave May e045bd3a5e
nomad operator debug - add pprof duration / csi details (#9346)
* debug: add pprof duration CLI argument
* debug: add CSI plugin details
* update help text with ACL requirements
* debug: provide ACL hints upon permission failures
* debug: only write file when pprof retrieve is successful
* debug: add helper function to clean bad characters from dynamic filenames
* debug: ensure files are unable to escape the capture directory
2020-12-01 12:36:05 -05:00
Drew Bailey 86080e25a9
Send events to EventSinks (#9171)
* Process to send events to configured sinks

This PR adds a SinkManager to a server which is responsible for managing
managed sinks. Managed sinks subscribe to the event broker and send
events to a sink writer (webhook). When changes to the eventstore are
made the sinkmanager and managed sink are responsible for reloading or
starting a new managed sink.

* periodically check in sink progress to raft

Save progress on the last successfully sent index to raft. This allows a
managed sink to resume close to where it left off in the event of a lost
server or leadership change

dereference eventsink so we can accurately use the watchch

When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger
2020-10-26 17:27:54 -04:00
Drew Bailey 1ae39a9ed9
event sink crud operation api (#9155)
* network sink rpc/api plumbing

state store methods and restore

upsert sink test

get sink

delete sink

event sink list and tests

go generate new msg types

validate sink on upsert

* go generate
2020-10-23 14:23:00 -04:00
Michael Schurter 6eb8520367 Update generate_msgtypes.sh now that iota is gone 2020-10-22 15:26:32 -07:00
Tim Gross 1fb1c9c5d4
artifact/template: make destination path absolute inside taskdir (#9149)
Prior to Nomad 0.12.5, you could use `${NOMAD_SECRETS_DIR}/mysecret.txt` as
the `artifact.destination` and `template.destination` because we would always
append the destination to the task working directory. In the recent security
patch we treated the `destination` absolute path as valid if it didn't escape
the working directory, but this breaks backwards compatibility and
interpolation of `destination` fields.

This changeset partially reverts the behavior so that we always append the
destination, but we also perform the escape check on that new destination
after interpolation so the security hole is closed.

Also, ConsulTemplate test should exercise interpolation
2020-10-22 15:47:49 -04:00
Tim Gross 6df36e4cdb artifact/template: prevent file sandbox escapes
Ensure that the client honors the client configuration for the
`template.disable_file_sandbox` field when validating the jobspec's
`template.source` parameter, and not just with consul-template's own `file`
function.

Prevent interpolated `template.source`, `template.destination`, and
`artifact.destination` fields from escaping file sandbox.
2020-10-21 14:34:12 -04:00
Alexander Shtuchkin 90fd8bb85f
Implement 'batch mode' for persisting allocations on the client. (#9093)
Fixes #9047, see problem details there.

As a solution, we use BoltDB's 'Batch' mode that combines multiple
parallel writes into small number of transactions. See
https://github.com/boltdb/bolt#batch-read-write-transactions for
more information.
2020-10-20 16:15:37 -04:00
davemay99 a7fc6e9b30 return explicit error if not found/empty path falls through 2020-09-29 14:55:28 -04:00
davemay99 a592186947 satisfy the linter 2020-09-29 01:37:15 -04:00
davemay99 bb555b249f reverting export of fixTime 2020-09-29 01:21:03 -04:00
davemay99 61c12e3e18 finish refactoring walk to search for any file 2020-09-29 01:17:10 -04:00
davemay99 f2b3536da2 refactor functions to find raft.db 2020-09-24 19:00:53 -04:00
davemay99 4fcdbb9dee logging tweaks 2020-09-24 17:30:50 -04:00
davemay99 48f8ee1d11 export FixTime to allow external use 2020-09-24 16:47:58 -04:00
davemay99 991b814377 add missing import 2020-09-21 11:20:08 -04:00
davemay99 5a159f1108 Raftutil cleanup, plus helper function to find raft.db 2020-09-17 21:35:17 -04:00
davemay99 d330da9434 Verify iter to avoid panic 2020-09-08 14:10:38 -04:00
Mahmood Ali 55e7b94f62 Handle when a new line follows an escaping char 2020-08-31 20:31:44 -04:00
Mahmood Ali 79279b37cb Add a failing \n~\n~. case! 2020-08-31 20:29:04 -04:00
Mahmood Ali d588f91575 add helper commands for debugging state 2020-08-31 08:45:59 -04:00
Seth Hoenig 5b072029f2 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
James Rasell dab8282be5
Merge pull request #8589 from hashicorp/f-gh-5718
driver/docker: allow configurable pull context timeout setting.
2020-08-14 16:07:59 +02:00
James Rasell bc42cd2e5e
driver/docker: allow configurable pull context timeout setting.
Pulling large docker containers can take longer than the default
context timeout. Without a way to change this it is very hard for
users to utilise Nomad properly without hacky work arounds.

This change adds an optional pull_timeout config parameter which
gives operators the possibility to account for increase pull times
where needed. The infra docker image also has the option to set a
custom timeout to keep consistency.
2020-08-12 08:58:07 +01:00
Tim Gross fb27082e5c
RPC errors must be wrapped in order to wrap internal errors (#8632)
The CSI client RPC uses error wrapping to detect the type of error bubbling up
from plugins, but if the errors we get aren't wrapped at each layer, we can't
unwrap the inner error.

Also eliminates some unused args.
2020-08-11 09:13:52 -04:00
Seth Hoenig 6ab3d21d2c consul: validate script type when ussing check thresholds 2020-08-10 14:08:09 -05:00
Drew Bailey b296558b8e
oss compoments for multi-vault namespaces
adds in oss components to support enterprise multi-vault namespace feature

upgrade specific doc on vault multi-namespaces

vault docs

update test to reflect new error
2020-07-24 10:14:59 -04:00
Mahmood Ali ab25616add
Merge pull request #7234 from derekmarcotte/dm-freebsd
Fix undefined: getEphemeralPortRange error on FreeBSD.
2020-07-24 10:01:41 -04:00
Mahmood Ali e226ca9c4f revert changes from earlier change 2020-06-12 14:02:43 -04:00
Mahmood Ali 69bb42acf8 tests: prefix agent logs to identify agent sources 2020-06-07 16:38:11 -04:00