Commit Graph

313 Commits

Author SHA1 Message Date
Jorge Marey 584ddfe859
Add Namespace, Job and Group to envoy stats (#14311) 2022-09-22 10:38:21 -04:00
Seth Hoenig 2088ca3345
cleanup more helper updates (#14638)
* cleanup: refactor MapStringStringSliceValueSet to be cleaner

* cleanup: replace SliceStringToSet with actual set

* cleanup: replace SliceStringSubset with real set

* cleanup: replace SliceStringContains with slices.Contains

* cleanup: remove unused function SliceStringHasPrefix

* cleanup: fixup StringHasPrefixInSlice doc string

* cleanup: refactor SliceSetDisjoint to use real set

* cleanup: replace CompareSliceSetString with SliceSetEq

* cleanup: replace CompareMapStringString with maps.Equal

* cleanup: replace CopyMapStringString with CopyMap

* cleanup: replace CopyMapStringInterface with CopyMap

* cleanup: fixup more CopyMapStringString and CopyMapStringInt

* cleanup: replace CopySliceString with slices.Clone

* cleanup: remove unused CopySliceInt

* cleanup: refactor CopyMapStringSliceString to be generic as CopyMapOfSlice

* cleanup: replace CopyMap with maps.Clone

* cleanup: run go mod tidy
2022-09-21 14:53:25 -05:00
Seth Hoenig 5187f92c5e
cleanup: create interface for check watcher and mock it in nsd tests (#14577)
* cleanup: create interface for check watcher and mock it in nsd tests

* cleanup: add comments for check watcher interface
2022-09-14 08:25:20 -05:00
Seth Hoenig 9a943107c7 servicedisco: implement check_restart for nomad service checks
This PR implements support for check_restart for checks registered
in the Nomad service provider.

Unlike Consul, Nomad service checks never report a "warning" status,
and so the check_restart.ignore_warnings configuration is not valid
for Nomad service checks.
2022-09-13 08:59:23 -05:00
Seth Hoenig feff36f3f7 client: refactor check watcher to be reusable
This PR refactors agent/consul/check_watcher into client/serviceregistration,
and abstracts away the Consul-specific check lookups.

In doing so we should be able to reuse the existing check watcher logic for
also watching NSD checks in a followup PR.

A chunk of consul/unit_test.go is removed - we'll cover that in e2e tests
in a follow PR if needed. In the long run I'd like to remove this whole file.
2022-09-12 10:13:31 -05:00
Seth Hoenig 31234d6a62 cleanup: consolidate interfaces for workload restarting
This PR combines two of the same interface definitions around workload restarting
2022-09-09 08:59:04 -05:00
Charlie Voiselle 5c0e34dd33
Vars: Update CT dependency to support variables. (#14399)
* Update Consul Template dep to support Nomad vars

* Remove `Peering` config for Consul Testservers
Upgrading to the 1.14 Consul SDK introduces and additional default
configuration—`Peering`—that is not compatible with versions of Consul
before v1.13.0. because Nomad tests against Consul v1.11.1, this
configuration has to be nil'ed out before passing it to the Consul
binary.
2022-08-30 15:26:01 -04:00
Michael Schurter dbffe22465
consul: allow stale namespace results (#12953)
Nomad reconciles services it expects to be registered in Consul with
what is actually registered in the local Consul agent. This is necessary
to prevent leaking service registrations if Nomad crashes at certain
points (or if there are bugs).

When Consul has namespaces enabled, we must iterate over each available
namespace to be sure no services were leaked into non-default
namespaces.

Since this reconciliation happens often, there's no need to require
results from the Consul leader server. In large clusters this creates
far more load than the "freshness" of the response is worth.

Therefore this patch switches the request to AllowStale=true
2022-08-26 16:05:12 -07:00
Luiz Aoqui 7a8cacc9ec
allocrunner: refactor task coordinator (#14009)
The current implementation for the task coordinator unblocks tasks by
performing destructive operations over its internal state (like closing
channels and deleting maps from keys).

This presents a problem in situations where we would like to revert the
state of a task, such as when restarting an allocation with tasks that
have already exited.

With this new implementation the task coordinator behaves more like a
finite state machine where task may be blocked/unblocked multiple times
by performing a state transition.

This initial part of the work only refactors the task coordinator and
is functionally equivalent to the previous implementation. Future work
will build upon this to provide bug fixes and enhancements.
2022-08-22 18:38:49 -04:00
Piotr Kazmierczak b63944b5c1
cleanup: replace TypeToPtr helper methods with pointer.Of (#14151)
Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.
2022-08-17 18:26:34 +02:00
Seth Hoenig 7728cf5a9a
Merge pull request #14132 from hashicorp/build-update-go1.19
build: update to go1.19
2022-08-16 11:20:27 -05:00
Seth Hoenig b3ea68948b build: run gofmt on all go source files
Go 1.19 will forecefully format all your doc strings. To get this
out of the way, here is one big commit with all the changes gofmt
wants to make.
2022-08-16 11:14:11 -05:00
Seth Hoenig f9355c29fb cleanup: consul mesh gateway type need not be pointer
This PR changes the use of structs.ConsulMeshGateway to value types
instead of via pointers. This will help in a follow up PR where we
cleanup a lot of custom comparison code with helper functions instead.
2022-08-13 11:26:58 -05:00
Seth Hoenig 54efec5dfe docs: add docs and tests for tagged_addresses 2022-05-31 13:02:48 -05:00
Jorge Marey f966614602 Allow setting tagged addresses on services 2022-05-31 10:06:55 -05:00
Seth Hoenig 4631045d83 connect: enable setting connect upstream destination namespace 2022-05-26 09:39:36 -05:00
Eng Zer Jun 97d1bc735c
test: use `T.TempDir` to create temporary test directory (#12853)
* test: use `T.TempDir` to create temporary test directory

This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.

Prior to this commit, temporary directory created using `ioutil.TempDir`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
	defer func() {
		if err := os.RemoveAll(dir); err != nil {
			t.Fatal(err)
		}
	}
is also tedious, but `t.TempDir` handles this for us nicely.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix TestLogmon_Start_restart on Windows

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestConsul_Integration

t.TempDir fails to perform the cleanup properly because the folder is
still in use

testing.go:967: TempDir RemoveAll cleanup: unlinkat /tmp/TestConsul_Integration2837567823/002/191a6f1a-5371-cf7c-da38-220fe85d10e5/web/secrets: device or resource busy

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-05-12 11:42:40 -04:00
Seth Hoenig 3fcac242c6 services: enable setting arbitrary address value in service registrations
This PR introduces the `address` field in the `service` block so that Nomad
or Consul services can be registered with a custom `.Address.` to advertise.

The address can be an IP address or domain name. If the `address` field is
set, the `service.address_mode` must be set in `auto` mode.
2022-04-22 09:14:29 -05:00
Seth Hoenig a1c4f16cf1 connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs
This PR expands on the work done in #12543 to
- prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags
- merge into pre-existing envoy_stats_tags fields
- update the upgrade guide docs
- update changelog
2022-04-14 12:52:52 -05:00
Ian Drennan 70bd32df83 Add alloc_id to sidecar bootstrap 2022-04-14 11:46:06 -05:00
Seth Hoenig 9670adb6c6 cleanup: purge github.com/pkg/errors 2022-04-01 19:24:02 -05:00
James Rasell a646333263
Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
James Rasell 042bf0fa57
client: hookup service wrapper for use within client hooks. 2022-03-21 10:29:57 +01:00
Seth Hoenig 2631659551 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
James Rasell 7cd28a6fb6
client: refactor common service registration objects from Consul.
This commit performs refactoring to pull out common service
registration objects into a new `client/serviceregistration`
package. This new package will form the base point for all
client specific service registration functionality.

The Consul specific implementation is not moved as it also
includes non-service registration implementations; this reduces
the blast radius of the changes as well.
2022-03-15 09:38:30 +01:00
Seth Hoenig db2347a86c cleanup: prevent leaks from time.After
This PR replaces use of time.After with a safe helper function
that creates a time.Timer to use instead. The new function returns
both a time.Timer and a Stop function that the caller must handle.

Unlike time.NewTimer, the helper function does not panic if the duration
set is <= 0.
2022-02-02 14:32:26 -06:00
Michael Schurter 10c3bad652 client: never embed alloc_dir in chroot
Fixes #2522

Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.

A bad agent configuration will look something like this:

```hcl
data_dir = "/tmp/nomad-badagent"

client {
  enabled = true

  chroot_env {
    # Note that the source matches the data_dir
    "/tmp/nomad-badagent" = "/ohno"
    # ...
  }
}
```

Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.

Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.

When running tests as root in a vm without the fix, the following error
occurs:

```
=== RUN   TestAllocDir_SkipAllocDir
    alloc_dir_test.go:520:
                Error Trace:    alloc_dir_test.go:520
                Error:          Received unexpected error:
                                Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
                Test:           TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```

Also removed unused Copy methods on AllocDir and TaskDir structs.

Thanks to @eveld for not letting me forget about this!
2021-10-18 09:22:01 -07:00
Luiz Aoqui 0a62bdc3c5
fix panic when Connect mesh gateway doesn't have a proxy block (#11257)
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-10-04 15:52:07 -04:00
Mahmood Ali 4d90afb425 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
Seth Hoenig 7c3db812fd consul/connect: remove sidecar proxy before removing parent service
This PR will have Nomad de-register a sidecar proxy service before
attempting to de-register the parent service. Otherwise, Consul will
emit a warning and an error.

Fixes #10845
2021-07-08 13:30:19 -05:00
Seth Hoenig 284cd214ec consul/connect: improve regex from CR suggestions 2021-07-08 13:05:05 -05:00
Seth Hoenig 868b246128 consul/connect: Avoid assumption of parent service when filtering connect proxies
This PR uses regex-based matching for sidecar proxy services and checks when syncing
with Consul. Previously we would check if the parent of the sidecar was still being
tracked in Nomad. This is a false invariant - one which we must not depend when we
make #10845 work.

Fixes #10843
2021-07-08 09:43:41 -05:00
Seth Hoenig 56a6a1b1df consul: avoid extra sync operations when no action required
This PR makes it so the Consul sync logic will ignore operations that
do not specify an action to take (i.e. [de-]register [services|checks]).

Ideally such noops would be discarded at the callsites (i.e. users
of [Create|Update|Remove]Workload], but we can also be defensive
at the commit point.

Also adds 2 trace logging statements which are helpful for diagnosing
sync operations with Consul - when they happen and why.

Fixes #10797
2021-07-07 11:24:56 -05:00
Seth Hoenig 532b898b07 consul/connect: in-place update service definition when connect upstreams are modified
This PR fixes a bug where modifying the upstreams of a Connect sidecar proxy
would not result Consul applying the changes, unless an additional change to
the job would trigger a task replacement (thus replacing the service definition).

The fix is to check if upstreams have been modified between Nomad's view of the
sidecar service definition, and the service definition for the sidecar that is
actually registered in Consul.

Fixes #8754
2021-06-16 16:48:26 -05:00
James Rasell 939b23936a
Merge pull request #10744 from hashicorp/b-remove-duplicate-imports
chore: remove duplicate import statements
2021-06-11 16:42:34 +02:00
James Rasell 492e308846
tests: remove duplicate import statements. 2021-06-11 09:39:22 +02:00
Seth Hoenig dbdc479970 consul: move consul acl tests into ent files
(cherry-pick ent back to oss)

This PR moves a lot of Consul ACL token validation tests into ent files,
so that we can verify correct behavior difference between OSS and ENT
Nomad versions.
2021-06-09 08:38:42 -05:00
Seth Hoenig d656777dd7
Merge pull request #10720 from hashicorp/f-cns-acl-check
consul: correctly check consul acl token namespace when using consul oss
2021-06-08 15:43:42 -05:00
Seth Hoenig 87be8c4c4b consul: correctly check consul acl token namespace when using consul oss
This PR fixes the Nomad Object Namespace <-> Consul ACL Token relationship
check when using Consul OSS (or Consul ENT without namespace support).

Nomad v1.1.0 introduced a regression where Nomad would fail the validation
when submitting Connect jobs and allow_unauthenticated set to true, with
Consul OSS - because it would do the namespace check against the Consul ACL
token assuming the "default" namespace, which does not work because Consul OSS
does not have namespaces.

Instead of making the bad assumption, expand the namespace check to handle
each special case explicitly.

Fixes #10718
2021-06-08 13:55:57 -05:00
Seth Hoenig 209e2d6d81 consul: pr cleanup namespace probe function signatures 2021-06-07 15:41:01 -05:00
Seth Hoenig 519429a2de consul: probe consul namespace feature before using namespace api
This PR changes Nomad's wrapper around the Consul NamespaceAPI so that
it will detect if the Consul Namespaces feature is enabled before making
a request to the Namespaces API. Namespaces are not enabled in Consul OSS,
and require a suitable license to be used with Consul ENT.

Previously Nomad would check for a 404 status code when makeing a request
to the Namespaces API to "detect" if Consul OSS was being used. This does
not work for Consul ENT with Namespaces disabled, which returns a 500.

Now we avoid requesting the namespace API altogether if Consul is detected
to be the OSS sku, or if the Namespaces feature is not licensed. Since
Consul can be upgraded from OSS to ENT, or a new license applied, we cache
the value for 1 minute, refreshing on demand if expired.

Fixes https://github.com/hashicorp/nomad-enterprise/issues/575

Note that the ticket originally describes using attributes from https://github.com/hashicorp/nomad/issues/10688.
This turns out not to be possible due to a chicken-egg situation between
bootstrapping the agent and setting up the consul client. Also fun: the
Consul fingerprinter creates its own Consul client, because there is no
[currently] no way to pass the agent's client through the fingerprint factory.
2021-06-07 12:19:25 -05:00
Seth Hoenig d026ff1f66 consul/connect: add support for connect mesh gateways
This PR implements first-class support for Nomad running Consul
Connect Mesh Gateways. Mesh gateways enable services in the Connect
mesh to make cross-DC connections via gateways, where each datacenter
may not have full node interconnectivity.

Consul docs with more information:
https://www.consul.io/docs/connect/gateways/mesh-gateway

The following group level service block can be used to establish
a Connect mesh gateway.

service {
  connect {
    gateway {
      mesh {
        // no configuration
      }
    }
  }
}

Services can make use of a mesh gateway by configuring so in their
upstream blocks, e.g.

service {
  connect {
    sidecar_service {
      proxy {
        upstreams {
          destination_name = "<service>"
          local_bind_port  = <port>
          datacenter       = "<datacenter>"
          mesh_gateway {
            mode = "<mode>"
          }
        }
      }
    }
  }
}

Typical use of a mesh gateway is to create a bridge between datacenters.
A mesh gateway should then be configured with a service port that is
mapped from a host_network configured on a WAN interface in Nomad agent
config, e.g.

client {
  host_network "public" {
    interface = "eth1"
  }
}

Create a port mapping in the group.network block for use by the mesh
gateway service from the public host_network, e.g.

network {
  mode = "bridge"
  port "mesh_wan" {
    host_network = "public"
  }
}

Use this port label for the service.port of the mesh gateway, e.g.

service {
  name = "mesh-gateway"
  port = "mesh_wan"
  connect {
    gateway {
      mesh {}
    }
  }
}

Currently Envoy is the only supported gateway implementation in Consul.
By default Nomad client will run the latest official Envoy docker image
supported by the local Consul agent. The Envoy task can be customized
by setting `meta.connect.gateway_image` in agent config or by setting
the `connect.sidecar_task` block.

Gateways require Consul 1.8.0+, enforced by the Nomad scheduler.

Closes #9446
2021-06-04 08:24:49 -05:00
Mahmood Ali 102763c979
Support disabling TCP checks for connect sidecar services 2021-05-07 12:10:26 -04:00
Seth Hoenig 09cd01a5f3 e2e: add e2e tests for consul namespaces on ent with acls
This PR adds e2e tests for Consul Namespaces for Nomad Enterprise
with Consul ACLs enabled.

Needed to add support for Consul ACL tokens with `namespace` and
`namespace_prefix` blocks, which Nomad parses and validates before
tossing the token. These bits will need to be picked back to OSS.
2021-04-27 14:45:54 -06:00
Seth Hoenig 509490e5d2 e2e: consul namespace tests from nomad ent
(cherry-picked from ent without _ent things)

This is part 2/4 of e2e tests for Consul Namespaces. Took a
first pass at what the parameterized tests can look like, but
only on the ENT side for this PR. Will continue to refactor
in the next PRs.

Also fixes 2 bugs:
 - Config Entries registered by Nomad Server on job registration
   were not getting Namespace set
 - Group level script checks were not getting Namespace set

Those changes will need to be copied back to Nomad OSS.

Nomad OSS + no ACLs (previously, needs refactor)
Nomad ENT + no ACLs (this)
Nomad OSS + ACLs (todo)
Nomad ENT + ALCs (todo)
2021-04-19 15:35:31 -06:00
Nick Spain 653d84ef68 Add a 'body' field to the check stanza
Consul allows specifying the HTTP body to send in a health check. Nomad
uses Consul for health checking so this just plumbs the value through to
where the Consul API is called.

There is no validation that `body` is not used with an incompatible
check method like GET.
2021-04-13 09:15:35 -04:00
Seth Hoenig fe8fce00d9 consul: minor CR cleanup 2021-04-05 10:10:16 -06:00
Seth Hoenig f17ba33f61 consul: plubming for specifying consul namespace in job/group
This PR adds the common OSS changes for adding support for Consul Namespaces,
which is going to be a Nomad Enterprise feature. There is no new functionality
provided by this changeset and hopefully no new bugs.
2021-04-05 10:03:19 -06:00
Charlie Voiselle 0473f35003
Fixup uses of `sanity` (#10187)
* Fixup uses of `sanity`
* Remove unnecessary comments.

These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.

* Update nomad/fsm_test.go
2021-03-16 18:05:08 -04:00
Andre Ilhicas f45fc6c899
consul/connect: enable setting local_bind_address in upstream 2021-02-26 11:37:31 +00:00