Commit graph

428 commits

Author SHA1 Message Date
Charlie Voiselle 0473f35003
Fixup uses of sanity (#10187)
* Fixup uses of `sanity`
* Remove unnecessary comments.

These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.

* Update nomad/fsm_test.go
2021-03-16 18:05:08 -04:00
Tim Gross 2a2e36690a docs: swap master for main in Nomad repo 2021-03-08 14:26:31 -05:00
Mahmood Ali ff8d67fae2
Merge pull request #9935 from hashicorp/e2e-segment-e2e-clusters
e2e: segment e2e clusters
2021-03-01 09:23:21 -05:00
Drew Bailey 86d9e1ff90
Merge pull request #9955 from hashicorp/on-update-services
Service and Check on_update configuration option (readiness checks)
2021-02-24 10:11:05 -05:00
Seth Hoenig d2cd605995 dist: place systemd unit options correctly
This PR places StartLimitIntervalSec and StartLimitBurst in the
Unit section of systemd unit files, rather than the Service section.

https://www.freedesktop.org/software/systemd/man/systemd.unit.html

Fixes #10065
2021-02-22 19:23:00 -06:00
Drew Bailey c152757d38
E2e/fix periodic (#10047)
* fix periodic

* update periodic to not use template

nomad job inspect no longer returns an apiliststub so the required fields to query job summary are no longer there, parse cli output instead

* rm tmp makefile entry

* fix typo

* revert makefile change
2021-02-18 12:21:53 -05:00
James Rasell f95e45b80c
e2e: account for race condition in periodic dispatch test. 2021-02-11 11:08:48 +01:00
Seth Hoenig 7d6e81e9e4
Merge pull request #9990 from hashicorp/f-nsiso-task
drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior
2021-02-09 13:29:14 -06:00
Seth Hoenig 45e0e70a50 consul/connect: enable custom sidecars to use expose checks
This PR enables jobs configured with a custom sidecar_task to make
use of the `service.expose` feature for creating checks on services
in the service mesh. Before we would check that sidecar_task had not
been set (indicating that something other than envoy may be in use,
which would not support envoy's expose feature). However Consul has
not added support for anything other than envoy and probably never
will, so having the restriction in place seems like an unnecessary
hindrance. If Consul ever does support something other than Envoy,
they will likely find a way to provide the expose feature anyway.

Fixes #9854
2021-02-09 10:49:37 -06:00
Seth Hoenig 8ee9835923 drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior
This PR adds pid_mode and ipc_mode options to the exec and java task
driver config options. By default these will defer to the default_pid_mode
and default_ipc_mode agent plugin options created in #9969. Setting
these values to "host" mode disables isolation for the task. Doing so
is not recommended, but may be necessary to support legacy job configurations.

Closes #9970
2021-02-08 14:26:35 -06:00
Drew Bailey b5585882e4
address pr comments 2021-02-08 13:43:05 -05:00
Drew Bailey b0cf3ffa54
on_update check_restart e2e 2021-02-08 10:49:25 -05:00
Drew Bailey 8507d54e3b
e2e test for on_update service checks
check_restart not compatible with on_update=ignore

reword caveat
2021-02-08 08:32:40 -05:00
Chris Baker b1bb8a760e e2e packer build: upgrade jdk to java 14 2021-02-02 17:33:48 +00:00
Mahmood Ali 45889f9f55 e2e: segment e2e clusters
Ensure that the e2e clusters are isolated and never attempt to autojoin
with another e2e cluster.

This ensures that each cluster servers have a unique `ConsulAutoJoin`,
to be used for discovery.
2021-02-01 08:04:21 -05:00
Chris Baker ce68ee164b Version 1.0.3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJgEuOKAAoJEFGFLYc0j/xMxF8H/3TTU6Tu+Xm0YvcsDaYDphZ/
 X7KQBV0aFiuL5VkTw4PzKEsgryIy9/sqEPyxxyKRowAmos9qhiusjNAIfqdP4TF8
 tdZmTedkfWir9uPD+hyv/LXpwbQ2T8kTwS3xHTYvaOmaCxZr710FEn+imnMk1AUn
 Xs5itkd/CYGr0nBLm+I5GutWSDPmL7Uw8J5Z30fFyoaxoCPAbCWQQNk793SCRUc5
 f/uo18V2tFInmQ+3sAdnM4gPewyStK/a5VvzWavL9fVDtYK83wlqWSchTXY5jpVz
 zNEzt/rYhbBzakPQQKb5zieblh2iGI8aHWpD5w4WduqO2Sg6B/5lAeNZIlW0UJg=
 =2g3c
 -----END PGP SIGNATURE-----

Merge tag 'v1.0.3' into post-release-1.0.3

Version 1.0.3
2021-01-29 19:30:08 +00:00
Chris Baker 2632b81124 lint some nomad HCL job specs 2021-01-28 12:03:19 +00:00
Chris Baker 2adf0f12d6 e2e: java driver isolation tests 2021-01-28 12:03:19 +00:00
Chris Baker aa55df0413 additional e2e utils for multi-task allocs 2021-01-28 12:03:19 +00:00
Kris Hicks d67b77f38e Add a little comment 2021-01-28 12:03:19 +00:00
Kris Hicks 5cf972d2e7 Add test for alloc exec 2021-01-28 12:03:19 +00:00
Kris Hicks 2db8aa2a52 Add e2e test for raw exec 2021-01-28 12:03:19 +00:00
Kris Hicks 87188f04de Add PID namespacing and e2e test 2021-01-28 12:03:19 +00:00
Mahmood Ali c92bb342e1 e2e: skip node drain deadline/force tests 2021-01-27 08:42:16 -05:00
Mahmood Ali b12e8912a9 e2e: use f.NoError instead of requires 2021-01-27 08:36:23 -05:00
Mahmood Ali 1ac8b32e08 e2e: Disable Connect tests
The connect tests are very disruptive: restart consul/nomad agents with new
tokens.  The test seems particularly flaky, failing 32 times out of 73 in my
sample.

The tests are particularly problematic because they are disruptive and affect
other tests. On failure, the nomad or consul agent on the client can get into a
wedged state, so health/deployment info in subsequent tests may be wrong. In
some cases, the node will be deemed as fail, and then the subsequent tests may
fail when the node is deemed lost and the test allocations get migrated unexpectedly.
2021-01-26 10:01:14 -05:00
Mahmood Ali 36ce1e73eb e2e: deflake nodedrain test
The nodedrain deadline test asserts that all allocations are migrated by the
deadline. However, when the deadline is short (e.g. 10s), the test may fail
because of scheduler/client-propagation delays.

In one failing test, it took ~15s from the RPC call to the moment to the moment
the scheduler issued migration update, and then 3 seconds for the alloc to be
stopped.

Here, I increase the timeouts to avoid such false positives.
2021-01-26 10:01:14 -05:00
Mahmood Ali cf8f6f07d7 e2e: vault increase timeout
Increase the timeout for vaultsecrets.  As the default  interval is 0.1s, 10
retries mean it only retries for one second, a very short time for some waiting
scenarios in the test (e.g. starting allocs, etc).
2021-01-26 10:01:14 -05:00
Mahmood Ali 94ad40907c e2e: prefer testutil.WaitForResultRetries
Prefer testutil.WaitForResultRetries that emits more descriptive errors on
failures. `require.Evatually` fails with opaque "Condition never satisfied"
error message.
2021-01-26 10:01:14 -05:00
Mahmood Ali f3f8f15b7b e2e: special case "Unexpected EOF" errors
This is an attempt at deflaking the e2e exec tests, and a way to improve
messages.

e2e occasionally fail with "unexpected EOF" even though the exec output matches
expectations. I suspect there is a race in handling EOF in server/http handling.

Here, we special case this error and ensures we get all failures,
to help debug the case better.
2021-01-26 10:01:14 -05:00
Mahmood Ali 925d9ce952 e2e: tweak failure messages
Tweak the error messages for the flakiest tests, so that on test failure, we get
more output
2021-01-26 09:16:48 -05:00
Mahmood Ali 6aa3dec6cc e2e: use testify requires instead of t.Fatal
testify requires offer better error message that is easier to notice when seeing
a wall of text in the builds.
2021-01-26 09:14:47 -05:00
Mahmood Ali 236b4055a7 e2e: deflake consul/CheckRestart test
Ensure we pass the alloc ID to status.  Otherwise, the test may fail if there is
another spurious allocation running from another test.
2021-01-26 09:12:20 -05:00
Mahmood Ali 0aafd9af64 e2e: Fix build script and pass shellcheck 2021-01-26 09:11:37 -05:00
Mahmood Ali 4397eda209
Merge pull request #9798 from hashicorp/e2e-terraform-tweaks-20200113
This PR makes two ergonomics changes, meant to get e2e builds more reproducible and ease changes.

### AMI Management

First, we pin the server AMIs to the commits associated with the build.  No more using the latest AMI a developer build in a test branch, or accidentally using a stale AMI because we forgot to build one!  Packer is to tag the AMI images with the commit sha used to generate the image, and then Terraform would look up only the AMIs associated with that sha. To minimize churn, we use the SHA associated with the latest Packer configurations, rather than SHA of all.

This has few benefits: reproducibility and avoiding accidental AMI changes and contamination of changes across branches. Also, the change is a stepping stone to an e2e pipeline that builds new AMIs automatically if Packer files changed.

The downside is that new AMIs will be generated even for irrelevant changes (e.g. spelling, commits), but I suspect that's OK. Also, an engineer will be forced to build the AMI whenever they change Packer files while iterating on e2e scripts; this hasn't been an issue for me yet, and I'll be open for iterating on that later if it proves to be an issue.

### Config Files and Packer

Second, this PR moves e2e config hcl management to Terraform instead of Packer. Currently, the config files live in `./terraform/config`, but they are baked into the servers by Packer and changes are ignored.  This current behavior surprised me, as I spent a bit of time debugging why my config changes weren't applied.  Having Terraform manage them would ease engineer's iteration.  Also, make Packer management more consistent (Packer only works `e2e/terraform/packer`), and easing the logic for AMI change detection.

The config directory is very small (100KB), and having it as an upload step adds negligible time to `terraform apply`.
2021-01-25 13:20:28 -05:00
Mahmood Ali 39da228964 update readme about profiles and packer build 2021-01-25 11:40:26 -05:00
Seth Hoenig 8b05efcf88 consul/connect: Add support for Connect terminating gateways
This PR implements Nomad built-in support for running Consul Connect
terminating gateways. Such a gateway can be used by services running
inside the service mesh to access "legacy" services running outside
the service mesh while still making use of Consul's service identity
based networking and ACL policies.

https://www.consul.io/docs/connect/gateways/terminating-gateway

These gateways are declared as part of a task group level service
definition within the connect stanza.

service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      terminating {
        // terminating-gateway configuration entry
      }
    }
  }
}

Currently Envoy is the only supported gateway implementation in
Consul. The gateay task can be customized by configuring the
connect.sidecar_task block.

When the gateway.terminating field is set, Nomad will write/update
the Configuration Entry into Consul on job submission. Because CEs
are global in scope and there may be more than one Nomad cluster
communicating with Consul, there is an assumption that any terminating
gateway defined in Nomad for a particular service will be the same
among Nomad clusters.

Gateways require Consul 1.8.0+, checked by a node constraint.

Closes #9445
2021-01-25 10:36:04 -06:00
Tim Gross 0b49e3da12 e2e: added tests for check restart behavior 2021-01-22 10:55:40 -05:00
Drew Bailey 630babb886
prevent double job status update (#9768)
* Prevent Job Statuses from being calculated twice

https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard
2021-01-22 09:18:17 -05:00
Mahmood Ali 9dcdafe4cf e2e: show command output on failure
When a command fails, it's nice to have the full output, as it contains
diagnostic information. The status code isn't sufficient for debugging.
2021-01-21 10:32:16 -05:00
Mahmood Ali 923725bf3d e2e: deflake TestVolumeMounts
After submitting an update, the test ought to wait until the new
allocations are placed. Previously, we'd use the original to-be-stopped
allocations and the test fails when attempting to exec.
2021-01-21 10:28:41 -05:00
Mahmood Ali 95b7fc80b8 e2e deflake namespaces: only check namespace jobs
Deflake namespace e2e test by only asserting on jobs related to the
namespace tests. During our e2e tests, some left over jobs (e.g.
prometheus) are left running while being shutdown and cause the test to
fail.
2021-01-21 10:26:24 -05:00
Mahmood Ali 2e8bcac261 e2e: deflake events
Handle streamCh channel being closed.
2021-01-21 10:25:42 -05:00
Seth Hoenig 991884e715 consul/connect: Enable running multiple ingress gateways per Nomad agent
Connect ingress gateway services were being registered into Consul without
an explicit deterministic service ID. Consul would generate one automatically,
but then Nomad would have no way to register a second gateway on the same agent
as it would not supply 'proxy-id' during envoy bootstrap.

Set the ServiceID for gateways, and supply 'proxy-id' when doing envoy bootstrap.

Fixes #9834
2021-01-19 12:58:36 -06:00
Mahmood Ali 76ce6306a4 add helper for building ami 2021-01-15 10:49:13 -05:00
Mahmood Ali e51651c34a set sha 2021-01-15 10:49:13 -05:00
Mahmood Ali 82637715cf change ami naming 2021-01-15 10:49:12 -05:00
Mahmood Ali 0af1509a77 move config files to terraform 2021-01-15 10:49:12 -05:00
Seth Hoenig 536747f216 e2e: use jobspec2 Parse for parsing jobfile in e2e utils
We directly parse job files in e2eutil, but currently using jobspec
package. Instead, use the Parse method from the jobspec2 package so
we can parse job files with new features.
2021-01-13 14:00:40 -06:00
James Rasell d6cab8aa14
Merge pull request #9767 from hashicorp/f-e2e-job-scaling-suite
e2e: add job scaling test suite.
2021-01-11 18:35:07 +01:00