open-nomad/e2e
Michael Schurter 9cac60dbed
test: use port collision instead of cpu exhaustion (#14994)
Originally this test relied on Job 1 blocking Job 2 until Job 1 had a
terminal *ClientStatus.* Job 2 ensured it would get blocked using 2
mechanisms:

1. A constraint requiring it is placed on the same node as Job 1.
2. Job 2 would require all unreserved CPU on the node to ensure it would
   be blocked until Job 1's resources were free.

That 2nd assertion breaks if *any previous job is still running on the
target node!* That seems very likely to happen in the flaky world of our
e2e tests. In fact there may be some jobs we intentionally want running
throughout; in hindsight it was never safe to assume my test would be
the only thing scheduled when it ran.

*Ports to the rescue!* Reserving a static port means that both Job 2
will now block on Job 1 being terminal. It will only conflict with other
tests if those tests use that port *on every node.* I ensured no
existing tests were using the port I chose.

Other changes:
- Gave job a bit more breathing room resource-wise.
- Tightened timings a bit since previous failure ran into the `go test`
  time limit.
- Cleaned up the DumpEvals output. It's quite nice and handy now!
2022-10-21 07:53:26 -07:00
..
acl e2e: do not assume clean cluster when checking return objects. (#14557) 2022-09-13 14:25:19 +02:00
affinities
bin
clientstate
connect e2e: use unique names for Connect ACL Consul policy names. (#14604) 2022-09-16 13:35:40 +02:00
consul cleanup more helper updates (#14638) 2022-09-21 14:53:25 -05:00
consulacls
consultemplate
csi
deployment
disconnectedclients
e2eutil test: use port collision instead of cpu exhaustion (#14994) 2022-10-21 07:53:26 -07:00
eval_priority
events
example
execagent
framework
isolation e2e: convert flaky exec download in chroot unit test into e2e test (#14949) 2022-10-19 08:22:32 -05:00
lifecycle
metrics
namespaces e2e: fix incorrect must function usage in namespace suite. (#14805) 2022-10-05 15:50:56 +02:00
networking
nodedrain
nomadexec
operator_scheduler
overlap test: use port collision instead of cpu exhaustion (#14994) 2022-10-21 07:53:26 -07:00
oversubscription
parameterized
periodic
podman
quotas
remotetasks
rescheduling
scaling cleanup: replace TypeToPtr helper methods with pointer.Of (#14151) 2022-08-17 18:26:34 +02:00
scalingpolicies
scheduler_sysbatch Allow specification of a custom job name/prefix for parameterized jobs (#14631) 2022-10-06 16:21:40 -04:00
scheduler_system
servicediscovery e2e: fixup service discovery and ACL expiration tests. (#14517) 2022-09-09 14:27:40 +02:00
spread e2e: fixes the ordering on greater than checks within spread test. (#14818) 2022-10-06 15:27:36 +02:00
taskevents
terraform e2e: add acl test for token expiration. (#14418) 2022-09-01 09:36:09 +02:00
ui
upgrades
vaultcompat cleanup: replace TypeToPtr helper methods with pointer.Of (#14151) 2022-08-17 18:26:34 +02:00
vaultsecrets
volumes
.gitignore
README.md
e2e_test.go

README.md

End to End Tests

This package contains integration tests. Unlike tests alongside Nomad code, these tests expect there to already be a functional Nomad cluster accessible (either on localhost or via the NOMAD_ADDR env var).

See framework/doc.go for how to write tests.

The NOMAD_E2E=1 environment variable must be set for these tests to run.

Provisioning Test Infrastructure on AWS

The terraform/ folder has provisioning code to spin up a Nomad cluster on AWS. You'll need both Terraform and AWS credentials to setup AWS instances on which e2e tests will run. See the README for details. The number of servers and clients is configurable, as is the specific build of Nomad to deploy and the configuration file for each client and server.

Provisioning Local Clusters

To run tests against a local cluster, you'll need to make sure the following environment variables are set:

  • NOMAD_ADDR should point to one of the Nomad servers
  • CONSUL_HTTP_ADDR should point to one of the Consul servers
  • NOMAD_E2E=1

TODO: the scripts in ./bin currently work only with Terraform, it would be nice for us to have a way to deploy Nomad to Vagrant or local clusters.

Running

After completing the provisioning step above, you can set the client environment for NOMAD_ADDR and run the tests as shown below:

# from the ./e2e/terraform directory, set your client environment
# if you haven't already
$(terraform output environment)

cd ..
go test -v ./...

If you want to run a specific suite, you can specify the -suite flag as shown below. Only the suite with a matching Framework.TestSuite.Component will be run, and all others will be skipped.

go test -v -suite=Consul .

If you want to run a specific test, you'll need to regex-escape some of the test's name so that the test runner doesn't skip over framework struct method names in the full name of the tests:

go test -v . -run 'TestE2E/Consul/\*consul\.ScriptChecksE2ETest/TestGroup'
                              ^       ^             ^               ^
                              |       |             |               |
                          Component   |             |           Test func
                                      |             |
                                  Go Package      Struct

We're also in the process of migrating to "stdlib-style" tests that use the standard go testing package without a notion of "suite". You can run these with -run regexes the same way you would any other go test:

go test -v . -run TestExample/TestExample_Simple

I Want To...

...SSH Into One Of The Test Machines

You can use the Terraform output to find the IP address. The keys will in the ./terraform/keys/ directory.

ssh -i keys/nomad-e2e-*.pem ubuntu@${EC2_IP_ADDR}

Run terraform output for IP addresses and details.

...Deploy a Cluster of Mixed Nomad Versions

The variables.tf file describes the nomad_version, and nomad_local_binary variables that can be used for most circumstances. But if you want to deploy mixed Nomad versions, you can provide a list of versions in your terraform.tfvars file.

For example, if you want to provision 3 servers all using Nomad 0.12.1, and 2 Linux clients using 0.12.1 and 0.12.2, you can use the following variables:

# will be used for servers
nomad_version = "0.12.1"

# will override the nomad_version for Linux clients
nomad_version_client_linux = [
    "0.12.1",
    "0.12.2"
]

...Deploy Custom Configuration Files

Set the profile field to "custom" and put the configuration files in ./terraform/config/custom/ as described in the README.

...Deploy More Than 4 Linux Clients

Use the "custom" profile as described above.

...Change the Nomad Version After Provisioning

You can update the nomad_version variable, or simply rebuild the binary you have at the nomad_local_binary path so that Terraform picks up the changes. Then run terraform plan/terraform apply again. This will update Nomad in place, making the minimum amount of changes necessary.