IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
Also, LXC requires target paths to be relative. Container paths in LXC
binds should never be absolute paths, so we strip any preceeding `/`,
even if a user sets one.
The previous integration test was broken during the client refactor, and
it seems to be some sort of race with state updating.
I'm going to try and construct a replacement test as part of work on
performance, but for now, the underlying behaviour is still being
tested.
This PR fixes an edge case where we could GC an allocation that was in a
desired stop state but had not terminated yet. This can be hit if the
client hasn't shutdown the allocation yet or if the allocation is still
shutting down (long kill_timeout).
Fixes https://github.com/hashicorp/nomad/issues/4940
* Add Nomad RA
* Add deployment guide and nav
* Deployment Guide update
* Minor typo fixes
* Update diagrams
* Fixes for review
* Link fixes and typo fix
* Edits following review
- Update image text from "zone" to "datacenter" to match Nomad terminology
- Clean up text based on Preetha's feedback
* Text updates
Based on feedback from Rob
* Update diagrams
* fixing spelling
* Add suggestions from Preetha and Omar
WaitForResult expects body to fail and retries few times before giving
up. Assertions inside the testfn body causes it to terminate abruptly
without retrying.
`currentExpiration` field is accessed in multiple goroutines: Stats and
renewal, so needs locking.
I don't anticipate high contention, so simple mutex suffices.
Since d335a82859ca2177bc6deda0c2c85b559daf2db3 ScriptExecutors now take
a timeout duration instead of a context. This broke the script check
removal code which used context cancelation propagation to remove
script checks while they were executing.
This commit adds a wrapper around ScriptExecutors that obeys context
cancelation again. The only downside is that it leaks a goroutine until
the underlying Exec call completes or timeouts.
Since check removal is relatively rare, check timeouts usually low, and
scripts usually fast, the risk of leaking a goroutine seems very small.
Some tests have containers that die almost immediately, and may die
and cleaned up before `driver.WaitUntilStarted` runs.
The causes for container dying seems special for each test:
* TestDockerDriver_Cleanup: `hello-world` image just emits a message and exits immediately
* TestDockerDriver_ForcePull_RepoDigest: the busybox image in `TestDockerDriver_ForcePull_RepoDigest` test didn't support `-p 0` argument
* TestDockerDriver_Entrypoint: with the entrypoint being `/bin/sh -c`, the command needs to be the entire string; otherwise, it ignores the comments
Fixes a regression caused in d335a82859ca2177bc6deda0c2c85b559daf2db3
The removal of the inner context made the remaining cancels cancel the
outer context and cause script checks to exit prematurely.
Currently, libcontainer-based executor, upon shutdown, kills the
container initial process. The children of the killed process remain
running, and the executor is never marked as terminated until they do.
Also, fix a case where we treat processes as successful, when
`proc.Wait()` fails. In some attempts, I was getting "waitid no child
processes" errors and such error shouldn't get process to be considered
successful.