After submitting an update, the test ought to wait until the new
allocations are placed. Previously, we'd use the original to-be-stopped
allocations and the test fails when attempting to exec.
Deflake namespace e2e test by only asserting on jobs related to the
namespace tests. During our e2e tests, some left over jobs (e.g.
prometheus) are left running while being shutdown and cause the test to
fail.
Connect ingress gateway services were being registered into Consul without
an explicit deterministic service ID. Consul would generate one automatically,
but then Nomad would have no way to register a second gateway on the same agent
as it would not supply 'proxy-id' during envoy bootstrap.
Set the ServiceID for gateways, and supply 'proxy-id' when doing envoy bootstrap.
Fixes#9834
If the connect.proxy stanza is left unset, the connection timeout
value is not set but is assumed to be, and may cause a non-fatal NPE
on job submission.
This has to have been unused because the HasPrefix operation is
backwards, meaning a Command.Env that includes PATH= never would have
worked; the default path was always used.
* Persist shared allocated ports for inplace update
Ports were not copied over when performing inplace updates in the
generic scheduler
* changelog
* drop spew
* Throw away result of multierror.Append
When given a *multierror.Error, it is mutated, therefore the return
value is not needed.
* Simplify MergeMultierrorWarnings, use StringBuilder
* Hash.Write() never returns an error
* Remove error that was always nil
* Remove error from Resources.Add signature
When this was originally written it could return an error, but that was
refactored away, and callers of it as of today never handle the error.
* Throw away results of io.Copy during Bridge
* Handle errors when computing node class in test
In this change we'll properly return the error in the
CSIPluginTypeMonolith case (which is the type given in DeleteNode()),
and also return the error when the given ID is not found.
This was found via errcheck.
If one thread calls `Flush()` on a gatedwriter while another thread attempts to
`Write()` new data to it, strange things will happen.
The test I wrote shows that at the very least you can write _while_ flushing,
and the call to `Write()` will happen during the internal writes of the
buffered data, which is maybe not what is expected. (i.e. the `Write()`'d data
will be inserted somewhere in the middle of the data being `Flush()'d`)
It's also the case that, because `Write()` only has a read lock, if you had
multiple threads trying to write ("read") at the same time you might have data
loss because the `w.buf` that was read would not necessarily be up-to-date by
the time `p2` was appended to it and it was re-assigned to `w.buf`. You can see
this if you run the new gatedwriter tests with `-race` against the old implementation:
```
WARNING: DATA RACE
Read at 0x00c0000c0420 by goroutine 11:
runtime.growslice()
/usr/lib/go/src/runtime/slice.go:125 +0x0
github.com/hashicorp/nomad/helper/gated-writer.(*Writer).Write()
/home/hicks/workspace/nomad/helper/gated-writer/writer.go:41 +0x2b6
github.com/hashicorp/nomad/helper/gated-writer.TestWriter_WithMultipleWriters.func1()
/home/hicks/workspace/nomad/helper/gated-writer/writer_test.go:90 +0xea
```
This race condition is fixed in this change.
We directly parse job files in e2eutil, but currently using jobspec
package. Instead, use the Parse method from the jobspec2 package so
we can parse job files with new features.
In 492d62d we prevented poststop tasks from contributing to allocation health
status, which fixed a bug where poststop tasks would prevent a deployment from
ever being marked successful. The patch introduced a regression where prestart
tasks that complete are causing the allocation to be marked unhealthy. This
changeset restores the previous behavior for prestart tasks.
Have the build-binary bundle the UI by default. This eases getting "alpha pre-releases" out for support without compiling locally, and engineer's experience with e2e test clusters.