Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932
While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*
The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
Certain environments use WARN for serious logging; however, it's very
possible to have machines without some of the fingerprinted keys
(public-ipv4 and public-hostname specifcally). Setting log level to
INFO seems more consistent with this possibility.
* Allow server TLS configuration to be reloaded via SIGHUP
* dynamic tls reloading for nomad agents
* code cleanup and refactoring
* ensure keyloader is initialized, add comments
* allow downgrading from TLS
* initalize keyloader if necessary
* integration test for tls reload
* fix up test to assert success on reloaded TLS configuration
* failure in loading a new TLS config should remain at current
Reload only the config if agent is already using TLS
* reload agent configuration before specific server/client
lock keyloader before loading/caching a new certificate
* introduce a get-or-set method for keyloader
* fixups from code review
* fix up linting errors
* fixups from code review
* add lock for config updates; improve copy of tls config
* GetCertificate only reloads certificates dynamically for the server
* config updates/copies should be on agent
* improve http integration test
* simplify agent reloading storing a local copy of config
* reuse the same keyloader when reloading
* Test that server and client get reloaded but keep keyloader
* Keyloader exposes GetClientCertificate as well for outgoing connections
* Fix spelling
* correct changelog style
This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.
Fixes https://github.com/hashicorp/nomad/issues/3487
WARNing when someone has over 50 non-terminal allocs was just too
confusing.
Tested manually with `gc_max_allocs = 10` and bumping a job from `count
= 19` to `count = 21`:
```
2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations
...
2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations
```
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.
Also stops logging "marked for GC" twice.
Fixes https://github.com/hashicorp/nomad/issues/3454
Reliably reproduced the data race before by having a fingerprinter
change the nodes attributes every millisecond and syncing at the same
rate. With fix, did not ever panic.
Our Docker network plugin autodetection code was erroneously treating
Window's default network `nat` as a plugin and defaulting to it instead
of the host.
Fixes#3218
If the container dies before the network can be read, we now ignore the
error coming out of the network information polling loop. Nomad will
restart the task regardless, so we might be masking the actual error.
The polling loop for the rkt network information, inside the `Start`
method, was getting a bit unwieldy. It's been refactored out so it's not
a seperate function.
The rkt port mapping test currently starts redis with --version, which
obviously makes redis exit again almost immediately. This means that the
container exists before the network status can be queried, and so the
test fails.
The network status poll loop for the rkt drivers `Start` method was a
bit messy, and could not display the last encountered error. Here we
clean it up.
The changes introduces in #3256 require at least rkt 1.27.0 because of
a bug in the JSON output of `rkt status` in previous versions.
Here we upgrade all references to rkt's minimum version, and also make
travis and vagrant use this version when running tests.
Finally we add a CHANGELOG notice.