Registration and restoring allocs don't share state or depend on each
other in any way (syncing allocs with servers is done outside of
registration).
Since restoring is synchronous, start the registration goroutine first.
For nodes with lots of allocs to restore or close to their heartbeat
deadline, this could be the difference between becoming "lost" or not.
Refactoring of 104067bc2b2002a4e45ae7b667a476b89addc162
Switch the MarkLive method for a chan that is closed by the client.
Thanks to @notnoop for the idea!
The old approach called a method on most existing ARs and TRs on every
runAllocs call. The new approach does a once.Do call in runAllocs to
accomplish the same thing with less work. Able to remove the gate
abstraction that did much more than was needed.
Fixes#1795
Running restored allocations and pulling what allocations to run from
the server happen concurrently. This means that if a client is rebooted,
and has its allocations rescheduled, it may restart the dead allocations
before it contacts the server and determines they should be dead.
This commit makes tasks that fail to reattach on restore wait until the
server is contacted before restarting.
Related to #4280
This PR adds
`client.allocs.<job>.<group>.<alloc>.<task>.memory.allocated` as a gauge
in bytes to metrics to ease calculating how close a task is to OOMing.
```
'nomad.client.allocs.memory.allocated.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 268435456.000
'nomad.client.allocs.memory.cache.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 5677056.000
'nomad.client.allocs.memory.kernel_max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.kernel_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8908800.000
'nomad.client.allocs.memory.rss.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 876544.000
'nomad.client.allocs.memory.swap.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8208384.000
```
* client: was not using up-to-date client state in determining which alloc count towards allocated resources
* Update client/client.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
This removes an unnecessary shared lock between discovery and heartbeating
which was causing heartbeats to be missed upon retries when a single server
fails. Also made a drive by fix to call the periodic server shuffler goroutine.
This fixes a confusing UX where a previously successful deployment's
healthy/unhealthy count would get updated if any allocations failed after
the deployment was already marked as successful.
Fixes https://github.com/hashicorp/nomad/issues/5587
When a nomad 0.9 client is handling an alloc generated by a nomad 0.8
server, we should check the alloc.TaskResources for networking details
rather than task.Resources.
We check alloc.TaskResources for networking for other tasks in the task
group [1], so it's a bit odd that we used the task.Resources struct
here. TaskRunner also uses `alloc.TaskResources`[2].
The task.Resources struct in 0.8 was sparsly populated, resulting to
storing of 0 in port mapping env vars:
```
vagrant@nomad-server-01:~$ nomad version
Nomad v0.8.7 (21a2d93eecf018ad2209a5eab6aae6c359267933+CHANGES)
vagrant@nomad-server-01:~$ nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
nomad-server-01.global 10.199.0.11 4648 alive true 2 0.8.7 dc1 global
vagrant@nomad-server-01:~$ nomad alloc status -json 5b34649b | jq '.Job.TaskGroups[0].Tasks[0].Resources.Networks'
[
{
"CIDR": "",
"Device": "",
"DynamicPorts": [
{
"Label": "db",
"Value": 0
}
],
"IP": "",
"MBits": 10,
"ReservedPorts": null
}
]
vagrant@nomad-server-01:~$ nomad alloc status -json 5b34649b | jq '.TaskResources'
{
"redis": {
"CPU": 500,
"DiskMB": 0,
"IOPS": 0,
"MemoryMB": 256,
"Networks": [
{
"CIDR": "",
"Device": "eth1",
"DynamicPorts": [
{
"Label": "db",
"Value": 21722
}
],
"IP": "10.199.0.21",
"MBits": 10,
"ReservedPorts": null
}
]
}
}
```
Also, updated the test values to mimic how Nomad 0.8 structs are
represented, and made its result match the non compact values in
`TestEnvironment_AsList`.
[1] 24e9040b18/client/taskenv/env.go (L624-L639)
[2] https://github.com/hashicorp/nomad/blob/master/client/allocrunner/taskrunner/task_runner.go#L287-L303
This helper returns the token as well as the ACL policy, to be used in a later
commit for logging the token info associated with nomad exec invocation.
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.
Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
Noticed that `detected drivers` log line was misleading - when a driver
doesn't fingerprint before timeout, their health status is empty string
`""` which we would mark as detected.
Now, we log all drivers along with their state to ease driver
fingerprint debugging.
I noticed that `watchNodeUpdates()` almost immediately after
`registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5
seconds.
This call is unnecessary and made debugging a bit harder. So here, we
ensure that we only re-register node for new node events, not for
initial registration.
Here we retain 0.8.7 behavior of waiting for driver fingerprints before
registering a node, with some timeout. This is needed for system jobs,
as system job scheduling for node occur at node registration, and the
race might mean that a system job may not get placed on the node because
of missing drivers.
The timeout isn't strictly necessary, but raising it to 1 minute as it's
closer to indefinitely blocked than 1 second. We need to keep the value
high enough to capture as much drivers/devices, but low enough that
doesn't risk blocking too long due to misbehaving plugin.
Fixes https://github.com/hashicorp/nomad/issues/5579
Currently, when logmon fails to reattach, we will retry reattachment to
the same pid until the task restart specification is exhausted.
Because we cannot clear hook state during error conditions, it is not
possible for us to signal to a future restart that it _shouldn't_
attempt to reattach to the plugin.
Here we revert to explicitly detecting reattachment seperately from a
launch of a new logmon, so we can recover from scenarios where a logmon
plugin has failed.
This is a net improvement over the current hard failure situation, as it
means in the most common case (the pid has gone away), we can recover.
Other reattachment failure modes where the plugin may still be running
could potentially cause a duplicate process, or a subsequent failure to launch
a new plugin.
If there was a duplicate process, it could potentially cause duplicate
logging. This is better than a production workload outage.
If there was a subsequent failure to launch a new plugin, it would fail
in the same (retry until restarts are exhausted) as the current failure
mode.
Renewal time was being calculated as 10s+Intn(lease-10s), so the renewal
time could be very rapid or within 1s of the deadline: [10s, lease)
This commit fixes the renewal time by calculating it as:
(lease/2) +/- 10s
For a lease of 60s this means the renewal will occur in [20s, 40s).