Fixes#5395
Alternative to #5957
Make task restarting asynchronous when handling check-based restarts.
This matches the pre-0.9 behavior where TaskRunner.Restart was an
asynchronous signal. The check-based restarting code was not designed
to handle blocking in TaskRunner.Restart. 0.9 made it reentrant and
could easily overwhelm the buffered update chan and deadlock.
Many thanks to @byronwolfman for his excellent debugging, PR, and
reproducer!
I created this alternative as changing the functionality of
TaskRunner.Restart has a much larger impact. This approach reverts to
old known-good behavior and minimizes the number of places changes are
made.
This uses ember-page-title to add dynamic page titles throughout the
route hierarchy. When there’s more than one region, the current
current region is added before the final entry of “- Nomad”.
This ensures that server-to-server streaming RPC calls use the tls
wrapped connections.
Prior to this, `streamingRpcImpl` function uses tls for setting header
and invoking the rpc method, but returns unwrapped tls connection.
Thus, streaming writes fail with tls errors.
This tls streaming bug existed since 0.8.0[1], but PR #5654[2]
exacerbated it in 0.9.2. Prior to PR #5654, nomad client used to
shuffle servers at every heartbeat -- `servers.Manager.setServers`[3]
always shuffled servers and was called by heartbeat code[4]. Shuffling
servers meant that a nomad client would heartbeat and establish a
connection against all nomad servers eventually. When handling
streaming RPC calls, nomad servers used these local connection to
communicate directly to the client. The server-to-server forwarding
logic was left mostly unexercised.
PR #5654 means that a nomad client may connect to a single server only
and caused the server-to-server forward streaming RPC code to get
exercised more and unearthed the problem.
[1] https://github.com/hashicorp/nomad/blob/v0.8.0/nomad/rpc.go#L501-L515
[2] https://github.com/hashicorp/nomad/pull/5654
[3] https://github.com/hashicorp/nomad/blob/v0.9.1/client/servers/manager.go#L198-L216
[4] https://github.com/hashicorp/nomad/blob/v0.9.1/client/client.go#L1603
Previously, if a channel is closed, we retry the Stats call. But, if that call
fails, we go in a backoff loop without calling Stats ever again.
Here, we use a utility function for calling driverHandle.Stats call that retries
as one expects.
I aimed to preserve the logging formats but made small improvements as I saw fit.
The driver plugin stub client must call `grpcutils.HandleGrpcErr` to handle plugin
shutdown similar to other functions. This ensures that TaskStats returns
`ErrPluginShutdown` when plugin shutdown.