open-nomad/client/pluginmanager
Tim Gross 27bb2da5ee
CSI: make gRPC client creation more robust (#12057)
Nomad communicates with CSI plugin tasks via gRPC. The plugin
supervisor hook uses this to ping the plugin for health checks which
it emits as task events. After the first successful health check the
plugin supervisor registers the plugin in the client's dynamic plugin
registry, which in turn creates a CSI plugin manager instance that has
its own gRPC client for fingerprinting the plugin and sending mount
requests.

If the plugin manager instance fails to connect to the plugin on its
first attempt, it exits. The plugin supervisor hook is unaware that
connection failed so long as its own pings continue to work. A
transient failure during plugin startup may mislead the plugin
supervisor hook into thinking the plugin is up (so there's no need to
restart the allocation) but no fingerprinter is started.

* Refactors the gRPC client to connect on first use. This provides the
  plugin manager instance the ability to retry the gRPC client
  connection until success.
* Add a 30s timeout to the plugin supervisor so that we don't poll
  forever waiting for a plugin that will never come back up.

Minor improvements:
* The plugin supervisor hook creates a new gRPC client for every probe
  and then throws it away. Instead, reuse the client as we do for the
  plugin manager.
* The gRPC client constructor has a 1 second timeout. Clarify that this
  timeout applies to the connection and not the rest of the client
  lifetime.
2022-02-15 16:57:29 -05:00
..
csimanager CSI: make gRPC client creation more robust (#12057) 2022-02-15 16:57:29 -05:00
drivermanager Log error if there are no event handlers registered 2021-10-11 19:44:52 +00:00
group.go chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
group_test.go pluginmanager: WaitForFirstFingerprint times out (#9597) 2020-12-10 07:27:15 -08:00
manager.go client: batch initial fingerprinting in plugin manangers 2018-12-18 22:56:19 -05:00
testing.go pluginmanager: WaitForFirstFingerprint times out (#9597) 2020-12-10 07:27:15 -08:00