Commit Graph

7 Commits

Author SHA1 Message Date
Mahmood Ali 1bdfcdcab7 add timeouts for docker reconciler docker calls 2019-10-18 15:31:13 -04:00
Mahmood Ali 3aec7b56ea Only start reconciler once in main driver
driver.SetConfig is not appropriate for starting up reconciler
goroutine.  Some ephemeral driver instances are created for validating
config and we ought not to side-effecting goroutines for those.

We currently lack a lifecycle hook to inject these, so I picked the
`Fingerprinter` function for now, and reconciler should only run after
fingerprinter started.

Use `sync.Once` to ensure that we only start reconciler loop once.
2019-10-18 14:43:23 -04:00
Mahmood Ali ac3b555cc8 docker label refactoring and additional tests 2019-10-17 10:45:13 -04:00
Mahmood Ali 8739cc2a62 refactor reconciler code and address comments 2019-10-17 09:42:23 -04:00
Mahmood Ali c01c6de481 address code review comments 2019-10-17 08:36:02 -04:00
Mahmood Ali 2a63caafba docker: explicit grace period for initial container reconcilation
Ensure we wait for some grace period before killing docker containers
that may have launched in earlier nomad restore.
2019-10-17 08:36:02 -04:00
Mahmood Ali aa59280edc docker: periodically reconcile containers
When running at scale, it's possible that Docker Engine starts
containers successfully but gets wedged in a way where API call fails.
The Docker Engine may remain unavailable for arbitrary long time.

Here, we introduce a periodic reconcilation process that ensures that any
container started by nomad is tracked, and killed if is running
unexpectedly.

Basically, the periodic job inspects any container that isn't tracked in
its handlers.  A creation grace period is used to prevent killing newly
created containers that aren't registered yet.

Also, we aim to avoid killing unrelated containters started by host or
through raw_exec drivers.  The logic is to pattern against containers
environment variables and mounts to infer if they are an alloc docker
container.

Lastly, the periodic job can be disabled to avoid any interference if
need be.
2019-10-17 08:36:01 -04:00