agent: use the new lib/mutex for stateLock

Previously the ServiceManager had to run a separate goroutine so that it could block on a channel
send/receive instead of a lock. Using this mutex with TryLock allows us to cancel the lock when
the serviceConfigWatch is stopped.

Without this change removing the ServiceManager.Start goroutine would not be possible because
when AddService is called it acquires the stateLock. While that lock is held, if there are
existing watches for the service, the old watch will be stopped, and the goroutine holding the
lock will attempt to wait for that watcher goroutine to exit.

If the goroutine is handling an update (serviceConfigWatch.handleUpdate) then it can block on
acquiring the stateLock and deadlock the agent.  With this change the context is cancelled as
and the goroutine will exit instead of waiting on the stateLock.
This commit is contained in:
Daniel Nephin 2020-12-04 19:06:47 -05:00
parent 3685f39970
commit e1e94ce69c
2 changed files with 7 additions and 4 deletions

View File

@ -48,6 +48,7 @@ import (
"github.com/hashicorp/consul/ipaddr"
"github.com/hashicorp/consul/lib"
"github.com/hashicorp/consul/lib/file"
"github.com/hashicorp/consul/lib/mutex"
"github.com/hashicorp/consul/logging"
"github.com/hashicorp/consul/tlsutil"
"github.com/hashicorp/consul/types"
@ -222,7 +223,7 @@ type Agent struct {
exposedPorts map[string]int
// stateLock protects the agent state
stateLock sync.Mutex
stateLock *mutex.Mutex
// dockerClient is the client for performing docker health checks.
dockerClient *checks.DockerClient
@ -358,6 +359,7 @@ func New(bd BaseDeps) (*Agent, error) {
retryJoinCh: make(chan error),
shutdownCh: make(chan struct{}),
endpoints: make(map[string]string),
stateLock: mutex.New(),
baseDeps: bd,
tokens: bd.Tokens,

View File

@ -292,11 +292,12 @@ func (w *serviceConfigWatch) handleUpdate(ctx context.Context, event cache.Updat
persistServiceDefaults: serviceDefaults,
}
w.agent.stateLock.Lock()
if err := w.agent.stateLock.TryLock(ctx); err != nil {
return nil
}
defer w.agent.stateLock.Unlock()
// While we were waiting on the agent state lock we may have been shutdown.
// So avoid doing a registration in that case.
// The context may have been cancelled after the lock was acquired.
if err := ctx.Err(); err != nil {
return nil
}