open-nomad/client/driver_manager_test.go

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

194 lines
5.3 KiB
Go
Raw Normal View History

// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: MPL-2.0
package client
import (
"fmt"
"testing"
"time"
"github.com/hashicorp/nomad/ci"
"github.com/hashicorp/nomad/client/config"
"github.com/hashicorp/nomad/client/pluginmanager/drivermanager"
"github.com/hashicorp/nomad/helper/pluginutils/catalog"
nconfig "github.com/hashicorp/nomad/nomad/structs/config"
"github.com/hashicorp/nomad/testutil"
"github.com/stretchr/testify/require"
)
// TestDriverManager_Fingerprint_Run asserts that node is populated with
// driver fingerprints
func TestDriverManager_Fingerprint_Run(t *testing.T) {
ci.Parallel(t)
testClient, cleanup := TestClient(t, nil)
defer cleanup()
client: fix data races in config handling (#14139) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex *only protects the config pointer itself and not fields inside the Config struct:* all config mutation must be done on a *copy* of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an *internally backward incompatible API change:* our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"
2022-08-18 23:32:04 +00:00
conf := testClient.GetConfig()
dm := drivermanager.New(&drivermanager.Config{
Logger: testClient.logger,
client: fix data races in config handling (#14139) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex *only protects the config pointer itself and not fields inside the Config struct:* all config mutation must be done on a *copy* of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an *internally backward incompatible API change:* our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"
2022-08-18 23:32:04 +00:00
Loader: conf.PluginSingletonLoader,
PluginConfig: conf.NomadPluginConfig(),
Updater: testClient.updateNodeFromDriver,
EventHandlerFactory: testClient.GetTaskEventHandler,
State: testClient.stateDB,
})
go dm.Run()
defer dm.Shutdown()
testutil.WaitForResult(func() (bool, error) {
client: fix data races in config handling (#14139) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex *only protects the config pointer itself and not fields inside the Config struct:* all config mutation must be done on a *copy* of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an *internally backward incompatible API change:* our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"
2022-08-18 23:32:04 +00:00
node := testClient.Node()
d, ok := node.Drivers["mock_driver"]
if !ok {
return false, fmt.Errorf("mock_driver driver is not present: %+v", node.Drivers)
}
if !d.Detected || !d.Healthy {
return false, fmt.Errorf("mock_driver driver is not marked healthy: %+v", d)
}
return true, nil
}, func(err error) {
require.NoError(t, err)
})
}
// TestDriverManager_Fingerprint_Run asserts that node is populated with
// driver fingerprints and it's updated periodically
func TestDriverManager_Fingerprint_Periodic(t *testing.T) {
ci.Parallel(t)
testClient, cleanup := TestClient(t, func(c *config.Config) {
pluginConfig := []*nconfig.PluginConfig{
{
Name: "mock_driver",
Config: map[string]interface{}{
"shutdown_periodic_after": true,
"shutdown_periodic_duration": 2 * time.Second,
},
},
}
c.PluginLoader = catalog.TestPluginLoaderWithOptions(t, "", map[string]string{}, pluginConfig)
})
defer cleanup()
client: fix data races in config handling (#14139) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex *only protects the config pointer itself and not fields inside the Config struct:* all config mutation must be done on a *copy* of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an *internally backward incompatible API change:* our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"
2022-08-18 23:32:04 +00:00
conf := testClient.GetConfig()
dm := drivermanager.New(&drivermanager.Config{
Logger: testClient.logger,
client: fix data races in config handling (#14139) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex *only protects the config pointer itself and not fields inside the Config struct:* all config mutation must be done on a *copy* of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an *internally backward incompatible API change:* our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"
2022-08-18 23:32:04 +00:00
Loader: conf.PluginSingletonLoader,
PluginConfig: conf.NomadPluginConfig(),
Updater: testClient.updateNodeFromDriver,
EventHandlerFactory: testClient.GetTaskEventHandler,
State: testClient.stateDB,
})
go dm.Run()
defer dm.Shutdown()
// we get a healthy mock_driver first
testutil.WaitForResult(func() (bool, error) {
node := testClient.Node()
d, ok := node.Drivers["mock_driver"]
if !ok {
return false, fmt.Errorf("mock_driver driver is not present: %+v", node.Drivers)
}
if !d.Detected || !d.Healthy {
return false, fmt.Errorf("mock_driver driver is not marked healthy: %+v", d)
}
return true, nil
}, func(err error) {
require.NoError(t, err)
})
// eventually, the mock_driver is marked as unhealthy
testutil.WaitForResult(func() (bool, error) {
node := testClient.Node()
d, ok := node.Drivers["mock_driver"]
if !ok {
return false, fmt.Errorf("mock_driver driver is not present: %+v", node.Drivers)
}
if d.Detected || d.Healthy {
return false, fmt.Errorf("mock_driver driver is still marked as healthy: %+v", d)
}
return true, nil
}, func(err error) {
require.NoError(t, err)
})
}
// TestDriverManager_NodeAttributes_Run asserts that node attributes are populated
// in addition to node.Drivers until we fully deprecate it
func TestDriverManager_NodeAttributes_Run(t *testing.T) {
ci.Parallel(t)
testClient, cleanup := TestClient(t, func(c *config.Config) {
c.Options = map[string]string{
"driver.raw_exec.enable": "1",
}
})
defer cleanup()
client: fix data races in config handling (#14139) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex *only protects the config pointer itself and not fields inside the Config struct:* all config mutation must be done on a *copy* of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an *internally backward incompatible API change:* our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"
2022-08-18 23:32:04 +00:00
conf := testClient.GetConfig()
dm := drivermanager.New(&drivermanager.Config{
Logger: testClient.logger,
client: fix data races in config handling (#14139) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex *only protects the config pointer itself and not fields inside the Config struct:* all config mutation must be done on a *copy* of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an *internally backward incompatible API change:* our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"
2022-08-18 23:32:04 +00:00
Loader: conf.PluginSingletonLoader,
PluginConfig: conf.NomadPluginConfig(),
Updater: testClient.updateNodeFromDriver,
EventHandlerFactory: testClient.GetTaskEventHandler,
State: testClient.stateDB,
})
go dm.Run()
defer dm.Shutdown()
// we should have mock_driver as well as raw_exec in node attributes
testutil.WaitForResult(func() (bool, error) {
node := testClient.Node()
// check mock driver
if node.Attributes["driver.mock_driver"] == "" {
return false, fmt.Errorf("mock_driver is not present in attributes: %#v", node.Attributes)
}
d, ok := node.Drivers["mock_driver"]
if !ok {
return false, fmt.Errorf("mock_driver is not present in drivers: %#v", node.Drivers)
}
if !d.Detected || !d.Healthy {
return false, fmt.Errorf("mock_driver driver is not marked as healthy: %+v", d)
}
if d.Attributes["driver.mock_driver"] != "" {
return false, fmt.Errorf("mock driver driver attributes contain duplicate health info: %#v", d.Attributes)
}
// check raw_exec
if node.Attributes["driver.raw_exec"] == "" {
return false, fmt.Errorf("raw_exec is not present in attributes: %#v", node.Attributes)
}
d, ok = node.Drivers["raw_exec"]
if !ok {
return false, fmt.Errorf("raw_exec is not present in drivers: %#v", node.Drivers)
}
if !d.Detected || !d.Healthy {
return false, fmt.Errorf("raw_exec driver is not marked as healthy: %+v", d)
}
return true, nil
}, func(err error) {
require.NoError(t, err)
})
}