open-nomad/scheduler/util_test.go

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

1828 lines
47 KiB
Go
Raw Normal View History

2015-08-13 23:25:59 +00:00
package scheduler
import (
"fmt"
2015-09-07 18:23:38 +00:00
"reflect"
2015-08-13 23:25:59 +00:00
"testing"
"time"
2015-08-13 23:25:59 +00:00
"github.com/hashicorp/nomad/ci"
Update alloc after reconnect and enforece client heartbeat order (#15068) * scheduler: allow updates after alloc reconnects When an allocation reconnects to a cluster the scheduler needs to run special logic to handle the reconnection, check if a replacement was create and stop one of them. If the allocation kept running while the node was disconnected, it will be reconnected with `ClientStatus: running` and the node will have `Status: ready`. This combination is the same as the normal steady state of allocation, where everything is running as expected. In order to differentiate between the two states (an allocation that is reconnecting and one that is just running) the scheduler needs an extra piece of state. The current implementation uses the presence of a `TaskClientReconnected` task event to detect when the allocation has reconnected and thus must go through the reconnection process. But this event remains even after the allocation is reconnected, causing all future evals to consider the allocation as still reconnecting. This commit changes the reconnect logic to use an `AllocState` to register when the allocation was reconnected. This provides the following benefits: - Only a limited number of task states are kept, and they are used for many other events. It's possible that, upon reconnecting, several actions are triggered that could cause the `TaskClientReconnected` event to be dropped. - Task events are set by clients and so their timestamps are subject to time skew from servers. This prevents using time to determine if an allocation reconnected after a disconnect event. - Disconnect events are already stored as `AllocState` and so storing reconnects there as well makes it the only source of information required. With the new logic, the reconnection logic is only triggered if the last `AllocState` is a disconnect event, meaning that the allocation has not been reconnected yet. After the reconnection is handled, the new `ClientStatus` is store in `AllocState` allowing future evals to skip the reconnection logic. * scheduler: prevent spurious placement on reconnect When a client reconnects it makes two independent RPC calls: - `Node.UpdateStatus` to heartbeat and set its status as `ready`. - `Node.UpdateAlloc` to update the status of its allocations. These two calls can happen in any order, and in case the allocations are updated before a heartbeat it causes the state to be the same as a node being disconnected: the node status will still be `disconnected` while the allocation `ClientStatus` is set to `running`. The current implementation did not handle this order of events properly, and the scheduler would create an unnecessary placement since it considered the allocation was being disconnected. This extra allocation would then be quickly stopped by the heartbeat eval. This commit adds a new code path to handle this order of events. If the node is `disconnected` and the allocation `ClientStatus` is `running` the scheduler will check if the allocation is actually reconnecting using its `AllocState` events. * rpc: only allow alloc updates from `ready` nodes Clients interact with servers using three main RPC methods: - `Node.GetAllocs` reads allocation data from the server and writes it to the client. - `Node.UpdateAlloc` reads allocation from from the client and writes them to the server. - `Node.UpdateStatus` writes the client status to the server and is used as the heartbeat mechanism. These three methods are called periodically by the clients and are done so independently from each other, meaning that there can't be any assumptions in their ordering. This can generate scenarios that are hard to reason about and to code for. For example, when a client misses too many heartbeats it will be considered `down` or `disconnected` and the allocations it was running are set to `lost` or `unknown`. When connectivity is restored the to rest of the cluster, the natural mental model is to think that the client will heartbeat first and then update its allocations status into the servers. But since there's no inherit order in these calls the reverse is just as possible: the client updates the alloc status and then heartbeats. This results in a state where allocs are, for example, `running` while the client is still `disconnected`. This commit adds a new verification to the `Node.UpdateAlloc` method to reject updates from nodes that are not `ready`, forcing clients to heartbeat first. Since this check is done server-side there is no need to coordinate operations client-side: they can continue sending these requests independently and alloc update will succeed after the heartbeat is done. * chagelog: add entry for #15068 * code review * client: skip terminal allocations on reconnect When the client reconnects with the server it synchronizes the state of its allocations by sending data using the `Node.UpdateAlloc` RPC and fetching data using the `Node.GetClientAllocs` RPC. If the data fetch happens before the data write, `unknown` allocations will still be in this state and would trigger the `allocRunner.Reconnect` flow. But when the server `DesiredStatus` for the allocation is `stop` the client should not reconnect the allocation. * apply more code review changes * scheduler: persist changes to reconnected allocs Reconnected allocs have a new AllocState entry that must be persisted by the plan applier. * rpc: read node ID from allocs in UpdateAlloc The AllocUpdateRequest struct is used in three disjoint use cases: 1. Stripped allocs from clients Node.UpdateAlloc RPC using the Allocs, and WriteRequest fields 2. Raft log message using the Allocs, Evals, and WriteRequest fields 3. Plan updates using the AllocsStopped, AllocsUpdated, and Job fields Adding a new field that would only be used in one these cases (1) made things more confusing and error prone. While in theory an AllocUpdateRequest could send allocations from different nodes, in practice this never actually happens since only clients call this method with their own allocations. * scheduler: remove logic to handle exceptional case This condition could only be hit if, somehow, the allocation status was set to "running" while the client was "unknown". This was addressed by enforcing an order in "Node.UpdateStatus" and "Node.UpdateAlloc" RPC calls, so this scenario is not expected to happen. Adding unnecessary code to the scheduler makes it harder to read and reason about it. * more code review * remove another unused test
2022-11-04 20:25:11 +00:00
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/hashicorp/nomad/helper/pointer"
2018-06-13 22:33:25 +00:00
"github.com/hashicorp/nomad/helper/testlog"
"github.com/hashicorp/nomad/helper/uuid"
2015-08-13 23:25:59 +00:00
"github.com/hashicorp/nomad/nomad/mock"
2015-08-14 00:19:09 +00:00
"github.com/hashicorp/nomad/nomad/state"
2015-08-13 23:25:59 +00:00
"github.com/hashicorp/nomad/nomad/structs"
)
func TestMaterializeTaskGroups(t *testing.T) {
ci.Parallel(t)
2015-08-13 23:25:59 +00:00
job := mock.Job()
index := materializeTaskGroups(job)
require.Equal(t, 10, len(index))
2015-08-13 23:25:59 +00:00
for i := 0; i < 10; i++ {
name := fmt.Sprintf("my-job.web[%d]", i)
require.Contains(t, index, name)
require.Equal(t, job.TaskGroups[0], index[name])
2015-08-13 23:25:59 +00:00
}
}
func newNode(name string) *structs.Node {
n := mock.Node()
n.Name = name
return n
}
func TestDiffSystemAllocsForNode_Sysbatch_terminal(t *testing.T) {
ci.Parallel(t)
// For a sysbatch job, the scheduler should not re-place an allocation
// that has become terminal, unless the job has been updated.
job := mock.SystemBatchJob()
required := materializeTaskGroups(job)
eligible := map[string]*structs.Node{
"node1": newNode("node1"),
}
var live []*structs.Allocation // empty
tainted := map[string]*structs.Node(nil)
t.Run("current job", func(t *testing.T) {
terminal := structs.TerminalByNodeByName{
"node1": map[string]*structs.Allocation{
"my-sysbatch.pinger[0]": {
ID: uuid.Generate(),
NodeID: "node1",
Name: "my-sysbatch.pinger[0]",
Job: job,
ClientStatus: structs.AllocClientStatusComplete,
},
},
}
diff := diffSystemAllocsForNode(job, "node1", eligible, nil, tainted, required, live, terminal, true)
require.Empty(t, diff.place)
require.Empty(t, diff.update)
require.Empty(t, diff.stop)
require.Empty(t, diff.migrate)
require.Empty(t, diff.lost)
require.True(t, len(diff.ignore) == 1 && diff.ignore[0].Alloc == terminal["node1"]["my-sysbatch.pinger[0]"])
})
t.Run("outdated job", func(t *testing.T) {
previousJob := job.Copy()
previousJob.JobModifyIndex -= 1
terminal := structs.TerminalByNodeByName{
"node1": map[string]*structs.Allocation{
"my-sysbatch.pinger[0]": {
ID: uuid.Generate(),
NodeID: "node1",
Name: "my-sysbatch.pinger[0]",
Job: previousJob,
},
},
}
expAlloc := terminal["node1"]["my-sysbatch.pinger[0]"]
expAlloc.NodeID = "node1"
diff := diffSystemAllocsForNode(job, "node1", eligible, nil, tainted, required, live, terminal, true)
require.Empty(t, diff.place)
require.Len(t, diff.update, 1)
require.Empty(t, diff.stop)
require.Empty(t, diff.migrate)
require.Empty(t, diff.lost)
require.Empty(t, diff.ignore)
})
}
2020-01-30 17:09:32 +00:00
func TestDiffSystemAllocsForNode(t *testing.T) {
ci.Parallel(t)
2015-08-13 23:25:59 +00:00
job := mock.Job()
required := materializeTaskGroups(job)
// The "old" job has a previous modify index
oldJob := new(structs.Job)
*oldJob = *job
2016-01-12 17:50:33 +00:00
oldJob.JobModifyIndex -= 1
2015-08-13 23:25:59 +00:00
eligibleNode := mock.Node()
eligibleNode.ID = "zip"
drainNode := mock.DrainNode()
deadNode := mock.Node()
deadNode.Status = structs.NodeStatusDown
tainted := map[string]*structs.Node{
"dead": deadNode,
"drainNode": drainNode,
}
eligible := map[string]*structs.Node{
eligibleNode.ID: eligibleNode,
}
2015-08-13 23:25:59 +00:00
allocs := []*structs.Allocation{
// Update the 1st
2017-09-26 22:26:33 +00:00
{
ID: uuid.Generate(),
2015-08-13 23:25:59 +00:00
NodeID: "zip",
Name: "my-job.web[0]",
Job: oldJob,
},
// Ignore the 2rd
2017-09-26 22:26:33 +00:00
{
ID: uuid.Generate(),
2015-08-13 23:25:59 +00:00
NodeID: "zip",
Name: "my-job.web[1]",
Job: job,
},
// Evict 11th
2017-09-26 22:26:33 +00:00
{
ID: uuid.Generate(),
2015-08-13 23:25:59 +00:00
NodeID: "zip",
Name: "my-job.web[10]",
Job: oldJob,
2015-08-13 23:25:59 +00:00
},
// Migrate the 3rd
2017-09-26 22:26:33 +00:00
{
ID: uuid.Generate(),
NodeID: "drainNode",
Name: "my-job.web[2]",
Job: oldJob,
DesiredTransition: structs.DesiredTransition{
Migrate: pointer.Of(true),
},
},
// Mark the 4th lost
2017-09-26 22:26:33 +00:00
{
ID: uuid.Generate(),
NodeID: "dead",
Name: "my-job.web[3]",
Job: oldJob,
},
2015-08-13 23:25:59 +00:00
}
// Have three terminal allocs
terminal := structs.TerminalByNodeByName{
"zip": map[string]*structs.Allocation{
"my-job.web[4]": {
ID: uuid.Generate(),
NodeID: "zip",
Name: "my-job.web[4]",
Job: job,
},
"my-job.web[5]": {
ID: uuid.Generate(),
NodeID: "zip",
Name: "my-job.web[5]",
Job: job,
},
"my-job.web[6]": {
ID: uuid.Generate(),
NodeID: "zip",
Name: "my-job.web[6]",
Job: job,
},
},
}
diff := diffSystemAllocsForNode(job, "zip", eligible, nil, tainted, required, allocs, terminal, true)
2015-08-13 23:25:59 +00:00
// We should update the first alloc
require.Len(t, diff.update, 1)
require.Equal(t, allocs[0], diff.update[0].Alloc)
2015-08-13 23:25:59 +00:00
// We should ignore the second alloc
require.Len(t, diff.ignore, 1)
require.Equal(t, allocs[1], diff.ignore[0].Alloc)
2015-08-13 23:25:59 +00:00
2015-08-26 00:06:06 +00:00
// We should stop the 3rd alloc
require.Len(t, diff.stop, 1)
require.Equal(t, allocs[2], diff.stop[0].Alloc)
2015-08-13 23:25:59 +00:00
// We should migrate the 4rd alloc
require.Len(t, diff.migrate, 1)
require.Equal(t, allocs[3], diff.migrate[0].Alloc)
// We should mark the 5th alloc as lost
require.Len(t, diff.lost, 1)
require.Equal(t, allocs[4], diff.lost[0].Alloc)
// We should place 6
require.Len(t, diff.place, 6)
// Ensure that the allocations which are replacements of terminal allocs are
// annotated.
for _, m := range terminal {
for _, alloc := range m {
for _, tuple := range diff.place {
if alloc.Name == tuple.Name {
require.Equal(t, alloc, tuple.Alloc)
}
}
}
}
2015-08-13 23:25:59 +00:00
}
2020-01-30 17:09:32 +00:00
// Test the desired diff for an updated system job running on a
// ineligible node
func TestDiffSystemAllocsForNode_ExistingAllocIneligibleNode(t *testing.T) {
ci.Parallel(t)
2020-01-30 17:09:32 +00:00
job := mock.Job()
job.TaskGroups[0].Count = 1
required := materializeTaskGroups(job)
// The "old" job has a previous modify index
oldJob := new(structs.Job)
*oldJob = *job
oldJob.JobModifyIndex -= 1
eligibleNode := mock.Node()
ineligibleNode := mock.Node()
ineligibleNode.SchedulingEligibility = structs.NodeSchedulingIneligible
tainted := map[string]*structs.Node{}
eligible := map[string]*structs.Node{
eligibleNode.ID: eligibleNode,
}
allocs := []*structs.Allocation{
// Update the TG alloc running on eligible node
{
ID: uuid.Generate(),
NodeID: eligibleNode.ID,
Name: "my-job.web[0]",
Job: oldJob,
},
// Ignore the TG alloc running on ineligible node
{
ID: uuid.Generate(),
NodeID: ineligibleNode.ID,
Name: "my-job.web[0]",
Job: job,
},
}
// No terminal allocs
terminal := make(structs.TerminalByNodeByName)
2020-01-30 17:09:32 +00:00
diff := diffSystemAllocsForNode(job, eligibleNode.ID, eligible, nil, tainted, required, allocs, terminal, true)
require.Len(t, diff.place, 0)
require.Len(t, diff.update, 1)
require.Len(t, diff.migrate, 0)
require.Len(t, diff.stop, 0)
require.Len(t, diff.ignore, 1)
require.Len(t, diff.lost, 0)
2020-01-30 17:09:32 +00:00
}
Update alloc after reconnect and enforece client heartbeat order (#15068) * scheduler: allow updates after alloc reconnects When an allocation reconnects to a cluster the scheduler needs to run special logic to handle the reconnection, check if a replacement was create and stop one of them. If the allocation kept running while the node was disconnected, it will be reconnected with `ClientStatus: running` and the node will have `Status: ready`. This combination is the same as the normal steady state of allocation, where everything is running as expected. In order to differentiate between the two states (an allocation that is reconnecting and one that is just running) the scheduler needs an extra piece of state. The current implementation uses the presence of a `TaskClientReconnected` task event to detect when the allocation has reconnected and thus must go through the reconnection process. But this event remains even after the allocation is reconnected, causing all future evals to consider the allocation as still reconnecting. This commit changes the reconnect logic to use an `AllocState` to register when the allocation was reconnected. This provides the following benefits: - Only a limited number of task states are kept, and they are used for many other events. It's possible that, upon reconnecting, several actions are triggered that could cause the `TaskClientReconnected` event to be dropped. - Task events are set by clients and so their timestamps are subject to time skew from servers. This prevents using time to determine if an allocation reconnected after a disconnect event. - Disconnect events are already stored as `AllocState` and so storing reconnects there as well makes it the only source of information required. With the new logic, the reconnection logic is only triggered if the last `AllocState` is a disconnect event, meaning that the allocation has not been reconnected yet. After the reconnection is handled, the new `ClientStatus` is store in `AllocState` allowing future evals to skip the reconnection logic. * scheduler: prevent spurious placement on reconnect When a client reconnects it makes two independent RPC calls: - `Node.UpdateStatus` to heartbeat and set its status as `ready`. - `Node.UpdateAlloc` to update the status of its allocations. These two calls can happen in any order, and in case the allocations are updated before a heartbeat it causes the state to be the same as a node being disconnected: the node status will still be `disconnected` while the allocation `ClientStatus` is set to `running`. The current implementation did not handle this order of events properly, and the scheduler would create an unnecessary placement since it considered the allocation was being disconnected. This extra allocation would then be quickly stopped by the heartbeat eval. This commit adds a new code path to handle this order of events. If the node is `disconnected` and the allocation `ClientStatus` is `running` the scheduler will check if the allocation is actually reconnecting using its `AllocState` events. * rpc: only allow alloc updates from `ready` nodes Clients interact with servers using three main RPC methods: - `Node.GetAllocs` reads allocation data from the server and writes it to the client. - `Node.UpdateAlloc` reads allocation from from the client and writes them to the server. - `Node.UpdateStatus` writes the client status to the server and is used as the heartbeat mechanism. These three methods are called periodically by the clients and are done so independently from each other, meaning that there can't be any assumptions in their ordering. This can generate scenarios that are hard to reason about and to code for. For example, when a client misses too many heartbeats it will be considered `down` or `disconnected` and the allocations it was running are set to `lost` or `unknown`. When connectivity is restored the to rest of the cluster, the natural mental model is to think that the client will heartbeat first and then update its allocations status into the servers. But since there's no inherit order in these calls the reverse is just as possible: the client updates the alloc status and then heartbeats. This results in a state where allocs are, for example, `running` while the client is still `disconnected`. This commit adds a new verification to the `Node.UpdateAlloc` method to reject updates from nodes that are not `ready`, forcing clients to heartbeat first. Since this check is done server-side there is no need to coordinate operations client-side: they can continue sending these requests independently and alloc update will succeed after the heartbeat is done. * chagelog: add entry for #15068 * code review * client: skip terminal allocations on reconnect When the client reconnects with the server it synchronizes the state of its allocations by sending data using the `Node.UpdateAlloc` RPC and fetching data using the `Node.GetClientAllocs` RPC. If the data fetch happens before the data write, `unknown` allocations will still be in this state and would trigger the `allocRunner.Reconnect` flow. But when the server `DesiredStatus` for the allocation is `stop` the client should not reconnect the allocation. * apply more code review changes * scheduler: persist changes to reconnected allocs Reconnected allocs have a new AllocState entry that must be persisted by the plan applier. * rpc: read node ID from allocs in UpdateAlloc The AllocUpdateRequest struct is used in three disjoint use cases: 1. Stripped allocs from clients Node.UpdateAlloc RPC using the Allocs, and WriteRequest fields 2. Raft log message using the Allocs, Evals, and WriteRequest fields 3. Plan updates using the AllocsStopped, AllocsUpdated, and Job fields Adding a new field that would only be used in one these cases (1) made things more confusing and error prone. While in theory an AllocUpdateRequest could send allocations from different nodes, in practice this never actually happens since only clients call this method with their own allocations. * scheduler: remove logic to handle exceptional case This condition could only be hit if, somehow, the allocation status was set to "running" while the client was "unknown". This was addressed by enforcing an order in "Node.UpdateStatus" and "Node.UpdateAlloc" RPC calls, so this scenario is not expected to happen. Adding unnecessary code to the scheduler makes it harder to read and reason about it. * more code review * remove another unused test
2022-11-04 20:25:11 +00:00
func TestDiffSystemAllocsForNode_DisconnectedNode(t *testing.T) {
ci.Parallel(t)
// Create job.
job := mock.SystemJob()
job.TaskGroups[0].MaxClientDisconnect = pointer.Of(time.Hour)
// Create nodes.
readyNode := mock.Node()
readyNode.Status = structs.NodeStatusReady
disconnectedNode := mock.Node()
disconnectedNode.Status = structs.NodeStatusDisconnected
eligibleNodes := map[string]*structs.Node{
readyNode.ID: readyNode,
}
taintedNodes := map[string]*structs.Node{
disconnectedNode.ID: disconnectedNode,
}
// Create allocs.
required := materializeTaskGroups(job)
terminal := make(structs.TerminalByNodeByName)
type diffResultCount struct {
place, update, migrate, stop, ignore, lost, disconnecting, reconnecting int
}
testCases := []struct {
name string
node *structs.Node
allocFn func(*structs.Allocation)
expect diffResultCount
}{
{
name: "alloc in disconnected client is marked as unknown",
node: disconnectedNode,
allocFn: func(alloc *structs.Allocation) {
alloc.ClientStatus = structs.AllocClientStatusRunning
},
expect: diffResultCount{
disconnecting: 1,
},
},
{
name: "disconnected alloc reconnects",
node: readyNode,
allocFn: func(alloc *structs.Allocation) {
alloc.ClientStatus = structs.AllocClientStatusRunning
alloc.AllocStates = []*structs.AllocState{{
Field: structs.AllocStateFieldClientStatus,
Value: structs.AllocClientStatusUnknown,
Time: time.Now().Add(-time.Minute),
}}
},
expect: diffResultCount{
reconnecting: 1,
},
},
{
name: "alloc not reconnecting after it reconnects",
node: readyNode,
allocFn: func(alloc *structs.Allocation) {
alloc.ClientStatus = structs.AllocClientStatusRunning
alloc.AllocStates = []*structs.AllocState{
{
Field: structs.AllocStateFieldClientStatus,
Value: structs.AllocClientStatusUnknown,
Time: time.Now().Add(-time.Minute),
},
{
Field: structs.AllocStateFieldClientStatus,
Value: structs.AllocClientStatusRunning,
Time: time.Now(),
},
}
},
expect: diffResultCount{
ignore: 1,
},
},
{
name: "disconnected alloc is lost after it expires",
node: disconnectedNode,
allocFn: func(alloc *structs.Allocation) {
alloc.ClientStatus = structs.AllocClientStatusUnknown
alloc.AllocStates = []*structs.AllocState{{
Field: structs.AllocStateFieldClientStatus,
Value: structs.AllocClientStatusUnknown,
Time: time.Now().Add(-10 * time.Hour),
}}
},
expect: diffResultCount{
lost: 1,
},
},
{
name: "disconnected allocs are ignored",
node: disconnectedNode,
allocFn: func(alloc *structs.Allocation) {
alloc.ClientStatus = structs.AllocClientStatusUnknown
alloc.AllocStates = []*structs.AllocState{{
Field: structs.AllocStateFieldClientStatus,
Value: structs.AllocClientStatusUnknown,
Time: time.Now(),
}}
},
expect: diffResultCount{
ignore: 1,
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
alloc := mock.AllocForNode(tc.node)
alloc.JobID = job.ID
alloc.Job = job
alloc.Name = fmt.Sprintf("%s.%s[0]", job.Name, job.TaskGroups[0].Name)
if tc.allocFn != nil {
tc.allocFn(alloc)
}
got := diffSystemAllocsForNode(
job, tc.node.ID, eligibleNodes, nil, taintedNodes,
required, []*structs.Allocation{alloc}, terminal, true,
)
assert.Len(t, got.place, tc.expect.place, "place")
assert.Len(t, got.update, tc.expect.update, "update")
assert.Len(t, got.migrate, tc.expect.migrate, "migrate")
assert.Len(t, got.stop, tc.expect.stop, "stop")
assert.Len(t, got.ignore, tc.expect.ignore, "ignore")
assert.Len(t, got.lost, tc.expect.lost, "lost")
assert.Len(t, got.disconnecting, tc.expect.disconnecting, "disconnecting")
assert.Len(t, got.reconnecting, tc.expect.reconnecting, "reconnecting")
})
}
}
func TestDiffSystemAllocs(t *testing.T) {
ci.Parallel(t)
job := mock.SystemJob()
drainNode := mock.DrainNode()
deadNode := mock.Node()
deadNode.Status = structs.NodeStatusDown
tainted := map[string]*structs.Node{
deadNode.ID: deadNode,
drainNode.ID: drainNode,
}
// Create three alive nodes.
nodes := []*structs.Node{{ID: "foo"}, {ID: "bar"}, {ID: "baz"},
{ID: "pipe"}, {ID: drainNode.ID}, {ID: deadNode.ID}}
// The "old" job has a previous modify index
oldJob := new(structs.Job)
*oldJob = *job
oldJob.JobModifyIndex -= 1
allocs := []*structs.Allocation{
// Update allocation on baz
2017-09-26 22:26:33 +00:00
{
ID: uuid.Generate(),
NodeID: "baz",
Name: "my-job.web[0]",
Job: oldJob,
},
// Ignore allocation on bar
2017-09-26 22:26:33 +00:00
{
ID: uuid.Generate(),
NodeID: "bar",
Name: "my-job.web[0]",
Job: job,
},
// Stop allocation on draining node.
2017-09-26 22:26:33 +00:00
{
ID: uuid.Generate(),
NodeID: drainNode.ID,
Name: "my-job.web[0]",
Job: oldJob,
DesiredTransition: structs.DesiredTransition{
Migrate: pointer.Of(true),
},
},
// Mark as lost on a dead node
2017-09-26 22:26:33 +00:00
{
ID: uuid.Generate(),
NodeID: deadNode.ID,
Name: "my-job.web[0]",
Job: oldJob,
},
}
// Have three (?) terminal allocs
terminal := structs.TerminalByNodeByName{
"pipe": map[string]*structs.Allocation{
"my-job.web[0]": {
ID: uuid.Generate(),
NodeID: "pipe",
Name: "my-job.web[0]",
Job: job,
},
},
}
diff := diffSystemAllocs(job, nodes, nil, tainted, allocs, terminal, true)
// We should update the first alloc
require.Len(t, diff.update, 1)
require.Equal(t, allocs[0], diff.update[0].Alloc)
// We should ignore the second alloc
require.Len(t, diff.ignore, 1)
require.Equal(t, allocs[1], diff.ignore[0].Alloc)
// We should stop the third alloc
require.Empty(t, diff.stop)
// There should be no migrates.
require.Len(t, diff.migrate, 1)
require.Equal(t, allocs[2], diff.migrate[0].Alloc)
// We should mark the 5th alloc as lost
require.Len(t, diff.lost, 1)
require.Equal(t, allocs[3], diff.lost[0].Alloc)
// We should place 2
require.Len(t, diff.place, 2)
// Ensure that the allocations which are replacements of terminal allocs are
// annotated.
for _, m := range terminal {
for _, alloc := range m {
for _, tuple := range diff.place {
if alloc.NodeID == tuple.Alloc.NodeID {
require.Equal(t, alloc, tuple.Alloc)
}
}
}
}
}
2015-08-14 00:19:09 +00:00
func TestReadyNodesInDCs(t *testing.T) {
ci.Parallel(t)
2017-10-13 21:36:02 +00:00
state := state.TestStateStore(t)
2015-08-14 00:19:09 +00:00
node1 := mock.Node()
node2 := mock.Node()
node2.Datacenter = "dc2"
node3 := mock.Node()
node3.Datacenter = "dc2"
node3.Status = structs.NodeStatusDown
node4 := mock.DrainNode()
2015-08-14 00:19:09 +00:00
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 1000, node1))
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 1001, node2))
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 1002, node3))
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 1003, node4))
2015-08-14 00:19:09 +00:00
nodes, notReady, dc, err := readyNodesInDCs(state, []string{"dc1", "dc2"})
require.NoError(t, err)
require.Equal(t, 2, len(nodes))
require.NotEqual(t, node3.ID, nodes[0].ID)
require.NotEqual(t, node3.ID, nodes[1].ID)
require.Contains(t, dc, "dc1")
require.Equal(t, 1, dc["dc1"])
require.Contains(t, dc, "dc2")
require.Equal(t, 1, dc["dc2"])
require.Contains(t, notReady, node3.ID)
require.Contains(t, notReady, node4.ID)
2015-08-14 00:19:09 +00:00
}
2015-08-14 00:40:23 +00:00
func TestRetryMax(t *testing.T) {
ci.Parallel(t)
2015-08-14 00:40:23 +00:00
calls := 0
bad := func() (bool, error) {
calls += 1
return false, nil
}
err := retryMax(3, bad, nil)
require.Error(t, err)
require.Equal(t, 3, calls, "mis match")
2015-08-14 00:40:23 +00:00
calls = 0
first := true
reset := func() bool {
if calls == 3 && first {
first = false
return true
}
return false
}
err = retryMax(3, bad, reset)
require.Error(t, err)
require.Equal(t, 6, calls, "mis match")
2015-08-14 00:40:23 +00:00
calls = 0
good := func() (bool, error) {
calls += 1
return true, nil
}
err = retryMax(3, good, nil)
require.NoError(t, err)
require.Equal(t, 1, calls, "mis match")
2015-08-14 00:40:23 +00:00
}
2015-08-14 00:51:31 +00:00
func TestTaintedNodes(t *testing.T) {
ci.Parallel(t)
2017-10-13 21:36:02 +00:00
state := state.TestStateStore(t)
2015-08-14 00:51:31 +00:00
node1 := mock.Node()
node2 := mock.Node()
node2.Datacenter = "dc2"
node3 := mock.Node()
node3.Datacenter = "dc2"
node3.Status = structs.NodeStatusDown
node4 := mock.DrainNode()
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 1000, node1))
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 1001, node2))
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 1002, node3))
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 1003, node4))
2015-08-14 00:51:31 +00:00
allocs := []*structs.Allocation{
2017-09-26 22:26:33 +00:00
{NodeID: node1.ID},
{NodeID: node2.ID},
{NodeID: node3.ID},
{NodeID: node4.ID},
{NodeID: "12345678-abcd-efab-cdef-123456789abc"},
2015-08-14 00:51:31 +00:00
}
tainted, err := taintedNodes(state, allocs)
require.NoError(t, err)
require.Equal(t, 3, len(tainted))
require.NotContains(t, tainted, node1.ID)
require.NotContains(t, tainted, node2.ID)
2015-08-14 00:51:31 +00:00
require.Contains(t, tainted, node3.ID)
require.NotNil(t, tainted[node3.ID])
require.Contains(t, tainted, node4.ID)
require.NotNil(t, tainted[node4.ID])
require.Contains(t, tainted, "12345678-abcd-efab-cdef-123456789abc")
require.Nil(t, tainted["12345678-abcd-efab-cdef-123456789abc"])
2015-08-14 00:51:31 +00:00
}
2015-09-07 18:23:38 +00:00
func TestShuffleNodes(t *testing.T) {
ci.Parallel(t)
// Use a large number of nodes to make the probability of shuffling to the
// original order very low.
2015-09-07 18:23:38 +00:00
nodes := []*structs.Node{
mock.Node(),
mock.Node(),
mock.Node(),
mock.Node(),
mock.Node(),
mock.Node(),
mock.Node(),
mock.Node(),
mock.Node(),
mock.Node(),
2015-09-07 18:23:38 +00:00
}
orig := make([]*structs.Node, len(nodes))
copy(orig, nodes)
eval := mock.Eval() // will have random EvalID
plan := eval.MakePlan(mock.Job())
shuffleNodes(plan, 1000, nodes)
require.False(t, reflect.DeepEqual(nodes, orig))
nodes2 := make([]*structs.Node, len(nodes))
copy(nodes2, orig)
shuffleNodes(plan, 1000, nodes2)
require.True(t, reflect.DeepEqual(nodes, nodes2))
2015-09-07 18:23:38 +00:00
}
func TestTaskUpdatedAffinity(t *testing.T) {
ci.Parallel(t)
j1 := mock.Job()
j2 := mock.Job()
name := j1.TaskGroups[0].Name
require.False(t, tasksUpdated(j1, j2, name))
// TaskGroup Affinity
j2.TaskGroups[0].Affinities = []*structs.Affinity{
{
LTarget: "node.datacenter",
RTarget: "dc1",
Operand: "=",
Weight: 100,
},
}
require.True(t, tasksUpdated(j1, j2, name))
// TaskGroup Task Affinity
j3 := mock.Job()
j3.TaskGroups[0].Tasks[0].Affinities = []*structs.Affinity{
{
LTarget: "node.datacenter",
RTarget: "dc1",
Operand: "=",
Weight: 100,
},
}
require.True(t, tasksUpdated(j1, j3, name))
j4 := mock.Job()
j4.TaskGroups[0].Tasks[0].Affinities = []*structs.Affinity{
{
LTarget: "node.datacenter",
RTarget: "dc1",
Operand: "=",
Weight: 100,
},
}
require.True(t, tasksUpdated(j1, j4, name))
// check different level of same affinity
j5 := mock.Job()
j5.Affinities = []*structs.Affinity{
{
LTarget: "node.datacenter",
RTarget: "dc1",
Operand: "=",
Weight: 100,
},
}
j6 := mock.Job()
j6.Affinities = make([]*structs.Affinity, 0)
j6.TaskGroups[0].Affinities = []*structs.Affinity{
{
LTarget: "node.datacenter",
RTarget: "dc1",
Operand: "=",
Weight: 100,
},
}
require.False(t, tasksUpdated(j5, j6, name))
}
func TestTaskUpdatedSpread(t *testing.T) {
ci.Parallel(t)
j1 := mock.Job()
j2 := mock.Job()
name := j1.TaskGroups[0].Name
require.False(t, tasksUpdated(j1, j2, name))
// TaskGroup Spread
j2.TaskGroups[0].Spreads = []*structs.Spread{
{
Attribute: "node.datacenter",
Weight: 100,
SpreadTarget: []*structs.SpreadTarget{
{
Value: "r1",
Percent: 50,
},
{
Value: "r2",
Percent: 50,
},
},
},
}
require.True(t, tasksUpdated(j1, j2, name))
// check different level of same constraint
j5 := mock.Job()
j5.Spreads = []*structs.Spread{
{
Attribute: "node.datacenter",
Weight: 100,
SpreadTarget: []*structs.SpreadTarget{
{
Value: "r1",
Percent: 50,
},
{
Value: "r2",
Percent: 50,
},
},
},
}
j6 := mock.Job()
j6.TaskGroups[0].Spreads = []*structs.Spread{
{
Attribute: "node.datacenter",
Weight: 100,
SpreadTarget: []*structs.SpreadTarget{
{
Value: "r1",
Percent: 50,
},
{
Value: "r2",
Percent: 50,
},
},
},
}
require.False(t, tasksUpdated(j5, j6, name))
}
func TestTasksUpdated(t *testing.T) {
ci.Parallel(t)
j1 := mock.Job()
j2 := mock.Job()
name := j1.TaskGroups[0].Name
require.False(t, tasksUpdated(j1, j2, name))
j2.TaskGroups[0].Tasks[0].Config["command"] = "/bin/other"
require.True(t, tasksUpdated(j1, j2, name))
j3 := mock.Job()
j3.TaskGroups[0].Tasks[0].Name = "foo"
require.True(t, tasksUpdated(j1, j3, name))
j4 := mock.Job()
j4.TaskGroups[0].Tasks[0].Driver = "foo"
require.True(t, tasksUpdated(j1, j4, name))
j5 := mock.Job()
j5.TaskGroups[0].Tasks = append(j5.TaskGroups[0].Tasks,
j5.TaskGroups[0].Tasks[0])
require.True(t, tasksUpdated(j1, j5, name))
j6 := mock.Job()
j6.TaskGroups[0].Networks[0].DynamicPorts = []structs.Port{
{Label: "http", Value: 0},
{Label: "https", Value: 0},
{Label: "admin", Value: 0},
}
require.True(t, tasksUpdated(j1, j6, name))
j7 := mock.Job()
j7.TaskGroups[0].Tasks[0].Env["NEW_ENV"] = "NEW_VALUE"
require.True(t, tasksUpdated(j1, j7, name))
j8 := mock.Job()
j8.TaskGroups[0].Tasks[0].User = "foo"
require.True(t, tasksUpdated(j1, j8, name))
j9 := mock.Job()
j9.TaskGroups[0].Tasks[0].Artifacts = []*structs.TaskArtifact{
{
GetterSource: "http://foo.com/bar",
},
}
require.True(t, tasksUpdated(j1, j9, name))
j10 := mock.Job()
j10.TaskGroups[0].Tasks[0].Meta["baz"] = "boom"
require.True(t, tasksUpdated(j1, j10, name))
j11 := mock.Job()
j11.TaskGroups[0].Tasks[0].Resources.CPU = 1337
require.True(t, tasksUpdated(j1, j11, name))
j11d1 := mock.Job()
j11d1.TaskGroups[0].Tasks[0].Resources.Devices = structs.ResourceDevices{
&structs.RequestedDevice{
Name: "gpu",
Count: 1,
},
}
j11d2 := mock.Job()
j11d2.TaskGroups[0].Tasks[0].Resources.Devices = structs.ResourceDevices{
&structs.RequestedDevice{
Name: "gpu",
Count: 2,
},
}
require.True(t, tasksUpdated(j11d1, j11d2, name))
j13 := mock.Job()
j13.TaskGroups[0].Networks[0].DynamicPorts[0].Label = "foobar"
require.True(t, tasksUpdated(j1, j13, name))
j14 := mock.Job()
j14.TaskGroups[0].Networks[0].ReservedPorts = []structs.Port{{Label: "foo", Value: 1312}}
require.True(t, tasksUpdated(j1, j14, name))
2016-09-21 18:29:50 +00:00
j15 := mock.Job()
j15.TaskGroups[0].Tasks[0].Vault = &structs.Vault{Policies: []string{"foo"}}
require.True(t, tasksUpdated(j1, j15, name))
j16 := mock.Job()
j16.TaskGroups[0].EphemeralDisk.Sticky = true
require.True(t, tasksUpdated(j1, j16, name))
// Change group meta
j17 := mock.Job()
j17.TaskGroups[0].Meta["j17_test"] = "roll_baby_roll"
require.True(t, tasksUpdated(j1, j17, name))
// Change job meta
j18 := mock.Job()
j18.Meta["j18_test"] = "roll_baby_roll"
require.True(t, tasksUpdated(j1, j18, name))
// Change network mode
j19 := mock.Job()
j19.TaskGroups[0].Networks[0].Mode = "bridge"
require.True(t, tasksUpdated(j1, j19, name))
// Change cores resource
j20 := mock.Job()
j20.TaskGroups[0].Tasks[0].Resources.CPU = 0
j20.TaskGroups[0].Tasks[0].Resources.Cores = 2
j21 := mock.Job()
j21.TaskGroups[0].Tasks[0].Resources.CPU = 0
j21.TaskGroups[0].Tasks[0].Resources.Cores = 4
require.True(t, tasksUpdated(j20, j21, name))
// Compare identical Template wait configs
j22 := mock.Job()
j22.TaskGroups[0].Tasks[0].Templates = []*structs.Template{
{
Wait: &structs.WaitConfig{
Min: pointer.Of(5 * time.Second),
Max: pointer.Of(5 * time.Second),
},
},
}
j23 := mock.Job()
j23.TaskGroups[0].Tasks[0].Templates = []*structs.Template{
{
Wait: &structs.WaitConfig{
Min: pointer.Of(5 * time.Second),
Max: pointer.Of(5 * time.Second),
},
},
}
require.False(t, tasksUpdated(j22, j23, name))
// Compare changed Template wait configs
j23.TaskGroups[0].Tasks[0].Templates[0].Wait.Max = pointer.Of(10 * time.Second)
require.True(t, tasksUpdated(j22, j23, name))
// Add a volume
j24 := mock.Job()
j25 := j24.Copy()
j25.TaskGroups[0].Volumes = map[string]*structs.VolumeRequest{
"myvolume": {
Name: "myvolume",
Type: "csi",
Source: "test-volume[0]",
}}
require.True(t, tasksUpdated(j24, j25, name))
// Alter a volume
j26 := j25.Copy()
j26.TaskGroups[0].Volumes["myvolume"].ReadOnly = true
require.True(t, tasksUpdated(j25, j26, name))
// Alter a CSI plugin
j27 := mock.Job()
j27.TaskGroups[0].Tasks[0].CSIPluginConfig = &structs.TaskCSIPluginConfig{
ID: "myplugin",
Type: "node",
}
j28 := j27.Copy()
j28.TaskGroups[0].Tasks[0].CSIPluginConfig.Type = "monolith"
require.True(t, tasksUpdated(j27, j28, name))
// Compare identical Template ErrMissingKey
j29 := mock.Job()
j29.TaskGroups[0].Tasks[0].Templates = []*structs.Template{
{
ErrMissingKey: false,
},
}
j30 := mock.Job()
j30.TaskGroups[0].Tasks[0].Templates = []*structs.Template{
{
ErrMissingKey: false,
},
}
require.False(t, tasksUpdated(j29, j30, name))
// Compare changed Template ErrMissingKey
j30.TaskGroups[0].Tasks[0].Templates[0].ErrMissingKey = true
require.True(t, tasksUpdated(j29, j30, name))
}
func TestTasksUpdated_connectServiceUpdated(t *testing.T) {
ci.Parallel(t)
servicesA := []*structs.Service{{
Name: "service1",
PortLabel: "1111",
Connect: &structs.ConsulConnect{
SidecarService: &structs.ConsulSidecarService{
Tags: []string{"a"},
},
},
}}
t.Run("service not updated", func(t *testing.T) {
servicesB := []*structs.Service{{
Name: "service0",
}, {
Name: "service1",
PortLabel: "1111",
Connect: &structs.ConsulConnect{
SidecarService: &structs.ConsulSidecarService{
Tags: []string{"a"},
},
},
}, {
Name: "service2",
}}
updated := connectServiceUpdated(servicesA, servicesB)
require.False(t, updated)
})
t.Run("service connect tags updated", func(t *testing.T) {
servicesB := []*structs.Service{{
Name: "service0",
}, {
Name: "service1",
PortLabel: "1111",
Connect: &structs.ConsulConnect{
SidecarService: &structs.ConsulSidecarService{
Tags: []string{"b"}, // in-place update
},
},
}}
updated := connectServiceUpdated(servicesA, servicesB)
require.False(t, updated)
})
t.Run("service connect port updated", func(t *testing.T) {
servicesB := []*structs.Service{{
Name: "service0",
}, {
Name: "service1",
PortLabel: "1111",
Connect: &structs.ConsulConnect{
SidecarService: &structs.ConsulSidecarService{
Tags: []string{"a"},
Port: "2222", // destructive update
},
},
}}
updated := connectServiceUpdated(servicesA, servicesB)
require.True(t, updated)
})
t.Run("service port label updated", func(t *testing.T) {
servicesB := []*structs.Service{{
Name: "service0",
}, {
Name: "service1",
PortLabel: "1112", // destructive update
Connect: &structs.ConsulConnect{
SidecarService: &structs.ConsulSidecarService{
Tags: []string{"1"},
},
},
}}
updated := connectServiceUpdated(servicesA, servicesB)
require.True(t, updated)
})
}
func TestNetworkUpdated(t *testing.T) {
ci.Parallel(t)
cases := []struct {
name string
a []*structs.NetworkResource
b []*structs.NetworkResource
updated bool
}{
{
name: "mode updated",
a: []*structs.NetworkResource{
{Mode: "host"},
},
b: []*structs.NetworkResource{
{Mode: "bridge"},
},
updated: true,
},
{
name: "host_network updated",
a: []*structs.NetworkResource{
{DynamicPorts: []structs.Port{
{Label: "http", To: 8080},
}},
},
b: []*structs.NetworkResource{
{DynamicPorts: []structs.Port{
{Label: "http", To: 8080, HostNetwork: "public"},
}},
},
updated: true,
},
{
name: "port.To updated",
a: []*structs.NetworkResource{
{DynamicPorts: []structs.Port{
{Label: "http", To: 8080},
}},
},
b: []*structs.NetworkResource{
{DynamicPorts: []structs.Port{
{Label: "http", To: 8088},
}},
},
updated: true,
},
{
name: "hostname updated",
a: []*structs.NetworkResource{
{Hostname: "foo"},
},
b: []*structs.NetworkResource{
{Hostname: "bar"},
},
updated: true,
},
}
for i := range cases {
c := cases[i]
t.Run(c.name, func(tc *testing.T) {
tc.Parallel()
require.Equal(tc, c.updated, networkUpdated(c.a, c.b), "unexpected network updated result")
})
}
}
func TestEvictAndPlace_LimitLessThanAllocs(t *testing.T) {
ci.Parallel(t)
_, ctx := testContext(t)
allocs := []allocTuple{
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
}
diff := &diffResult{}
limit := 2
require.True(t, evictAndPlace(ctx, diff, allocs, "", &limit), "evictAndReplace() should have returned true")
require.Zero(t, limit, "evictAndReplace() should decremented limit; got %v; want 0", limit)
require.Equal(t, 2, len(diff.place), "evictAndReplace() didn't insert into diffResult properly: %v", diff.place)
}
func TestEvictAndPlace_LimitEqualToAllocs(t *testing.T) {
ci.Parallel(t)
_, ctx := testContext(t)
allocs := []allocTuple{
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
}
diff := &diffResult{}
limit := 4
require.False(t, evictAndPlace(ctx, diff, allocs, "", &limit), "evictAndReplace() should have returned false")
require.Zero(t, limit, "evictAndReplace() should decremented limit; got %v; want 0", limit)
require.Equal(t, 4, len(diff.place), "evictAndReplace() didn't insert into diffResult properly: %v", diff.place)
}
func TestSetStatus(t *testing.T) {
ci.Parallel(t)
h := NewHarness(t)
2018-09-15 23:23:13 +00:00
logger := testlog.HCLogger(t)
eval := mock.Eval()
status := "a"
desc := "b"
require.NoError(t, setStatus(logger, h, eval, nil, nil, nil, status, desc, nil, ""))
require.Equal(t, 1, len(h.Evals), "setStatus() didn't update plan: %v", h.Evals)
newEval := h.Evals[0]
require.True(t, newEval.ID == eval.ID && newEval.Status == status && newEval.StatusDescription == desc,
"setStatus() submited invalid eval: %v", newEval)
2016-05-19 20:09:52 +00:00
// Test next evals
h = NewHarness(t)
next := mock.Eval()
require.NoError(t, setStatus(logger, h, eval, next, nil, nil, status, desc, nil, ""))
require.Equal(t, 1, len(h.Evals), "setStatus() didn't update plan: %v", h.Evals)
newEval = h.Evals[0]
require.Equal(t, next.ID, newEval.NextEval, "setStatus() didn't set nextEval correctly: %v", newEval)
2016-05-19 20:09:52 +00:00
// Test blocked evals
h = NewHarness(t)
blocked := mock.Eval()
require.NoError(t, setStatus(logger, h, eval, nil, blocked, nil, status, desc, nil, ""))
require.Equal(t, 1, len(h.Evals), "setStatus() didn't update plan: %v", h.Evals)
2016-05-19 20:09:52 +00:00
newEval = h.Evals[0]
require.Equal(t, blocked.ID, newEval.BlockedEval, "setStatus() didn't set BlockedEval correctly: %v", newEval)
// Test metrics
h = NewHarness(t)
metrics := map[string]*structs.AllocMetric{"foo": nil}
require.NoError(t, setStatus(logger, h, eval, nil, nil, metrics, status, desc, nil, ""))
require.Equal(t, 1, len(h.Evals), "setStatus() didn't update plan: %v", h.Evals)
newEval = h.Evals[0]
require.True(t, reflect.DeepEqual(newEval.FailedTGAllocs, metrics),
"setStatus() didn't set failed task group metrics correctly: %v", newEval)
// Test queued allocations
h = NewHarness(t)
queuedAllocs := map[string]int{"web": 1}
require.NoError(t, setStatus(logger, h, eval, nil, nil, metrics, status, desc, queuedAllocs, ""))
require.Equal(t, 1, len(h.Evals), "setStatus() didn't update plan: %v", h.Evals)
newEval = h.Evals[0]
require.True(t, reflect.DeepEqual(newEval.QueuedAllocations, queuedAllocs), "setStatus() didn't set failed task group metrics correctly: %v", newEval)
2017-07-06 00:13:45 +00:00
h = NewHarness(t)
dID := uuid.Generate()
require.NoError(t, setStatus(logger, h, eval, nil, nil, metrics, status, desc, queuedAllocs, dID))
require.Equal(t, 1, len(h.Evals), "setStatus() didn't update plan: %v", h.Evals)
2017-07-06 00:13:45 +00:00
newEval = h.Evals[0]
require.Equal(t, dID, newEval.DeploymentID, "setStatus() didn't set deployment id correctly: %v", newEval)
}
func TestInplaceUpdate_ChangedTaskGroup(t *testing.T) {
ci.Parallel(t)
state, ctx := testContext(t)
eval := mock.Eval()
job := mock.Job()
node := mock.Node()
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 900, node))
// Register an alloc
alloc := &structs.Allocation{
2017-09-07 23:56:15 +00:00
Namespace: structs.DefaultNamespace,
ID: uuid.Generate(),
2017-09-07 23:56:15 +00:00
EvalID: eval.ID,
NodeID: node.ID,
JobID: job.ID,
Job: job,
2018-10-03 16:47:18 +00:00
AllocatedResources: &structs.AllocatedResources{
Tasks: map[string]*structs.AllocatedTaskResources{
"web": {
Cpu: structs.AllocatedCpuResources{
CpuShares: 2048,
},
Memory: structs.AllocatedMemoryResources{
MemoryMB: 2048,
},
},
},
},
DesiredStatus: structs.AllocDesiredStatusRun,
2016-07-22 21:53:49 +00:00
TaskGroup: "web",
}
alloc.TaskResources = map[string]*structs.Resources{"web": alloc.Resources}
require.NoError(t, state.UpsertJobSummary(1000, mock.JobSummary(alloc.JobID)))
require.NoError(t, state.UpsertAllocs(structs.MsgTypeTestSetup, 1001, []*structs.Allocation{alloc}))
// Create a new task group that prevents in-place updates.
tg := &structs.TaskGroup{}
*tg = *job.TaskGroups[0]
2018-10-10 18:32:56 +00:00
task := &structs.Task{
Name: "FOO",
Resources: &structs.Resources{},
}
tg.Tasks = nil
tg.Tasks = append(tg.Tasks, task)
updates := []allocTuple{{Alloc: alloc, TaskGroup: tg}}
stack := NewGenericStack(false, ctx)
// Do the inplace update.
unplaced, inplace := inplaceUpdate(ctx, eval, job, stack, updates)
require.True(t, len(unplaced) == 1 && len(inplace) == 0, "inplaceUpdate incorrectly did an inplace update")
require.Empty(t, ctx.plan.NodeAllocation, "inplaceUpdate incorrectly did an inplace update")
}
func TestInplaceUpdate_AllocatedResources(t *testing.T) {
ci.Parallel(t)
state, ctx := testContext(t)
eval := mock.Eval()
job := mock.Job()
node := mock.Node()
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 900, node))
// Register an alloc
alloc := &structs.Allocation{
Namespace: structs.DefaultNamespace,
ID: uuid.Generate(),
EvalID: eval.ID,
NodeID: node.ID,
JobID: job.ID,
Job: job,
AllocatedResources: &structs.AllocatedResources{
Shared: structs.AllocatedSharedResources{
Ports: structs.AllocatedPorts{
{
Label: "api-port",
Value: 19910,
To: 8080,
},
},
},
},
DesiredStatus: structs.AllocDesiredStatusRun,
TaskGroup: "web",
}
alloc.TaskResources = map[string]*structs.Resources{"web": alloc.Resources}
require.NoError(t, state.UpsertJobSummary(1000, mock.JobSummary(alloc.JobID)))
require.NoError(t, state.UpsertAllocs(structs.MsgTypeTestSetup, 1001, []*structs.Allocation{alloc}))
// Update TG to add a new service (inplace)
tg := job.TaskGroups[0]
tg.Services = []*structs.Service{
{
Name: "tg-service",
},
}
updates := []allocTuple{{Alloc: alloc, TaskGroup: tg}}
stack := NewGenericStack(false, ctx)
// Do the inplace update.
unplaced, inplace := inplaceUpdate(ctx, eval, job, stack, updates)
require.True(t, len(unplaced) == 0 && len(inplace) == 1, "inplaceUpdate incorrectly did not perform an inplace update")
require.NotEmpty(t, ctx.plan.NodeAllocation, "inplaceUpdate incorrectly did an inplace update")
require.NotEmpty(t, ctx.plan.NodeAllocation[node.ID][0].AllocatedResources.Shared.Ports)
port, ok := ctx.plan.NodeAllocation[node.ID][0].AllocatedResources.Shared.Ports.Get("api-port")
require.True(t, ok)
require.Equal(t, 19910, port.Value)
}
func TestInplaceUpdate_NoMatch(t *testing.T) {
ci.Parallel(t)
state, ctx := testContext(t)
eval := mock.Eval()
job := mock.Job()
node := mock.Node()
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 900, node))
// Register an alloc
alloc := &structs.Allocation{
2017-09-07 23:56:15 +00:00
Namespace: structs.DefaultNamespace,
ID: uuid.Generate(),
2017-09-07 23:56:15 +00:00
EvalID: eval.ID,
NodeID: node.ID,
JobID: job.ID,
Job: job,
2018-10-03 16:47:18 +00:00
AllocatedResources: &structs.AllocatedResources{
Tasks: map[string]*structs.AllocatedTaskResources{
"web": {
Cpu: structs.AllocatedCpuResources{
CpuShares: 2048,
},
Memory: structs.AllocatedMemoryResources{
MemoryMB: 2048,
},
},
},
},
DesiredStatus: structs.AllocDesiredStatusRun,
2016-07-22 21:53:49 +00:00
TaskGroup: "web",
}
alloc.TaskResources = map[string]*structs.Resources{"web": alloc.Resources}
require.NoError(t, state.UpsertJobSummary(1000, mock.JobSummary(alloc.JobID)))
require.NoError(t, state.UpsertAllocs(structs.MsgTypeTestSetup, 1001, []*structs.Allocation{alloc}))
// Create a new task group that requires too much resources.
tg := &structs.TaskGroup{}
*tg = *job.TaskGroups[0]
resource := &structs.Resources{CPU: 9999}
tg.Tasks[0].Resources = resource
updates := []allocTuple{{Alloc: alloc, TaskGroup: tg}}
stack := NewGenericStack(false, ctx)
// Do the inplace update.
unplaced, inplace := inplaceUpdate(ctx, eval, job, stack, updates)
require.True(t, len(unplaced) == 1 && len(inplace) == 0, "inplaceUpdate incorrectly did an inplace update")
require.Empty(t, ctx.plan.NodeAllocation, "inplaceUpdate incorrectly did an inplace update")
}
func TestInplaceUpdate_Success(t *testing.T) {
ci.Parallel(t)
state, ctx := testContext(t)
eval := mock.Eval()
job := mock.Job()
node := mock.Node()
require.NoError(t, state.UpsertNode(structs.MsgTypeTestSetup, 900, node))
// Register an alloc
alloc := &structs.Allocation{
2017-09-07 23:56:15 +00:00
Namespace: structs.DefaultNamespace,
ID: uuid.Generate(),
EvalID: eval.ID,
NodeID: node.ID,
JobID: job.ID,
Job: job,
TaskGroup: job.TaskGroups[0].Name,
2018-10-03 16:47:18 +00:00
AllocatedResources: &structs.AllocatedResources{
Tasks: map[string]*structs.AllocatedTaskResources{
"web": {
Cpu: structs.AllocatedCpuResources{
CpuShares: 2048,
},
Memory: structs.AllocatedMemoryResources{
MemoryMB: 2048,
},
},
},
},
DesiredStatus: structs.AllocDesiredStatusRun,
}
alloc.TaskResources = map[string]*structs.Resources{"web": alloc.Resources}
require.NoError(t, state.UpsertJobSummary(999, mock.JobSummary(alloc.JobID)))
require.NoError(t, state.UpsertAllocs(structs.MsgTypeTestSetup, 1001, []*structs.Allocation{alloc}))
// Create a new task group that updates the resources.
tg := &structs.TaskGroup{}
*tg = *job.TaskGroups[0]
resource := &structs.Resources{CPU: 737}
tg.Tasks[0].Resources = resource
2016-06-12 23:36:49 +00:00
newServices := []*structs.Service{
{
Name: "dummy-service",
PortLabel: "http",
},
{
Name: "dummy-service2",
PortLabel: "http",
},
}
// Delete service 2
2016-06-12 23:36:49 +00:00
tg.Tasks[0].Services = tg.Tasks[0].Services[:1]
// Add the new services
2016-06-12 23:36:49 +00:00
tg.Tasks[0].Services = append(tg.Tasks[0].Services, newServices...)
updates := []allocTuple{{Alloc: alloc, TaskGroup: tg}}
stack := NewGenericStack(false, ctx)
stack.SetJob(job)
// Do the inplace update.
unplaced, inplace := inplaceUpdate(ctx, eval, job, stack, updates)
require.True(t, len(unplaced) == 0 && len(inplace) == 1, "inplaceUpdate did not do an inplace update")
require.Equal(t, 1, len(ctx.plan.NodeAllocation), "inplaceUpdate did not do an inplace update")
require.Equal(t, alloc.ID, inplace[0].Alloc.ID, "inplaceUpdate returned the wrong, inplace updated alloc: %#v", inplace)
// Get the alloc we inserted.
a := inplace[0].Alloc // TODO(sean@): Verify this is correct vs: ctx.plan.NodeAllocation[alloc.NodeID][0]
require.NotNil(t, a.Job)
require.Equal(t, 1, len(a.Job.TaskGroups))
require.Equal(t, 1, len(a.Job.TaskGroups[0].Tasks))
require.Equal(t, 3, len(a.Job.TaskGroups[0].Tasks[0].Services),
"Expected number of services: %v, Actual: %v", 3, len(a.Job.TaskGroups[0].Tasks[0].Services))
serviceNames := make(map[string]struct{}, 3)
2016-06-12 23:36:49 +00:00
for _, consulService := range a.Job.TaskGroups[0].Tasks[0].Services {
serviceNames[consulService.Name] = struct{}{}
}
require.Equal(t, 3, len(serviceNames))
for _, name := range []string{"dummy-service", "dummy-service2", "web-frontend"} {
if _, found := serviceNames[name]; !found {
t.Errorf("Expected consul service name missing: %v", name)
}
}
}
func TestEvictAndPlace_LimitGreaterThanAllocs(t *testing.T) {
ci.Parallel(t)
_, ctx := testContext(t)
allocs := []allocTuple{
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
{Alloc: &structs.Allocation{ID: uuid.Generate()}},
}
diff := &diffResult{}
limit := 6
require.False(t, evictAndPlace(ctx, diff, allocs, "", &limit))
require.Equal(t, 2, limit, "evictAndReplace() should decremented limit")
require.Equal(t, 4, len(diff.place), "evictAndReplace() didn't insert into diffResult properly: %v", diff.place)
}
func TestTaskGroupConstraints(t *testing.T) {
ci.Parallel(t)
constr := &structs.Constraint{RTarget: "bar"}
constr2 := &structs.Constraint{LTarget: "foo"}
constr3 := &structs.Constraint{Operand: "<"}
tg := &structs.TaskGroup{
Name: "web",
Count: 10,
Constraints: []*structs.Constraint{constr},
EphemeralDisk: &structs.EphemeralDisk{},
Tasks: []*structs.Task{
2017-09-26 22:26:33 +00:00
{
Driver: "exec",
Resources: &structs.Resources{
CPU: 500,
MemoryMB: 256,
},
Constraints: []*structs.Constraint{constr2},
},
2017-09-26 22:26:33 +00:00
{
Driver: "docker",
Resources: &structs.Resources{
CPU: 500,
MemoryMB: 256,
},
Constraints: []*structs.Constraint{constr3},
},
},
}
// Build the expected values.
expConstr := []*structs.Constraint{constr, constr2, constr3}
2017-09-26 22:26:33 +00:00
expDrivers := map[string]struct{}{"exec": {}, "docker": {}}
actConstrains := taskGroupConstraints(tg)
require.True(t, reflect.DeepEqual(actConstrains.constraints, expConstr),
"taskGroupConstraints(%v) returned %v; want %v", tg, actConstrains.constraints, expConstr)
require.True(t, reflect.DeepEqual(actConstrains.drivers, expDrivers),
"taskGroupConstraints(%v) returned %v; want %v", tg, actConstrains.drivers, expDrivers)
}
2016-02-22 18:38:04 +00:00
func TestProgressMade(t *testing.T) {
ci.Parallel(t)
2016-02-22 18:38:04 +00:00
noopPlan := &structs.PlanResult{}
require.False(t, progressMade(nil) || progressMade(noopPlan), "no progress plan marked as making progress")
2016-02-22 18:38:04 +00:00
m := map[string][]*structs.Allocation{
2017-09-26 22:26:33 +00:00
"foo": {mock.Alloc()},
2016-02-22 18:38:04 +00:00
}
both := &structs.PlanResult{
NodeAllocation: m,
2016-02-22 21:24:26 +00:00
NodeUpdate: m,
2016-02-22 18:38:04 +00:00
}
2016-02-22 21:24:26 +00:00
update := &structs.PlanResult{NodeUpdate: m}
alloc := &structs.PlanResult{NodeAllocation: m}
2017-07-06 16:55:39 +00:00
deployment := &structs.PlanResult{Deployment: mock.Deployment()}
deploymentUpdates := &structs.PlanResult{
DeploymentUpdates: []*structs.DeploymentStatusUpdate{
{DeploymentID: uuid.Generate()},
2017-07-06 16:55:39 +00:00
},
}
require.True(t, progressMade(both) && progressMade(update) && progressMade(alloc) &&
progressMade(deployment) && progressMade(deploymentUpdates))
2016-02-22 18:38:04 +00:00
}
func TestDesiredUpdates(t *testing.T) {
ci.Parallel(t)
tg1 := &structs.TaskGroup{Name: "foo"}
tg2 := &structs.TaskGroup{Name: "bar"}
2016-05-13 18:53:11 +00:00
a2 := &structs.Allocation{TaskGroup: "bar"}
place := []allocTuple{
2017-09-26 22:26:33 +00:00
{TaskGroup: tg1},
{TaskGroup: tg1},
{TaskGroup: tg1},
{TaskGroup: tg2},
}
stop := []allocTuple{
2017-09-26 22:26:33 +00:00
{TaskGroup: tg2, Alloc: a2},
{TaskGroup: tg2, Alloc: a2},
}
ignore := []allocTuple{
2017-09-26 22:26:33 +00:00
{TaskGroup: tg1},
}
migrate := []allocTuple{
2017-09-26 22:26:33 +00:00
{TaskGroup: tg2},
}
inplace := []allocTuple{
2017-09-26 22:26:33 +00:00
{TaskGroup: tg1},
{TaskGroup: tg1},
}
destructive := []allocTuple{
2017-09-26 22:26:33 +00:00
{TaskGroup: tg1},
{TaskGroup: tg2},
{TaskGroup: tg2},
}
diff := &diffResult{
place: place,
stop: stop,
ignore: ignore,
migrate: migrate,
}
expected := map[string]*structs.DesiredUpdates{
"foo": {
Place: 3,
Ignore: 1,
InPlaceUpdate: 2,
DestructiveUpdate: 1,
},
"bar": {
Place: 1,
Stop: 2,
Migrate: 1,
DestructiveUpdate: 2,
},
}
desired := desiredUpdates(diff, inplace, destructive)
require.True(t, reflect.DeepEqual(desired, expected), "desiredUpdates() returned %#v; want %#v", desired, expected)
}
func TestUtil_AdjustQueuedAllocations(t *testing.T) {
ci.Parallel(t)
2018-09-15 23:23:13 +00:00
logger := testlog.HCLogger(t)
alloc1 := mock.Alloc()
alloc2 := mock.Alloc()
alloc2.CreateIndex = 4
2017-01-08 22:14:35 +00:00
alloc2.ModifyIndex = 4
alloc3 := mock.Alloc()
alloc3.CreateIndex = 3
2017-01-08 22:14:35 +00:00
alloc3.ModifyIndex = 5
alloc4 := mock.Alloc()
alloc4.CreateIndex = 6
2017-01-08 22:14:35 +00:00
alloc4.ModifyIndex = 8
planResult := structs.PlanResult{
NodeUpdate: map[string][]*structs.Allocation{
2017-09-26 22:26:33 +00:00
"node-1": {alloc1},
},
NodeAllocation: map[string][]*structs.Allocation{
2017-09-26 22:26:33 +00:00
"node-1": {
alloc2,
},
2017-09-26 22:26:33 +00:00
"node-2": {
alloc3, alloc4,
},
},
RefreshIndex: 3,
2017-01-08 22:14:35 +00:00
AllocIndex: 16, // Should not be considered
}
queuedAllocs := map[string]int{"web": 2}
adjustQueuedAllocations(logger, &planResult, queuedAllocs)
require.Equal(t, 1, queuedAllocs["web"])
}
func TestUtil_UpdateNonTerminalAllocsToLost(t *testing.T) {
ci.Parallel(t)
node := mock.Node()
node.Status = structs.NodeStatusDown
alloc1 := mock.Alloc()
alloc1.NodeID = node.ID
alloc1.DesiredStatus = structs.AllocDesiredStatusStop
alloc2 := mock.Alloc()
alloc2.NodeID = node.ID
alloc2.DesiredStatus = structs.AllocDesiredStatusStop
alloc2.ClientStatus = structs.AllocClientStatusRunning
alloc3 := mock.Alloc()
alloc3.NodeID = node.ID
alloc3.DesiredStatus = structs.AllocDesiredStatusStop
alloc3.ClientStatus = structs.AllocClientStatusComplete
alloc4 := mock.Alloc()
alloc4.NodeID = node.ID
alloc4.DesiredStatus = structs.AllocDesiredStatusStop
alloc4.ClientStatus = structs.AllocClientStatusFailed
allocs := []*structs.Allocation{alloc1, alloc2, alloc3, alloc4}
plan := structs.Plan{
NodeUpdate: make(map[string][]*structs.Allocation),
}
tainted := map[string]*structs.Node{node.ID: node}
updateNonTerminalAllocsToLost(&plan, tainted, allocs)
allocsLost := make([]string, 0, 2)
for _, alloc := range plan.NodeUpdate[node.ID] {
allocsLost = append(allocsLost, alloc.ID)
}
expected := []string{alloc1.ID, alloc2.ID}
require.True(t, reflect.DeepEqual(allocsLost, expected), "actual: %v, expected: %v", allocsLost, expected)
// Update the node status to ready and try again
plan = structs.Plan{
NodeUpdate: make(map[string][]*structs.Allocation),
}
node.Status = structs.NodeStatusReady
updateNonTerminalAllocsToLost(&plan, tainted, allocs)
allocsLost = make([]string, 0, 2)
for _, alloc := range plan.NodeUpdate[node.ID] {
allocsLost = append(allocsLost, alloc.ID)
}
expected = []string{}
require.True(t, reflect.DeepEqual(allocsLost, expected), "actual: %v, expected: %v", allocsLost, expected)
}
func TestUtil_connectUpdated(t *testing.T) {
ci.Parallel(t)
t.Run("both nil", func(t *testing.T) {
require.False(t, connectUpdated(nil, nil))
})
t.Run("one nil", func(t *testing.T) {
require.True(t, connectUpdated(nil, new(structs.ConsulConnect)))
})
t.Run("native differ", func(t *testing.T) {
a := &structs.ConsulConnect{Native: true}
b := &structs.ConsulConnect{Native: false}
require.True(t, connectUpdated(a, b))
})
t.Run("gateway differ", func(t *testing.T) {
a := &structs.ConsulConnect{Gateway: &structs.ConsulGateway{
Ingress: new(structs.ConsulIngressConfigEntry),
}}
b := &structs.ConsulConnect{Gateway: &structs.ConsulGateway{
Terminating: new(structs.ConsulTerminatingConfigEntry),
}}
require.True(t, connectUpdated(a, b))
})
t.Run("sidecar task differ", func(t *testing.T) {
a := &structs.ConsulConnect{SidecarTask: &structs.SidecarTask{
Driver: "exec",
}}
b := &structs.ConsulConnect{SidecarTask: &structs.SidecarTask{
Driver: "docker",
}}
require.True(t, connectUpdated(a, b))
})
t.Run("sidecar service differ", func(t *testing.T) {
a := &structs.ConsulConnect{SidecarService: &structs.ConsulSidecarService{
Port: "1111",
}}
b := &structs.ConsulConnect{SidecarService: &structs.ConsulSidecarService{
Port: "2222",
}}
require.True(t, connectUpdated(a, b))
})
t.Run("same", func(t *testing.T) {
a := new(structs.ConsulConnect)
b := new(structs.ConsulConnect)
require.False(t, connectUpdated(a, b))
})
}
func TestUtil_connectSidecarServiceUpdated(t *testing.T) {
ci.Parallel(t)
t.Run("both nil", func(t *testing.T) {
require.False(t, connectSidecarServiceUpdated(nil, nil))
})
t.Run("one nil", func(t *testing.T) {
require.True(t, connectSidecarServiceUpdated(nil, new(structs.ConsulSidecarService)))
})
t.Run("ports differ", func(t *testing.T) {
a := &structs.ConsulSidecarService{Port: "1111"}
b := &structs.ConsulSidecarService{Port: "2222"}
require.True(t, connectSidecarServiceUpdated(a, b))
})
t.Run("same", func(t *testing.T) {
a := &structs.ConsulSidecarService{Port: "1111"}
b := &structs.ConsulSidecarService{Port: "1111"}
require.False(t, connectSidecarServiceUpdated(a, b))
})
}