It is possible to provide multiple identically named services with
different port assignments in a Nomad configuration.
We introduced a regression when migrating to stable service identifiers where
multiple services with the same name would conflict, and the last definition
would take precedence.
This commit includes the port label in the stable service identifier to
allow the previous behaviour where this was supported, for example
providing:
```hcl
service {
name = "redis-cache"
tags = ["global", "cache"]
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
service {
name = "redis-cache"
tags = ["global", "foo"]
port = "foo"
check {
name = "alive"
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
}
service {
name = "redis-cache"
tags = ["global", "bar"]
port = "bar"
check {
name = "alive"
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
}
```
in a nomad task definition is now completely valid. Each service
definition with the same name must still have a unique port label however.
Currently when you submit a manual request to the alloc lifecycle API
with a version of Curl that will submit empty bodies, the alloc restart
api will fail with an EOF error.
This behaviour is undesired, as it is reasonable to not submit a body at
all when restarting an entire allocation rather than an individual task.
This fixes it by ignoring EOF (not unexpected EOF) errors and treating
them as entire task restarts.
Enterprise only.
Disable preemption for service and batch jobs by default.
Maintain backward compatibility in a x.y.Z release. Consider switching
the default for new clusters in the future.
This exposes a client flag to disable nomad remote exec support in
environments where access to tasks ought to be restricted.
I used `disable_remote_exec` client flag that defaults to allowing
remote exec. Opted for a client config that can be used to disable
remote exec globally, or to a subset of the cluster if necessary.
This adds a websocket endpoint for handling `nomad exec`.
The endpoint is a websocket interface, as we require a bi-directional
streaming (to handle both input and output), which is not very appropriate for
plain HTTP 1.0. Using websocket makes implementing the web ui a bit simpler. I
considered using golang http hijack capability to treat http request as a plain
connection, but the web interface would be too complicated potentially.
Furthermore, the API endpoint operates against the raw core nomad exec streaming
datastructures, defined in protobuf, with json serializer. Our APIs use json
interfaces in general, and protobuf generates json friendly golang structs.
Reusing the structs here simplify interface and reduce conversion overhead.
This commit causes sync to skip deregistering checks that are not
managed by nomad, such as service maintenance mode checks. This is
handled in the same way as service registrations - by doing a Nomad
specific prefix match.
The current implementation of Service Registration uses a hash of the
nomad-internal state of a service to register it with Consul, this means that
any update to the service invalidates this name and we then deregister, and
recreate the service in Consul.
While this behaviour slightly simplifies reasoning about service registration,
this becomes problematic when we add consul health checks to a service. When
the service is re-registered, so are the checks, which default to failing for
at least one check period.
This commit migrates us to using a stable identifier based on the
allocation, task, and service identifiers, and uses the difference
between the remote and local state to decide when to push updates.
It uses the existing hashing mechanic to decide when UpdateTask should
regenerate service registrations for providing to Sync, but this should
be removable as part of a future refactor.
It additionally introduces the _nomad-check- prefix for check
definitions, to allow for future allowing of consul features like
maintenance mode.
* client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy
* updated armon/go-metrics to address race condition in DisplayMetrics
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.
Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.
This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.
The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.