The system command includes gc and reconcile-summaries subcommands
which covers all currently available system API calls. The help
information is largely pulled from the current Nomad website API
documentation.
Passes in agent enable_debug config to nomad server and client configs.
This allows for rpc endpoints to have more granular control if they
should be enabled or not in combination with ACLs.
enable debug on client test
copy struct values
ensure groupserviceHook implements RunnerPreKillhook
run deregister first
test that shutdown times are delayed
move magic number into variable
Fixes a bug where if a command flag parsing errors, the resulting error
and help usage messages get interleaved in unexpected and non-user
friendly way.
The reason is that we have flag parsing library effectively writes to
ui.Error in a goroutine. This is problematic: first, we lose the sequencing between help
usage and error message; second, cli.Ui methods are not concurrent safe.
Here, we introduce a custom error writer that buffers result and calls
ui.Error() in the write method and in the same goroutine.
For context, we need to wrap ui.Error because it's line-oriented, while
flags library expects a io.Writer which is bytes oriented.
Currently `nomad monitor -node-id` will panic when a node-id does not
match any nodes, as there is no empty result bounds checking. Here we
return an error to the user when no nodes are found.
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.
Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.
This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
The test asserts that alloc counts get reported accurately in metrics by
inspecting the metrics endpoint directly. Sadly, the metrics as
collected by `armon/go-metrics` seem to be stateful and may contain info
from other tests.
This means that the test can fail depending on the order of returned
metrics.
Inspecting the metrics output of one failing run, you can see the
duplicate guage entries but for different node_ids:
```
{
"Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
"Value": 10,
"Labels": {
"datacenter": "dc1",
"node_class": "none",
"node_id": "67402bf4-00f3-bd8d-9fa8-f4d1924a892a"
}
},
{
"Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
"Value": 0,
"Labels": {
"datacenter": "dc1",
"node_class": "none",
"node_id": "a2945b48-7e66-68e2-c922-49b20dd4e20c"
}
},
```
Noticed that ACL endpoints return 500 status code for user errors. This
is confusing and can lead to false monitoring alerts.
Here, I introduce a concept of RPCCoded errors to be returned by RPC
that signal a code in addition to error message. Codes for now match
HTTP codes to ease reasoning.
```
$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9))
$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9))
```
* client: improve group service stanza interpolation and check_restart support
Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.
The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
handle the case where we request a server-id which is this current server
update docs, error on node and server id params
more accurate names for tests
use shared no leader err, formatting
rm bad comment
remove redundant variable
When inferring whether to use TTY, check both stdin and stdout are
terminals.
Otherwise, we get failures like the following:
```
$ nomad alloc exec --job example echo hi
hi
$ echo | nomad alloc exec --job example echo hi
hi
$ nomad alloc exec --job example echo hi | head -n1
failed to exec into task: not a terminal
```
This is the basic code to add the Windows Service Manager hooks to Nomad.
Includes vendoring golang.org/x/sys/windows/svc and added Docs:
* guide for installing as a windows service.
* configuration for logging to file from PR #6429
Nomad web UI currently fails when querying client nodes for allocation
state end endpoints, due to CORS policy.
The issue is that CORS requests that are marked `withCredentials` need
the http server to include a `Access-Control-Allow-Credentials` [1].
But Nomad Task Logs and filesystem requests include authenticating
information and thus marked with `credentials=true`[2][3].
It's worth noting that the browser currently sends credentials and
authentication token to servers anyway; it's just that the response is
not made available to caller nomad ui javascript. For task logs
specifically, nomad ui retries again by querying the web ui address
(typically pointing to a nomad server) which will forward the request
to the nomad client agent appropriately.
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Credentials
[2] 101d0373ee/ui/app/components/task-log.js (L50)
[3] 101d0373ee/ui/app/services/token.js (L25-L39)
Addresses feedback around monitor implementation
subselect on stopCh to prevent blocking forever.
Set up a separate goroutine to check every 3 seconds for dropped
messages.
rename returned ch to avoid confusion
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger
clean up old code
small comment about write
rm old comment about minsize
rename to Monitor
Removes connection logic from monitor command
Keep connection logic in endpoints, use a channel to send results from
monitoring
use new multisink logger and interfaces
small test for dropped messages
update go-hclogger and update sink/intercept logger interfaces