* build: add BuildDate to version info
will be used in enterprise to compare to license expiration time
* cli: multi-line version output, add BuildDate
before:
$ nomad version
Nomad v1.4.3 (coolfakecommithashomgoshsuchacoolonewoww)
after:
$ nomad version
Nomad v1.5.0-dev
BuildDate 2023-02-17T19:29:26Z
Revision coolfakecommithashomgoshsuchacoolonewoww
compare consul:
$ consul version
Consul v1.14.4
Revision dae670fe
Build Date 2023-01-26T15:47:10Z
Protocol 2 spoken by default, blah blah blah...
and vault:
$ vault version
Vault v1.12.3 (209b3dd99fe8ca320340d08c70cff5f620261f9b), built 2023-02-02T09:07:27Z
* docs: update version command output
* taskapi: return Forbidden on bad credentials
Prior to this change a "Server error" would be returned when ACLs are
enabled which did not match when ACLs are disabled.
* e2e: love love love datacenter wildcard default
* e2e: skip windows nodes on linux only test
The Logfs are a bit weird because they're most useful when converted to
Printfs to make debugging the test much faster, but that makes CI noisy.
In a perfect world Go would expose how many tests are being run and we
could stream output live if there's only 1. For now I left these helpful
lines in as basically glorified comments.
The `nomad fmt -check` command incorrectly writes to file because we didn't
return before writing the file on a diff. Fix this bug and update the command
internals to differentiate between the write-to-file and write-to-stdout code
paths, which are activated by different combinations of options and flags.
The docstring for the `-list` and `-write` flags is also unclear and can be
easily misread to be the opposite of the actual behavior. Clarify this and fix
up the docs to match.
This changeset also refactors the tests quite a bit so as to make the test
outputs clear when something is incorrect.
* Warn when Items key isn't directly accessible
Go template requires that map keys are alphanumeric for direct access
using the dotted reference syntax. This warns users when they create
keys that run afoul of this requirement.
- cli: use regex to detect invalid indentifiers in var keys
- test: fix slash in escape test case
- api: share warning formatting function between API and CLI
- ui: warn if var key has characters other than _, letter, or number
---------
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
* Convert assets from bindatafs to go embeds
* Add command/asset to "uninteresting" list for missing test check
* Remove generate-examples target
* Update paths in tests
When an auth method was not supplied and the OIDC type was given
in lower case, the CLI was not matching the default method due to
casing and responded with a confusing user message.
This change fixes the above problem, along with making use of the
santized type easier.
Fixes#14617
Dynamic Node Metadata allows Nomad users, and their jobs, to update Node metadata through an API. Currently Node metadata is only reloaded when a Client agent is restarted.
Includes new UI for editing metadata as well.
---------
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
* main: remove deprecated uses of rand.Seed
go1.20 deprecates rand.Seed, and seeds the rand package
automatically. Remove cases where we seed the random package,
and cleanup the one case where we intentionally create a
known random source.
* cl: update cl
* mod: update go mod
This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled.
This PR contains the core feature and tests but followup work is required for the following TODO items:
- Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink
- Unit tests for auth middleware
- Caching for auth middleware
- Rate limiting on negative lookups for auth middleware
---------
Co-authored-by: Seth Hoenig <shoenig@duck.com>
Add `identity` jobspec block to expose workload identity tokens to tasks.
---------
Co-authored-by: Anders <mail@anars.dk>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
If a deployment fails, the `deployment status` command can get a nil deployment
when it checks for a rollback deployment if there isn't one (or at least not one
at the time of the query). Fix the panic.
* Extend variables under the nomad path prefix to allow for job-templates (#15570)
* Extend variables under the nomad path prefix to allow for job-templates
* Add job-templates to error message hinting
* RadioCard component for Job Templates (#15582)
* chore: add
* test: component API
* ui: component template
* refact: remove bc naming collission
* styles: remove SASS var causing conflicts
* Disallow specific variable at nomad/job-templates (#15681)
* Disallows variables at exactly nomad/job-templates
* idiomatic refactor
* Expanding nomad job init to accept a template flag (#15571)
* Adding a string flag for templates on job init
* data-down actions-up version of a custom template editor within variable
* Dont force grid on job template editor
* list-templates flag started
* Correctly slice from end of path name
* Pre-review cleanup
* Variable form acceptance test for job template editing
* Some review cleanup
* List Job templates test
* Example from template test
* Using must.assertions instead of require etc
* ui: add choose template button (#15596)
* ui: add new routes
* chore: update file directory
* ui: add choose template button
* test: button and page navigation
* refact: update var name
* ui: use `Button` component from `HDS` (#15607)
* ui: integrate buttons
* refact: remove helper
* ui: remove icons on non-tertiary buttons
* refact: update normalize method for key/value pairs (#15612)
* `revert`: `onCancel` for `JobDefinition`
The `onCancel` method isn't included in the component API for `JobEditor` and the primary cancel behavior exists outside of the component. With the exception of the `JobDefinition` page where we include this button in the top right of the component instead of next to the `Plan` button.
* style: increase button size
* style: keep lime green
* ui: select template (#15613)
* ui: deprecate unused component
* ui: deprecate tests
* ui: jobs.run.templates.index
* ui: update logic to handle templates
* refact: revert key/value changes
* style: padding for cards + buttons
* temp: fixtures for mirage testing
* Revert "refact: revert key/value changes"
This reverts commit 124e95d12140be38fc921f7e15243034092c4063.
* ui: guard template for unsaved job
* ui: handle reading template variable
* Revert "refact: update normalize method for key/value pairs (#15612)"
This reverts commit 6f5ffc9b610702aee7c47fbff742cc81f819ab74.
* revert: remove test fixtures
* revert: prettier problems
* refact: test doesnt need filter expression
* styling: button sizes and responsive cards
* refact: remove route guarding
* ui: update variable adapter
* refact: remove model editing behavior
* refact: model should query variables to populate editor
* ui: clear qp on exit
* refact: cleanup deprecated API
* refact: query all namespaces
* refact: deprecate action
* ui: rely on collection
* refact: patch deprecate transition API
* refact: patch test to expect namespace qp
* styling: padding, conditionals
* ui: flashMessage on 404
* test: update for o(n+1) query
* ui: create new job template (#15744)
* refact: remove unused code
* refact: add type safety
* test: select template flow
* test: add data-test attrs
* chore: remove dead code
* test: create new job flow
* ui: add create button
* ui: create job template
* refact: no need for wildcard
* refact: record instead of delete
* styling: spacing
* ui: add error handling and form validation to job create template (#15767)
* ui: handle server side errors
* ui: show error to prevent duplicate
* refact: conditional namespace
* ui: save as template flow (#15787)
* bug: patches failing tests associated with `pretender` (#15812)
* refact: update assertion
* refact: test set-up
* ui: job templates manager view (#15815)
* ui: manager list view
* test: edit flow
* refact: deprecate column-helper
* ui: template edit and delete flow (#15823)
* ui: manager list view
* refact: update title
* refact: update permissions
* ui: template edit page
* bug: typo
* refact: update toast messages
* bug: clear selections on exit (#15827)
* bug: clear controllers on exit
* test: mirage config changes (#15828)
* refact: deprecate column-helper
* style: update z-index for HDS
* Revert "style: update z-index for HDS"
This reverts commit d3d87ceab6d083f7164941587448607838944fc1.
* refact: update delete button
* refact: edit redirect
* refact: patch reactivity issues
* styling: fixed width
* refact: override defaults
* styling: edit text causing overflow
* styling: add inline text
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
* bug: edit `text` to `template`
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
* test: delete flow job templates (#15896)
* refact: edit names
* bug: set correct ref to store
* chore: trim whitespace:
* test: delete flow
* bug: reactively update view (#15904)
* Initialized default jobs (#15856)
* Initialized default jobs
* More jobs scaffolded
* Better commenting on a couple example job specs
* Adapter doing the work
* fall back to epic config
* Label format helper and custom serialization logic
* Test updates to account for a never-empty state
* Test suite uses settled and maintain RecordArray in adapter return
* Updates to hello-world and variables example jobspecs
* Parameterized job gets optional payload output
* Formatting changes for param and service discovery job templates
* Multi-group service discovery job
* Basic test for default templates (#15965)
* Basic test for default templates
* Percy snapshot for manage page
* Some late-breaking design changes
* Some copy edits to the header paragraphs for job templates (#15967)
* Added some init options for job templates (#15994)
* Async method for populating default job templates from the variable adapter
---------
Co-authored-by: Jai <41024828+ChaiWithJai@users.noreply.github.com>
* Add `bridge_network_hairpin_mode` client config setting
* Add node attribute: `nomad.bridge.hairpin_mode`
* Changed format string to use `%q` to escape user provided data
* Add test to validate template JSON for developer safety
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
The ACL token decoding was not correctly handling time duration
syntax such as "1h" which forced people to use the nanosecond
representation via the HTTP API.
The change adds an unmarshal function which allows this syntax to
be used, along with other styles correctly.
Prior to 2409f72 the code compared the modification index of a job to itself. Afterwards, the code compared the creation index of the job to itself. In either case there should never be a case of re-parenting of allocs causing the evaluation to trivially always result in false, which leads to unreclaimable memory.
Prior to this change allocations and evaluations for batch jobs were never garbage collected until the batch job was explicitly stopped. The new `batch_eval_gc_threshold` server configuration controls how often they are collected. The default threshold is `24h`.
When registering a job with a service and 'consul.allow_unauthenticated=false',
we scan the given Consul token for an acceptable policy or role with an
acceptable policy, but did not scan for an acceptable service identity (which
is backed by an acceptable virtual policy). This PR updates our consul token
validation to also accept a matching service identity when registering a service
into Consul.
Fixes#15902
Implement a metric for RPC requests with labels on the identity, so that
administrators can monitor the source of requests within the cluster. This
changeset demonstrates the change with the new `ACL.WhoAmI` RPC, and we'll wire
up the remaining RPCs once we've threaded the new pre-forwarding authentication
through the all.
Note that metrics are measured after we forward but before we return any
authentication error. This ensures that we only emit metrics on the server that
actually serves the request. We'll perform rate limiting at the same place.
Includes telemetry configuration to omit identity labels.
In #15417 we added a new `Authenticate` method to the server that returns an
`AuthenticatedIdentity` struct. This changeset implements this method for a
small number of RPC endpoints that together represent all the various ways in
which RPCs are sent, so that we can validate that we're happy with this
approach.
* Add config elements
* Wire in snapshot configuration to raft
* Add hot reload of raft config
* Add documentation for new raft settings
* Add changelog
This PR adds support for configuring `proxy.upstreams[].config` for
Consul Connect upstreams. This is an opaque config value to Nomad -
the data is passed directly to Consul and is unknown to Nomad.
The command line flag parsing and the HTTP header parsing for CSI secrets
incorrectly split at more than one '=' rune, making it impossible to use secrets
that included that rune.
In #15605 we fixed the bug where the presense of "stale" query parameter
was mean to imply stale, even if the value of the parameter was "false"
or malformed. In parsing, we missed the case where the slice of values
would be nil which lead to a failing test case that was missed because
CI didn't run against the original PR.
This PR removes usages of `consul/sdk/testutil/retry`, as part of the
ongoing effort to remove use of any non-API module from Consul.
There is one remanining usage in the helper/freeport package, but that
will get removed as part of #15589
* Add changes to make stale querystring param boolean
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Make error message more consistent
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Changes from code review + Adding CHANGELOG file
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Changes from code review to use github.com/shoenig/test package
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Change must.Nil() to must.NoError()
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Minor fix on the import order
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Fix existing code format too
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Minor changes addressing code review feedbacks
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* swap must.EqOp() order of param provided
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* command: fixup job multi-stop test
This PR refactors the StopCommand test that runs 10 jobs and then
passes them all to one invokation of 'job stop'.
* test: swap use of assert for must
* test: cleanup job files we create
* command: fixup job stop failure tests
Now that JobStop works on concurrent jobs, the error messages are
different.
* cleanup: use multiple post scripts
This change add the RPC ACL binding rule handlers. These handlers
are responsible for the creation, updating, reading, and deletion
of binding rules.
The write handlers are feature gated so that they can only be used
when all federated servers are running the required version.
The HTTP API handlers and API SDK have also been added where
required. This allows the endpoints to be called from the API by users
and clients.
This PR is a continuation of #14917, where we missed the ipv6 cases.
Consul auto-inserts tagged_addresses for keys
- lan_ipv4
- wan_ipv4
- lan_ipv6
- wan_ipv6
even though the service registration coming from Nomad does not contain such
elements. When doing the differential between services Nomad expects to be
registered vs. the services actually registered into Consul, we must first
purge these automatically inserted tagged_addresses if they do not exist in
the Nomad view of the Consul service.
Currently CRUD code that operates on SSO auth methods does not return created or updated object upon creation/update. This is bad UX and inconsistent behavior compared to other ACL objects like roles, policies or tokens.
This PR fixes it.
Relates to #13120
This PR adds trace logging around the differential done between a Nomad service
registration and its corresponding Consul service registration, in an effort
to shed light on why a service registration request is being made.
During unusual outage recovery scenarios on large clusters, a backlog of
millions of evaluations can appear. In these cases, the `eval delete` command can
put excessive load on the cluster by listing large sets of evals to extract the
IDs and then sending larges batches of IDs. Although the command's batch size
was carefully tuned, we still need to be JSON deserialize, re-serialize to
MessagePack, send the log entries through raft, and get the FSM applied.
To improve performance of this recovery case, move the batching process into the
RPC handler and the state store. The design here is a little weird, so let's
look a the failed options first:
* A naive solution here would be to just send the filter as the raft request and
let the FSM apply delete the whole set in a single operation. Benchmarking with
1M evals on a 3 node cluster demonstrated this can block the FSM apply for
several minutes, which puts the cluster at risk if there's a leadership
failover (the barrier write can't be made while this apply is in-flight).
* A less naive but still bad solution would be to have the RPC handler filter
and paginate, and then hand a list of IDs to the existing raft log
entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and
took roughly an hour to complete.
Instead, we're filtering and paginating in the RPC handler to find a page token,
and then passing both the filter and page token in the raft log. The FSM apply
recreates the paginator using the filter and page token to get roughly the same
page of evaluations, which it then deletes. The pagination process is fairly
cheap (only abut 5% of the total FSM apply time), so counter-intuitively this
rework ends up being much faster. A benchmark of 1M evaluations showed this
blocked the FSM apply for 20-30ms at a time (typical for normal operations) and
completes in less than 4 minutes.
Note that, as with the existing design, this delete is not consistent: a new
evaluation inserted "behind" the cursor of the pagination will fail to be
deleted.
This PR solves a defect in the deserialization of api.Port structs when returning structs from theEventStream.
Previously, the api.Port struct's fields were decorated with both mapstructure and hcl tags to support the network.port stanza's use of the keyword static when posting a static port value. This works fine when posting a job and when retrieving any struct that has an embedded api.Port instance as long as the value is deserialized using JSON decoding. The EventStream, however, uses mapstructure to decode event payloads in the api package. mapstructure expects an underlying field named static which does not exist. The result was that the Port.Value field would always be set to 0.
Upon further inspection, a few things became apparent.
The struct already has hcl tags that support the indirection during job submission.
Serialization/deserialization with both the json and hcl packages produce the desired result.
The use of of the mapstructure tags provided no value as the Port struct contains only fields with primitive types.
This PR:
Removes the mapstructure tags from the api.Port structs
Updates the job parsing logic to use hcl instead of mapstructure when decoding Port instances.
Closes#11044
Co-authored-by: DerekStrickland <dstrickland@hashicorp.com>
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is
designed to support interactive use in the `nomad eval delete` command to get a
count of evals expected to be deleted before doing so.
The state store operations to do this sort of thing are somewhat expensive, but
it's cheaper than serializing a big list of evals to JSON. Note that although it
seems like this could be done as an extra parameter and response field on
`Eval.List`, having it as its own endpoint avoids having to change the response
body shape and lets us avoid handling the legacy filter params supported by
`Eval.List`.
* Adds meta to job list stub and displays a pack logo on the jobs index
* Changelog
* Modifying struct for optional meta param
* Explicitly ask for meta anytime I look up a job from index or job page
* Test case for the endpoint
* adding meta field to API struct and ommitting from response if empty
* passthru method added to api/jobs.list
* Meta param listed in docs for jobs list
* Update api/jobs.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
The configuration knobs for root keyring garbage collection are present in the
consumer and present in the user-facing config, but we missed the spot where we
copy from one to the other. Fix this so that users can set their own thresholds.
The root key is automatically rotated every ~30d, but the function that does
both rotation and key GC was wired up such that `nomad system gc` caused an
unexpected key rotation. Split this into two functions so that `nomad system gc`
cleans up old keys without forcing a rotation, which will be done periodially
or by the `nomad operator root keyring rotate` command.
The client ACL cache was not accounting for tokens which included
ACL role links. This change modifies the behaviour to resolve role
links to policies. It will also now store ACL roles within the
cache for quick lookup. The cache TTL is configurable in the same
manner as policies or tokens.
Another small fix is included that takes into account the ACL
token expiry time. This was not included, which meant tokens with
expiry could be used past the expiry time, until they were GC'd.
* consul: register checks along with service on initial registration
This PR updates Nomad's Consul service client to include checks in
an initial service registration, so that the checks associated with
the service are registered "atomically" with the service. Before, we
would only register the checks after the service registration, which
causes problems where the service is deemed healthy, even if one or
more checks are unhealthy - especially problematic in the case where
SuccessBeforePassing is configured.
Fixes#3935
* cr: followup to fix cause of extra consul logging
* cr: fix another bug
* cr: fixup changelog
This PR updates Nomad's Consul service client to do map comparisons
using maps.Equal instead of reflect.DeepEqual. The bug fix is in how
DeepEqual treats nil slices different from empty slices, when actually
they should be treated the same.