* api: enable support for setting original source alongside job
This PR adds support for setting job source material along with
the registration of a job.
This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.
The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.
The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.
* api: avoid writing var content to disk for parsing
* api: move submission validation into RPC layer
* api: return an error if updating a job submission without namespace or job id
* api: be exact about the job index we associate a submission with (modify)
* api: reword api docs scheduling
* api: prune all but the last 6 job submissions
* api: protect against nil job submission in job validation
* api: set max job source size in test server
* api: fixups from pr
Nomad's security model requires mTLS in order to secure client-to-server and
server-to-server communications. Configuring ACLs alone is not enough. Loudly
warn the user if mTLS is not configured in non-dev modes.
These endpoints all refer to JobID by the time you get to the RPC request
layer, but the HTTP handler functions call the field JobName, which is a different
field (... though often with the same value).
This update changes the behaviour when following logs from an
allocation, so that both stdout and stderr files streamed when the
operator supplies the follow flag. The previous behaviour is held
when all other flags and situations are provided.
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Implement the new `nomad job restart` command that allows operators to
restart allocations tasks or reschedule then entire allocation.
Restarts can be batched to target multiple allocations in parallel.
Between each batch the command can stop and hold for a predefined time
or until the user confirms that the process should proceed.
This implements the "Stateless Restarts" alternative from the original
RFC
(https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180).
The original concept is still worth implementing, as it allows this
functionality to be exposed over an API that can be consumed by the
Nomad UI and other clients. But the implementation turned out to be more
complex than we initially expected so we thought it would be better to
release a stateless CLI-based implementation first to gather feedback
and validate the restart behaviour.
Co-authored-by: Shishir Mahajan <smahajan@roblox.com>
* Generate files for 1.5.2 release
* Prepare for next release
* add 1.4.7 and 1.3.12 to the changelog
---------
Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
Fixes#16517
Given a 3 Server cluster with at least 1 Client connected to Follower 1:
If a NodeMeta.{Apply,Read} for the Client request is received by
Follower 1 with `AllowStale = false` the Follower will forward the
request to the Leader.
The Leader, not being connected to the target Client, will forward the
RPC to Follower 1.
Follower 1, seeing AllowStale=false, will forward the request to the
Leader.
The Leader, not being connected to... well hoppefully you get the
picture: an infinite loop occurs.
* Added and flag to command
* cli[style]: small refactor to avoid confussion with tmpl variable
* Update inspect.mdx
* cli: add changelog entry
* Update .changelog/16478.txt
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update command/quota_inspect.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* cli: add json and t flags to quota status command
* cli: add entry to changelog
* Update command/quota_status.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* cli: Add and flags to server members
* Update website/content/docs/commands/server/members.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update website/content/docs/commands/server/members.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* cli: update the server memebers tests to use must
* cli: add flags addition to changelog
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
nomad login command does not need to know ACL Auth Method's type, since all
method names are unique.
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* cli: Add and flag to namespace status command
* Update command/namespace_status.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* cli: update tests for namespace status command to use must
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Our auth token parsing code trims space around the `Authorization` header but
not around `X-Nomad-Token`. When using the UI, it's easy to accidentally
introduce a leading or trailing space, which results in spurious authentication
errors. Trim the space at the HTTP server.
The job evaluate endpoint creates a new evaluation for the job which is
a write operation. This change modifies the necessary capability from
`read-job` to `submit-job` to better reflect this.
* cli: add -json flag to alloc checks for completion
* CLI: Expand test to include testing the json flag for allocation checks
* Documentation: Add the checks command
* Documentation: Add example for alloc check command
* Update website/content/docs/commands/alloc/checks.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* CLI: Add template flag to alloc checks command
* Update website/content/docs/commands/alloc/checks.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* CLI: Extend test to include -t flag for alloc checks
* func: add changelog for added flags to alloc checks
* cli[doc]: Make usage section on alloc checks clearer
* Update website/content/docs/commands/alloc/checks.mdx
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Delete modd.conf
* cli[doc]: add -t flag to command description for alloc checks
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Co-authored-by: Juanita De La Cuesta Morales <juanita.delacuestamorales@juanita.delacuestamorales-LHQ7X0QG9X>
Most job subcommands allow for job ID prefix match as a convenience
functionality so users don't have to type the full job ID.
But this introduces a hard ACL requirement that the token used to run
these commands have the `list-jobs` permission, even if the token has
enough permission to execute the basic command action and the user
passed an exact job ID.
This change softens this requirement by not failing the prefix match in
case the request results in a permission denied error and instead using
the information passed by the user directly.
Several `nomad job` subcommands had duplicate or slightly similar logic
for resolving a job ID from a CLI argument prefix, while others did not
have this functionality at all.
This commit pulls the shared logic to the command Meta and updates all
`nomad job` subcommands to use it.
* build: add BuildDate to version info
will be used in enterprise to compare to license expiration time
* cli: multi-line version output, add BuildDate
before:
$ nomad version
Nomad v1.4.3 (coolfakecommithashomgoshsuchacoolonewoww)
after:
$ nomad version
Nomad v1.5.0-dev
BuildDate 2023-02-17T19:29:26Z
Revision coolfakecommithashomgoshsuchacoolonewoww
compare consul:
$ consul version
Consul v1.14.4
Revision dae670fe
Build Date 2023-01-26T15:47:10Z
Protocol 2 spoken by default, blah blah blah...
and vault:
$ vault version
Vault v1.12.3 (209b3dd99fe8ca320340d08c70cff5f620261f9b), built 2023-02-02T09:07:27Z
* docs: update version command output
* taskapi: return Forbidden on bad credentials
Prior to this change a "Server error" would be returned when ACLs are
enabled which did not match when ACLs are disabled.
* e2e: love love love datacenter wildcard default
* e2e: skip windows nodes on linux only test
The Logfs are a bit weird because they're most useful when converted to
Printfs to make debugging the test much faster, but that makes CI noisy.
In a perfect world Go would expose how many tests are being run and we
could stream output live if there's only 1. For now I left these helpful
lines in as basically glorified comments.
The `nomad fmt -check` command incorrectly writes to file because we didn't
return before writing the file on a diff. Fix this bug and update the command
internals to differentiate between the write-to-file and write-to-stdout code
paths, which are activated by different combinations of options and flags.
The docstring for the `-list` and `-write` flags is also unclear and can be
easily misread to be the opposite of the actual behavior. Clarify this and fix
up the docs to match.
This changeset also refactors the tests quite a bit so as to make the test
outputs clear when something is incorrect.
* Warn when Items key isn't directly accessible
Go template requires that map keys are alphanumeric for direct access
using the dotted reference syntax. This warns users when they create
keys that run afoul of this requirement.
- cli: use regex to detect invalid indentifiers in var keys
- test: fix slash in escape test case
- api: share warning formatting function between API and CLI
- ui: warn if var key has characters other than _, letter, or number
---------
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
* Convert assets from bindatafs to go embeds
* Add command/asset to "uninteresting" list for missing test check
* Remove generate-examples target
* Update paths in tests
When an auth method was not supplied and the OIDC type was given
in lower case, the CLI was not matching the default method due to
casing and responded with a confusing user message.
This change fixes the above problem, along with making use of the
santized type easier.
Fixes#14617
Dynamic Node Metadata allows Nomad users, and their jobs, to update Node metadata through an API. Currently Node metadata is only reloaded when a Client agent is restarted.
Includes new UI for editing metadata as well.
---------
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
* main: remove deprecated uses of rand.Seed
go1.20 deprecates rand.Seed, and seeds the rand package
automatically. Remove cases where we seed the random package,
and cleanup the one case where we intentionally create a
known random source.
* cl: update cl
* mod: update go mod
This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled.
This PR contains the core feature and tests but followup work is required for the following TODO items:
- Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink
- Unit tests for auth middleware
- Caching for auth middleware
- Rate limiting on negative lookups for auth middleware
---------
Co-authored-by: Seth Hoenig <shoenig@duck.com>
Add `identity` jobspec block to expose workload identity tokens to tasks.
---------
Co-authored-by: Anders <mail@anars.dk>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
If a deployment fails, the `deployment status` command can get a nil deployment
when it checks for a rollback deployment if there isn't one (or at least not one
at the time of the query). Fix the panic.
* Extend variables under the nomad path prefix to allow for job-templates (#15570)
* Extend variables under the nomad path prefix to allow for job-templates
* Add job-templates to error message hinting
* RadioCard component for Job Templates (#15582)
* chore: add
* test: component API
* ui: component template
* refact: remove bc naming collission
* styles: remove SASS var causing conflicts
* Disallow specific variable at nomad/job-templates (#15681)
* Disallows variables at exactly nomad/job-templates
* idiomatic refactor
* Expanding nomad job init to accept a template flag (#15571)
* Adding a string flag for templates on job init
* data-down actions-up version of a custom template editor within variable
* Dont force grid on job template editor
* list-templates flag started
* Correctly slice from end of path name
* Pre-review cleanup
* Variable form acceptance test for job template editing
* Some review cleanup
* List Job templates test
* Example from template test
* Using must.assertions instead of require etc
* ui: add choose template button (#15596)
* ui: add new routes
* chore: update file directory
* ui: add choose template button
* test: button and page navigation
* refact: update var name
* ui: use `Button` component from `HDS` (#15607)
* ui: integrate buttons
* refact: remove helper
* ui: remove icons on non-tertiary buttons
* refact: update normalize method for key/value pairs (#15612)
* `revert`: `onCancel` for `JobDefinition`
The `onCancel` method isn't included in the component API for `JobEditor` and the primary cancel behavior exists outside of the component. With the exception of the `JobDefinition` page where we include this button in the top right of the component instead of next to the `Plan` button.
* style: increase button size
* style: keep lime green
* ui: select template (#15613)
* ui: deprecate unused component
* ui: deprecate tests
* ui: jobs.run.templates.index
* ui: update logic to handle templates
* refact: revert key/value changes
* style: padding for cards + buttons
* temp: fixtures for mirage testing
* Revert "refact: revert key/value changes"
This reverts commit 124e95d12140be38fc921f7e15243034092c4063.
* ui: guard template for unsaved job
* ui: handle reading template variable
* Revert "refact: update normalize method for key/value pairs (#15612)"
This reverts commit 6f5ffc9b610702aee7c47fbff742cc81f819ab74.
* revert: remove test fixtures
* revert: prettier problems
* refact: test doesnt need filter expression
* styling: button sizes and responsive cards
* refact: remove route guarding
* ui: update variable adapter
* refact: remove model editing behavior
* refact: model should query variables to populate editor
* ui: clear qp on exit
* refact: cleanup deprecated API
* refact: query all namespaces
* refact: deprecate action
* ui: rely on collection
* refact: patch deprecate transition API
* refact: patch test to expect namespace qp
* styling: padding, conditionals
* ui: flashMessage on 404
* test: update for o(n+1) query
* ui: create new job template (#15744)
* refact: remove unused code
* refact: add type safety
* test: select template flow
* test: add data-test attrs
* chore: remove dead code
* test: create new job flow
* ui: add create button
* ui: create job template
* refact: no need for wildcard
* refact: record instead of delete
* styling: spacing
* ui: add error handling and form validation to job create template (#15767)
* ui: handle server side errors
* ui: show error to prevent duplicate
* refact: conditional namespace
* ui: save as template flow (#15787)
* bug: patches failing tests associated with `pretender` (#15812)
* refact: update assertion
* refact: test set-up
* ui: job templates manager view (#15815)
* ui: manager list view
* test: edit flow
* refact: deprecate column-helper
* ui: template edit and delete flow (#15823)
* ui: manager list view
* refact: update title
* refact: update permissions
* ui: template edit page
* bug: typo
* refact: update toast messages
* bug: clear selections on exit (#15827)
* bug: clear controllers on exit
* test: mirage config changes (#15828)
* refact: deprecate column-helper
* style: update z-index for HDS
* Revert "style: update z-index for HDS"
This reverts commit d3d87ceab6d083f7164941587448607838944fc1.
* refact: update delete button
* refact: edit redirect
* refact: patch reactivity issues
* styling: fixed width
* refact: override defaults
* styling: edit text causing overflow
* styling: add inline text
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
* bug: edit `text` to `template`
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
* test: delete flow job templates (#15896)
* refact: edit names
* bug: set correct ref to store
* chore: trim whitespace:
* test: delete flow
* bug: reactively update view (#15904)
* Initialized default jobs (#15856)
* Initialized default jobs
* More jobs scaffolded
* Better commenting on a couple example job specs
* Adapter doing the work
* fall back to epic config
* Label format helper and custom serialization logic
* Test updates to account for a never-empty state
* Test suite uses settled and maintain RecordArray in adapter return
* Updates to hello-world and variables example jobspecs
* Parameterized job gets optional payload output
* Formatting changes for param and service discovery job templates
* Multi-group service discovery job
* Basic test for default templates (#15965)
* Basic test for default templates
* Percy snapshot for manage page
* Some late-breaking design changes
* Some copy edits to the header paragraphs for job templates (#15967)
* Added some init options for job templates (#15994)
* Async method for populating default job templates from the variable adapter
---------
Co-authored-by: Jai <41024828+ChaiWithJai@users.noreply.github.com>
* Add `bridge_network_hairpin_mode` client config setting
* Add node attribute: `nomad.bridge.hairpin_mode`
* Changed format string to use `%q` to escape user provided data
* Add test to validate template JSON for developer safety
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
The ACL token decoding was not correctly handling time duration
syntax such as "1h" which forced people to use the nanosecond
representation via the HTTP API.
The change adds an unmarshal function which allows this syntax to
be used, along with other styles correctly.
Prior to 2409f72 the code compared the modification index of a job to itself. Afterwards, the code compared the creation index of the job to itself. In either case there should never be a case of re-parenting of allocs causing the evaluation to trivially always result in false, which leads to unreclaimable memory.
Prior to this change allocations and evaluations for batch jobs were never garbage collected until the batch job was explicitly stopped. The new `batch_eval_gc_threshold` server configuration controls how often they are collected. The default threshold is `24h`.
When registering a job with a service and 'consul.allow_unauthenticated=false',
we scan the given Consul token for an acceptable policy or role with an
acceptable policy, but did not scan for an acceptable service identity (which
is backed by an acceptable virtual policy). This PR updates our consul token
validation to also accept a matching service identity when registering a service
into Consul.
Fixes#15902
Implement a metric for RPC requests with labels on the identity, so that
administrators can monitor the source of requests within the cluster. This
changeset demonstrates the change with the new `ACL.WhoAmI` RPC, and we'll wire
up the remaining RPCs once we've threaded the new pre-forwarding authentication
through the all.
Note that metrics are measured after we forward but before we return any
authentication error. This ensures that we only emit metrics on the server that
actually serves the request. We'll perform rate limiting at the same place.
Includes telemetry configuration to omit identity labels.
In #15417 we added a new `Authenticate` method to the server that returns an
`AuthenticatedIdentity` struct. This changeset implements this method for a
small number of RPC endpoints that together represent all the various ways in
which RPCs are sent, so that we can validate that we're happy with this
approach.
* Add config elements
* Wire in snapshot configuration to raft
* Add hot reload of raft config
* Add documentation for new raft settings
* Add changelog
This PR adds support for configuring `proxy.upstreams[].config` for
Consul Connect upstreams. This is an opaque config value to Nomad -
the data is passed directly to Consul and is unknown to Nomad.
The command line flag parsing and the HTTP header parsing for CSI secrets
incorrectly split at more than one '=' rune, making it impossible to use secrets
that included that rune.
In #15605 we fixed the bug where the presense of "stale" query parameter
was mean to imply stale, even if the value of the parameter was "false"
or malformed. In parsing, we missed the case where the slice of values
would be nil which lead to a failing test case that was missed because
CI didn't run against the original PR.
This PR removes usages of `consul/sdk/testutil/retry`, as part of the
ongoing effort to remove use of any non-API module from Consul.
There is one remanining usage in the helper/freeport package, but that
will get removed as part of #15589
* Add changes to make stale querystring param boolean
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Make error message more consistent
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Changes from code review + Adding CHANGELOG file
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Changes from code review to use github.com/shoenig/test package
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Change must.Nil() to must.NoError()
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Minor fix on the import order
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Fix existing code format too
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* Minor changes addressing code review feedbacks
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* swap must.EqOp() order of param provided
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
* command: fixup job multi-stop test
This PR refactors the StopCommand test that runs 10 jobs and then
passes them all to one invokation of 'job stop'.
* test: swap use of assert for must
* test: cleanup job files we create
* command: fixup job stop failure tests
Now that JobStop works on concurrent jobs, the error messages are
different.
* cleanup: use multiple post scripts