Commit graph

2083 commits

Author SHA1 Message Date
Dao Thanh Tung ca2f509e82
agent: Make agent syslog log level inherit from Nomad agent log (#15625) 2023-01-04 09:38:06 -05:00
Tim Gross 8859e1bff1
csi: Fix parsing of '=' in secrets at command line and HTTP (#15670)
The command line flag parsing and the HTTP header parsing for CSI secrets
incorrectly split at more than one '=' rune, making it impossible to use secrets
that included that rune.
2023-01-03 16:28:38 -05:00
Seth Hoenig 7214e21402
ci: swap freeport for portal in packages (#15661) 2023-01-03 11:25:20 -06:00
Seth Hoenig 9eb2433871
command: fixup parsing of stale query parameter (#15631)
In #15605 we fixed the bug where the presense of "stale" query parameter
was mean to imply stale, even if the value of the parameter was "false"
or malformed. In parsing, we missed the case where the slice of values
would be nil which lead to a failing test case that was missed because
CI didn't run against the original PR.
2023-01-03 08:21:20 -06:00
Seth Hoenig 266ca25a81
cleanup: remove usage of consul/sdk/testutil/retry (#15609)
This PR removes usages of `consul/sdk/testutil/retry`, as part of the
ongoing effort to remove use of any non-API module from Consul.

There is one remanining usage in the helper/freeport package, but that
will get removed as part of #15589
2023-01-02 08:06:20 -06:00
Dao Thanh Tung 53cd1b4871
fix: stale querystring parameter value as boolean (#15605)
* Add changes to make stale querystring param boolean

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

* Make error message more consistent

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

* Changes from code review + Adding CHANGELOG file

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

* Changes from code review to use github.com/shoenig/test package

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

* Change must.Nil() to must.NoError()

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

* Minor fix on the import order

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

* Fix existing code format too

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

* Minor changes addressing code review feedbacks

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

* swap must.EqOp() order of param provided

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
2023-01-01 13:04:14 -06:00
Seth Hoenig 92dfa41286
command: fixup tests concerning multi job stop (#15606)
* command: fixup job multi-stop test

This PR refactors the StopCommand test that runs 10 jobs and then
passes them all to one invokation of 'job stop'.

* test: swap use of assert for must

* test: cleanup job files we create

* command: fixup job stop failure tests

Now that JobStop works on concurrent jobs, the error messages are
different.

* cleanup: use multiple post scripts
2022-12-21 16:21:48 -06:00
Seth Hoenig 83f9fc9db4
tests: do not return error from testagent shutdown (#15595) 2022-12-21 08:23:58 -06:00
James Rasell 95c9ffa505
ACL: add ACL binding rule RPC and HTTP API handlers. (#15529)
This change add the RPC ACL binding rule handlers. These handlers
are responsible for the creation, updating, reading, and deletion
of binding rules.

The write handlers are feature gated so that they can only be used
when all federated servers are running the required version.

The HTTP API handlers and API SDK have also been added where
required. This allows the endpoints to be called from the API by users
and clients.
2022-12-15 09:18:55 +01:00
Seth Hoenig 119f7b1cd1
consul: fixup expected consul tagged_addresses when using ipv6 (#15411)
This PR is a continuation of #14917, where we missed the ipv6 cases.

Consul auto-inserts tagged_addresses for keys
- lan_ipv4
- wan_ipv4
- lan_ipv6
- wan_ipv6

even though the service registration coming from Nomad does not contain such
elements. When doing the differential between services Nomad expects to be
registered vs. the services actually registered into Consul, we must first
purge these automatically inserted tagged_addresses if they do not exist in
the Nomad view of the Consul service.
2022-12-01 07:38:30 -06:00
Piotr Kazmierczak 0eccd3286c
acl: sso auth methods RPC/API/CLI should return created or updated objects (#15410)
Currently CRUD code that operates on SSO auth methods does not return created or updated object upon creation/update. This is bad UX and inconsistent behavior compared to other ACL objects like roles, policies or tokens.

This PR fixes it.

Relates to #13120
2022-11-29 07:36:36 +01:00
James Rasell 32dfa431f3
sso: add ACL auth-method HTTP API CRUD endpoints (#15338)
* core: remove custom auth-method TTLS and use ACL token TTLS.

* agent: add ACL auth-method HTTP endpoints for CRUD actions.

* api: add ACL auth-method client.
2022-11-23 09:38:02 +01:00
hc-github-team-nomad-core 031d75e158 Generate files for 1.4.3 release 2022-11-22 12:56:29 -05:00
Seth Hoenig bf4b5f9a8d
consul: add trace logging around service registrations (#15311)
This PR adds trace logging around the differential done between a Nomad service
registration and its corresponding Consul service registration, in an effort
to shed light on why a service registration request is being made.
2022-11-21 08:03:56 -06:00
James Rasell a7350853ae
api: ensure ACL role upsert decode error returns a 400 status code. (#15253) 2022-11-18 17:47:43 +01:00
James Rasell 3225cf77b6
api: ensure all request body decode error return a 400 status code. (#15252) 2022-11-18 17:04:33 +01:00
James Rasell 2e19e9639e
agent: ensure all HTTP Server methods are pointer receivers. (#15250) 2022-11-15 16:31:44 +01:00
Tim Gross 37134a4a37
eval delete: move batching of deletes into RPC handler and state (#15117)
During unusual outage recovery scenarios on large clusters, a backlog of
millions of evaluations can appear. In these cases, the `eval delete` command can
put excessive load on the cluster by listing large sets of evals to extract the
IDs and then sending larges batches of IDs. Although the command's batch size
was carefully tuned, we still need to be JSON deserialize, re-serialize to
MessagePack, send the log entries through raft, and get the FSM applied.

To improve performance of this recovery case, move the batching process into the
RPC handler and the state store. The design here is a little weird, so let's
look a the failed options first:

* A naive solution here would be to just send the filter as the raft request and
  let the FSM apply delete the whole set in a single operation. Benchmarking with
  1M evals on a 3 node cluster demonstrated this can block the FSM apply for
  several minutes, which puts the cluster at risk if there's a leadership
  failover (the barrier write can't be made while this apply is in-flight).

* A less naive but still bad solution would be to have the RPC handler filter
  and paginate, and then hand a list of IDs to the existing raft log
  entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and
  took roughly an hour to complete.

Instead, we're filtering and paginating in the RPC handler to find a page token,
and then passing both the filter and page token in the raft log. The FSM apply
recreates the paginator using the filter and page token to get roughly the same
page of evaluations, which it then deletes. The pagination process is fairly
cheap (only abut 5% of the total FSM apply time), so counter-intuitively this
rework ends up being much faster. A benchmark of 1M evaluations showed this
blocked the FSM apply for 20-30ms at a time (typical for normal operations) and
completes in less than 4 minutes.

Note that, as with the existing design, this delete is not consistent: a new
evaluation inserted "behind" the cursor of the pagination will fail to be
deleted.
2022-11-14 14:08:13 -05:00
Derek Strickland 80b6f27efd
api: remove mapstructure tags fromPort struct (#12916)
This PR solves a defect in the deserialization of api.Port structs when returning structs from theEventStream.

Previously, the api.Port struct's fields were decorated with both mapstructure and hcl tags to support the network.port stanza's use of the keyword static when posting a static port value. This works fine when posting a job and when retrieving any struct that has an embedded api.Port instance as long as the value is deserialized using JSON decoding. The EventStream, however, uses mapstructure to decode event payloads in the api package. mapstructure expects an underlying field named static which does not exist. The result was that the Port.Value field would always be set to 0.

Upon further inspection, a few things became apparent.

The struct already has hcl tags that support the indirection during job submission.
Serialization/deserialization with both the json and hcl packages produce the desired result.
The use of of the mapstructure tags provided no value as the Port struct contains only fields with primitive types.
This PR:

Removes the mapstructure tags from the api.Port structs
Updates the job parsing logic to use hcl instead of mapstructure when decoding Port instances.
Closes #11044

Co-authored-by: DerekStrickland <dstrickland@hashicorp.com>
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2022-11-08 11:26:28 +01:00
Drew Gonzales aac9404ee5
server: add git revision to serf tags (#9159) 2022-11-07 10:34:33 -05:00
Tim Gross 9e1c0b46d8
API for Eval.Count (#15147)
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is
designed to support interactive use in the `nomad eval delete` command to get a
count of evals expected to be deleted before doing so.

The state store operations to do this sort of thing are somewhat expensive, but
it's cheaper than serializing a big list of evals to JSON. Note that although it
seems like this could be done as an extra parameter and response field on
`Eval.List`, having it as its own endpoint avoids having to change the response
body shape and lets us avoid handling the legacy filter params supported by
`Eval.List`.
2022-11-07 08:53:19 -05:00
Charlie Voiselle 79c4478f5b
template: error on missing key (#15141)
* Support error_on_missing_value for templates
* Update docs for template stanza
2022-11-04 13:23:01 -04:00
Phil Renaud ffb4c63af7
[ui] Adds meta to job list stub and displays a pack logo on the jobs index (#14833)
* Adds meta to job list stub and displays a pack logo on the jobs index

* Changelog

* Modifying struct for optional meta param

* Explicitly ask for meta anytime I look up a job from index or job page

* Test case for the endpoint

* adding meta field to API struct and ommitting from response if empty

* passthru method added to api/jobs.list

* Meta param listed in docs for jobs list

* Update api/jobs.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-11-02 16:58:24 -04:00
hc-github-team-nomad-core fbef8881cd Generate files for 1.4.2 release 2022-10-27 13:08:05 -04:00
Tim Gross b9922631bd
keyring: fix missing GC config, don't rotate on manual GC (#15009)
The configuration knobs for root keyring garbage collection are present in the
consumer and present in the user-facing config, but we missed the spot where we
copy from one to the other. Fix this so that users can set their own thresholds.

The root key is automatically rotated every ~30d, but the function that does
both rotation and key GC was wired up such that `nomad system gc` caused an
unexpected key rotation. Split this into two functions so that `nomad system gc`
cleans up old keys without forcing a rotation, which will be done periodially
or by the `nomad operator root keyring rotate` command.
2022-10-24 08:43:42 -04:00
Luiz Aoqui 0fddb4d7e8
Post 1.4.1 release (#14988)
* Generate files for 1.4.1 release

* Prepare for next release

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2022-10-20 13:09:41 -04:00
Seth Hoenig 756b71b7d2
deps: bump shoenig for str func bugfixes (#14974)
And fix the one place we use them.
2022-10-20 08:11:43 -05:00
James Rasell d7b311ce55
acl: correctly resolve ACL roles within client cache. (#14922)
The client ACL cache was not accounting for tokens which included
ACL role links. This change modifies the behaviour to resolve role
links to policies. It will also now store ACL roles within the
cache for quick lookup. The cache TTL is configurable in the same
manner as policies or tokens.

Another small fix is included that takes into account the ACL
token expiry time. This was not included, which meant tokens with
expiry could be used past the expiry time, until they were GC'd.
2022-10-20 09:37:32 +02:00
Seth Hoenig 57375566d4
consul: register checks along with service on initial registration (#14944)
* consul: register checks along with service on initial registration

This PR updates Nomad's Consul service client to include checks in
an initial service registration, so that the checks associated with
the service are registered "atomically" with the service. Before, we
would only register the checks after the service registration, which
causes problems where the service is deemed healthy, even if one or
more checks are unhealthy - especially problematic in the case where
SuccessBeforePassing is configured.

Fixes #3935

* cr: followup to fix cause of extra consul logging

* cr: fix another bug

* cr: fixup changelog
2022-10-19 12:40:56 -05:00
Seth Hoenig f1b902beac
consul: do not re-register already registered services (#14917)
This PR updates Nomad's Consul service client to do map comparisons
using maps.Equal instead of reflect.DeepEqual. The bug fix is in how
DeepEqual treats nil slices different from empty slices, when actually
they should be treated the same.
2022-10-18 08:10:59 -05:00
Seth Hoenig 306b4dd38e
cleanup: remove another string-set helper function (#14902) 2022-10-17 14:14:52 -05:00
Michael Schurter 45ce8c13cf
client: remove unused LogOutput and LogLevel (#14867)
* client: remove unused LogOutput

* client: remove unused config.LogLevel
2022-10-11 09:24:40 -07:00
Seth Hoenig 5e38a0e82c
cleanup: rename Equals to Equal for consistency (#14759) 2022-10-10 09:28:46 -05:00
hc-github-team-nomad-core 4fdcd197c0 Generate files for 1.4.0 release 2022-10-06 09:16:00 -07:00
Seth Hoenig c68ed3b4c8
client: protect user lookups with global lock (#14742)
* client: protect user lookups with global lock

This PR updates Nomad client to always do user lookups while holding
a global process lock. This is to prevent concurrency unsafe implementations
of NSS, but still enabling NSS lookups of users (i.e. cannot not use osusergo).

* cl: add cl
2022-09-29 09:30:13 -05:00
hc-github-team-nomad-core 2fe5a962f3 Generate files for 1.4.0-rc.1 release 2022-09-27 17:33:32 -04:00
Jorge Marey 584ddfe859
Add Namespace, Job and Group to envoy stats (#14311) 2022-09-22 10:38:21 -04:00
Seth Hoenig 2088ca3345
cleanup more helper updates (#14638)
* cleanup: refactor MapStringStringSliceValueSet to be cleaner

* cleanup: replace SliceStringToSet with actual set

* cleanup: replace SliceStringSubset with real set

* cleanup: replace SliceStringContains with slices.Contains

* cleanup: remove unused function SliceStringHasPrefix

* cleanup: fixup StringHasPrefixInSlice doc string

* cleanup: refactor SliceSetDisjoint to use real set

* cleanup: replace CompareSliceSetString with SliceSetEq

* cleanup: replace CompareMapStringString with maps.Equal

* cleanup: replace CopyMapStringString with CopyMap

* cleanup: replace CopyMapStringInterface with CopyMap

* cleanup: fixup more CopyMapStringString and CopyMapStringInt

* cleanup: replace CopySliceString with slices.Clone

* cleanup: remove unused CopySliceInt

* cleanup: refactor CopyMapStringSliceString to be generic as CopyMapOfSlice

* cleanup: replace CopyMap with maps.Clone

* cleanup: run go mod tidy
2022-09-21 14:53:25 -05:00
Luiz Aoqui c3c8ae584f
api: provide more detail on ACL bootstrap request error (#14629) 2022-09-20 21:20:04 -04:00
hc-github-team-nomad-core a3a718e167 Generate files for 1.4.0-beta.1 release 2022-09-14 19:32:18 +00:00
hc-github-team-nomad-core b91437bb68 Generate files for 1.4.0-beta.1 release 2022-09-14 18:59:59 +00:00
Seth Hoenig 5187f92c5e
cleanup: create interface for check watcher and mock it in nsd tests (#14577)
* cleanup: create interface for check watcher and mock it in nsd tests

* cleanup: add comments for check watcher interface
2022-09-14 08:25:20 -05:00
Seth Hoenig 9a943107c7 servicedisco: implement check_restart for nomad service checks
This PR implements support for check_restart for checks registered
in the Nomad service provider.

Unlike Consul, Nomad service checks never report a "warning" status,
and so the check_restart.ignore_warnings configuration is not valid
for Nomad service checks.
2022-09-13 08:59:23 -05:00
Seth Hoenig feff36f3f7 client: refactor check watcher to be reusable
This PR refactors agent/consul/check_watcher into client/serviceregistration,
and abstracts away the Consul-specific check lookups.

In doing so we should be able to reuse the existing check watcher logic for
also watching NSD checks in a followup PR.

A chunk of consul/unit_test.go is removed - we'll cover that in e2e tests
in a follow PR if needed. In the long run I'd like to remove this whole file.
2022-09-12 10:13:31 -05:00
Seth Hoenig 31234d6a62 cleanup: consolidate interfaces for workload restarting
This PR combines two of the same interface definitions around workload restarting
2022-09-09 08:59:04 -05:00
Tim Gross 9259a373cd
remove root keyring install API (#14514)
* keyring rotate API should require put/post method
* remove keyring install API
2022-09-09 08:50:35 -04:00
James Rasell 3fa8b0b270
client: fix RPC forwarding when querying checks for alloc. (#14498)
When querying the checks for an allocation, the request must be
forwarded to the agent that is running the allocation. If the
initial request is made to a server agent, the request can be made
directly to the client agent running the allocation. If the
request is made to a client agent not running the alloc, the
request needs to be forwarded to a server and then the correct
client.
2022-09-08 16:55:23 +02:00
Tim Gross 7921f044e5
migrate autopilot implementation to raft-autopilot (#14441)
Nomad's original autopilot was importing from a private package in Consul. It
has been moved out to a shared library. Switch Nomad to use this library so that
we can eliminate the import of Consul, which is necessary to build Nomad ENT
with the current version of the Consul SDK. This also will let us pick up
autopilot improvements shared with Consul more easily.
2022-09-01 14:27:10 -04:00
Derek Strickland 35e91ff376
Merge release 1.3.5 files (#14425)
* Merge release 1.3.5 files

* Generate files for 1.3.5 release

* Prepare for next release

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2022-08-31 18:31:56 -04:00
Charlie Voiselle 5c0e34dd33
Vars: Update CT dependency to support variables. (#14399)
* Update Consul Template dep to support Nomad vars

* Remove `Peering` config for Consul Testservers
Upgrading to the 1.14 Consul SDK introduces and additional default
configuration—`Peering`—that is not compatible with versions of Consul
before v1.13.0. because Nomad tests against Consul v1.11.1, this
configuration has to be nil'ed out before passing it to the Consul
binary.
2022-08-30 15:26:01 -04:00