Commit graph

22713 commits

Author SHA1 Message Date
Michael Klein 37993d9fb8 fix: client-detail-test no default namespace param
Recent changes changed the behavior of not adding the `@default`
-namespace - we need to adapt the tests accordingly
2022-02-17 12:41:33 +01:00
Michael Klein b0a90b425e fix: allocations page tests regarding job links
Default namespaced jobs don't append the `@default`-id anymore due
to recent `jobs.job#serialize` changes.
2022-02-17 11:56:29 +01:00
Luiz Aoqui de91954582
initial base work for implementing sorting and filter across API endpoints (#12076) 2022-02-16 14:34:36 -05:00
Seth Hoenig 9fccc8f8bc
Merge pull request #12077 from hashicorp/b-makefile-use-gobin
build: respect GOBIN when using make targets
2022-02-16 13:25:03 -06:00
Seth Hoenig 98758d4287 build: respect GOBIN when using make targets
This PR updates GNUMakefile to respect $GOBIN if it is set in the
environment or via an $GOENV file. Previously we hard-coded the output
to $GOPATH/bin, which is not necessarily the desired behavior.
2022-02-16 12:05:55 -06:00
Michael Klein d8b89e64dc fix: some test fixes module-for-job
* less clever™ metaprogramming when checking for expectedURL
* clicking slices job-client-status-summary needs to change its
  behavior and not pass the namespace query-param anymore.
2022-02-16 17:47:29 +01:00
Michael Klein ecaf6d3299 refact: don't pass namespace as query-param in job-subnav
The new ID handling gives us this behavior for free and we don't need
to drill the namespace down through all the route-layers anymore.
2022-02-16 17:44:16 +01:00
Michael Klein 3202e5036d refact: use idWithNamespace in serialize hook jobs.job
This will give us 'correct' URLs for free when we only pass a `job`-model
to a `LinkTo` that links to the `jobs.job.*`-routes.
2022-02-16 17:42:26 +01:00
Luiz Aoqui 110dbeeb9d
Add go-bexpr filters to evals and deployment list endpoints (#12034) 2022-02-16 11:40:30 -05:00
Michael Klein 913eee982a fix: job-versions-test
We need to adapt the test due to recent namespace changes.
2022-02-16 15:31:10 +01:00
Michael Klein cbb72edf68 fix: job-dispatch tests after namespace changes 2022-02-16 15:22:41 +01:00
Michael Klein b75188d8ed feat: improve namespace handling job-route 2022-02-16 15:03:02 +01:00
Michael Klein dd410401bd refact: use idWithNamespace in job-row links 2022-02-16 15:02:39 +01:00
Michael Klein 6903ba33b3 refact: render jobs.index template again 2022-02-16 15:02:06 +01:00
Michael Klein 34cb277f9c feat: add idWithNamespace-getter job model 2022-02-16 15:01:25 +01:00
Tiernan c30b4617aa
interpolate network.dns block on client (#12021) 2022-02-16 08:39:44 -05:00
Tim Gross 27bb2da5ee
CSI: make gRPC client creation more robust (#12057)
Nomad communicates with CSI plugin tasks via gRPC. The plugin
supervisor hook uses this to ping the plugin for health checks which
it emits as task events. After the first successful health check the
plugin supervisor registers the plugin in the client's dynamic plugin
registry, which in turn creates a CSI plugin manager instance that has
its own gRPC client for fingerprinting the plugin and sending mount
requests.

If the plugin manager instance fails to connect to the plugin on its
first attempt, it exits. The plugin supervisor hook is unaware that
connection failed so long as its own pings continue to work. A
transient failure during plugin startup may mislead the plugin
supervisor hook into thinking the plugin is up (so there's no need to
restart the allocation) but no fingerprinter is started.

* Refactors the gRPC client to connect on first use. This provides the
  plugin manager instance the ability to retry the gRPC client
  connection until success.
* Add a 30s timeout to the plugin supervisor so that we don't poll
  forever waiting for a plugin that will never come back up.

Minor improvements:
* The plugin supervisor hook creates a new gRPC client for every probe
  and then throws it away. Instead, reuse the client as we do for the
  plugin manager.
* The gRPC client constructor has a 1 second timeout. Clarify that this
  timeout applies to the connection and not the rest of the client
  lifetime.
2022-02-15 16:57:29 -05:00
Seth Hoenig ac3cd73d00
Merge pull request #12054 from hashicorp/b-creation-indexes
api: return sorted results in certain list endpoints
2022-02-15 15:08:38 -06:00
Seth Hoenig 40c714a681 api: return sorted results in certain list endpoints
These API endpoints now return results in chronological order. They
can return results in reverse chronological order by setting the
query parameter ascending=true.

- Eval.List
- Deployment.List
2022-02-15 13:48:28 -06:00
Seth Hoenig f8f0d92469
Merge pull request #11955 from hashicorp/f-update-gopsutil
Update gopsutil to 3.21.12
2022-02-15 08:31:57 -06:00
Seth Hoenig d1ce07cbf7 cl: shorten changelog entry 2022-02-15 08:31:25 -06:00
Tim Gross bed9b3c248
changelog entry (#12072) 2022-02-15 09:00:30 -05:00
Seth Hoenig 3646bdd738
Merge pull request #12066 from hashicorp/f-make-golint-faster
build: allow golangci-lint to use more than 1 core
2022-02-15 08:00:07 -06:00
Alex Holyoake 3071c7d91b
config: merge ReservableCores in clientConfig (#12044) 2022-02-15 08:36:37 -05:00
Seth Hoenig 5e919ae95b
Merge pull request #12069 from alrs/scheduler-test-err
scheduler: fix dropped test error
2022-02-15 07:29:50 -06:00
Lars Lehtonen a07795b4a2
scheduler: fix dropped test error 2022-02-14 22:11:45 -08:00
Seth Hoenig 84abc16ec6 build: allow golangci-lint to use more than 1 core
Since switching to `golangci-lint` we have set the `-j 1` flag, which
restricts the tool to using 1 CPU thread.

This PR removes the flag so `make check` takes less time on good
computers.
2022-02-14 16:56:58 -06:00
Mike Nomitch 8377f5cfe3 Adding link to interview form 2022-02-14 12:38:26 -08:00
James Rasell 15205b5408
Merge pull request #12052 from hashicorp/b-taskrunner-track-deregistered-call
client: track service deregister call so it's only called once.
2022-02-14 09:01:26 +01:00
Tim Gross 2f79a260fe
csi: volume cli prefix matching should accept exact match (#12051)
The `volume detach`, `volume deregister`, and `volume status` commands
accept a prefix argument for the volume ID. Update the behavior on
exact matches so that if there is more than one volume that matches
the prefix, we should only return an error if one of the volume IDs is
not an exact match. Otherwise we won't be able to use these commands
at all on those volumes. This also makes the behavior of these commands
consistent with `job stop`.
2022-02-11 08:53:03 -05:00
Tim Gross 8ffe7aa76f
csi: provide CSI_ENDPOINT env var to plugins (#12050)
The CSI specification says:
> The CO SHALL provide the listen-address for the Plugin by way of the
`CSI_ENDPOINT` environment variable.

Note that plugins without filesystem isolation won't have the plugin
dir bind-mounted to their alloc dir, but we can provide a path to the
socket anyways.

Refactor to use opts struct for plugin supervisor hook config.
The parameter list for configuring the plugin supervisor hook has
grown enough where is makes sense to use an options struct similiar to
many of the other task runner hooks (ex. template).
2022-02-11 08:46:21 -05:00
James Rasell 926458c5b2
Merge pull request #12053 from marcaurele/fix-typo
doc(typo): technical typo in advertised example
2022-02-11 14:27:12 +01:00
James Rasell d7f90a4fbb
Merge pull request #12041 from hashicorp/b-gh-12040
changelog: add entry for #12040
2022-02-11 10:15:09 +01:00
James Rasell 222592a07e
client: track service deregister call so it's only called once.
In certain task lifecycles the taskrunner service deregister call
could be called three times for a task that is exiting. Whilst
each hook caller of deregister has its own purpose, we should try
and ensure it is only called once during the shutdown lifecycle of
a task.

This change therefore tracks when deregister has been called, so
that subsequent calls are noop. In the event the task is
restarting, the deregister value is reset to ensure proper
operation.
2022-02-11 09:29:38 +01:00
Derek Strickland e1f9c442e1
reconciler: refactor computeGroup (#12033)
The allocReconciler's computeGroup function contained a significant amount of inline logic that was difficult to understand the intent of. This commit extracts inline logic into the following intention revealing subroutines. It also includes updates to the function internals also aimed at improving maintainability and renames some existing functions for the same purpose. New or renamed functions include.

Renamed functions

- handleGroupCanaries -> cancelUnneededCanaries
- handleDelayedLost -> createLostLaterEvals
- handeDelayedReschedules -> createRescheduleLaterEvals

New functions

- filterAndStopAll
- initializeDeploymentState
- requiresCanaries
- computeCanaries
- computeUnderProvisionedBy
- computeReplacements
- computeDestructiveUpdates
- computeMigrations
- createDeployment
- isDeploymentComplete
2022-02-10 16:24:51 -05:00
Luiz Aoqui d976e4a19b
docs: add upgrade note and ACL requirements for the job submit endpoint (#12046) 2022-02-10 15:35:16 -05:00
Luiz Aoqui 1d5b96bdf7
update download to Nomad v1.2.6 (#12042) 2022-02-10 15:33:28 -05:00
Luiz Aoqui 77c718d227
Merge pull request #12045 from hashicorp/merge-release-1.2.6-branch
Merge release 1.2.6 branch
2022-02-10 15:12:40 -05:00
Luiz Aoqui e83ef0a008
prepare for next release 2022-02-10 14:56:11 -05:00
Luiz Aoqui 3bf6036487 Version 1.2.6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJiBIXqAAoJELC0QQl2hbZ2M8cP/A7LENJbFSph25M1aGItra5j
 BphSX//Sq/v9ZzO44rOGNYQGfTpFT8STJgj2GC50qR/ilF4KX4D0oZlDyu/6D0NG
 ouN9RUjnFd6IEDQrjqqqhr3F69Z95SWVfi1rfgn/pIgOYkVEXfi6DXaulVVyd2ZT
 J0G5w5ryl5d8PhuL7TWw4zbhZRQn0hVspZv/1s3/I9aG6Sew8SMweeOxbN9lBr7E
 H19Amdjh6ugRuPgU7YMpKDVrZQRv9Wt7BUP/uc0u3LiW9z3Ko8ZKnCRKErtL5Kc3
 HDZsWe+t3va4Uekzd0HULNcYU4kwjogdRYRzX5kRsOyXelrZkQIqYFiKrk1wVbq/
 cYM5DUak6eUQBGhgi3UY0fklBFq4GDGpiwEzn7rvQb0PRSuVyykgbZ12fzyIu8dp
 tWbR/WOEg9F+jva6HkR2kDIcr5mDmny3Pxi5aUT6lMk1111nCzOjDzhLkQVtfsex
 FDMByXxM4oWAK3ouq2OIdxDL2c742A2933C4/30KWE7Xy7twsvkGw52irw66VO3V
 4PHP880cDvEDaEh15mY/8FlaAE7t/gsCUuYLxGwl33TaXSRBLc9vVNrrp89q53TD
 ZcvXTBpHUOWa6ZlHF/4f8LW44rowM6bU0Wili7NaWOKx86dnUJMG4sqJifNgcpS/
 7lXogv98CYLbMy4X4if0
 =NY1Z
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElFaq1Z5DKdB91i+lKfRZwNnLtXMFAmIFbbkACgkQKfRZwNnL
 tXOr/g/+N2ZBMK8ohEvtdXLl7WXrVhgJfUSVbdD5Kfshul9CPn3yWRxJzqtEN2Pf
 55ozeWLpoziP9y9LviJ7rDidXcTmDFutbFdGJ3L+ZLdLILsNOq1A+lbuwO3fJngZ
 5aiPoJLsw4sqj6uHaM6Cls2f145O92nT7GXEHCxuvGHeSf3NkcR+zRY5nPrLTIrA
 uxYefCOzP6C2I+W7dL4Oj5R5EZd4UDi1WiL8pGzwm24LcagZN2ctctolAeF9OlJX
 M58UUv9b4GObe617u8MeH0LIlyZiNwn9JqrV33dKVTyrkBIYfYxkzdzMKf1csVYk
 kQb13KPdPTASBAGTl+sxeXXnw/bg09JXGcvREX5lLyQqY8xGwTv2FpTmybKWLiss
 Bg6BbejrgtCPBik0EAHWV0+kVzhi9bPfUYwTXLDCzMtrbyCyPoWchruel2sm41U1
 ezRDzlSvf6nrXf7sAv6umJICck4Bc5Gol+8W7fxvWqnY9rQ3ds2v7E5lXZMBbOmE
 JSi+EDWBJjBAXehE6pLxeVsvlHMRWN007Z2UeD4neGIgG7xFJLq6nKeUKoiNIpgk
 hKBL8iwHyuJfrBB/dcPzI9NV+jL6OZ/oI1RWxSj0MX/B4VXZp8HrqZA5JxzQolUg
 KIxqe4iX3WIkQv+UU4WiELvs4O7fujB4KWz3iQokhwDxqGUpffk=
 =5EG2
 -----END PGP SIGNATURE-----

Merge tag 'v1.2.6' into merge-release-1.2.6-branch

Version 1.2.6
2022-02-10 14:55:34 -05:00
Jai Bhagat 6a5fa75078 temp: bug in region selector causing failing test 2022-02-10 10:55:32 -05:00
Jai Bhagat fd5d766c41 temp: csi refactor 2022-02-10 09:14:32 -05:00
Jai Bhagat 4f802449dc temp: namespace model refact 2022-02-10 09:11:42 -05:00
Marc-Aurèle Brothier fb80dc57a1
small typo in advertised example 2022-02-10 13:53:05 +01:00
James Rasell d96f9684be
changelog: add entry for #12040 2022-02-10 08:36:32 +01:00
Nomad Release Bot 45b04fb53e
Release v1.2.6 2022-02-10 03:26:34 +00:00
Nomad Release bot 3b0e6ae029 Generate files for 1.2.6 release 2022-02-10 02:47:03 +00:00
Luiz Aoqui 9476c75c57
docs: add 1.2.6 to changelog 2022-02-09 19:59:37 -05:00
Tim Gross 74486d86fb
scheduler: prevent panic in spread iterator during alloc stop
The spread iterator can panic when processing an evaluation, resulting
in an unrecoverable state in the cluster. Whenever a panicked server
restarts and quorum is restored, the next server to dequeue the
evaluation will panic.

To trigger this state:
* The job must have `max_parallel = 0` and a `canary >= 1`.
* The job must not have a `spread` block.
* The job must have a previous version.
* The previous version must have a `spread` block and at least one
  failed allocation.

In this scenario, the desired changes include `(place 1+) (stop
1+), (ignore n) (canary 1)`. Before the scheduler can place the canary
allocation, it tries to find out which allocations can be
stopped. This passes back through the stack so that we can determine
previous-node penalties, etc. We call `SetJob` on the stack with the
previous version of the job, which will include assessing the `spread`
block (even though the results are unused). The task group spread info
state from that pass through the spread iterator is not reset when we
call `SetJob` again. When the new job version iterates over the
`groupPropertySets`, it will get an empty `spreadAttributeMap`,
resulting in an unexpected nil pointer dereference.

This changeset resets the spread iterator internal state when setting
the job, logging with a bypass around the bug in case we hit similar
cases, and a test that panics the scheduler without the patch.
2022-02-09 19:53:06 -05:00
Luiz Aoqui 15f9d54dea
api: prevent excessice CPU load on job parse
Add new namespace ACL requirement for the /v1/jobs/parse endpoint and
return early if HCLv2 parsing fails.

The endpoint now requires the new `parse-job` ACL capability or
`submit-job`.
2022-02-09 19:51:47 -05:00