Commit graph

21749 commits

Author SHA1 Message Date
Nick Ethier 2978c430e5 command: show number of reserved cores on alloc status output 2021-05-05 08:11:41 -04:00
Mahmood Ali 067fd86a8c
drivers: Capture exit code when task is killed (#10494)
This commit ensures Nomad captures the task code more reliably even when the task is killed. This issue affect to `raw_exec` driver, as noted in https://github.com/hashicorp/nomad/issues/10430 .

We fix this issue by ensuring that the TaskRunner only calls `driver.WaitTask` once. The TaskRunner monitors the completion of the task by calling `driver.WaitTask` which should return the task exit code on completion. However, it also could return a "context canceled" error if the agent/executor is shutdown.

Previously, when a task is to be stopped, the killTask path makes two WaitTask calls, and the second returns "context canceled" occasionally because of a "race" in task shutting down and depending on driver, and how fast it shuts down after task completes.

By having a single WaitTask call and consistently waiting for the task, we ensure we capture the exit code reliably before the executor is shutdown or the contexts expired.

I opted to change the TaskRunner implementation to avoid changing the driver interface or requiring 3rd party drivers to update.

Additionally, the PR ensures that attempts to kill the task terminate when the task "naturally" dies. Without this change, if the task dies at the right moment, the `killTask` call may retry to kill an already-dead task for up to 5 minutes before giving up.
2021-05-04 10:54:00 -04:00
Drew Bailey a86477021f
remove license put command references (#10501) 2021-05-04 08:39:56 -04:00
Kendall Strautman fe85162128
[Assembly]: Website Branding Refresh (#10188)
* style: update gray brand colors

* style: update brand colors

* chore: upgrade react-components deps

* chore: update text split cta link color

* style(home): update icons

* refactor(home): use learn-callout component

* style(downloads): temporary color override

* style(community): fix link color

* Update website/pages/community/style.css

Co-authored-by: Zachary Shilton <4624598+zchsh@users.noreply.github.com>

* update package-lock

* update deps

* add new downloads page

* remove extra husky script

* chore: upgrades nextjs-scripts dep

* chore: upgrades community page vertical text block list

* chore: test component pre-releases

* chore: upgrade deps

chore: upgrades nextjs-scripts

* chore: update home icon colors

* chore: update home logo grid

* chore(website): upgrade deps

* style: adjust features icons border radius

* style: home hero bg to secondary

* chore: upgrade deps for body copy colors

* chore: upgrades alert banner

* feat: updates favicon

* chore: updates deps

* content(home): updates assets

* content(use-cases:simple container orch): updates content

* content(use-cases:non-containerized-app) updates assets

* content(use-cases:auto networking with consul): updates assets

* style(home): remove use cases icons override

* style(home-hero): remove bg pattern on mobile

* content(use-cases): updates asset

* chore: update assets

* chore: updates product download page to alpha

* chore: updates product download page to stable

Co-authored-by: Zachary Shilton <4624598+zchsh@users.noreply.github.com>
Co-authored-by: Jeff Escalante <jescalan@users.noreply.github.com>
2021-05-03 11:06:55 -07:00
Brandon Romano 4f646bebc1
Merge pull request #10500 from hashicorp/br.11-banner-update
Updates website banner for Nomad 1.1
2021-05-03 10:17:14 -07:00
Brandon Romano c9862eebed Updates banner for Nomad 1.1 2021-05-03 10:11:11 -07:00
Buck Doyle 4e4a83039f
ui: Fix bug where switching topo viz allocation highlights doesn’t update charts (#10490)
This closes #10489. It adds `dependentKeyCompat` to the allocation getter so it works
as expected as a dependent key for the `tracker` computed property, as described here:
https://guides.emberjs.com/release/upgrading/current-edition/tracked-properties/#toc_backwards-compatibility
2021-05-03 10:36:18 -05:00
Seth Hoenig 0fe0b6832f
Merge pull request #10498 from hashicorp/b-hclfmt-ceph
demo: apply hclfmt to ceph files
2021-05-03 09:35:21 -06:00
Tim Gross cf838f49e1 docker: improve error message for auth helper
The error returned from the stdlib's `exec` package is always a message with
the exit code of the exec'd process, not any error message that process might
have given us. This results in opaque failures for the Nomad user. Cast to an
`ExitError` so that we can access the output from stderr.
2021-05-03 11:30:12 -04:00
Seth Hoenig 7b3136c4b2 demo: apply hclfmt to ceph files 2021-05-03 09:27:26 -06:00
Seth Hoenig 9f7c410087
Merge pull request #10492 from hashicorp/b-expose-diff
connect: use deterministic injected dynamic exposed port label
2021-05-03 09:00:34 -06:00
Tim Gross cb9ac29d8a demo: CSI Ceph
This changeset expands on the existing demonstration we had for Ceph by
showing volume creation. It includes a demo setup for Ceph on Vagrant so that
you don't need a whole Ceph cluster to try it out.
2021-05-03 10:49:47 -04:00
Charlie Voiselle 19b35833de Adding environment variables to Command overview page 2021-05-03 08:12:45 -04:00
Andy Assareh 1616f80211 git example - suggest providing real repo
when troubleshooting it is better if this command will actually work (pointing to a real repository)
2021-05-03 08:12:10 -04:00
Michael Schurter fdd7fc4817 docs: add 1.1.0-beta1 download link 2021-05-03 08:12:00 -04:00
Mahmood Ali 4b95f6ef42
api: actually set MemoryOversubscriptionEnabled (#10493) 2021-05-02 22:53:53 -04:00
Seth Hoenig b024d85f48 connect: use deterministic injected dynamic exposed port
This PR uses the checksum of the check for which a dynamic exposed
port is being generated (instead of a UUID prefix) so that the
generated port label is deterministic.

This fixes 2 bugs:
 - 'job plan' output is now idempotent for jobs making use of injected ports
 - tasks will no longer be destructively updated when jobs making use of
   injected ports are re-run without changing any user specified part of
   job config.

Closes: https://github.com/hashicorp/nomad/issues/10099
2021-04-30 15:18:22 -06:00
Michael Schurter edcd48bdfa
Merge pull request #10389 from hashicorp/docs-9895
docs: add #9895 to the changelog
2021-04-30 12:51:22 -07:00
Michael Schurter 68ca087bde docs: add #9895 to the changelog 2021-04-30 12:47:40 -07:00
Tim Gross bf2ab548b8 changelog: ensure all backports shown 2021-04-30 14:53:37 -04:00
Mahmood Ali ba49661198
Docs memory oversubscription (#10478)
* update docs

* document memory_oversubscription_enabled scheduler config
2021-04-30 14:07:56 -04:00
Mahmood Ali e17082b9cf
update golang to 1.16.3 (#10484) 2021-04-30 13:52:05 -04:00
Mahmood Ali 2e01d623b7 batch update changelog 2021-04-30 13:18:00 -04:00
Michael Schurter 547a718ef6
Merge pull request #10248 from hashicorp/f-remotetask-2021
core: propagate remote task handles
2021-04-30 08:57:26 -07:00
Michael Schurter 982c65c0c7 comment out unused consts to make linter happy 2021-04-30 08:31:31 -07:00
Michael Schurter b9f3d8e3c7 docs: make bootstrap installs buf now
No need to specify a version in the contributing docs. Let `make
bootstrap` handle that.
2021-04-30 08:31:31 -07:00
Michael Schurter 641eb1dc1a clarify docs from pr comments 2021-04-30 08:31:31 -07:00
Tim Gross 81afcdc435 docs: remove API doc for license PUT 2021-04-30 10:39:21 -04:00
Buck Doyle ef21c5f75b
Add guard against missing namespace in Mirage (#10474)
Similarly to 735f056, this won’t happen with real data,
but can happen in the current Mirage factory setup.
2021-04-30 09:18:23 -05:00
Mahmood Ali 98a9a9052f
Port OSS changes for Enterprise Quota accounting (#10481) 2021-04-30 09:48:03 -04:00
Mahmood Ali 52d881f567
Allow configuring memory oversubscription (#10466)
Cluster operators want to have better control over memory
oversubscription and may want to enable/disable it based on their
experience.

This PR adds a scheduler configuration field to control memory
oversubscription. It's additional field that can be set in the [API via Scheduler Config](https://www.nomadproject.io/api-docs/operator/scheduler), or [the agent server config](https://www.nomadproject.io/docs/configuration/server#configuring-scheduler-config).

I opted to have the memory oversubscription be an opt-in, but happy to change it.  To enable it, operators should call the API with:
```json
{
  "MemoryOversubscriptionEnabled": true
}
```

If memory oversubscription is disabled, submitting jobs specifying `memory_max` will get a "Memory oversubscription is not
enabled" warnings, but the jobs will be accepted without them accessing
the additional memory.

The warning message is like:
```
$ nomad job run /tmp/j
Job Warnings:
1 warning(s):

* Memory oversubscription is not enabled; Task cache.redis memory_max value will be ignored

==> Monitoring evaluation "7c444157"
    Evaluation triggered by job "example"
==> Monitoring evaluation "7c444157"
    Evaluation within deployment: "9d826f13"
    Allocation "aa5c3cad" created: node "9272088e", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "7c444157" finished with status "complete"

# then you can examine the Alloc AllocatedResources to validate whether the task is allowed to exceed memory:
$ nomad alloc status -json aa5c3cad | jq '.AllocatedResources.Tasks["redis"].Memory'
{
  "MemoryMB": 256,
  "MemoryMaxMB": 0
}
```
2021-04-29 22:09:56 -04:00
Luiz Aoqui 154b2105ac
docs: add FAQ for Docker Desktop for Windows and MacOS (#10390)
* docs: add FAQ for Docker Desktop for Windows and MacOS

* docs: add win

* docs: add docker desktop note to docker driver page
2021-04-29 19:53:12 -04:00
Michael Lange e8593ec1bb
ui: Update namespaces design (#10444)
This rethinks namespaces as a filter on list pages rather than a global setting.

The biggest net-new feature here is being able to select All (*) to list all jobs
or CSI volumes across namespaces.
2021-04-29 15:00:59 -05:00
Luiz Aoqui 2949a40ddf
changelog: add entry for blocked eval metrics (#10475) 2021-04-29 15:32:30 -04:00
Luiz Aoqui f1b9055d21
Add metrics for blocked eval resources (#10454)
* add metrics for blocked eval resources

* docs: add new blocked_evals metrics

* fix to call `pruneStats` instead of `stats.prune` directly
2021-04-29 15:03:45 -04:00
Michael Schurter 76e56254e1 docs: mention remote task drivers 2021-04-29 09:22:33 -07:00
Buck Doyle 1b1805e8d9
changelog: Add missed UI entries for 1.1 (#10467) 2021-04-29 09:11:06 -05:00
Buck Doyle b9f462fdc1
ui: Add optional memory max to task details ribbon (#10459)
This is the first step in #10268. If a maximum is not specified, the
task group sum uses the memory number instead. The maximum is only
shown when it’s higher than the memory sum.
2021-04-28 15:38:14 -05:00
Tim Gross 9e1d4981f0
docs: Enterprise licensing updates 2021-04-28 14:46:06 -04:00
Buck Doyle 6d037633da
ui: Change global search to use fuzzy search API (#10412)
This updates the UI to use the new fuzzy search API. It’s a drop-in
replacement so the / shortcut to jump to search is preserved, and
results can be cycled through and chosen via arrow keys and the
enter key.

It doesn’t use everything returned by the API:
* deployments and evaluations: these match by id, doesn’t seem like
  people would know those or benefit from quick navigation to them
* namespaces: doesn’t seem useful as they currently function
* scaling policies
* tasks: the response doesn’t include an allocation id, which means they
  can’t be navigated to in the UI without an additional query
* CSI volumes: aren’t actually returned by the API

Since there’s no API to check the server configuration and know whether
the feature has been disabled, this adds another query in
route:application#beforeModel that acts as feature detection: if the
attempt to query fails (500), the global search field is hidden.

Upon having added another query on load, I realised that beforeModel was
being triggered any time service:router#transitionTo was being called,
which happens upon navigating to a search result, for instance, because
of refreshModel being present on the region query parameter. This PR
adds a check for transition.queryParamsOnly and skips rerunning the
onload queries (token permissions check, license check, fuzzy search
feature detection).

Implementation notes:

* there are changes to unrelated tests to ignore the on-load feature
  detection query
* some lifecycle-related guards against undefined were required to
  address failures when navigating to an allocation
* the minimum search length of 2 characters is hard-coded as there’s
  currently no way to determine min_term_length in the UI
2021-04-28 13:31:05 -05:00
Tim Gross 7fdfbfc0f0 license: remove "Terminates At" from license get command
The `Terminates At` field can't be removed from the struct for backwards
compatibility reasons, but there's no purpose to it anymore so we shouldn't be
showing it to end users of the command.
2021-04-28 12:00:30 -04:00
Tim Gross 4f9c5c4bac license: update 'license get' command 2021-04-28 12:00:30 -04:00
Seth Hoenig d54a606819
Merge pull request #10439 from hashicorp/pick-ent-acls-changes
e2e: add e2e tests for consul namespaces on ent with acls
2021-04-28 08:30:08 -06:00
Tim Gross 79f81d617e licensing: remove raft storage and sync
This changeset is the OSS portion of the work to remove the raft storage and
sync for Nomad Enterprise.
2021-04-28 10:28:23 -04:00
catinthetap b84cd7d61d
docs: update filesystem.mdx to fix typo 2021-04-28 08:11:05 -04:00
Michael Schurter d8f50ca20d ignore local e2e files
- nomad-driver-ecs is an optional plugin to packer into ami
- ecs.vars is generated by tf
- *.auto.tfvars is just a style I use for local var overrides
2021-04-27 15:07:03 -07:00
Michael Schurter 0eb5d5136f e2e: use public_ip in packer 2021-04-27 15:07:03 -07:00
Michael Schurter a6636723ee vendor: update aws-sdk-go and deps 2021-04-27 15:07:03 -07:00
Michael Schurter e62795798d core: propagate remote task handles
Add a new driver capability: RemoteTasks.

When a task is run by a driver with RemoteTasks set, its TaskHandle will
be propagated to the server in its allocation's TaskState. If the task
is replaced due to a down node or draining, its TaskHandle will be
propagated to its replacement allocation.

This allows tasks to be scheduled in remote systems whose lifecycles are
disconnected from the Nomad node's lifecycle.

See https://github.com/hashicorp/nomad-driver-ecs for an example ECS
remote task driver.
2021-04-27 15:07:03 -07:00
Seth Hoenig 09cd01a5f3 e2e: add e2e tests for consul namespaces on ent with acls
This PR adds e2e tests for Consul Namespaces for Nomad Enterprise
with Consul ACLs enabled.

Needed to add support for Consul ACL tokens with `namespace` and
`namespace_prefix` blocks, which Nomad parses and validates before
tossing the token. These bits will need to be picked back to OSS.
2021-04-27 14:45:54 -06:00