Commit graph

21961 commits

Author SHA1 Message Date
Mahmood Ali daf20f9788
vault: set JobID in Vault metadata (#11397)
Closes: #11395 .
2021-10-27 07:20:29 -07:00
Mahmood Ali e06ff1d613
scheduler: stop allocs in unrelated nodes (#11391)
The system scheduler should leave allocs on draining nodes as-is, but
stop node stop allocs on nodes that are no longer part of the job
datacenters.

Previously, the scheduler did not make the distinction and left system
job allocs intact if they are already running.

I've added a failing test first, which you can see in https://app.circleci.com/jobs/github/hashicorp/nomad/179661 .

Fixes https://github.com/hashicorp/nomad/issues/11373
2021-10-27 07:04:13 -07:00
Mahmood Ali f03d65062d
Fix arm64 panics by updating google/snappy library to latest, 0.0.4 (#11396)
Pick up https://github.com/golang/snappy/pull/56 to handle arm64 architectures to fix panics. tldr; Golang 1.16 changed `memmove` implementation for arm64 requiring additional cpu registers that snappy wasn't preserving in its assembly implementation.

Other projects have experienced this issue as well, searching for `encode_arm64.s:666` on your favorite search engine will reveal some.  Vault updated the dependency earlier this August: https://github.com/hashicorp/vault/pull/12371 .

I believe this issue affects Nomad 1.2.x and 1.1.x. Nomad 1.0.x use Golang 1.15 and isn't affected. However, backporting the change to 1.0.x should be harmless.

Fixed https://github.com/hashicorp/nomad/issues/11385 .
2021-10-27 06:39:16 -07:00
James Rasell e4f703b401
vagrantfile: expose Nomad and Consul APIs to local machine. 2021-10-27 12:15:37 +02:00
Luiz Aoqui b463715a98
prevent active log from being overwritten when agent starts (#11386) 2021-10-26 20:57:07 -04:00
Luiz Aoqui ecc7a288ec
docs: add note and example of storing nomad job plan index to disk (#11377) 2021-10-26 20:25:22 -04:00
Charlie Voiselle 7d02c8b605
DOCS: Update Consul Connect to Consul service mesh (#11362)
* Update Consul Connect to Consul service mesh
* Apply suggestions from code review
2021-10-26 15:10:21 -04:00
Noel Quiles f16ef7f6fb
website: Add Fathom analytics (#11276)
* Impl Fathom analytics

* Actually install fathom-client

* Use analytics package instead of direct impl

* Remove explicit fathom-client dep

* Upgrade platform analytics package
2021-10-25 15:23:38 -04:00
Luiz Aoqui 645a87f6b3
ui: update task group alloc summary chart to use new SummaryLegendItem component (#11375) 2021-10-25 11:14:01 -04:00
Luiz Aoqui 979faf41e5
fix test names (#11374) 2021-10-22 15:43:55 -04:00
Luiz Aoqui 3c22fc79a5
add dispatch idempotency token support in the CLI (#10930) 2021-10-22 12:39:05 -04:00
Luiz Aoqui 2c7bfb7000
ui: persist node drain settings (#11368) 2021-10-22 10:51:31 -04:00
Luiz Aoqui dc5222f6e5
ui: display Nomad version in the Clients and Servers table (#11366) 2021-10-22 10:33:06 -04:00
Luiz Aoqui a7eb72f7d1
ui: use get to access job meta value (#11370) 2021-10-22 10:05:48 -04:00
Luiz Aoqui b73ecf684b
ui: update favicon (#11371) 2021-10-22 09:40:38 -04:00
Luiz Aoqui 6853bf9632
cli: allow setting namespace and region in the nomad ui command (#11364) 2021-10-21 16:24:39 -04:00
Luiz Aoqui fce1a03897
ui: create tooltip component (#11363) 2021-10-21 13:12:33 -04:00
Luiz Aoqui 362c8c54f4
ui: set * as the default namespace selector (#11357) 2021-10-21 10:24:07 -04:00
Luiz Aoqui dceeccfc5d
ui: add client name tooltip when displaying client ID in tables (#11358) 2021-10-21 10:23:06 -04:00
James Rasell 6011411111
Merge pull request #11339 from hashicorp/b-website-fixup-interpolation-formatting
website: fixup link formatting within interpolation doc.
2021-10-21 09:15:36 +02:00
Mahmood Ali e992ebf58d
document GH-11346 fix (#11350) 2021-10-20 22:03:19 -04:00
Brandon Romano 8c863288ed
Merge pull request #11356 from hashicorp/update-alert-banner
Update HashiConf alert-banner expiration
2021-10-20 16:28:30 -07:00
Brandon Romano 5c4f4be3ca
Update HashiConf alert-banner expiration
Updates the HashiConf Alert Banner expiration to 10/20 @ 11pm (PT)
2021-10-20 16:02:45 -07:00
Michael Schurter 37a8f27a35
Merge pull request #11331 from shishir-a412ed/init
Add support for --init to docker driver.
2021-10-20 10:49:51 -07:00
Michael Schurter f95f966e8b
Merge pull request #11347 from shishir-a412ed/cleanup
Code cleanup: Remove extra if clause.
2021-10-20 09:37:10 -07:00
Mahmood Ali 1de395b42c
Fix preemption panic (#11346)
Fix a bug where the scheduler may panic when preemption is enabled. The conditions are a bit complicated:
A job with higher priority that schedule multiple allocations that preempt other multiple allocations on the same node, due to port/network/device assignments.

The cause of the bug is incidental mutation of internal cached data. `RankedNode` computes and cache proposed allocations  in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L42-L53 . But scheduler then mutates the list to remove pre-emptable allocs in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L293-L294, and  `RemoveAllocs` mutates and sets the tail of cached slice with `nil`s triggering a nil-pointer derefencing case.

I fixed the issue by avoiding the mutation in `RemoveAllocs` - the micro-optimization there doesn't seem necessary.

Fixes https://github.com/hashicorp/nomad/issues/11342
2021-10-19 20:22:03 -04:00
Shishir Mahajan dd93f72920 Code cleanup: Remove extra if clause.
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2021-10-19 16:52:11 -07:00
Michael Schurter 081cfb85d7 docs: add #11331 to changelog 2021-10-19 16:30:06 -07:00
Michael Schurter fd68bbc342 test: update tests to properly use AllocDir
Also use t.TempDir when possible.
2021-10-19 10:49:07 -07:00
Brandon Romano 4d3bdc0dbf
Merge pull request #11341 from hashicorp/nq.update-alert-banner-hcg2021-live
website: Update alert banner for HashiConf
2021-10-19 07:01:04 -07:00
Michael Schurter d25b60a82d docs: add #11334 to changelog 2021-10-18 09:22:01 -07:00
Michael Schurter 10c3bad652 client: never embed alloc_dir in chroot
Fixes #2522

Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.

A bad agent configuration will look something like this:

```hcl
data_dir = "/tmp/nomad-badagent"

client {
  enabled = true

  chroot_env {
    # Note that the source matches the data_dir
    "/tmp/nomad-badagent" = "/ohno"
    # ...
  }
}
```

Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.

Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.

When running tests as root in a vm without the fix, the following error
occurs:

```
=== RUN   TestAllocDir_SkipAllocDir
    alloc_dir_test.go:520:
                Error Trace:    alloc_dir_test.go:520
                Error:          Received unexpected error:
                                Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
                Test:           TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```

Also removed unused Copy methods on AllocDir and TaskDir structs.

Thanks to @eveld for not letting me forget about this!
2021-10-18 09:22:01 -07:00
Noel Quiles ef533b6e3b Update alert banner for HashiConf
Final cleanup/closer exp date
2021-10-18 11:52:29 -04:00
James Rasell 2f5f6e0fdd
website: fixup link formatting within interpolation doc. 2021-10-18 12:21:05 +02:00
Andy Assareh 8c638217ac
exactly one of ingress, terminating, or mesh must be configured
i believe mesh should be included in this statement was omitted.
2021-10-15 14:15:02 -07:00
Shishir Mahajan d4daef7ebf Add support for --init to docker driver.
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2021-10-15 12:53:25 -07:00
Mahmood Ali 73351c35dd
ease building Linux binaries on macOS (#11329)
Meant for development purposes only, so one can compile binary on a
macos host then start a Docker container or scp the binary to a linux
host easily.

The resulting binary is statically linked and has very subtle
differences. e.g. static binaries use go native network stack that
honor /etc/hosts and /etc/resolve differently from the glibc
implementation. In development environment, I don't expect these to
materially change our experience.
2021-10-15 11:12:59 -04:00
Florian Apolloner c762f64505
Follow up fixes for #11237 (#11260) 2021-10-14 17:23:38 -04:00
Luiz Aoqui 130970e12e
Merge missing commits from 1.2.0-beta1 release branch (#11319) 2021-10-14 16:10:05 -04:00
Luiz Aoqui 234aac14a8
Merge release branch (#11317) 2021-10-14 13:06:04 -04:00
Luiz Aoqui 9d48daed8c
fix nomad job allocs command name (#11314) 2021-10-14 12:44:59 -04:00
Luiz Aoqui f1fb0987ab
docs: update Nvidia device plugin as external (#11313) 2021-10-14 12:22:31 -04:00
Dave May 190716b4c6
Remove vendor folder during make clean (#11315)
* Remove vendor folder during make clean
* Add vendor warning to make dev build command
2021-10-14 11:32:19 -04:00
Luiz Aoqui 1bd9db3df0
changlog: add entry for #10796 (#11312) 2021-10-14 09:01:43 -04:00
James Rasell 444d25db07
Merge pull request #11280 from benbuzbee/log-err
Log error if there are no event handlers registered
2021-10-14 14:49:22 +02:00
Mahmood Ali d5e136b82b
executor: set CpuWeight in cgroup-v2 (#11287)
Cgroup-v2 uses `cpu.weight` property instead of cpu shares:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files
. And it uses a different range (i.e. `[1, 10000]`) from cpu.shares
(i.e. `[2, 262144]`) to make things more interesting.

Luckily, the libcontainer provides a helper function to perform the
conversion
[`ConvertCPUSharesToCgroupV2Value`](https://pkg.go.dev/github.com/opencontainers/runc@v1.0.2/libcontainer/cgroups#ConvertCPUSharesToCgroupV2Value).

I have confirmed that docker/libcontainer performs the conversion as
well in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/specconv/spec_linux.go#L536-L541
, and that CpuShares is ignored by libcontainer in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/cgroups/fs2/cpu.go#L24-L29
.
2021-10-14 08:46:07 -04:00
Luiz Aoqui 536a5751ff
changelog: add entries for #9160 and #11078 (#11290) 2021-10-14 08:43:36 -04:00
Charlie Voiselle cb8e52b5df
Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799) 2021-10-13 21:23:13 -04:00
Michael Schurter 59fda1894e
Merge pull request #11167 from a-zagaevskiy/master
Support configurable dynamic port range
2021-10-13 16:47:38 -07:00
Michael Schurter e14cd34392 client: improve errors & tests for dynamic ports 2021-10-13 16:25:25 -07:00