Commit graph

20269 commits

Author SHA1 Message Date
Michael Schurter 5ec065b180 client: always wait 200ms before sending updates
Always wait 200ms before calling the Node.UpdateAlloc RPC to send
allocation updates to servers.

Prior to this change we only reset the update ticker when an error was
encountered. This meant the 200ms ticker was running while the RPC was
being performed. If the RPC was slow due to network latency or server
load and took >=200ms, the ticker would tick during the RPC.

Then on the next loop only the select would randomly choose between the
two viable cases: receive an update or fire the RPC again.

If the RPC case won it would immediately loop again due to there being
no updates to send.

When the update chan receive is selected a single update is added to the
slice. The odds are then 50/50 that the subsequent loop will send the
single update instead of receiving any more updates.

This could cause a couple of problems:

1. Since only a small number of updates are sent, the chan buffer may
   fill, applying backpressure, and slowing down other client
   operations.
2. The small number of updates sent may already be stale and not
   represent the current state of the allocation locally.

A risk here is that it's hard to reason about how this will interact
with the 50ms batches on servers when the servers under load.

A further improvement would be to completely remove the alloc update
chan and instead use a mutex to build a map of alloc updates. I wanted
to test the lowest risk possible change on loaded servers first before
making more drastic changes.
2020-11-25 11:36:51 -08:00
Mahmood Ali 8ca33b24f0
Merge pull request #9414 from hashicorp/b-tweak-buf-linter
Parameterize buf compatibility check
2020-11-25 12:19:10 -05:00
Tim Gross b2cd0da0a2
CSI: fix transaction handling in state store (#9438)
When making updates to CSI plugins, the state store methods that have open
write transactions were querying the state store using the same methods used
by the CSI RPC endpoint, but these method creates their own top-level read
transactions. During concurrent plugin updates (as happens when a plugin job
is stopped), this can cause write skew in the plugin counts.

* Refactor the CSIPlugin query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
* Refactor the CSIVolume query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
* CSI volumes need to be "denormalized" with their plugins and (optionally)
allocations. Read-only RPC endpoints should take a snapshot so that we can
make multiple state store method calls with a consistent view.
2020-11-25 11:15:57 -05:00
Tim Gross b9842c32c1 docs: enumerate required cgroups for exec driver 2020-11-25 09:41:37 -05:00
Mahmood Ali b2a8752c5f
honor task user when execing into raw_exec task (#9439)
Fix #9210 .

This update the executor so it honors the User when using nomad alloc exec. The bug was that the exec task didn't honor the init command when execing.
2020-11-25 09:34:10 -05:00
Tim Gross 481f91034c
E2E: CSI driver provisioning (#9443)
* e2e/csi: wait longer for plugins to become healthy

Plugins are Docker containers, and as such sometimes we get delays in startup
due to pulling from the registry and this is a source of test flakiness. Give
the plugins a little longer to start up.

* e2e/csi: version bump for AWS EBS plugins
2020-11-25 09:05:22 -05:00
Michael Schurter 9bd1f267d2 nomad: try to avoid slice resizing when batching 2020-11-24 09:14:00 -08:00
Seth Hoenig 74a34704c5
Merge pull request #8743 from hashicorp/f-task_network_warning
Validate and document 0.12 mbits/network deprecations
2020-11-23 15:36:18 -06:00
Drew Bailey c8b1a84d1e
Events/mv structs (#9430)
* move structs to structs/event.go to avoid import cycle
2020-11-23 14:01:10 -05:00
Seth Hoenig 3fe3259d32 docs: update changelog with group/task network labels fix 2020-11-23 12:56:54 -06:00
Seth Hoenig a35c0db6c7 nomad/structs: validate deprecated task.resource.network port labels
Enable users to submit jobs that still make use of the deprecated
task.resources.network stanza. Such jobs can be submitted, but
will emit a warning.
2020-11-23 12:40:40 -06:00
Tim Gross d686a51d60
e2e: prevent Ubuntu startup race conditions (#9428)
The cloud-init configuration runs on boot, which can result in a race
condition between that and service startup. This has caused provisioning
failures because Nomad expects the userdata to have configured a host volume
directory. Diagnosing this was also compounded by a warning being fired by
systemd for the Nomad unit file.

* Update the location of the `StartLimitIntervalSec` field to it's
  post-systemd-230 location.
* Ensure that the weekly AMI build is up-to-date to reduce the risk of
  unexpected system software changes.
* Move the host volume to a directory we can set up at AMI build time rather
  than in userdata.
2020-11-23 12:29:08 -05:00
Seth Hoenig 3c17dc2ecc api: safely access legacy MBits field 2020-11-23 10:36:10 -06:00
Nick Ethier 3483f540b6 api: don't break public API 2020-11-23 10:36:10 -06:00
Nick Ethier 5cac8f45b7 vendor: sync api 2020-11-23 10:33:28 -06:00
Nick Ethier f1ea79f5a8 remove references to default mbits 2020-11-23 10:32:13 -06:00
Nick Ethier e8784c919f e2e: update jobs to use new network stanza format 2020-11-23 10:25:30 -06:00
Nick Ethier c9bd7e89ca command: use correct port mapping syntax in examples 2020-11-23 10:25:30 -06:00
Nick Ethier d21cbeb30f command: remove task network usage from init examples 2020-11-23 10:25:11 -06:00
Nick Ethier 9471892df4 mock: add default host network 2020-11-23 10:11:00 -06:00
Nick Ethier 7266376ae6 nomad: update validate to check group networks for task port usage 2020-11-23 10:11:00 -06:00
Nick Ethier 8efa3c355a website: add mbits field back to network docs with notice 2020-11-23 10:11:00 -06:00
Nick Ethier c4ddb0a43a website: add mbits and network deprecation notice 2020-11-23 10:09:36 -06:00
Luiz Aoqui 26913da7c0
Merge pull request #9427 from hashicorp/docs-fix-cpu-allocated-unit
docs: fix nomad.client.allocs.cpu.allocated metric unit
2020-11-23 10:31:05 -05:00
Luiz Aoqui f50740a66b
docs: fix nomad.client.allocs.cpu.allocated metric unit 2020-11-23 10:25:15 -05:00
Tim Gross b844aeabae docs: template signal change_mode not compatible with env
Only `change_mode = "restart"` will result in the template environment
variables being updated in the task. Clarify the behavior of the unsupported
options.
2020-11-23 10:11:03 -05:00
Chris Baker 6d7648670b
Merge pull request #9419 from hashicorp/api-event-stream-index
events: API event stream client should the index flag
2020-11-23 07:58:16 -06:00
Tim Gross 38120123c5
docs: add missing command documentation (#9415)
* `nomad operator keyring` was missing the general options section
* `nomad operator metrics` was missing a page in the docs entirely

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2020-11-23 08:10:10 -05:00
Chris Baker a659091fd9 api: Event().Stream() should use the index parameter 2020-11-21 16:49:52 +00:00
Chris Baker 00841a8525 events: e2e test that API client honors the index flag 2020-11-21 16:38:24 +00:00
Tim Gross f1ad512986 docs: describe required ACLs for all commands 2020-11-20 13:38:29 -05:00
Tim Gross 6cc5c40cdb docs: clarify default signal for raw_exec on Windows 2020-11-20 13:25:48 -05:00
Michael Schurter d8e3adfad9
Merge pull request #9407 from hashicorp/docs-0129-backports
Add backports to changelog and 0.12.9 to website
2020-11-20 09:09:47 -08:00
Tim Gross 4331e86e8e docs: move telemetry under operations
Create a new "Operating Nomad" section of the docs where we can put reference
material for operators that doesn't quite fit in either configuration file /
command line documentation or a step-by-step Learn Guide. Pre-populate this
with the existing telemetry docs and some links out to the Learn Guide
sections.
2020-11-20 11:05:27 -05:00
Mahmood Ali 14fe89003b lint protobuf files 2020-11-20 11:00:24 -05:00
Mahmood Ali ecbae03cf3 Parameterize buf compatibility target
Parameterize it so we can arbitrary target other versions, if we
are doing some manual checking, specially in the beginning when we may
want to validate compatibilities for skip release upgrades.

Also, introduce `checkbuf` target so we can run buf linter without the
rest.

use beta
2020-11-20 11:00:11 -05:00
Michael Schurter 876144302a docs: avoid the regression in 0.12.[78]
The suggest version, 0.12.7, includes regressions that are best avoided
so steer users to 0.12.9.
2020-11-19 14:32:59 -08:00
Michael Schurter 1fedddd814 docs: update website to 0.12.9 2020-11-19 14:26:06 -08:00
Michael Schurter 85b71e76f7 docs: add 0.12.9, 0.11.8, and 0.10.9 to changelog
upstream fix: #9383

backport PRs: #9391, #9402, and #9405
2020-11-19 14:23:42 -08:00
Tim Gross de6b023af2 command: remove -namespace from help options when not applicable 2020-11-19 16:28:39 -05:00
Michael Schurter 0cd73b44e9 docs: add task_token_ttl default
See
https://github.com/hashicorp/nomad/blob/v0.12.7/nomad/vault.go#L36-L39
2020-11-19 16:14:25 -05:00
Tim Gross 716451b793 docs: template behavior warnings
* vault secrets named with `-` characters cannot be read by `consul-template`
  due to limitations in golang's template rendering engine.
* environment variables are not modified in running tasks if
`change_mode.noop` is set.
2020-11-19 16:06:48 -05:00
Tim Gross c1a3496a55 docs: remove -namespace option from commands when not applicable 2020-11-19 16:06:28 -05:00
Tim Gross d67afa2e21 docs/help: -no-color does not apply to alloc logs content
The `nomad alloc logs` command does not remove terminal escape sequences for
color from the log outputs of a task. Clarify that the standard `-no-color`
flag, which does apply to Nomad's error responses from `nomad alloc logs`,
does not apply to the log output.
2020-11-19 15:29:12 -05:00
Tim Gross 9788a514a0 docs: fix some markdown escaping errors 2020-11-19 14:11:53 -05:00
Tim Gross 47ce5ff471 docs: expand artifact getter options
Adds an example of using HTTP Basic Auth, git options, and using HCL2 syntax
to encode an SSH key from file.
2020-11-19 12:07:02 -05:00
Tim Gross 2139e029ec docs: make dispatch payload size limit unambiguous 2020-11-19 12:06:49 -05:00
Michael Schurter bbfaa5e9ad
Merge pull request #9383 from hashicorp/b-template-escape
client: fix interpolation in template source
2020-11-18 12:21:29 -08:00
Michael Schurter 43b225b19d e2e: test template path interpolation 2020-11-18 10:48:58 -08:00
Michael Schurter 15f2b8fe7c client: skip broken test and fix assertion 2020-11-18 10:01:02 -08:00