Commit graph

15847 commits

Author SHA1 Message Date
Danielle b14436cd14
Merge pull request #6233 from hashicorp/chore/more-circle
ci: Migrate remaining jobs to CircleCI
2019-08-31 13:28:51 +02:00
Preetha eba025f35d
Merge pull request #6237 from hashicorp/f-rkt-deprecated
rkt deprecation notice
2019-08-30 16:45:40 -05:00
Preetha Appan 4ebe5e3daf
fix casing 2019-08-30 15:31:28 -05:00
Lang Martin ed51d37095 CHANGELOG go-getter upgrade 2019-08-30 16:22:34 -04:00
Michael Schurter 4bd53deba9
Merge pull request #6236 from hashicorp/b-ignore-connect-services
consul: ignore connect services when syncing
2019-08-30 13:11:09 -07:00
Preetha Appan 284ec935ea
Wording changes 2019-08-30 14:36:08 -05:00
Michael Schurter 67b7bc1e90 consul: ignore connect services when syncing
Consul registers Connect services automatically, however Nomad thinks it
owns them due to the _nomad prefix. Since the services are managed by
Consul, Nomad needs to explicitly ignore them or otherwies they will be
removed.
2019-08-30 11:53:41 -07:00
Tim Gross 3ac3ceb2cc test: add NOMAD_TEST_LOG_LEVEL env var to tune log levels 2019-08-30 13:25:36 -04:00
Tim Gross b79021adfd cli: split -dev and -dev-connect flags 2019-08-30 09:33:30 -04:00
Danielle Lancashire bc198d9328
chore: Remove unused travis scripts 2019-08-30 13:51:05 +02:00
Danielle Lancashire 67488e2b19
ci: Migrate remaining jobs to CircleCI 2019-08-30 13:44:23 +02:00
Danielle Lancashire f7b55bd965
chore: Update changelog 2019-08-30 13:31:01 +02:00
Danielle 7d3cc532d9
Merge pull request #6228 from hashicorp/chore/remove-go-travis
chore: Remove Go Tests from Travis
2019-08-30 09:09:20 +02:00
Mahmood Ali 11b9212673
Merge pull request #6226 from hashicorp/b-defensive-rawexec
raw_exec: be defensive when disabled
2019-08-29 21:19:05 -04:00
Mahmood Ali f98d4ee3f1 tests: enable raw_exec driver 2019-08-29 20:26:50 -04:00
Buck Doyle c1310a48c6 Update recent entries with consistent tenses 2019-08-29 17:36:21 -05:00
Preetha Appan c8f5130978
Deprecation notice for rkt 2019-08-29 13:38:12 -05:00
Tim Gross 2df7bac630 ci: require Consul 1.6.0 2019-08-29 14:15:56 -04:00
Tim Gross aa12b87ac2 dev: bump vagrant consul version to match CI 2019-08-29 14:15:56 -04:00
Grégoire Delattre c6ac788258 Fix the ExecTask function in DriverExecTaskNotSupported (#6145)
This fixes the ExecTask definition to match with the DriverPlugin
interface.
2019-08-29 11:36:29 -04:00
Mahmood Ali 32ab75c3f1
Merge pull request #6227 from hashicorp/b-drivers-check
schedulers: check all drivers on node
2019-08-29 09:48:07 -04:00
Danielle Lancashire b4ea277ecd
chore: Remove Go Tests from Travis
This commit removes the travis tests that duplicate ones ran in
CircleCI.
2019-08-29 15:43:09 +02:00
Mahmood Ali 28e473aaff raw_exec: be defensive when disabled
Ensure that no raw_exec task can run on a client where it's disabled,
even if a flaw lead to client being assigned a raw_exec task
unexpectedly.
2019-08-29 09:09:40 -04:00
Mahmood Ali 3a1cb51539 schedulers: check all drivers on node
When checking driver feasability for an alloc with multiple drivers, we
must check that all drivers are detected and healthy.

Nomad 0.9 and 0.8 have a bug where we may check a single driver only,
but which driver is dependent on map traversal order, which is
unspecified in golang spec.
2019-08-29 09:03:31 -04:00
Mahmood Ali 3da10b5cb3 scheduler: tests for multiple drivers in TG 2019-08-29 09:03:31 -04:00
Michael Schurter f5792635ca
Merge pull request #6218 from hashicorp/f-consul-defaults
consul: use Consul's defaults and env vars
2019-08-28 11:54:44 -07:00
Mahmood Ali 0bd2eee87f
Merge pull request #6216 from hashicorp/b-recognize-pending-allocs
alloc_runner: wait when starting suspicious allocs
2019-08-28 14:46:09 -04:00
Mahmood Ali e0da3c5d0e rename to hasLocalState, and ignore clientstate
The ClientState being pending isn't a good criteria; as an alloc may
have been updated in-place before it was completed.

Also, updated the logic so we only check for task states.  If an alloc
has deployment state but no persisted tasks at all, restore will still
fail.
2019-08-28 11:44:48 -04:00
Mahmood Ali 33673be4a6
Merge pull request #6219 from hashicorp/c-circleci-upgrade-machine-img
upgrade machine image for most jobs
2019-08-28 11:27:04 -04:00
Lang Martin a1936e3add
Merge pull request #6215 from hashicorp/f-upgrade-go-getter
upgrade go-getter, leave compiled protobuf at version 1.2
2019-08-28 11:01:31 -04:00
Nick Ethier cf014c7fd5
ar: ensure network forwarding is allowed for bridged allocs (#6196)
* ar: ensure network forwarding is allowed in iptables for bridged allocs

* ensure filter rule exists at setup time
2019-08-28 10:51:34 -04:00
Mahmood Ali acec5a751a upgrade machine image for most jobs
Looks like the host unattended upgrades is interferring with chroot
creation.  Here, we upgrade machine image to one without unattended
upgrades misconfigured, across the board except for the `test-docker`
job.

Docker seems to be misbehaving on that image, and we get some unexpected
cgroups errors, e.g. https://circleci.com/gh/hashicorp/nomad/3854 .

Sample recent failures of `test-exec`:

https://circleci.com/gh/hashicorp/nomad/3633
https://circleci.com/gh/hashicorp/nomad/3696
https://circleci.com/gh/hashicorp/nomad/3714
https://circleci.com/gh/hashicorp/nomad/3764
https://circleci.com/gh/hashicorp/nomad/3770
https://circleci.com/gh/hashicorp/nomad/3834
2019-08-28 09:50:56 -04:00
Nick Ethier 9e96971a75
cli: display group ports and address in alloc status command output (#6189)
* cli: display group ports and address in alloc status command output

* add assertions for port.To = -1 case and convert assertions to testify
2019-08-27 23:59:36 -04:00
Nick Ethier cbb27e74bc
Add environment variables for connect upstreams (#6171)
* taskenv: add connect upstream env vars + test

* set taskenv upstreams instead of appending

* Update client/taskenv/env.go

Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
2019-08-27 23:41:38 -04:00
Michael Schurter 3b0e1d8ef7 consul: use Consul's defaults and env vars
Use Consul's API package defaults and env vars as Nomad's defaults.
2019-08-27 14:56:52 -07:00
Mahmood Ali 90c5eefbab Alternative approach: avoid restoring
This uses an alternative approach where we avoid restoring the alloc
runner in the first place, if we suspect that the alloc may have been
completed already.
2019-08-27 17:30:55 -04:00
Lang Martin 5ae153900f match pinned versions for sub-modules 2019-08-27 12:58:12 -04:00
Jasmine Dahilig 4078393bb6
expose nomad namespace as environment variable in allocation #5692 (#6192) 2019-08-27 08:38:07 -07:00
Jasmine Dahilig ffceab0879
remove network stanza from job init --short example jobspec (#6179) 2019-08-27 07:36:32 -07:00
Mahmood Ali 647c1457cb alloc_runner: wait when starting suspicious allocs
This commit aims to help users running with clients suseptible to the
destroyed alloc being restrarted bug upgrade to latest.  Without this,
such users will have their tasks run unexpectedly on upgrade and only
see the bug resolved after subsequent restart.

If, on restore, the client sees a pending alloc without any other
persisted info, then err on the side that it's an corrupt persisted
state of an alloc instead of the client happening to be killed right
when alloc is assigned to client.

Few reasons motivate this behavior:

Statistically speaking, corruption being the cause is more likely.  A
long running client will have higher chance of having allocs persisted
incorrectly with pending state.  Being killed right when an alloc is
about to start is relatively unlikely.

Also, delaying starting an alloc that hasn't started (by hopefully
seconds) is not as severe as launching too many allocs that may bring
client down.

More importantly, this helps customers upgrade their clients without
risking taking their clients down and destablizing their cluster. We
don't want existing users to force triggering the bug while they upgrade
and restart cluster.
2019-08-26 22:05:31 -04:00
Lang Martin c79eb24816 govendor fetch github.com/hashicorp/go-getter@f5101da, protobuf 1.2 2019-08-26 17:54:21 -04:00
Mahmood Ali dfdf0edd3b
Merge pull request #6207 from hashicorp/b-gc-destroyed-allocs-rerun
Don't persist allocs of destroyed alloc runners
2019-08-26 17:26:18 -04:00
Tim Gross 11030f7aa0 init: add generated assets into bindata 2019-08-26 14:24:15 -04:00
Mahmood Ali cc460d4804 Write to client store while holding lock
Protect against a race where destroying and persist state goroutines
race.

The downside is that the database io operation will run while holding
the lock and may run indefinitely.  The risk of lock being long held is
slow destruction, but slow io has bigger problems.
2019-08-26 13:45:58 -04:00
Danielle 329e195be8
Merge pull request #6181 from hashicorp/dani/scheduler-vol-ro
scheduler: Implicit constraint on readonly hostvol
2019-08-26 17:01:49 +02:00
Mahmood Ali 97a2905004
Merge pull request #6205 from hashicorp/b-no-golang-29119-workaround
logmon: revert workaround for Windows go1.11 bug
2019-08-26 10:52:51 -04:00
Nick Fagerlund bc30275c98 Update middleman-hashicorp container (#6185) 2019-08-26 09:29:08 -05:00
Mahmood Ali 1851820f20 logmon: log stat error to help debugging 2019-08-26 10:10:20 -04:00
Mahmood Ali e7085ca846
Merge pull request #6204 from hashicorp/c-circleci-tweaks-20190824
ci: use circleci/golang images directly
2019-08-26 10:08:14 -04:00
Mahmood Ali c132623ffc Don't persist allocs of destroyed alloc runners
This fixes a bug where allocs that have been GCed get re-run again after client
is restarted.  A heavily-used client may launch thousands of allocs on startup
and get killed.

The bug is that an alloc runner that gets destroyed due to GC remains in
client alloc runner set.  Periodically, they get persisted until alloc is
gced by server.  During that  time, the client db will contain the alloc
but not its individual tasks status nor completed state.  On client restart,
client assumes that alloc is pending state and re-runs it.

Here, we fix it by ensuring that destroyed alloc runners don't persist any alloc
to the state DB.

This is a short-term fix, as we should consider revamping client state
management.  Storing alloc and task information in non-transaction non-atomic
concurrently while alloc runner is running and potentially changing state is a
recipe for bugs.

Fixes https://github.com/hashicorp/nomad/issues/5984
Related to https://github.com/hashicorp/nomad/pull/5890
2019-08-25 11:21:28 -04:00