Commit graph

15860 commits

Author SHA1 Message Date
Mahmood Ali dfdf0edd3b
Merge pull request #6207 from hashicorp/b-gc-destroyed-allocs-rerun
Don't persist allocs of destroyed alloc runners
2019-08-26 17:26:18 -04:00
Tim Gross 11030f7aa0 init: add generated assets into bindata 2019-08-26 14:24:15 -04:00
Mahmood Ali cc460d4804 Write to client store while holding lock
Protect against a race where destroying and persist state goroutines
race.

The downside is that the database io operation will run while holding
the lock and may run indefinitely.  The risk of lock being long held is
slow destruction, but slow io has bigger problems.
2019-08-26 13:45:58 -04:00
Danielle 329e195be8
Merge pull request #6181 from hashicorp/dani/scheduler-vol-ro
scheduler: Implicit constraint on readonly hostvol
2019-08-26 17:01:49 +02:00
Mahmood Ali 97a2905004
Merge pull request #6205 from hashicorp/b-no-golang-29119-workaround
logmon: revert workaround for Windows go1.11 bug
2019-08-26 10:52:51 -04:00
Nick Fagerlund bc30275c98 Update middleman-hashicorp container (#6185) 2019-08-26 09:29:08 -05:00
Mahmood Ali 1851820f20 logmon: log stat error to help debugging 2019-08-26 10:10:20 -04:00
Mahmood Ali e7085ca846
Merge pull request #6204 from hashicorp/c-circleci-tweaks-20190824
ci: use circleci/golang images directly
2019-08-26 10:08:14 -04:00
Danielle Lancashire 7d1f313091
docs: Sort keys in node response 2019-08-26 14:10:13 +02:00
Danielle Lancashire 9ab3fb0908
docs: Update /v1/node/{node-id} example response
The nodes api documentation is fairly out of date, here I've updated the
entire response based on a local dev agent, rather than explicitly
adding new fields to bring us up to the current api shape.
2019-08-26 12:27:16 +02:00
Trevor Sullivan a832c23c19
Update AWS logo to current version 2019-08-25 14:18:24 -07:00
Mahmood Ali c132623ffc Don't persist allocs of destroyed alloc runners
This fixes a bug where allocs that have been GCed get re-run again after client
is restarted.  A heavily-used client may launch thousands of allocs on startup
and get killed.

The bug is that an alloc runner that gets destroyed due to GC remains in
client alloc runner set.  Periodically, they get persisted until alloc is
gced by server.  During that  time, the client db will contain the alloc
but not its individual tasks status nor completed state.  On client restart,
client assumes that alloc is pending state and re-runs it.

Here, we fix it by ensuring that destroyed alloc runners don't persist any alloc
to the state DB.

This is a short-term fix, as we should consider revamping client state
management.  Storing alloc and task information in non-transaction non-atomic
concurrently while alloc runner is running and potentially changing state is a
recipe for bugs.

Fixes https://github.com/hashicorp/nomad/issues/5984
Related to https://github.com/hashicorp/nomad/pull/5890
2019-08-25 11:21:28 -04:00
Mahmood Ali 6301725002 logmon: revert workaround for Windows go1.11 bug
Revert e0126123ab1ba848f72458538bc6118c978245e6 now that we are running
with Golang 1.12, and https://github.com/golang/go/issues/29119 is no
longer relevant.
2019-08-24 08:19:44 -04:00
Mahmood Ali df1f3eb9ee
Merge pull request #6201 from hashicorp/b-device-stats-interval
initialize device manager stats interval
2019-08-24 08:16:03 -04:00
Mahmood Ali e17d7338fc use circleci/golang images directly
We currently use an container image for `test-devices` job only; while
all other jobs use machine executor.

This allows us to switch golang and protoc verions easily without
manually managing Docker images (which requires building them manually
on a dev machines, etc).  All that while, we install dependencies on
every build in all other jobs..

`test-devices` now is one of the fastest jobs and isn't a constraint or
a bottleneck, so increasing its overhead by few seconds doesn't hurt the
overall developer iteration.

If we split tests effectively later, we can revisit.
2019-08-23 21:59:49 -04:00
Mahmood Ali c832853436 use a new image with proper protoc dependency
Fixes `test-devices` job
2019-08-23 21:33:07 -04:00
Mahmood Ali 07b5f4c530
Merge pull request #6146 from hashicorp/b-config-template-copy
clientConfig.Copy() to copy template config too
2019-08-23 19:00:57 -04:00
Mahmood Ali b98568774b clientConfig.Copy() to copy template config too 2019-08-23 18:43:22 -04:00
Mahmood Ali 3791a70aa9
Merge pull request #5676 from hashicorp/f-b-upgrade-ugorji-dep-20190508
Update ugorji/go to latest
2019-08-23 18:29:49 -04:00
Lang Martin fc2e1c407e
Merge pull request #6203 from hashicorp/b-chroot-setuid-110
exec driver setuid go-getter update
2019-08-23 16:49:41 -04:00
Lang Martin 4f6493a301 taskrunner getter set Umask for go-getter, setuid test 2019-08-23 15:59:03 -04:00
Lang Martin f807d9208f govendor fetch github.com/hashicorp/go-getter@6be654f 2019-08-23 15:59:03 -04:00
Mahmood Ali 3890619100 initialize device manager stats interval
Fixes a bug where we cpu is pigged at 100% due to collecting devices
statistics.  The passed stats interval was ignored, and the default zero
value causes a very tight loop of stats collection.

FWIW, in my testing, it took 2.5-3ms to collect nvidia GPU stats, on a
`g2.2xlarge` ec2 instance.

The stats interval defaults to 1 second and is user configurable.  I
believe this is too frequent as a default, and I may advocate for
reducing it to a value closer to 5s or 10s, but keeping it as is for
now.

Fixes https://github.com/hashicorp/nomad/issues/6057 .
2019-08-23 14:58:34 -04:00
Mahmood Ali 2999d5e2b2
Merge pull request #6200 from hashicorp/r-golang-1.12.9
Update golang to 1.12.9
2019-08-23 14:37:21 -04:00
Tim Gross 4d4461d1f5 agent: -dev=connect mode bind to 0.0.0.0
The dev mode flag for connect was binding to the default interface's
IP, but this makes for a bad user experience for the CLI which will
default to 127.0.0.1. If we bind to 0.0.0.0 instead the CLI will work
without further configuration by the user.
2019-08-23 13:51:16 -04:00
Jerome Gravel-Niquet cbdc1978bf Consul service meta (#6193)
* adds meta object to service in job spec, sends it to consul

* adds tests for service meta

* fix tests

* adds docs

* better hashing for service meta, use helper for copying meta when registering service

* tried to be DRY, but looks like it would be more work to use the
helper function
2019-08-23 12:49:02 -04:00
Mahmood Ali 74bc5a2c0b update circleci builds to use golang 1.12.9 2019-08-23 12:26:47 -04:00
Mahmood Ali 6430fb5444 use golang 1.12 2019-08-23 09:44:40 -04:00
Nick Ethier 96d379071d
ar: fix bridge networking port mapping when port.To is unset (#6190) 2019-08-22 21:53:52 -04:00
Preetha 28740274d4
Bring 0.9.5 changes to changelog on master branch 2019-08-22 17:35:15 -05:00
Buck Doyle 6fc229a0f1
Remove most Netlify configuration (#6194)
This removes the in-repository Netlify configuration. There are now two
sites backed by the repository, so we must use the web UI to
control the build settings, as having the configuration in-repository
overrides the web UI settings.

The build settings for the two sites are below, as of this commit. See
the extra step in nomad-ui site’s build step that copies the _redirects
file to the correct destination so things are properly forwarded when
you visit the deployment.

nomad-ui:

base directory: ui
build command: ember build && mkdir -p ui-dist/ui && mv dist/* ui-dist/ui/ && cp ../.netlify/ui-redirects ui-dist/_redirects
publish directory: ui/ui-dist

nomad-website:

base directory: website
build command: bundle exec middleman build
publish directory: website/build
2019-08-22 15:54:23 -05:00
Michael Schurter 95b8048553
Merge pull request #6121 from hashicorp/f-connect-bootstrap
connect: task hook for bootstrapping envoy sidecar
2019-08-22 10:58:31 -07:00
Michael Schurter 59e0b67c7f connect: task hook for bootstrapping envoy sidecar
Fixes #6041

Unlike all other Consul operations, boostrapping requires Consul be
available. This PR tries Consul 3 times with a backoff to account for
the group services being asynchronously registered with Consul.
2019-08-22 08:15:32 -07:00
Mahmood Ali afbe967583
Merge pull request #6187 from hashicorp/c-circleci-tweak-20190822
ci: Use more recent base machine executor image for test-rkt
2019-08-22 11:10:05 -04:00
Mahmood Ali 4a94d4ec1d ci: Use more recent base machine executor image
This fixes a frequent failure in `test-rkt` jobs where dpkg installation
fails.

The image used currently, circleci/classic:201808-01, has unattended
upgrades enabled accidentally, which runs on every build.  This means
that tools get modified unexpectedly during builds, and apt-get commands
may fail as the unattended upgrade is holding package database lock.

This updates `test-rkt` job only because the new image breaks
`test-docker` job (e.g. https://circleci.com/gh/hashicorp/nomad/2641 ),
and I punted on investigating test-docker for another day.
2019-08-22 10:31:57 -04:00
Buck Doyle 49b9dd5b9b
UI: Add creation time to evaluations table (#6050) 2019-08-22 08:11:24 -05:00
Danielle 7b062b0eac
Merge pull request #6175 from hashicorp/dani/remove-hidden-vols
remove hidden field from host volumes
2019-08-22 08:49:54 +02:00
Danielle Lancashire 2e5f28029f
remove hidden field from host volumes
We're not shipping support for "hidden" volumes in 0.10 any more, I'll
convert this to an issue+mini RFC for future enhancement.
2019-08-22 08:48:05 +02:00
Danielle c280e97619
Merge pull request #6184 from hashicorp/dani/fix-api
api: Fix definition of HostVolumeInfo
2019-08-22 00:13:28 +02:00
Danielle bd22e0e534
Merge pull request #6183 from hashicorp/dani/fix-multiparse
clientconfig: Fix parsing multiple host volumes
2019-08-21 22:36:27 +02:00
Danielle Lancashire 112b986736
api: Fix definition of HostVolumeInfo 2019-08-21 22:34:41 +02:00
Danielle 0428284aee
Merge pull request #6180 from hashicorp/dani/readonly-acl
Fine grained ACLs for Host Volumes
2019-08-21 22:22:14 +02:00
Danielle Lancashire 9df7e0eb72
clientconfig: Fix parsing multiple host volumes 2019-08-21 22:19:58 +02:00
Danielle Lancashire 91bb67f713
acls: Break mount acl into mount-rw and mount-ro 2019-08-21 21:17:30 +02:00
Danielle Lancashire 3a5e48ad18
scheduler: Implicit constraint on readonly hostvol
When a Client declares a volume is ReadOnly, we should only schedule it
for requests for ReadOnly volumes. This change means that if a host
exposes a readonly volume, we then validate that the group level
requests for the volume are all read only for that host.
2019-08-21 20:57:05 +02:00
Nick Ethier c8556daf37
structs: validate no tcp checks for connect services (#6169) 2019-08-21 12:42:53 -04:00
Buck Doyle 2900659a3c
UI: Add CircleCI job (#6125)
This adds a job to test the UI on CircleCI, including the sort of branch
pattern-matching from #5839, so .-ui/ branches only have that job
and not the non-UI ones.

I considered having an entire workflow for UI, which could have separate
jobs for linting vs Ember tests, but the lint commands take so little time
that it didn’t seem worth it.

There’s no use of nvm to change the Node version as the Docker image
is what controls that. It’s annoying to have to update the version in multiple
places, but probably infrequent.
2019-08-21 08:56:37 -05:00
Michael Schurter 050cc32fde
Merge pull request #6157 from hashicorp/f-connect-register
Register connect enabled group services with Consul
2019-08-20 14:45:38 -07:00
Michael Schurter 85aaba14ed ci: require Consul 1.6.0-rc1 2019-08-20 13:23:23 -07:00
Tim Gross 7dc6ee2d27 structs: add taskgroup networks and services to plan diffs
Adds a check for differences in `job.Diff` so that task group networks
and services, including new Consul connect stanzas, show up in the job
plan outputs.
2019-08-20 16:18:30 -04:00