Commit Graph

3017 Commits

Author SHA1 Message Date
Jainin Shah 39e1fc06e5 adding comments to the change 2018-02-28 16:19:51 -08:00
Preetha Appan eaedffc7f7
Fix go vet errors 2018-02-28 12:21:27 -06:00
Chelsea Holland Komlo 355805db56 reset timer after updating node copy 2018-02-27 17:18:10 -05:00
Jainin Shah 6eb7da002f changing the formula to correctly pass the CPUQota to docker 2018-02-27 12:32:23 -08:00
Chelsea Holland Komlo a72aaaf47f add network resources equal method, use time ticker
remove impossible test case
2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo e736e31820 use time ticker, update how network resources are compared 2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo 5059065b52 improved testing; node networks comparison 2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo b7bcd0b59f fingerprinters accessing node information should be thread safe 2018-02-26 15:25:54 -05:00
Chelsea Holland Komlo 1f31b39fe8 code review fixups 2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo ed8c8afbcd edge trigger node update
test update config copy trigger
2018-02-26 12:36:04 -05:00
Alex Dadgar 49a47483d1 Registering back to initializing
Fix a bug in which if the node attributes/meta changed, we would
re-register the node in status initializing. This would incorrectly
trigger the client to log that it missed its heartbeat.

It would change the status of the Node to initializing until the next
heartbeat occured.
2018-02-16 17:49:31 -08:00
Alex Dadgar eff4455c68 Fix original client server list behavior 2018-02-15 16:04:53 -08:00
Alex Dadgar 0ebf7f3b7f remove tmp file 2018-02-15 15:51:27 -08:00
Alex Dadgar f9cf642436 Client tls 2018-02-15 15:22:57 -08:00
Alex Dadgar 0e85ae77b4 fix flaky gc tests 2018-02-15 13:59:03 -08:00
Alex Dadgar 38b695b69c feedback and rebasing 2018-02-15 13:59:03 -08:00
Alex Dadgar 9117ef4650 HTTP agent 2018-02-15 13:59:03 -08:00
Alex Dadgar d7029965ca Server side impl + touch ups 2018-02-15 13:59:02 -08:00
Alex Dadgar ce0caccad2 client implementation of alloc gc and stats 2018-02-15 13:59:02 -08:00
Alex Dadgar e685211892 Code review feedback 2018-02-15 13:59:02 -08:00
Alex Dadgar a9c4f8a4c8 clarify force 2018-02-15 13:59:02 -08:00
Alex Dadgar dc75501c69 Respond to comments 2018-02-15 13:59:02 -08:00
Alex Dadgar cea77df6a7 Add Streaming RPC ack
This PR introduces an ack allowing the receiving end of the streaming
RPC to return any error that may have occured during the establishment
of the streaming RPC.
2018-02-15 13:59:02 -08:00
Alex Dadgar 2f9d33f479 vet 2018-02-15 13:59:02 -08:00
Alex Dadgar f5f43218f5 HTTP and tests 2018-02-15 13:59:02 -08:00
Alex Dadgar 6546b43a17 Client implementation of stream 2018-02-15 13:59:02 -08:00
Alex Dadgar 9a5569678c Client Stat/List impl 2018-02-15 13:59:02 -08:00
Alex Dadgar 8854b35b34 Agent logs 2018-02-15 13:59:02 -08:00
Alex Dadgar 857b0ab6c7 client tests 2018-02-15 13:59:02 -08:00
Alex Dadgar 69def2ff22 Server tests of logs 2018-02-15 13:59:02 -08:00
Alex Dadgar 9479cb7f25 Remove logging 2018-02-15 13:59:01 -08:00
Alex Dadgar 14f57024b7 test stream framer 2018-02-15 13:59:01 -08:00
Alex Dadgar ddd67f5f11 Server streaming 2018-02-15 13:59:01 -08:00
Alex Dadgar ca9379be09 Logs over RPC w/ lots to touch up 2018-02-15 13:59:01 -08:00
Alex Dadgar 2c0ad26374 New RPC Modes and basic setup for streaming RPC handlers 2018-02-15 13:59:01 -08:00
Alex Dadgar fea0e69d4f wip fs endpoint 2018-02-15 13:59:01 -08:00
Alex Dadgar b5037f20db Remove circular dependency 2018-02-15 13:59:01 -08:00
Alex Dadgar 9bc75f0ad4 Fix manager tests and make testagent recover from port conflicts 2018-02-15 13:59:01 -08:00
Alex Dadgar feb943c873 Fix lint/comments 2018-02-15 13:59:01 -08:00
Alex Dadgar ac67da3b06 Unjankify the pkg 2018-02-15 13:59:01 -08:00
Alex Dadgar 3f1f8604bb initial round of comment review 2018-02-15 13:59:01 -08:00
Alex Dadgar e03b074650 Plumb config 2018-02-15 13:59:01 -08:00
Alex Dadgar 05c4fe8675 Change defaults for min use duration 2018-02-15 13:59:01 -08:00
Alex Dadgar c8c1284bc3 SetServer command actually returns an error if given an invalid server 2018-02-15 13:59:01 -08:00
Alex Dadgar 3f786b904b use server manager 2018-02-15 13:59:01 -08:00
Alex Dadgar b24b05e025 Remove testing 2018-02-15 13:59:01 -08:00
Alex Dadgar 4e1cb1d96e Test RPC from server 2018-02-15 13:59:00 -08:00
Alex Dadgar 6dd1c9f49d Refactor 2018-02-15 13:59:00 -08:00
Alex Dadgar a6dfffa4fa Add testing interfaces 2018-02-15 13:59:00 -08:00
Alex Dadgar d918f9bd5c RPC Listener 2018-02-15 13:59:00 -08:00
Alex Dadgar 1472b943d6 Stats Endpoint 2018-02-15 13:59:00 -08:00
Chelsea Komlo 0c0b56a1a4
Merge pull request #3807 from hashicorp/f-client-add-fingerprint-manager
Add fingerprint manager to manage fingerprinting node
2018-02-13 11:22:50 -05:00
Chelsea Holland Komlo b321287712 extract test helper
lock concurrent accesses to node

comment exported method
2018-02-12 18:30:10 -05:00
Michael Schurter 101e85f078
Merge pull request #3819 from schmichael/qemu-graceful-shutdown-alpine
Test QEMU graceful shutdown
2018-02-12 12:32:14 -08:00
Michael Schurter ed6bce2ccf Improve test logging 2018-02-12 11:25:52 -08:00
Michael Schurter 06397ba59d
Merge pull request #3825 from jaininshah9/master
add a flag for cpu_hard_limit
2018-02-08 20:40:38 -08:00
Michael Schurter 6e6915e7f5 Merge branch 'master' into f-cpu_hard_limit 2018-02-08 20:14:29 -08:00
Alan Scherger eee7144643 drivers: use ctx.TaskEnv for mount points 2018-02-08 12:59:20 -06:00
Jainin Shah a4516aa71a removing underscore in variable name 2018-02-07 16:28:43 -08:00
Chelsea Holland Komlo 4a26959825 code review feedback 2018-02-07 18:10:55 -05:00
Chelsea Holland Komlo d626d24488 remove dependency on client for fingerprint manager 2018-02-07 18:10:45 -05:00
Chelsea Holland Komlo e012e5ab8a add fingerprint manager 2018-02-07 18:10:33 -05:00
Jainin Shah 8149587abe clearing the confusion between microsecond,nanosecond and millisecond 2018-02-06 19:11:39 -08:00
Jainin Shah d3087d6069 using d.node.Resources.CPU as suggested 2018-02-06 14:52:15 -08:00
Michael Schurter 279a3b3f28
Merge pull request #3790 from 42wim/dockerv6
Service registration for IPv6 docker addresses (Fixes #3785)
2018-02-05 17:07:53 -08:00
Michael Schurter 25f0ad050f docker: Skip IPv6 test if IPv6 disabled 2018-02-05 16:24:30 -08:00
Chelsea Komlo 42d20234a3
Merge pull request #3781 from hashicorp/f-client-fingerprint-refactor
Refactor client fingerprinters to return a diff of node attributes
2018-02-01 20:13:44 -05:00
Chelsea Holland Komlo b21233fe23 update log message 2018-02-01 19:46:57 -05:00
Chelsea Holland Komlo 6f9c0ab361 req/resp should be within config locks; rename for detected fingerprints
changelog
2018-02-01 19:00:39 -05:00
Wim a1a2ca8e33 Add AdvertiseIPv6Address test 2018-02-01 23:21:47 +01:00
Jainin Shah 94d0ce6006 wrapping the line to less than 80 characters 2018-02-01 14:16:38 -08:00
Jainin Shah 0d99f256de changes after running go fmt 2018-02-01 12:07:05 -08:00
Jainin Shah 04c14b3cb2 add a flag for cpu_hard_limit 2018-02-01 10:09:12 -08:00
Chelsea Holland Komlo d889e471a2 fix up linting 2018-02-01 12:26:38 -05:00
Chelsea Holland Komlo b54203eddc add detected to more drivers where the driver is found but unusable 2018-02-01 11:28:17 -05:00
Michael Schurter 0ac43a7622 Skip QEMU graceful shutdown test except on Travis
Hopefully we can reuse the SkipSlow helper elsewhere.
2018-01-31 15:47:26 -08:00
Chelsea Holland Komlo b8e8064835 code review fixup 2018-01-31 18:34:03 -05:00
Michael Schurter 24d060bbb4 Test graceful shutdown
Uses an Alpine image which supports ACPI poweroff signal handling.
Handling is only enabled after the VM has booted, so this test blocks
until sshd starts before issuing the command.
2018-01-31 15:05:02 -08:00
Wim db3bdfe898 * Change use_ipv6_address to advertise_ipv6_address.
* Set autoadvertise to true.
* Update documentation.
2018-02-01 00:01:25 +01:00
Chelsea Holland Komlo 7b53474a6e add applicable boolean to fingerprint response
public fields and remove getter functions
2018-01-31 13:21:45 -05:00
Michael Schurter cc54e36f91
Merge pull request #3798 from simar7/qemu-graceful-shutdown-bug
[QEMU] Fixing an unintentional variable shadowing
2018-01-30 17:43:44 -08:00
Michael Schurter c662cc0172
Merge pull request #3773 from mikemccracken/2018-01-18/destroy-container-on-err
lxc: cleanup partially configured containers after errors in Start
2018-01-30 14:52:29 -08:00
Chelsea Holland Komlo 9482c322b7 locks for fingerprint reads/writes 2018-01-30 11:32:45 -05:00
Wim 76f09db067 Service registration for IPv6 docker addresses 2018-01-30 17:07:47 +01:00
Alex Dadgar 3ad5916f72
Merge pull request #3799 from mikemccracken/2018-01-25/lxc-log-outside-container
lxc: move lxc log file out of container-visible alloc dir
2018-01-29 14:32:22 -08:00
Chelsea Holland Komlo 14147c8496 remove attributes from periodic fingerprints when state changes
write test for client periodic fingerprinters
2018-01-29 13:48:54 -05:00
Alex Dadgar 3d28774f74
Merge pull request #3802 from filipochnik/docker-readonly-rootfs
Add ReadonlyRootfs option to the Docker driver
2018-01-29 09:47:27 -08:00
Indradhanush Gupta 7db4ee1122 rkt_test.go: Remove underscore from variable names 2018-01-29 11:39:50 +01:00
Filip Ochnik 80a17ee8dd Add ReadonlyRootfs option to the Docker driver 2018-01-27 14:38:29 +01:00
Chelsea Holland Komlo 7c19de797c create safe getters and setters for fingerprint response 2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo 896d6f8058 fixups from code review 2018-01-26 07:04:32 -05:00
Simarpreet Singh ac720b84f0
qemu: Make the driver debugging output more indicative
Signed-off-by: Simarpreet Singh <simar@linux.com>
2018-01-25 16:40:16 -08:00
Simarpreet Singh 8b058f7570
qemu: Fix unintentional shadowing of monitorPath variable
Signed-off-by: Simarpreet Singh <simar@linux.com>
2018-01-25 16:24:10 -08:00
Michael McCracken 09c9ca23f5 lxc: move lxc log file out of container-visible alloc dir
The LXC runtime's log file is currently written to TaskDir.LogDir,
which is mounted as alloc/logs inside the containers in the task
group.

This file is not intended to be visible to containers, and depending
on the log level, may have information about the host that a container
should not be allowed to see.

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-25 14:41:37 -08:00
Michael McCracken 88e3063717 fix speling in log
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-25 13:56:14 -08:00
Chelsea Holland Komlo 3d38868b88 add test case for available cgroups 2018-01-25 06:08:07 -05:00
Chelsea Holland Komlo 9a8344333b refactor Fingerprint to request/response construct 2018-01-24 11:54:02 -05:00
Michael McCracken f8fe2ea8cb review cleanup
don't export an internal function, and simplify some code

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-23 15:03:09 -08:00
Alex Dadgar a43e0a7b08 Allow overriding an image's entrypoint in Docker
Fixes https://github.com/hashicorp/nomad/issues/2219
2018-01-23 14:05:00 -08:00
Alex Dadgar 98a03ad689
Merge pull request #3754 from filipochnik/docker-caps
Add an option to add and drop capabilities in the Docker driver
2018-01-23 12:02:50 -08:00
Chelsea Komlo d09cc2a69f
Merge pull request #3492 from hashicorp/f-client-tls-reload
Client/Server TLS dynamic reload
2018-01-23 05:51:32 -05:00
Filip Ochnik 4abd269a68
Merge branch 'master' into docker-caps 2018-01-21 12:18:22 +01:00
Filip Ochnik 558812350d Finish implementation of the capabilities whitelist 2018-01-21 12:14:24 +01:00
Michael McCracken 00dcfa6db9 lxc: cleanup partially configured containers after errors in Start
If there are any errors in container setup after c.Create() in
Start(), the container will be left around, with no way to clean it up
because the handle will not be created or returned from Start.

Added a wrapper that checks for errors and performs appropriate
cleanup. Returning a cleanup function from a wrapped function instead
of just doing the cleanup before returning the error helps to ensure
that future changes that might add or change error exits can't forget
to consider a cleanup function.

Adds a check to the invalid config test case to check that a container
created with an invalid config doesn't get left behind.

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 16:03:03 -08:00
Michael Schurter 38182bebea Drop log level to TRACE
For people not using driver networks these log lines would just be
confusing.
2018-01-18 15:35:24 -08:00
Michael Schurter 9d410c88a7 Improve driver network logging 2018-01-18 15:35:24 -08:00
Michael Schurter 583e17fad5 Always advertise driver IP when in driver mode
Fixes #3681

When in drive address mode Nomad should always advertise the driver's IP
in Consul even when no network exists. This matches the 0.6 behavior.

When in host address mode Nomad advertises the alloc's network's IP if
one exists. Otherwise it lets Consul determine the IP.

I also added some much needed logging around Docker's network discovery.
2018-01-18 15:35:24 -08:00
Michael McCracken 70817f728c lxc_test: add test for contents of file in bind-mounted dir
Ensure that bind mounting via the volumes config really did work.

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 05:36:45 -08:00
Michael McCracken fd44bdee37 Simplify with gofmt -s
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken f176e02a64 lxc: add tests for volume support
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken c78c00a2d2 lxc: Add config flag to disable volume support
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken d694a8921f Add volumes config to LXC driver
Allow lxc driver to accept bind mount config similarly to the docker
driver.

Includes some static sanity checks in Validate step

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Chelsea Holland Komlo 649f86f094 refactor creating a new tls configuration 2018-01-16 08:02:39 -05:00
Chelsea Holland Komlo 6c9f9c8ac3 adding additional test assertions; differentiate reloading agent and http server 2018-01-16 07:34:39 -05:00
Filip Ochnik 4eeb552a4f Add a sketch of capabilities whitelist logic for the Docker driver 2018-01-14 20:01:47 +01:00
Filip Ochnik 8ee3ce7a26 Add an option to add and drop capabilities in the Docker driver 2018-01-14 19:56:57 +01:00
Alex Dadgar bec9a72eec Remove networking from basic resources 2018-01-12 14:33:42 -08:00
Charlie Voiselle 867bb6f7f9 Found more priviledge.
priviledge -> privilege
2018-01-12 09:44:53 -05:00
Alex Dadgar 9e1e04c6f1
Merge pull request #3727 from filipochnik/fix-gh-2832
Recognize renewing non-renewable Vault lease as fatal
2018-01-10 11:47:10 -08:00
Michael Schurter 189ce7f991
Merge pull request #3723 from hashicorp/b-3702-chown-dirs
chown dirs when migrating ephemeral_disk data
2018-01-09 09:27:26 -08:00
Michael Schurter e6c27256b7 Test streamed directory ownership 2018-01-08 16:00:07 -08:00
Michael Schurter 2c79ffb213 chown dirs when migrating ephemeral_disk data
Fixes #3702

Added missing chown call and made it conditional on running as root and
not on Windows as we do with files.
2018-01-08 15:31:12 -08:00
Charlie Voiselle 1bb1ab5069 fix typo
Priviledge -> privilege
2018-01-08 15:56:07 -05:00
Chelsea Holland Komlo 214d128eb9 reload raft transport layer
fix up linting
2018-01-08 14:52:28 -05:00
Filip Ochnik d265e11c36 Recognize renewing non-renewable Vault lease as fatal 2018-01-08 20:32:31 +01:00
Chelsea Holland Komlo 0708d34135 call reload on agent, client, and server separately 2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo 9741097406 reloading tls config should be atomic for clients/servers 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo ae7fc4695e fixups from code review
Revert "close raft long-lived connections"

This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.

reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo acd3d1b162 fix up downgrading client to plaintext
add locks around changing server configuration
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo c0ad9a4627 add ability to upgrade/downgrade nomad agents tls configurations via sighup 2018-01-08 09:21:06 -05:00
Michael Schurter ef76c65da1 Lookup euid outside of loop 2017-12-13 11:50:12 -08:00
Michael Schurter 5032bf4f5a Skip tests that require root when not root
Also skip Chown on allocdir migration on Windows and when non-root.
Windows doesn't support it, and it will always fail as a non-root user.
2017-12-12 16:58:27 -08:00
Alex Dadgar f0b0697b57 Keyify struct 2017-12-11 17:23:14 -08:00
Michael Schurter c4d4ead199 Fix test broken by mock updates 2017-12-08 16:45:25 -08:00
Michael Schurter 4b20441eef Validate port label for host address mode
Also skip getting an address for script checks which don't use them.

Fixed a weird invalid reserved port in a TaskRunner test helper as well
as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause
other tests to fail, but we were referencing an invalid PortLabel and
just not catching it before.
2017-12-08 12:03:43 -08:00
Michael Schurter 30dd570061 Fix interpolation bug with service/check updates
Previously if only an interpolated variable used in a service or check
was changed we interpolated the old and new services and checks with the
new variable, so nothing appeared to have changed.
2017-12-08 12:03:00 -08:00
Michael Schurter 4347026f83 Test Consul from TaskRunner thoroughly
Rely less on the mockConsulServiceClient because the real
consul.ServiceClient needs all the testing it can get!
2017-12-08 12:03:00 -08:00
Alex Dadgar a0d6b6a121
Merge pull request #3630 from hashicorp/b-periodic
Handle race between fingerprinters and registration
2017-12-07 16:11:13 -08:00
Alex Dadgar 91ffbbb517 Review feedback 2017-12-07 16:10:57 -08:00
Chelsea Komlo c8e0cb3044
Merge pull request #3591 from hashicorp/b-1755-stop
Allow controlling the stop signal for drivers
2017-12-07 17:06:43 -05:00
Alex Dadgar 02baa6c52b Handle race between fingerprinters and registration 2017-12-07 13:09:37 -08:00
Chelsea Holland Komlo 61fa8ad4ba code review fixes 2017-12-07 13:46:25 -05:00
Chelsea Holland Komlo 77ab41124b set default kill signal on executor shutdown 2017-12-07 11:40:15 -05:00
Chelsea Holland Komlo 6cae8fe6e6 extend configurable kill signal to java driver 2017-12-07 11:40:10 -05:00
Alex Dadgar 4409fdacc0 Drop trace logging 2017-12-06 18:02:24 -08:00
Alex Dadgar cd9a7f14b8 Add logging around heartbeats 2017-12-06 17:57:50 -08:00
Chelsea Holland Komlo 350319239c change location of default kill signal 2017-12-06 17:48:25 -05:00
Chelsea Holland Komlo 7dfb64f941 extract signal helper into utils 2017-12-06 14:36:44 -05:00
Chelsea Holland Komlo b08611cfac move kill_signal to task level, extend to docker 2017-12-06 14:36:39 -05:00
Chelsea Holland Komlo 80de7d5ebd allow controlling the stop signal in exec/raw_exec 2017-12-06 11:28:45 -05:00
Chelsea Komlo 9ae849e09c
Merge pull request #3612 from hashicorp/docker-rkt-user
Set user for rkt tasks
2017-12-05 17:45:08 -05:00
Michael Schurter b66aa5b7f6
Merge pull request #3563 from hashicorp/b-snapshot-atomic
Atomic Snapshotting / Sticky Volume Migration
2017-12-05 09:16:33 -08:00
Chelsea Holland Komlo 4463dc607e fix up test 2017-12-05 10:12:40 -05:00
Chelsea Holland Komlo 7284f2385a remove unused user option 2017-12-04 18:01:31 -05:00
Michael Schurter 6ccc4219d3
Merge pull request #3615 from hashicorp/b-rkt-host-ports
rkt: Don't require port_map with host networking
2017-12-04 14:49:42 -08:00
Chelsea Holland Komlo 7c74968452 add ability to specify user for rkt 2017-12-04 14:21:48 -05:00
Michael Schurter 2bf1d6d85e rkt: Don't require port_map with host networking
Also don't try to return a DriverNetwork with host networking. None will
ever exist as that's the point of host networking: rkt won't create a
network namespace.
2017-12-01 17:23:25 -08:00
Chelsea Holland Komlo 4ee2122536 get KillTimeout in seconds, not nanoseconds 2017-12-01 10:43:00 -05:00
Michael Schurter 5e975bbd0f Add comment and normalize err check ordering
as per PR comments
2017-11-29 17:26:11 -08:00
Michael Schurter d996c3a231 Check for error file when receiving snapshots 2017-11-29 17:26:11 -08:00
Michael Schurter ca946679f6 Destroy partially migrated alloc dirs
Test that snapshot errors don't return a valid tar currently fails.
2017-11-29 17:26:11 -08:00
Michael Schurter 23c66e37c5 Handle errors during snapshotting
If an alloc dir is being GC'd (removed) during snapshotting the walk
func will be passed an error. Previously we didn't check for an error so
a panic would occur when we'd try to use a nil `fileInfo`.
2017-11-29 17:26:11 -08:00
Chelsea Holland Komlo 2208964948 Support StopTimeout for Docker tasksw
Update github.com/fsouza/go-dockerclient
2017-11-29 14:33:05 -05:00
Preetha Appan 6ad65c51e6 Missed assert in one place 2017-11-20 13:04:38 -06:00
Preetha Appan 747bd59daa Better error validation, and added test case for invalid sysctl inputs 2017-11-20 12:07:18 -06:00
Preetha Appan c68973747b Address some review comments 2017-11-20 11:15:09 -06:00
Preetha Appan 39ef9ee76d Fix gofmt warnings 2017-11-18 09:23:09 -06:00
Preetha Appan e53dd15f58 Fix test compilation after rebase 2017-11-17 17:46:04 -06:00
Samuel BERTHE 0fca2e19c8 review(docker driver): sysctls -> sysctl + ulimits -> ulimit 2017-11-17 16:30:45 -06:00
Samuel BERTHE 6c93922cb7 Oops 2017-11-17 16:14:14 -06:00
Samuel BERTHE c8363bc44b 💄 2017-11-17 16:03:22 -06:00
Samuel BERTHE 281ab90484 test(docker driver): testing sysctls and ulimits 2017-11-17 16:03:22 -06:00
Samuel BERTHE b9a10ff7fa feat(docker driver): adds sysctls and ulimits configs 2017-11-17 16:03:22 -06:00
Alex Dadgar 69d3bf7392
Merge pull request #3559 from hashicorp/b-metrics
Don't emit metrics for non-running tasks
2017-11-17 10:33:23 -08:00
Michael Schurter 3845c8d200
Merge pull request #3562 from hashicorp/b-3561-rkt-rm
Remove rkt pods when exiting
2017-11-16 17:30:21 -08:00
Michael Schurter 737fb45640
Merge pull request #3551 from hashicorp/b-3419-docker-409-bug
Fix Docker name conflict bug by updating dockerclient
2017-11-16 16:38:54 -08:00
Michael Schurter 437fce9954 Improve rktRemove error message 2017-11-16 15:45:14 -08:00
Michael Schurter 3ceec0caab Remove rkt pods when exiting
Fixes #3561
2017-11-16 14:33:44 -08:00
Charlie Voiselle 7a231897a5
Merge pull request #3556 from angrycub/f-fingerprint-log-level
Dropped loglevel for AWS fingerprinter env read misses to DEBUG
2017-11-16 16:27:25 -05:00
Charlie Voiselle 969ddf9c2a Lowered to DEBUG from AD feedback 2017-11-16 14:13:03 -05:00
Alex Dadgar 05b1588cea Only publish metric when the task is running and dev mode publishes metrics 2017-11-15 13:21:06 -08:00
Alex Dadgar 07963f0b6d
Merge pull request #3546 from hashicorp/f-heuristic
Better interface selection heuristic
2017-11-15 12:51:21 -08:00
Alex Dadgar 97ec3974a9 Use interface attached to default route 2017-11-15 11:32:32 -08:00
Michael Schurter f86f0bd9ea Handle leader task being dead in RestoreState
Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932

While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*

The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
2017-11-15 10:36:13 -08:00
Charlie Voiselle 1197637251 Dropped loglevel for AWS fingerprinter env reads
Certain environments use WARN for serious logging; however, it's very
possible to have machines without some of the fingerprinted keys
(public-ipv4 and public-hostname specifcally).  Setting log level to
INFO seems more consistent with this possibility.
2017-11-15 18:20:59 +00:00
Chelsea Komlo 2dfda33703 Nomad agent reload TLS configuration on SIGHUP (#3479)
* Allow server TLS configuration to be reloaded via SIGHUP

* dynamic tls reloading for nomad agents

* code cleanup and refactoring

* ensure keyloader is initialized, add comments

* allow downgrading from TLS

* initalize keyloader if necessary

* integration test for tls reload

* fix up test to assert success on reloaded TLS configuration

* failure in loading a new TLS config should remain at current

Reload only the config if agent is already using TLS

* reload agent configuration before specific server/client

lock keyloader before loading/caching a new certificate

* introduce a get-or-set method for keyloader

* fixups from code review

* fix up linting errors

* fixups from code review

* add lock for config updates; improve copy of tls config

* GetCertificate only reloads certificates dynamically for the server

* config updates/copies should be on agent

* improve http integration test

* simplify agent reloading storing a local copy of config

* reuse the same keyloader when reloading

* Test that server and client get reloaded but keep keyloader

* Keyloader exposes GetClientCertificate as well for outgoing connections

* Fix spelling

* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter 3023336b39 Add a test demonstrating the bug
Fails on Docker 17.09, passes on Docker 17.06 and earlier
2017-11-14 15:25:52 -08:00
Alex Dadgar ee31e15f51 Better interface selection heuristic
This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.

Fixes https://github.com/hashicorp/nomad/issues/3487
2017-11-13 15:13:43 -08:00
Preetha Appan 926c9ed997 Make device mounting unit test verify configuration via docker inspect 2017-11-13 09:56:54 -06:00
Preetha Appan dc2d5fb5a4 Unit test (linux only) that tests mounting a device in the docker driver 2017-11-13 09:56:54 -06:00
Preetha Appan 4834710e45 Add default value for cgroup permissions for device if not set 2017-11-13 09:56:54 -06:00
Preetha Appan 9cdee6991c Remove unnecessary check since validate method already checks this 2017-11-13 09:56:54 -06:00
Preetha Appan 110c1fd4f0 Add support for passing device into docker driver 2017-11-13 09:56:54 -06:00
Alex Dadgar d1358ec1b6 alway load all templates 2017-11-10 12:35:51 -08:00
Alex Dadgar a3ea0c17a0 Handle multiple environment templates
Fixes https://github.com/hashicorp/nomad/issues/3498
2017-11-10 11:08:19 -08:00
Alex Dadgar b3edc12dd9
Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown
Qemu driver: graceful shutdown feature
2017-11-03 16:41:34 -07:00
Michael Schurter 690b8f4cfb Remove noisy log line
Didn't mean to commit this
2017-11-03 16:00:30 -07:00
Matt Mercer 11e2870875 Qemu driver: clean up logging; fail unsupported features on Windows 2017-11-03 15:40:20 -07:00
Alex Dadgar 6034916ad1 fix spelling mistake 2017-11-03 15:04:59 -07:00
Alex Dadgar a23033932a
Merge pull request #3459 from multani/docker-oom-notification
docker: log that a container has been killed by the OOM killer
2017-11-03 13:24:03 -07:00
Matt Mercer cef9ba9770 Qemu driver: tweaks in response to PR feedback
Remove attribute for long qemu monitor path; misc cleanup; update tests
2017-11-03 11:28:56 -07:00
Preetha Appan 0eaef09675 Remove event GenericSource, and address other code review comments. Also added deprecation info in comments. 2017-11-03 10:10:06 -05:00
Preetha Appan 5f09c968b3 Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details. 2017-11-03 09:13:01 -05:00
Alex Dadgar b4af10edde Alloc Runner doesn't panic on restoration. 2017-11-02 16:14:13 -07:00
Alex Dadgar abd28cbd7d
Merge pull request #3493 from hashicorp/f-remove-atlas
Remove Atlas and Scada from codebase
2017-11-02 16:00:44 -07:00
Michael Schurter eedbe8efbb
Merge pull request #3490 from hashicorp/f-gc-logging
Make unable-to-gc log level adaptive
2017-11-02 14:32:40 -07:00
Diptanu Choudhury cb68889652 Added the node_id as a tag 2017-11-02 13:29:10 -07:00
Alex Dadgar 701f462d33 remove atlas 2017-11-02 11:27:21 -07:00
Michael Schurter fc33c945be Make unable-to-gc log level adaptive
WARNing when someone has over 50 non-terminal allocs was just too
confusing.

Tested manually with `gc_max_allocs = 10` and bumping a job from `count
= 19` to `count = 21`:

```
2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations
...
2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations
```
2017-11-02 10:57:42 -07:00
Diptanu Choudhury 8a9d0d40b1 Added support for tagged metrics 2017-11-02 10:07:57 -07:00
Diptanu Choudhury 5f522c6de3 Incrementing the start counter when we are actually starting a container 2017-11-02 09:51:20 -07:00
Diptanu Choudhury 44535e5d10 Recording counter for dead allocs properly 2017-11-02 09:51:20 -07:00
Diptanu Choudhury 0b34e811b7 Added metrics to track task/alloc start/restarts/dead events 2017-11-02 09:51:20 -07:00
Matt Mercer 00f90323c2 Qemu driver: defer cleanup sooner 2017-11-01 17:37:43 -07:00
Matt Mercer 43256af5f3 Qemu driver: clean up test logging; retry integration test for longer 2017-11-01 17:21:56 -07:00
Matt Mercer b1145705d3 Use strings.Replace() instead of custom function 2017-11-01 15:31:35 -07:00
Matt Mercer d51d174fa0 Qemu driver: basic testing of graceful shutdown feature 2017-11-01 15:31:30 -07:00
Matt Mercer c26013ea0b Qemu driver: include PIDs in log output 2017-11-01 15:31:24 -07:00
Matt Mercer 38d9a391aa Qemu driver: ensure proper cleanup of resources 2017-11-01 15:31:20 -07:00
Matt Mercer 46f7e2fa4c Qemu driver: minor logging fixes 2017-11-01 15:31:14 -07:00
Matt Mercer 4afb9dfa2d Standardize driver.qemu logging prefix 2017-11-01 15:30:44 -07:00
Matt Mercer 5127e75569 Qemu driver: add graceful shutdown feature 2017-11-01 15:30:36 -07:00
Michael Schurter 1769db98b7 Fix regression by returning error on unknown alloc 2017-11-01 15:16:38 -05:00
Michael Schurter 9f26b9a403 Fix race in test 2017-11-01 15:16:38 -05:00
Michael Schurter 73e9b57908 Trigger GCs after alloc changes
GC much more aggressively by triggering GCs when allocations become
terminal as well as after new allocations are added.
2017-11-01 15:16:38 -05:00
Michael Schurter 2a81160dcd Fix GC'd alloc tracking
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.

Also stops logging "marked for GC" twice.
2017-11-01 15:16:38 -05:00
Alex Dadgar c710550551 fix test 2017-10-30 12:35:31 -07:00
Alex Dadgar 4831380e57 Node access is done using locked Node copy
Fixes https://github.com/hashicorp/nomad/issues/3454

Reliably reproduced the data race before by having a fingerprinter
change the nodes attributes every millisecond and syncing at the same
rate. With fix, did not ever panic.
2017-10-27 13:27:24 -07:00
Jonathan Ballet 5429d1c656 docker: changed OOM killed error message 2017-10-27 20:30:52 +02:00
Jonathan Ballet 12615bde9c docker: log that a container has been killed by the OOM killer
Fix: #2203 (at least for Docker tasks)
2017-10-27 18:05:27 +02:00
Alex Dadgar f117eb28c7 go style vars 2017-10-25 10:49:34 -07:00
Alex Dadgar 3f8495dd0e fix two flaky tests 2017-10-23 18:15:52 -07:00
Alex Dadgar cb0d0ef009 move to consul freeport implementation 2017-10-23 16:51:40 -07:00
Alex Dadgar dbc014b360 Standardize retrieving a free port into a helper package 2017-10-23 16:48:20 -07:00
Alex Dadgar 4a69e1ad15 don't double parallel 2017-10-23 16:48:06 -07:00
Alex Dadgar 96ca2bbe4c respond to comments 2017-10-23 15:50:27 -07:00
Alex Dadgar 99c81b5848 Skip if no docker 2017-10-19 16:55:10 -07:00
Alex Dadgar 593536664e fix flaky java tests 2017-10-19 16:49:57 -07:00
Alex Dadgar 4bc452b479 Undo darwin user setting 2017-10-19 16:49:57 -07:00
Alex Dadgar c7c6964313 Run as user on mac 2017-10-19 16:49:57 -07:00
Alex Dadgar 55a1dffa2f sudo docker works 2017-10-19 16:49:57 -07:00
Alex Dadgar 805e7b3b62 docker tests 2017-10-19 16:49:57 -07:00
Michael Schurter 797f49702e Add logging around moby/moby#32648 bug 2017-10-18 10:44:03 -07:00
Michael Schurter 22ac450b2f Properly fail rkt fingerprinting on old vesions 2017-10-16 13:58:58 -07:00
Michael Schurter d7732c1a58 Squelch repeated rkt version warnings 2017-10-16 12:09:47 -07:00
Michael Schurter b5fd075d74 Test fixes from #3383 2017-10-13 15:45:35 -07:00
Michael Schurter b63eee17e9 Merge pull request #3383 from hashicorp/b-migrate-token
base64 migrate token
2017-10-13 13:46:54 -07:00
Michael Schurter dfd2967cdb Merge pull request #3376 from hashicorp/f-node-acls
Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc
2017-10-13 11:51:48 -07:00
Michael Schurter 15b991e039 base64 migrate token
HTTP header values must be ASCII.

Also constant time compare tokens and test the generate and compare
helper functions.
2017-10-13 10:59:13 -07:00
Alex Dadgar 85178d6048 rkt remove allocid 2017-10-13 10:07:50 -07:00