Commit Graph

833 Commits

Author SHA1 Message Date
Michael Schurter 5794e5ece7 Use int32 for atomic ops to avoid alignment issues
From https://golang.org/pkg/sync/atomic/#pkg-note-BUG :

On both ARM and x86-32, it is the caller's responsibility to arrange for
64-bit alignment of 64-bit words accessed atomically. The first word in
a global variable or in an allocated struct or slice can be relied upon
to be 64-bit aligned.
2017-08-04 10:14:16 -07:00
Chelsea Holland Komlo 111019642b resources are expected by state store to be plural 2017-08-04 15:07:34 +00:00
Chelsea Holland Komlo b2b4c5d7af add documentation
extract magic number into variable
2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo b6f459a7a9 refactor rpc endpoint and tests
add test for when no prefixes are matched

add test for no context at HTTP api
2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo 7b7a80d6ce resources list endpoint accepts http POST and PUT
set the index for a resources response
2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo 312bb19e1c refactor and add error handling for invalid context type 2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo ef5a62aaa1 use upsert as a test helper 2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo 0026a308ec add truncation boolean to response 2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo 06f10d2e5f adds evaluations
makes context singular
2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo 92cc40f1de change resources endpoint from http get to post 2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo 64a83edd9f remove resourceliststub, no need for another layer of abstraction 2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo 36285a4b23 test resources endpoint will return matching prefixes 2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo ecd0d85a4e adding test validation that received resources matches requested 2017-08-04 14:34:25 +00:00
Chelsea Holland Komlo 4dd6b46198 Retrieve job information for resources endpoint
requires further refactoring and logic for more contexts
2017-08-04 14:34:25 +00:00
Alex Dadgar 067a638478 Allow template to set Vault grace
This PR allows a template to specify the Vault grace duration.

Fixes https://github.com/hashicorp/nomad/issues/2922
2017-08-01 14:14:08 -07:00
James Nugent bf13a0cd90 meta: Fix goimports for command/agent/syslog.go 2017-07-30 08:56:40 -05:00
Michael Schurter d2f8fdcad5 Fix comment 2017-07-25 12:13:05 -07:00
Michael Schurter 3e6231842d Forgot to setcmdenv
This would leak a consul agent
2017-07-25 12:09:57 -07:00
Michael Schurter 4b83eba599 Use seen more conservatively 2017-07-24 16:48:40 -07:00
Michael Schurter cdf138eb27 Always increment failures...
...as it's used in calculating the backoff
2017-07-24 15:37:53 -07:00
Michael Schurter 809724ad8d Track whether Consul has ever been seen
Need a way to squelch Consul operation errors on shutdown. If it's never
been seen don't log errors about deregs failing.
2017-07-24 12:12:02 -07:00
Michael Schurter edbe62a879 Synchronously deregister agent on shutdown
Fixes #2891

Previously the agent services and checks were being asynchrously
deregistered on shutdown, so it was a race between the sync goroutine
deregistering them and Nomad shutting down.

This switches to synchronously deregister agent serivces and checks
which doesn't really have a downside since the sync goroutines retry
behavior doesn't help on shutdown anyway.
2017-07-24 11:40:37 -07:00
Alex Dadgar 553bc91725 Parallel client tests (#2890)
* alloc_runner

* Random tests

* parallel task_runner and no exec compatible check

* Parallel client

* Fail fast and use random ports

* Fix docker port mapping

* Make concurrent pull less timing dependant

* up parallel

* Fixes

* don't build chroots in parallel on travis

* Reduce parallelism on travis with lxc/rkt

* make java test app not run forever

* drop parallelism a little

* use docker ports that are out of the os's ephemeral port range

* Limit even more on travis

* rkt deadline
2017-07-22 19:04:36 -07:00
Alex Dadgar 4dd5d943c7 remove root requirement on consul integration check 2017-07-21 19:32:41 -07:00
Alex Dadgar 56f9cf86df Speed up client startup 2017-07-20 22:34:24 -07:00
Alex Dadgar c106df9215 Switch to in-process agent 2017-07-20 21:07:32 -07:00
Alex Dadgar d019eed363 Merge pull request #2874 from hashicorp/f-command-agent-tests
Parallelize the command/agent tests and add new test agent
2017-07-20 20:27:49 -07:00
Alex Dadgar 5df9be0ccb Fix bootstrapping and waiting 2017-07-20 20:15:37 -07:00
Alex Dadgar 4e90d56098 More parallel 2017-07-20 09:36:34 -07:00
Alex Dadgar 9037693436 New test agent 2017-07-19 22:14:36 -07:00
Alex Dadgar 9a2a5af608 Don't print atlas 2017-07-19 20:25:06 -07:00
Alex Dadgar 18c7043a30 Merge pull request #2866 from hashicorp/f-autocomplete-agent
Agent command autocompletes to hcl/json files
2017-07-19 13:18:32 -07:00
Michael Schurter c9e4c041b3 Too lazy to remember the right formatter for floats 2017-07-19 11:53:18 -07:00
Alex Dadgar b4b50b636f Fix predictor 2017-07-19 11:51:01 -07:00
Alex Dadgar 871cdcb932 Agent command autocompletes to hcl/json files 2017-07-19 11:28:16 -07:00
Michael Schurter 40c2d4e5eb Merge pull request #2858 from hashicorp/b-2849-deploy-json
Implement -json for job deployments
2017-07-19 10:15:01 -07:00
Michael Schurter 125a3fb2f9 Error -> Errof 2017-07-19 10:00:57 -07:00
Alex Dadgar 747d67eb3f Allow tuning of heartbeat ttls
This PR allows tuning of heartbeat TTLs. An example of very aggressive
settings is as follows:

```
server {
  heartbeat_grace = "1s"
  min_heartbeat_ttl = "1s"
  max_heartbeats_per_second = 200.0
}
```
2017-07-19 09:38:35 -07:00
Michael Schurter 99d1486f32 Never remove unknown agent services
Fixes #2827

This is a tradeoff. The pro is that you can run separate client and
server agents on the same node and advertise both. The con is that if a
Nomad agent crashes and isn't restarted on that node in the same mode
its entry will not be cleaned up.

That con scenario seems far less likely to occur than the scenario on
the pro side, and even if we do leak an agent entry the checks will be
failing so nothing should attempt to use it.
2017-07-18 13:23:01 -07:00
Alex Dadgar 45712c6ca3 test fixes 2017-07-07 14:11:27 -07:00
Alex Dadgar bf2dafb8e9 check id method name changed 2017-07-07 12:15:09 -07:00
Alex Dadgar 1cb877699a Disallow update stanza on batch jobs
This PR:
* disallows update stanzas on batch jobs
* undeprecates the stagger field
* changes the way warnings are returned
2017-07-07 12:11:39 -07:00
Alex Dadgar 5457bb7962 Job stability 2017-07-07 12:10:04 -07:00
Alex Dadgar 09dfa2fc10 Rename CreateDeployments and remove cancelling behavior in state_store 2017-07-07 12:10:04 -07:00
Alex Dadgar 067ed86a47 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Alex Dadgar abf34204cc JobVersions returns struct with optional diff 2017-07-07 12:05:57 -07:00
Alex Dadgar c643e6b0d1 Add config options 2017-07-07 12:05:56 -07:00
Alex Dadgar f233629a4f job deployment endpoint + api 2017-07-07 12:05:56 -07:00
Alex Dadgar 580eed5c88 HTTP Endpoints 2017-07-07 12:03:11 -07:00
Michael Schurter cab28b2963 Fix api endpoint test 2017-07-06 10:45:44 -07:00
Michael Schurter 0d3bdf7210 Add support for go-getter modes
Fixes #2678
2017-07-06 10:45:44 -07:00
Michael Schurter b9c9e6e557 Fix no_host_uuid parsing
Need to pointerify it to default to true since we can't tell false from
unset if it's not a pointer.
2017-07-03 17:41:20 -07:00
Michael Schurter 6e7cc3964e Merge pull request #2709 from hashicorp/f-advertise-docker-ips
Advertise driver-specific addresses
2017-07-03 14:04:12 -07:00
Michael Schurter d9e032aabf Merge pull request #2735 from hashicorp/f-no_host_uuid-true
Default no_host_uuid to true instead of false
2017-07-03 13:18:25 -07:00
Alexandre Dantas 100b51ac6a Fixing issue where use_node_name was always been set as false when merging telemetry configurations 2017-07-02 00:31:09 -03:00
Michael Schurter e9c357187c Properly normalize IPv6 addresses
A fix to #2739 instead of forcing IPv6 users to always specify a port as
well.

Prior to this commit IPv6 advertise addresses which lacked a port would
fail instead of having the default port added because
`net.SplitHostPort(someipv6)` returns a different error than
`net.SplitHostPort(someipv4)`.
2017-06-29 10:46:31 -07:00
Michael Schurter a863ead30e Fix test error formats 2017-06-26 12:53:43 -07:00
Michael Schurter e81252ba45 Default no_host_uuid to true instead of false
The host UUID isn't unique in many virtualized cases and of dubious
value even when it is univerally unique. Default to a random UUID.
2017-06-23 16:23:01 -07:00
Michael Schurter 5b59bea67b Move caonicalization from nomad/structs/ to api/ 2017-06-21 17:19:08 -07:00
Michael Schurter 9da78ae25f Remove debug logging 2017-06-21 17:19:08 -07:00
Michael Schurter c0eff81383 Fix Service.AddressMode changes during task updates 2017-06-21 17:19:08 -07:00
Michael Schurter 67d154a274 Test driver network advertisement and checks 2017-06-21 17:19:08 -07:00
Michael Schurter b9bfb84b53 Implement DriverNetwork and Service.AddressMode
Ideally DriverNetwork would be fully populated in Driver.Prestart, but
Docker doesn't assign the container's IP until you start the container.

However, it's important to setup the port env vars before calling
Driver.Start, so Prestart should populate that.
2017-06-21 17:19:08 -07:00
Michael Schurter 95a00cbef1 Fix path used by Nomad Server HTTP Check
Fixes #2701
2017-06-21 10:41:28 -07:00
Michael Schurter ffc2b36dc7 Merge pull request #2636 from hashicorp/f-gc-alloc-limit
Add new gc_max_allocs tuneable
2017-05-30 16:14:09 -07:00
Michael Schurter dd51aa1cb9 Merge pull request #2654 from hashicorp/f-env-consul
Add envconsul-like support and refactor environment handling
2017-05-30 14:40:14 -07:00
Michael Schurter a7e26e0a3e Don't autoadvertise private ip if bind=localhost
A slight improvement to #2399 - if bind is localhost, return an error
instead of advertising a private ip. The advertised ip isn't valid and
will just cause errors on use. It's better to fail with an error message
instructing users how to fix the problem.
2017-05-30 11:47:29 -07:00
Michael Schurter bbf299dde1 Fix config parsing test
Went overboard before I realized there's only one test case.
2017-05-30 11:39:26 -07:00
Michael Schurter 10b6610e56 Functional consul template env file support 2017-05-23 13:45:14 -07:00
Alex Dadgar 4503e2d1f7 Merge pull request #2634 from hashicorp/f-update-block
New Update block syntax
2017-05-18 13:29:17 -04:00
Michael Schurter 06f937bf28 Merge pull request #2591 from hashicorp/b-2180-script-updates
Properly interpolate services on updated tasks
2017-05-17 09:09:01 -07:00
Michael Schurter 49ce86ee0a Lower default gc_max_allocs to 50 2017-05-12 15:57:27 -07:00
Michael Schurter 5fd438661d Merge pull request #2399 from multani/sockaddr-template
Add support for late binding to IP addresses using go-sockaddr/template
2017-05-11 17:25:03 -07:00
Michael Schurter 0453c2709c Add new gc_max_allocs tuneable
More than gc_max_allocs may be running on a node, but terminal allocs
will be garbage collected to try to keep the total number below the
limit.
2017-05-11 17:18:02 -07:00
Alex Dadgar 919c50ba5c Merge branch 'master' into f-update-block 2017-05-11 13:08:31 -07:00
Alex Dadgar ac2afece53 Fix truncate test 2017-05-11 13:05:53 -07:00
Alex Dadgar 6232b66ea7 Thread through warnings about deprecations 2017-05-09 20:52:47 -07:00
Alex Dadgar ba70cc4f01 Merge branch 'master' into f-bolt-db 2017-05-09 11:11:55 -07:00
Alex Dadgar 10b040aea3 New update block; still need to handle the upgrade path 2017-05-08 17:44:26 -07:00
Michael Schurter 85210eb92f Update consul/api to support unix socket addrs
Fixes #2594
2017-05-08 11:57:04 -07:00
Michael Schurter f350c1f37e Merge pull request #2608 from hashicorp/f-test-verify_https_client
Test verify_https_client behavior and skip Consul HTTPS health checks when enabled
2017-05-04 17:36:13 -07:00
Michael Schurter 749406e50b Remove extra Travis logging 2017-05-04 17:35:54 -07:00
Michael Schurter 24c8434368 Adding logging for Travis 2017-05-03 15:18:48 -07:00
Alex Dadgar 2d54ee2925 Fix tests 2017-05-03 15:14:19 -07:00
Michael Schurter 4dc897a664 Don't reuse transport/client 2017-05-03 13:26:55 -07:00
Kate Taggart 2fb6301b37 responding to feedback on PR: remove Region from Node struct, some grammatical niceties. 2017-05-03 12:45:59 -07:00
Kate Taggart af22cb722e I think I did it. 2017-05-03 12:45:59 -07:00
Alex Dadgar 9faa98e13b Fix tests 2017-05-03 12:38:49 -07:00
Michael Schurter c4ab8211b6 Skip https health check if verify_https_client is true 2017-05-03 12:19:02 -07:00
Michael Schurter 66aea59650 Fix error check for consul skip tls verify support 2017-05-02 17:38:18 -07:00
Michael Schurter d42bad098a Extensively test verify_https_client behavior
verify_https_client support added in #2587
2017-05-02 16:48:16 -07:00
Michael Schurter b6e97d8523 Merge pull request #2587 from weargoggles/patch-1
Verification options for TLS
2017-05-02 10:36:41 -07:00
Alex Dadgar d779defe65 Use batching 2017-05-01 14:50:34 -07:00
Alex Dadgar bddedd7aba Don't deepcopy job when retrieving copy of Alloc
This PR removes deepcopying of the job attached to the allocation in the
alloc runner. This operation is called very often so removing reflect
from the code path and the potentially large number of mallocs need to
create a job reduced memory and cpu pressure.
2017-05-01 14:50:34 -07:00
Pete Wildsmith f64f37317e update test 2017-04-30 15:40:04 +01:00
Pete Wildsmith 642fbd2f56 address feedback 2017-04-29 08:26:12 +01:00
Pete Wildsmith 1b8a1614ca reduce to one configuration option
There should be just one option, verify_https_client, which
controls incoming and outgoing validation for the HTTPS wrapper
2017-04-28 10:45:09 +01:00
Pete Wildsmith 829d9e60b9 fix config parse test 2017-04-26 21:13:54 +01:00
Michael Schurter cafefa049b Properly interpolate services on updated tasks
Previously was interpolating the original task's services again.

Fixes #2180

Also fixes a slight memory leak in the new consul agent. Script check
handles weren't being deleted after cancellation.
2017-04-26 11:22:01 -07:00
Pete Wildsmith 1e6694c5c1 Verification options allowed in TLS config 2017-04-25 23:35:47 +01:00
Pete Wildsmith 3070d5ab9d Copy TLSConfig verification flags in server create 2017-04-25 23:33:12 +01:00
Alex Dadgar 65945a09fa Agent test 2017-04-20 11:14:06 -07:00
Alex Dadgar b3d7175e52 Agent revert 2017-04-20 11:14:06 -07:00
Michael Schurter 8926743106 Fix consul test build on Windows 2017-04-19 16:14:11 -07:00
Michael Schurter 83f9591d75 Thanks go vet! 2017-04-19 13:05:41 -07:00
Michael Schurter e997ae44a5 Skip checks with TLSSkipVerify if it's unsupported
Fixes #2218
2017-04-19 12:45:34 -07:00
Michael Schurter 45a8635ea2 Add TLSSkipVerify support to api and parser 2017-04-19 12:45:34 -07:00
Michael Schurter cdb7d2ebb6 Use go-version instead of manual version parsing 2017-04-19 12:42:48 -07:00
Michael Schurter 4910f867e7 Use spiffy new Go 1.8 subtest feature 2017-04-19 12:42:48 -07:00
Michael Schurter 3e8dd386ee Forgot an important word 2017-04-19 12:42:48 -07:00
Michael Schurter 947e31e9c2 Only register HTTPS agent check when Consul>=0.7.2
Support for TLSSkipVerify in other checks coming soon!
2017-04-19 12:42:48 -07:00
Michael Schurter 383f21559e Explain weird timer logic 2017-04-19 12:42:48 -07:00
Michael Schurter 4db840b54e Metricsify new Consul client 2017-04-19 12:42:48 -07:00
Michael Schurter 59c687c940 Always fail script checks when deadline exceeded 2017-04-19 12:42:48 -07:00
Michael Schurter ca29bb2cac Test script check exit codes 2017-04-19 12:42:47 -07:00
Michael Schurter 6f6431ce0b Follow _testing.go convention for testing tools 2017-04-19 12:42:47 -07:00
Michael Schurter 2a11508cc8 Rework to account for ports not being in IDs
Previous implementation assumed all struct fields were included in
service and check IDs. Service IDs never include port labels and check
IDs *optionally* include port labels, so lots of things had to change.

Added a really big test to exercise this.
2017-04-19 12:42:47 -07:00
Michael Schurter 6e0fd86361 Remove commits return value
...and still protect against leaking agent entries in Consul on
shutdown.
2017-04-19 12:42:47 -07:00
Michael Schurter fb52bbfa45 Explain PortLabel handling in RegisterAgent 2017-04-19 12:42:47 -07:00
Michael Schurter 2728216f6f Plumb alloc id + task name into script check logs 2017-04-19 12:42:47 -07:00
Michael Schurter 01b20cbb59 Stop being lazy and just type out struct{}{} 2017-04-19 12:42:47 -07:00
Michael Schurter 5e908dbf38 Use nifty testtask sleep command for xplat compat 2017-04-19 12:42:47 -07:00
Michael Schurter 8c433cfba1 Add comments, clarify names, fix PR comments 2017-04-19 12:42:47 -07:00
Michael Schurter 40b3987e90 Explain cleanup defer in test 2017-04-19 12:42:47 -07:00
Michael Schurter 7c8ad71da5 Fix comment to reflect reality 2017-04-19 12:42:47 -07:00
Michael Schurter 745ad9521f Move ScriptExecutor to driver 2017-04-19 12:42:47 -07:00
Michael Schurter 926d141532 Fix shutdown when consul is down 2017-04-19 12:42:47 -07:00
Michael Schurter 87b28cfb75 Switch ServiceClient to synchronizing state
Previously it applied a stream of operations. Reconciling state is less
complex and error prone at the cost of slightly higher CPU/memory usage.
2017-04-19 12:42:47 -07:00
Michael Schurter 244251490a Add UpdateTask method instead of Remove/Add 2017-04-19 12:42:47 -07:00
Michael Schurter 8118c41931 Remove some lies 2017-04-19 12:42:47 -07:00
Michael Schurter 0143bef8c7 Remove unused syncInterval 2017-04-19 12:42:47 -07:00
Michael Schurter e204a287ed Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar e7b128271f Diff code fixes 2017-04-16 16:54:40 -07:00
Alex Dadgar 3145086a42 non-purge deregisters 2017-04-15 17:08:05 -07:00
Alex Dadgar 45ad95e862 Agent API + api package 2017-04-15 17:08:05 -07:00
Alex Dadgar f97664512b Upsert Job Histories 2017-04-15 17:08:05 -07:00
Adam Stankiewicz 4daf4cb8c9
Remove unnecessary parameter from NewHTTPServer 2017-04-10 16:24:49 +02:00
Alex Dadgar 81b78f77e1 Track task start/finish time & improve logs errors
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar 40d0f3ce7d Flush logs on err 2017-03-14 14:48:39 -07:00
Alex Dadgar 177bd14718 rename cpu_total_compute and docs 2017-03-14 14:15:49 -07:00
Alex Dadgar a1a7941dec Various fixes
This PR:
* Uses Go 1.8 executable lookup
* Stores any err message from stats init method
* Allows overriding of Cpu Compute for hosts where it can't be detected
2017-03-14 12:56:31 -07:00
Jonathan Ballet cfaa2e9a1a Allow to advertise 127.0.0.1 in non-dev mode if explicitly configured 2017-03-13 23:05:06 +01:00
Jonathan Ballet 562deb6262 Default to private IP advertise address in non-dev mode 2017-03-13 23:01:11 +01:00
Jonathan Ballet 3c5c49bedb Parse template before splitting host/port
Ref: a33af1ca0b (r105444568)
2017-03-13 21:40:37 +01:00
Alex Dadgar 70e4feb045 Limit parallelism during garbage collection
This PR introduces a parallelism limit during garbage collection. This
is used to avoid large resource usage spikes if garbage collecting many
allocations at once.
2017-03-10 16:27:00 -08:00
Alex Dadgar 3b9bdfef1c Make validate work without a Nomad agent 2017-03-03 15:02:03 -08:00
Alex Dadgar 8827b4f4d0 Fix canonicalization of services 2017-03-01 15:30:01 -08:00
Alex Dadgar 5be806a3df Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar 6910678c21 Allow random UUID 2017-02-27 13:42:37 -08:00
Alex Dadgar 6afcba9e22 Allow specification of eval/job gc threshold 2017-02-27 11:58:10 -08:00