Commit Graph

1676 Commits

Author SHA1 Message Date
Alex Dadgar e1d68f3e9b Docker host network test 2016-07-12 10:59:34 -06:00
Alex Dadgar 9085b5ca0d Merge pull request #1405 from novilabs/delay-on-startup-failure
do not fail for multiple startup failures, delay instead
2016-07-12 09:51:40 -06:00
Cameron Davison d7fe56ecf3 test policy delay for startup error 2016-07-11 20:54:36 -05:00
Cameron Davison dd314ea06e if policy mode is delay, do not fail for multiple startup failures, delay instead 2016-07-11 20:40:53 -05:00
Diptanu Choudhury ed3be34105 Introduced an env var for rkt tests 2016-07-11 15:48:16 -06:00
Diptanu Choudhury 4e0b621ffa Skipping travis tests and not installing rkt on travis 2016-07-11 15:10:09 -06:00
Sean Chittenden a9b3f5e552
Alpha-sort the build platforms 2016-07-11 12:23:46 -07:00
Sean Chittenden 267198742f
Merge branch 'master' into f-resource-isolation-fingerprinter 2016-07-11 12:23:09 -07:00
Sean Chittenden d309649ada
Darwin currently has allocdir support.
Pointed out by: @dadgar
2016-07-11 12:19:17 -07:00
Sean Chittenden 20d87f1782
Remove cgroup fingerprinter from non-linux systems.
If someone wants to extend or reuse Cgroup detenction in the future they
can move `cgroup_linux.go` to `cgroup.go` and add the relevant build
tags.

Requested by: @dadgar
2016-07-11 12:16:56 -07:00
Diptanu Choudhury e2909db9ef Merge pull request #1388 from novilabs/support-docker-syslog-unixformat-and-defaultformat
Support docker syslog unixformat and defaultformat
2016-07-11 11:17:30 -07:00
Alex Dadgar f11b1ce079 Get windows to build 2016-07-11 11:52:41 -06:00
Alex Dadgar e9ffadfdc6 initial comments 2016-07-11 10:58:18 -06:00
Sean Chittenden 9966169596
Merge branch 'f-resource-isolation-cleanup' into f-resource-isolation-fingerprinter 2016-07-11 00:10:21 -07:00
Sean Chittenden d20c5fc327 Merge pull request #1402 from hashicorp/f-resource-isolation-cleanup
Resource isolation cleanup
2016-07-11 02:09:35 -05:00
Sean Chittenden be272168c7
Rename resourceContainer{,Context} and resCon{,Ctx}. 2016-07-11 00:02:55 -07:00
Sean Chittenden 1c14e01ac0
Add a comment describing IsolationConfig 2016-07-10 23:45:44 -07:00
Sean Chittenden 5e2e5b6ccc
Merge branch 'b-exec-cleanup' into f-resource-isolation-cleanup 2016-07-10 23:41:04 -07:00
Sean Chittenden cf6b97ec6c Merge pull request #1400 from hashicorp/b-exec-cleanup
Initialize the list of available fingerprinters per platform.
2016-07-10 23:22:02 -07:00
Sean Chittenden f39e84b672
Improve readability: use of a switch vs two if's 2016-07-10 20:18:57 -07:00
Sean Chittenden 2ffbeee06c
Skip the network fingerprinter test when offline.
Conditionalize the network fingerprinter test so that it works when a
user is offline.  Similarly, when the network fingerprint test fails in
the future pass a HINT to the user to set an env var to allow the test
to be skipped in the future.
2016-07-10 20:16:06 -07:00
Sean Chittenden 2983bd6fce
Fix test for non-Linux platforms.
The following tests now check a whitelist for whether or not their
driver is present or not, or if the OS is supported or not.

* `TestAllocDir_MountSharedAlloc`
* `TestClient_Drivers_InWhitelist` (`exec` driver)
* `TestClient_Drivers` (`exec` driver)
* `TestJavaDriver_Fingerprint` (`java` driver)
2016-07-10 15:19:49 -07:00
Sean Chittenden 710173e9cb
Build the Cgroup fingerprinter on only Linux.
Change the logic from `!linux` to an empty build tag so that *if*
another platform picks up Cgroups support they can add themselves to the
necessary build tags for this fingerprinter and be on their way.
Because this technology isn't inherently Linux-specific and isn't
mutually exclusive of other resource isolation containers, resist the
urge to rename the Cgroup fingerprinter to something generic like the
ResourceContainerFingerprinter.
2016-07-10 13:55:06 -07:00
Alex Dadgar 51ae7ace25 initial tail impl 2016-07-10 13:57:04 -04:00
Sean Chittenden d4fe69ddf9
Update comments and pushdown a lock into the resource container 2016-07-10 00:12:59 -07:00
Sean Chittenden fc9cd8d4af
Push down the Linux-specific bits into resourceContainer 2016-07-10 00:06:53 -07:00
Sean Chittenden 6fc269d2a6
Move unit tests around into per-platform where appropriate. 2016-07-09 23:56:31 -07:00
Sean Chittenden a5dc6c2da9
Push the Client's cleanup of Cgroups down 2016-07-09 23:45:33 -07:00
Sean Chittenden 5dbc0bf382
Rename resourceContainer.cleanup() to executorCleanup()
Not to be confused with the imminent ClientCleanup().
2016-07-09 23:25:33 -07:00
Sean Chittenden 5f8f0a50ac
Begin cgroup pushdown into platform specific files 2016-07-09 23:01:14 -07:00
Sean Chittenden bdd7022fdc
Centralize the fingerprintrs.
Add platform specific fingerprinters per platform.

Requested by: @diptanu
2016-07-09 22:31:14 -07:00
Sean Chittenden 1e2e0ca050
Initialize the list of available fingerprinters per platform. 2016-07-09 00:22:42 -07:00
Diptanu Choudhury 5b39a5db40 Fixed a debug message 2016-07-09 00:12:53 -07:00
Diptanu Choudhury e5310b76a6 Merge pull request #1399 from hashicorp/b-exec-cleanup
WIP: Cleanup exec driver
2016-07-09 00:08:43 -07:00
Sean Chittenden 03c571c61b
Consolidate fingerprinters into a single `map`. 2016-07-08 23:37:14 -07:00
Sean Chittenden 7530b27014 Move all non-Linux Fingerprinter items to the default exec driver 2016-07-08 18:35:46 -07:00
Diptanu Choudhury 5d61fa01f1 Fixed tests 2016-07-08 18:27:51 -07:00
Diptanu Choudhury 3c4002c48b Fixed the client tests 2016-07-08 17:49:58 -07:00
Diptanu Choudhury 1784d98182 Fixed the host port environment variable 2016-07-08 15:37:44 -07:00
Cameron Davison 921a6c889c remove the expected leading space, after the colon in syslog 2016-07-06 11:08:24 -05:00
Cameron Davison 07a9e15560 get into the hour minute second part of the time before looking for spaces, and then looking for the : seperator 2016-07-06 11:08:24 -05:00
Alex Dadgar c35b1be845 Set running when restoring 2016-06-28 13:47:59 -07:00
Wojciech Bederski a73422b4ff Fix docker driver lockup during nomad boot
Unit mismatch caused docker driver to wait almost indefinitely during boot 
(when one or more containers were a bit uncooperative during StopContainer())
This should fix problems described in  #1202
2016-06-28 14:26:47 +02:00
Cameron Davison d1e7d9c50f
write state to temp file and then rename 2016-06-27 12:29:33 -05:00
Jake Champlin f094969c7b
Update failing tests 2016-06-23 11:28:17 -04:00
Alex Dadgar 14e950f882 Treat float as int 2016-06-22 15:09:39 -07:00
Alex Dadgar 4ff8edd2da Floor CPU MHz and total compute and mark hostname as unique 2016-06-22 15:01:36 -07:00
Diptanu Choudhury 0a10873aa6 Merge pull request #1335 from hashicorp/f-set-docker-timeout
Setting a timeout in the docker client
2016-06-21 17:00:14 -07:00
Diptanu Choudhury 2837d3395d Setting a timeout in the docker client 2016-06-21 16:58:21 -07:00
Diptanu Choudhury 1d5c5b18f3 Making SSL default 2016-06-21 16:41:14 -07:00
Sean Chittenden 8bdb38d016
Code golf
Pointed out by: @dadgar
2016-06-21 14:26:01 -07:00
Sean Chittenden df4fe2e502
Fix the shuffling of remote datacenters.
Pointed out by: @ryanuber
2016-06-21 13:37:22 -07:00
Diptanu Choudhury 88ac1b33a4 Not emitting per-pid stats and added the total ticks consumed by a Task 2016-06-20 17:30:25 -07:00
Alex Dadgar 19024f4da0 Merge pull request #1322 from hashicorp/b-docker-logs-splicing
Make line copy to avoid being overriden by subsequent scans
2016-06-20 13:17:49 -07:00
Alex Dadgar 661dc200f3 Make line copy to avoid being overriden by subsequent scans 2016-06-20 13:14:43 -07:00
Michal Wieczorek 67a04bb1cc Volume binds for windows containers 2016-06-20 21:46:33 +02:00
Alex Dadgar 3cd9c9590b guard against NaN 2016-06-20 10:29:46 -07:00
Alex Dadgar 7b83503596 finer grain locking 2016-06-20 10:19:06 -07:00
Alex Dadgar 744270590b Guard against bad restore 2016-06-17 14:58:53 -07:00
Alex Dadgar c9f7467ccb Driver tests use client default config 2016-06-17 14:24:49 -07:00
Sean Chittenden 7b9961f09b
Initialize the stats helpers before accessing them for the first time 2016-06-17 13:23:30 -07:00
Sean Chittenden 9cb649d247 Merge pull request #1307 from hashicorp/f-fingerprint-cpus
F fingerprint cpus
2016-06-17 12:23:40 -07:00
Sean Chittenden 9e287858de Merge pull request #1310 from hashicorp/b-logger
Create and pass only one `logger` object around per Agent
2016-06-17 12:16:35 -07:00
Sean Chittenden 21b84fc3e6
Memoize the CPU stats. Error if CPU fingerprinting fails. 2016-06-17 12:13:53 -07:00
Alex Dadgar 27c6398639 debug message when stopping container 2016-06-17 11:52:44 -07:00
Sean Chittenden 686c125fea
Record and use only the first Mhz from the CPU fingerprinter.
Assume all cores are the same speed.
2016-06-17 11:06:57 -07:00
Sean Chittenden 46e2d54acf
Provide `nomad.Config` with a default `LogOutput` of `os.StdErr` 2016-06-17 06:44:10 -07:00
Sean Chittenden 9a60999100
Pass a logger arg to `NewClient` and `NewServer` 2016-06-16 23:29:23 -07:00
Sean Chittenden 4cc90753f8
In the debug log, split the unit from the measurement
awk(1) friendly is UNIX(tm) friendly.
2016-06-16 23:07:13 -07:00
Sean Chittenden 2dcb591cd8
Warn when we're unable to fingerprint the CPU Mhz 2016-06-16 23:07:13 -07:00
Sean Chittenden b8e63411c0
Explicitly call `cpu.Counts()` to determine the CPU core count
Much safer than counting the number of InfoStat structs returned.
2016-06-16 23:07:13 -07:00
Sean Chittenden d17af396ca
Create config.DefaultConsulConfig() 2016-06-16 20:41:05 -07:00
Sean Chittenden fd18eb7fdb
Only register the Client services reaper when `consul.auto_advertise` is enabled 2016-06-16 18:24:58 -07:00
Sean Chittenden 1ce2cc6141
`conf` -> `config` 2016-06-16 17:05:29 -07:00
Sean Chittenden 015248a67f
Fix tests for NewServer() in client mode.
Pointy-hat: sean-

I'm positive I've done this before.
2016-06-16 16:34:22 -07:00
Sean Chittenden 952b6ce7b5
Only auto-join clients if `client_auto_join` is true 2016-06-16 14:47:21 -07:00
Sean Chittenden ec77a1869e
Test for errors 2016-06-16 14:43:46 -07:00
Sean Chittenden af55b74114 Merge pull request #1276 from hashicorp/f-consul-server-autojoin
Teach Nomad servers how to fall back to Consul.
2016-06-16 14:40:45 -07:00
Diptanu Choudhury ed67f1a347 Merge pull request #1285 from hashicorp/fix-selinux-options
Added a client options for setting selinux options
2016-06-16 22:45:24 +02:00
Diptanu Choudhury 266c417ac8 Changed the client options for docker volume selinux labels 2016-06-16 21:41:02 +01:00
Alex Dadgar fe588a2469 Guard against restoring a nil task in task_runner 2016-06-16 11:55:40 -07:00
Sean Chittenden 008d75184b
Use the `%+q` verb in log messages (vs `%q`). 2016-06-16 11:03:51 -07:00
Alex Dadgar 7375d828e1 remove trace 2016-06-15 15:47:59 -07:00
Sean Chittenden 5e0ced2ae7
Shuffle all datacenters vs only the nearest N datacenters.
Per discussion, we want to be aggressive about fanning out vs possibly
fixating on only local DCs.  With RPC forwarding in place, a random walk
may be less optimal from a network latency perspective, but it is guaranteed
to eventually result in a converged state because all DCs are candidates
during the bootstrapping process.
2016-06-15 12:40:51 -07:00
Sean Chittenden 2123460cf0
Bump various Consul search limits
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest.
Server: Search factor increased from 3 to 5 times the bootstrap_expect.

This should allow for faster convergence in large environments (e.g.
sub-5min for 10K Consul DCs).
2016-06-15 12:40:51 -07:00
Alex Dadgar cf99fc3173 Use Status.Peers instead of Status.Ping 2016-06-15 12:00:20 -07:00
Diptanu Choudhury fa216199ce Added documentation 2016-06-15 02:42:15 +02:00
Diptanu Choudhury e08083acfe Added a client options for setting selinux options 2016-06-15 02:33:09 +02:00
Sean Chittenden 6e22b680ce
Disambiguate `auto_join` from `auto_register`, rename reg to `auto_advertise`.
Provide an option that describes the value to the user vs the
operation performed by the software.  Momentarily introducing
`auto_join`
2016-06-14 12:11:38 -07:00
Alex Dadgar 4b04e503f3 address comments 2016-06-13 17:32:18 -07:00
Alex Dadgar 8bbf4a55e5 Fix IDs and domain scoping 2016-06-13 16:30:58 -07:00
Diptanu Choudhury d019d8ef8e implemented reconciliation of unwanted services 2016-06-13 14:52:26 +02:00
Alex Dadgar 232654cdee register checks 2016-06-12 21:28:56 -07:00
Alex Dadgar a82c2bb058 Do not reconcile in client and cleanup executor a bit 2016-06-12 18:22:07 -07:00
Alex Dadgar 8e231fa382 Rename ConsulService back to Service 2016-06-12 16:36:49 -07:00
Alex Dadgar e931b42473 unify cli output 2016-06-12 13:16:07 -07:00
Alex Dadgar 6ef4bfd6bf skip docker test if no docker found 2016-06-12 11:28:43 -07:00
Alex Dadgar 2628827b7c Merge pull request #1262 from hashicorp/remove-artifact-check
Removing artifact check for java and qemu drivers
2016-06-12 11:21:18 -07:00
Alex Dadgar c4a819528a Merge pull request #1260 from hashicorp/f-alloc-stats-struct
Allocation resources returned in a struct
2016-06-12 11:18:57 -07:00
Alex Dadgar b671cd4134 Test fixes 2016-06-12 11:14:17 -07:00
Alex Dadgar fdda90229f only support latest and remove ring buffer 2016-06-12 09:32:38 -07:00
Diptanu Choudhury 34f85baab0 Fix the calculation of total ticks for docker and exec 2016-06-12 18:08:35 +02:00
Diptanu Choudhury beb362e202 Setting a flag to indicate whether fs isolation is indeed happening 2016-06-12 15:43:24 +02:00
Diptanu Choudhury 641cf50682 Not converting the abs path relative to task dir for drivers which enforce FS isolation only in linux 2016-06-12 13:54:30 +02:00
Alex Dadgar e952540f6f Allocation resources returned in a struct 2016-06-11 21:04:10 -07:00
Diptanu Choudhury a5e81ebc3a Removing un-used code 2016-06-12 01:23:49 +02:00
Diptanu Choudhury 86e4f295da Fixed the calculation of the host node ticks 2016-06-12 01:14:51 +02:00
Sean Chittenden 2f036231e5 Merge pull request #1201 from hashicorp/f-dyn-server-list
Dynamic Server Lists/Client Bootstrapping via consul.
2016-06-11 18:58:25 -04:00
Sean Chittenden 92e2cfb0ad
Walk the DCs from nearest to most remote. 2016-06-11 18:52:21 -04:00
Sean Chittenden 2968545201
Walk the DCs from nearest to most remote, no limit on the search. 2016-06-11 18:23:06 -04:00
Sean Chittenden 917766a3df
Prefer `%+q` over `%q` in log messages. 2016-06-11 18:17:20 -04:00
Sean Chittenden 445783889b
Remove default values and use nil for the executor. Much better. 2016-06-11 17:52:09 -04:00
Diptanu Choudhury 0fad17a1a9 Merge pull request #1258 from hashicorp/fix-statsd-metric-type
Emitting client resource usage metrics as guages instead of k/v pairs
2016-06-11 13:18:52 -07:00
Diptanu Choudhury fd60cfd585 Emitting client resource usage metrics as guages instead of k/v pairs 2016-06-11 22:17:32 +02:00
Diptanu Choudhury 19f4adbcf1 Using a different client for collecting stats and waiting on containers 2016-06-11 20:37:29 +02:00
Diptanu Choudhury 7fb507e810 Moving the clkspeed code to helper 2016-06-11 17:31:49 +02:00
Sean Chittenden f891fa0ec8
Perform a nil-check for Executor's consulServices.
Executors can `Shutdown()` before calling `SyncServices()`.
2016-06-10 23:43:54 -04:00
Sean Chittenden bbd8dfa798
goling(1) compliance pass (e.g. Rpc* -> RPC) 2016-06-10 23:38:28 -04:00
Sean Chittenden bc771d35df
Query for the Nomad service across multiple Consul datacenters. 2016-06-10 23:05:14 -04:00
Sean Chittenden d433001f04
Expose rpcproxy's `ServerEndpoint()` constructor, `newServer()` as `NewServerEndpoint()` 2016-06-10 22:14:03 -04:00
Diptanu Choudhury 59540c3e93 Extracted a method for getting clock speed 2016-06-11 02:07:28 +02:00
Diptanu Choudhury f94f89b6d7 Pruning out pids which are no longer present 2016-06-11 01:40:52 +02:00
Diptanu Choudhury 0a9a3918d6 Not reset-ing the list of pids if they don't change 2016-06-11 01:19:50 +02:00
Diptanu Choudhury c38a6fb3c5 Implementing the total ticks per task for the docker driver 2016-06-10 23:33:25 +02:00
Diptanu Choudhury 01054db4fa Calculating total ticks consumed in the nomad client 2016-06-10 23:14:33 +02:00
Sean Chittenden 5b0c2bf282
Restore old behavior and have AddPrimaryServer() return a pointer to the existing server (vs nil when the server already exists). 2016-06-10 16:46:49 -04:00
Diptanu Choudhury 2d3798b076 Calculating the cpu ticks in nomad client 2016-06-10 22:22:32 +02:00
Sean Chittenden df896c6ef2
Prevent duplicate servers being added in AddPrimaryServer.
This logic was already present elsewhere and was missed in this one
place.
2016-06-10 15:55:27 -04:00
Sean Chittenden d99467ef5e
Always create a consul.Syncer. Use a default Consul Config if necessary. 2016-06-10 15:55:27 -04:00
Sean Chittenden f6a0459ae5
Always create a consul.Syncer. Use a default Consul Config if necessary. 2016-06-10 15:55:27 -04:00
Sean Chittenden 03846fb754
Rename listLock to activatedListLock 2016-06-10 15:54:39 -04:00
Sean Chittenden 4c9067310b
Nomad does not use Serf at the client level. Use a hard lock. 2016-06-10 15:54:39 -04:00
Sean Chittenden 26b1e826d7
golint(1) police 2016-06-10 15:54:39 -04:00
Sean Chittenden 974c9927c7
Formatting nit: remove brackets 2016-06-10 15:54:39 -04:00
Sean Chittenden 1371ef85f5
Prefix all log entries in client/rpcproxy with client.rpcproxy 2016-06-10 15:54:39 -04:00
Sean Chittenden 009495c18a
Style nit: remove `var` block 2016-06-10 15:54:39 -04:00
Sean Chittenden f139d0c68b
Properly guard consulPullHeartbeatDeadline behind heartbeatLock 2016-06-10 15:54:39 -04:00
Sean Chittenden 8dc16ad5e3
Move RPCProxy.New() adjacent to its struct definition 2016-06-10 15:54:39 -04:00
Sean Chittenden cc6e8792e0
Create a consulContext using a client's consul config.
This is wrong and should be the Agent's Consul Config.  This is a
step in the right direction, so committing to mark the necessary
future change.
2016-06-10 15:54:39 -04:00
Sean Chittenden ed29946f5e
Populate the RPC Proxy's server list if heartbeat did not include a leader.
It's possible that a Nomad Client is heartbeating with a Nomad server that
has become issolated from the quorum of Nomad Servers.  When 3x the
heartbeatTTL has been exceeded, append the Consul server list to the primary
primary server list.  When the next RPCProxy rebalance occurs, there is a
chance one of the servers discovered from Consul will be in the majority.
When client reattaches to a Nomad Server in the majority, it will include
a heartbeat and will reset the TTLs *AND* will clear the primary server list
to include only values from the heartbeat.
2016-06-10 15:54:39 -04:00
Sean Chittenden 9a223936bb
Generate and sync Consul ServiceIDs consistently 2016-06-10 15:54:39 -04:00
Sean Chittenden 3892a0433e
Move the start of the UniversalExecutor's consulSyncer to initialize once
This should be handled via a sync.Once primative, but I don't want to
unpack that atm.
2016-06-10 15:54:39 -04:00
Sean Chittenden 7956eb0c80
Rename structs.Task's `Service` attribute to `ConsulService` 2016-06-10 15:54:39 -04:00
Sean Chittenden 8c813630e6
Move package client/consul/sync to command/agent/consul.
This has been done to allow the Server and Client to reuse the same
Syncer because the Agent may be running Client, Server, or both
simultaneously and we only want one Syncer object alive in the agent.
2016-06-10 15:54:39 -04:00
Sean Chittenden d6ef97911b
Rename Syncer.SetServiceIdentifier to SetServiceRegPrefix()
This attribute isn't actually an identifier because it can represent
a collection of services.  Rename `serviceIdentifier` to
`serviceRegPrefix which more accurately conveys the intention of this
Syncer attribute.

While here, also rename `SetServiceIdentifier()` to `SetServiceRegPrefix()`
and `GenerateServiceIdentifier()` to `GenerateServicePrefix()`.
2016-06-10 15:54:39 -04:00
Sean Chittenden 6b126ce488
Change the API signature of Syncer.SyncServices().
SyncServices() immediately attempts to sync whatever information
the process has with Consul.  Previously this method would take an
argument of the exclusive list of services that should exist,
however this is not condusive to having a Nomad Client and Nomad
Server share the same consul.Syncer.
2016-06-10 15:54:39 -04:00
Sean Chittenden fda03c5c9e
Change the signature of the PeriodicCallback to return an error
I *KNEW* I should have done this when I wrote it, but didn't want to
go back and audit the handlers to include the appropriate return
handling, but now that the code is taking shape, make this change.
2016-06-10 15:54:39 -04:00
Sean Chittenden 4973ec32bb
Rename structs.Services to structs.ConsulServices 2016-06-10 15:54:39 -04:00
Sean Chittenden 6430190d6a
Rename createCheck() to createDelegatedCheck() for clarity 2016-06-10 15:54:39 -04:00
Sean Chittenden 555f4fe135
Change client/consul.NewSyncer() to accept a shutdown channel
In addition to the API changing, consul.Syncer can now be signaled
to shutdown via the Shutdown() method, which will call the Run()'ing
sync task to exit gracefully.
2016-06-10 15:54:39 -04:00