Alex Dadgar
8ceb7ead20
Do not use snapshot
2016-06-22 09:33:15 -07:00
Alex Dadgar
91f6976423
tighter index bound when creating GC evals
2016-06-22 09:11:25 -07:00
Alex Dadgar
25decca3ca
Worker waitForIndex uses StateStore index, not Raft Applied Index
2016-06-22 09:04:22 -07:00
Sean Chittenden
8bdb38d016
Code golf
...
Pointed out by: @dadgar
2016-06-21 14:26:01 -07:00
Sean Chittenden
df4fe2e502
Fix the shuffling of remote datacenters.
...
Pointed out by: @ryanuber
2016-06-21 13:37:22 -07:00
Sean Chittenden
9e287858de
Merge pull request #1310 from hashicorp/b-logger
...
Create and pass only one `logger` object around per Agent
2016-06-17 12:16:35 -07:00
Sean Chittenden
46e2d54acf
Provide nomad.Config
with a default LogOutput
of os.StdErr
2016-06-17 06:44:10 -07:00
Sean Chittenden
9a60999100
Pass a logger arg to NewClient
and NewServer
2016-06-16 23:29:23 -07:00
Sean Chittenden
871a31a8ec
Teach config.ConsulConfig how to construct a consulapi TLS client.
...
Said differently, centralize the creation of consul's client config
in one place and use it everywhere.
2016-06-16 22:51:06 -07:00
Sean Chittenden
d17af396ca
Create config.DefaultConsulConfig()
2016-06-16 20:41:05 -07:00
Sean Chittenden
a658299235
Misc typos
2016-06-16 16:17:17 -07:00
Sean Chittenden
ec77a1869e
Test for errors
2016-06-16 14:43:46 -07:00
Sean Chittenden
31313b68cf
Don't assign to an atomic w/o using atomic setter func
2016-06-16 14:43:46 -07:00
Sean Chittenden
af55b74114
Merge pull request #1276 from hashicorp/f-consul-server-autojoin
...
Teach Nomad servers how to fall back to Consul.
2016-06-16 14:40:45 -07:00
Sean Chittenden
7c24487850
Fix up various error handling
2016-06-16 14:40:09 -07:00
Sean Chittenden
71cd9984ae
Immediately query Consul upon initialization if we have no peers.
...
Also don't attempt to join the Server with itself.
2016-06-16 14:27:10 -07:00
Sean Chittenden
65319252b9
Rework server_auto_join
to use a timer instead of the peer count.
...
It is perfectly viable for an admin to downsize a Nomad Server cluster
down to 1, 2, or `num % 2 == 0` (however ill-advised such activities
may be). And instead of using `bootstrap_expect`, use a timeout-based
strategy. If the `bootstrapFn` hasn't observed a leader in 15s it will
fall back to Consul and will poll every ~60s until it sees a leader.
2016-06-16 12:14:03 -07:00
Sean Chittenden
b0fecbefc1
Define BootstrapExepct
as an int32
so it can be manipulated atomically.
2016-06-16 12:00:15 -07:00
Alex Dadgar
ea5d11e628
remove consul reference
2016-06-15 17:23:02 -07:00
Alex Dadgar
bf14fd355f
plan displays launch time of periodic jobs
2016-06-15 13:34:45 -07:00
Sean Chittenden
14f9d2a947
Use the config's log output
2016-06-15 12:40:51 -07:00
Sean Chittenden
5b0def194a
Namespace the log messages
2016-06-15 12:40:51 -07:00
Sean Chittenden
bffc82d668
Do not consider the number of Serf members when considering falling back to Consul.
2016-06-15 12:40:51 -07:00
Sean Chittenden
324af8d7f1
Guard the auto-join functionality behind its consul.server_auto_join
tunable
2016-06-15 12:40:51 -07:00
Sean Chittenden
5e0ced2ae7
Shuffle all datacenters vs only the nearest N datacenters.
...
Per discussion, we want to be aggressive about fanning out vs possibly
fixating on only local DCs. With RPC forwarding in place, a random walk
may be less optimal from a network latency perspective, but it is guaranteed
to eventually result in a converged state because all DCs are candidates
during the bootstrapping process.
2016-06-15 12:40:51 -07:00
Sean Chittenden
2123460cf0
Bump various Consul search limits
...
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest.
Server: Search factor increased from 3 to 5 times the bootstrap_expect.
This should allow for faster convergence in large environments (e.g.
sub-5min for 10K Consul DCs).
2016-06-15 12:40:51 -07:00
Sean Chittenden
e8d1264dbc
Short-circuit the bootstrapFn if we have a leader
2016-06-15 12:40:51 -07:00
Sean Chittenden
f05514335b
Teach Nomad servers how to fall back to Consul.
2016-06-15 12:40:51 -07:00
Alex Dadgar
aea21affdb
Document consul configuration
2016-06-14 15:21:57 -07:00
Sean Chittenden
6e22b680ce
Disambiguate auto_join
from auto_register
, rename reg to auto_advertise
.
...
Provide an option that describes the value to the user vs the
operation performed by the software. Momentarily introducing
`auto_join`
2016-06-14 12:11:38 -07:00
Sean Chittenden
4f14d51013
Fix up validation and allow existing unset timeouts to continue to be unset
2016-06-13 18:55:15 -07:00
Sean Chittenden
c3a3fdc230
Upon further review, the Timeout needs to be validate for more than script checks.
...
This value is used for Consul HTTP and TCP checks.
2016-06-13 18:28:27 -07:00
Sean Chittenden
baac19cad6
Remove diff check for ServiceID, may it R.I.P.
2016-06-13 18:22:53 -07:00
Sean Chittenden
79c675cf72
Guard against an interval and timeout being less than 1s
2016-06-13 18:19:40 -07:00
Sean Chittenden
af8db7ec18
Don't export ServiceCheck validate
2016-06-13 18:17:43 -07:00
Sean Chittenden
08c88102a7
There is no "docker" check type
2016-06-13 18:15:07 -07:00
Alex Dadgar
8bbf4a55e5
Fix IDs and domain scoping
2016-06-13 16:30:58 -07:00
Alex Dadgar
8e231fa382
Rename ConsulService back to Service
2016-06-12 16:36:49 -07:00
Diptanu Choudhury
3024c080e8
Removing artifact check for java and qemu drivers
2016-06-12 12:57:35 +02:00
Alex Dadgar
480a281031
Merge pull request #1243 from hashicorp/f-run-modify-index
...
Add check-index flag to nomad run
2016-06-11 16:12:53 -07:00
Sean Chittenden
2f036231e5
Merge pull request #1201 from hashicorp/f-dyn-server-list
...
Dynamic Server Lists/Client Bootstrapping via consul.
2016-06-11 18:58:25 -04:00
Alex Dadgar
59b0a7b3f6
Merge pull request #1256 from hashicorp/b-node-gc
...
Improve partial garbage collection of allocations
2016-06-11 15:41:00 -07:00
Sean Chittenden
bbd8dfa798
goling(1) compliance pass (e.g. Rpc* -> RPC)
2016-06-10 23:38:28 -04:00
Alex Dadgar
98bf249625
Partial GC allocations
2016-06-10 18:32:37 -07:00
Alex Dadgar
7ccc7d20a0
test
2016-06-10 15:48:59 -07:00
Alex Dadgar
b064b392fc
Only unblock if missed class was added after eval snapshot index
2016-06-10 15:24:06 -07:00
Sean Chittenden
948663c89a
Fix another unit test not expecting ServiceID
2016-06-10 16:50:35 -04:00
Sean Chittenden
d99467ef5e
Always create a consul.Syncer. Use a default Consul Config if necessary.
2016-06-10 15:55:27 -04:00
Sean Chittenden
3d64daafd9
Fold RaftPeers() into its only call site now
2016-06-10 15:54:39 -04:00
Sean Chittenden
0ba1da9c9c
Always pass in a snapshot before calling constructNodeServerInfoResponse()
2016-06-10 15:54:39 -04:00
Sean Chittenden
1df6fc253f
Rename updateNodeUpdateResponse
to constructNodeServerInfoResponse
2016-06-10 15:54:39 -04:00
Sean Chittenden
077203fe93
Update the structure of ConsulService to match reality.
...
ConsulService is the configuration for a Consul Service
2016-06-10 15:54:39 -04:00
Sean Chittenden
197feae679
Sync services with Consul by comparing the AgentServiceReg w/ ConsulService
...
The source of truth is the local Nomad Agent. Any services not local that
have a matching prefix are removed. Changed services are re-registered
and missing services are re-added.
2016-06-10 15:54:39 -04:00
Sean Chittenden
9a223936bb
Generate and sync Consul ServiceIDs consistently
2016-06-10 15:54:39 -04:00
Sean Chittenden
95c9d1a63e
Per-comment, remove structs.Allocation's Services attribute.
...
Nuke PopulateServiceIDs() now that it's also no longer needed.
2016-06-10 15:54:39 -04:00
Sean Chittenden
7956eb0c80
Rename structs.Task's Service
attribute to ConsulService
2016-06-10 15:54:39 -04:00
Sean Chittenden
fda03c5c9e
Change the signature of the PeriodicCallback to return an error
...
I *KNEW* I should have done this when I wrote it, but didn't want to
go back and audit the handlers to include the appropriate return
handling, but now that the code is taking shape, make this change.
2016-06-10 15:54:39 -04:00
Sean Chittenden
4973ec32bb
Rename structs.Services to structs.ConsulServices
2016-06-10 15:54:39 -04:00
Sean Chittenden
060300007e
Use a monotonically incrementing number to create unique node names.
...
Also remove the space from the "name" of the node
2016-06-10 15:50:11 -04:00
Sean Chittenden
1ec7d6c266
Push down the server list even on node registration and evaluation
...
Be mindful of the cost of taking a snapshot from the statestore and
reuse the snapshot if one has already been taken.
2016-06-10 15:50:11 -04:00
Sean Chittenden
bff57a0dce
Reconcile, clean up, and centralize API version numbers (major and minor).
...
Reduce future confusion by introducing a minor version that is gossiped out
via the `mvn` Serf tag (Minor Version Number, `vsn` is already being used for
to communicate `Major Version Number`).
Background: hashicorp/consul/issues/1346#issuecomment-151663152
2016-06-10 15:50:11 -04:00
Sean Chittenden
dde6a4074d
Nuke trace-level logging in heartbeats
2016-06-10 15:50:11 -04:00
Sean Chittenden
d76c042a13
Invert error handling logic
2016-06-10 15:50:11 -04:00
Sean Chittenden
1fe979a5e4
Remove types.ShutdownChannel and replace with chan struct{}
2016-06-10 15:50:11 -04:00
Sean Chittenden
438becb28b
Pass the datacenter name in the heartbeat
...
Servers that are part of a different datacenter are added as backup
servers instead of primary servers.
2016-06-10 15:50:11 -04:00
Sean Chittenden
89168b0c51
Invert check definition so the error is first
2016-06-10 15:50:11 -04:00
Sean Chittenden
dc78baedfd
Fix typo in the comment to reflect the actual function name.
2016-06-10 15:50:11 -04:00
Sean Chittenden
410d85cc78
Rename the package from client/rpc_proxy
to client/rpcproxy
...
Also rename `NewRpcProxy()` to just `New()` to avoid package stutter.
2016-06-10 15:50:11 -04:00
Sean Chittenden
1aefdb1e15
Use the correctly typed rand.Int*
variant
2016-06-10 15:50:11 -04:00
Sean Chittenden
3a1dc9a194
Use rand.Int*n()
where appropriate
2016-06-10 15:50:11 -04:00
Sean Chittenden
e727fd8c3c
Centralize the creation of a consul/api.Config struct.
...
While documented, the consul.timeout parameter wasn't ever set
except one-off in the Consul fingerprinter.
2016-06-10 15:50:11 -04:00
Sean Chittenden
f695d6d70d
Reconcile consul's address configuration section.
...
There were conflicting directives previously, both consul.addr and
consul.address were required to achieve the desired behavior. The
documentation said `consul.address` was the canonical name for the
parameter, so consolidate configuration parameters to `consul.address`.
2016-06-10 15:50:11 -04:00
Sean Chittenden
e60580b279
Define a type for the PeriodicCallback handlers and ShutdownChannel
2016-06-10 15:50:11 -04:00
Sean Chittenden
17116fc5a7
Rebalance Nomad client RPCs among different Nomad servers.
...
Implement client/rpc_proxy.RpcProxy.
2016-06-10 15:50:11 -04:00
Sean Chittenden
b509da2d0c
Create a nomad/structs/config
to break an import cycle.
...
Flattening and normalizing the various Consul config structures and
services has led to an import cycle. Break this by creating a new package
that is intended to be terminal in the import DAG.
2016-06-10 15:48:36 -04:00
Sean Chittenden
6d162e1e03
Fix copy pasta comment.
...
These parameters are used to bootstrap Nomad servers, not Consul servers.
2016-06-10 15:48:36 -04:00
Sean Chittenden
4e2835d5ff
Use the correctly typed rand.Int*
variant
2016-06-10 15:48:36 -04:00
Sean Chittenden
49deaae2ae
Seed random once in main
2016-06-10 15:48:36 -04:00
Sean Chittenden
db97a88f94
Fix small typo
2016-06-10 15:48:36 -04:00
Sean Chittenden
66b4b2a99f
Use rand.Int*n()
where appropriate
2016-06-10 15:48:36 -04:00
Sean Chittenden
e36686a17d
Use consul/lib's RandomStagger
...
Removes four redundant copies of the method in the process.
2016-06-10 15:48:36 -04:00
Sean Chittenden
e0e7d94450
Use consul/lib's RateScaledInterval
2016-06-10 15:48:36 -04:00
Alex Dadgar
527afa5119
Merge pull request #1244 from hashicorp/b-eval-reblock-test-hardening
...
Don't dequeue requeued evals in tests
2016-06-09 11:35:42 -07:00
Alex Dadgar
5d181d203c
Add check-index flag to nomad run
2016-06-08 17:56:32 -07:00
Alex Dadgar
b7e3a45fef
fix channel being nil on restore
2016-06-07 15:03:08 -07:00
Alex Dadgar
ecdce9a641
don't dequeue
2016-06-07 09:51:20 -07:00
Alex Dadgar
cc95d5d332
GC Nodes even if they have terminal allocations
2016-06-03 16:24:41 -07:00
Alex Dadgar
5f3e27ecd8
Fix case in periodic dispatch and blocked evals where lock was not released
2016-06-03 13:46:57 -07:00
Alex Dadgar
3100b4a086
Change eval_endpoint test to not retry but block longer
2016-06-03 12:02:49 -07:00
Alex Dadgar
299a0bb4b3
up timeout for dequeue in test
2016-06-03 11:36:50 -07:00
Alex Dadgar
0f84d8968b
Merge pull request #1221 from hashicorp/b-nil-wait
...
fix wait result being nil and some panics in the cli
2016-05-31 16:50:38 -07:00
Alex Dadgar
629542f64e
flaky test
2016-05-31 23:50:14 +00:00
Alex Dadgar
7196133f0a
Merge pull request #1220 from hashicorp/f-plan-failure-reasons
...
plan shows failure reasons and ordered annotations
2016-05-31 15:32:22 -07:00
Alex Dadgar
b1298bb658
plan shows failure reasons and ordered annotations
2016-05-31 21:51:23 +00:00
Alex Dadgar
13f0ff03c1
Merge pull request #1209 from hashicorp/b-blocked-eval-fixes
...
Fix race condition in which a reblocked evaluation could be dropped
2016-05-31 13:26:58 -07:00
Alex Dadgar
060318845f
Comments addressed
2016-05-31 11:39:03 -07:00
Alex Dadgar
75bd7a50f7
changelog
2016-05-27 17:43:20 -07:00
Alex Dadgar
cc00a66e38
validate that tasks don't contain slashes
2016-05-27 17:17:10 -07:00
Alex Dadgar
1f9f015c1b
Fix race condition in which a reblocked evaluation could be dropped
2016-05-27 16:53:10 -07:00
Alex Dadgar
6a236872b4
address comment
2016-05-25 10:30:47 -07:00