Commit Graph

732 Commits

Author SHA1 Message Date
James Phillips d7bac0e808 Turns down the coordinate sync rate a bit more. 2015-10-23 15:23:01 -07:00
James Phillips 9ba9a708f6 Scales coordinate sends to hit a fixed aggregate rate across the cluster. 2015-10-23 15:23:01 -07:00
James Phillips d8b8a3719f Simplifies the batching function and adds some comments. 2015-10-23 15:23:01 -07:00
James Phillips f71c79c53f Does some small cleanups based on PR feedback.
* Holds coordinate updates in map and gets rid of the update channel.
* Cleans up config variables a bit.
2015-10-23 15:23:01 -07:00
James Phillips 1222772452 Hardens Consul from bad coordinates from other nodes. 2015-10-23 15:23:01 -07:00
James Phillips acb0dce829 Moves batching down into the state store and changes it to fail-fast.
* A batch of updates is done all in a single transaction.
* We no longer need to get an update to kick things, there's a periodic flush.
* If incoming updates overwhelm the configured flush rate they will be dumped with an error.
2015-10-23 15:23:01 -07:00
James Phillips b6c31bdf2f Flips the sense of the coordinate enable option. 2015-10-23 15:23:01 -07:00
James Phillips 9c069c5031 Removes one more WAN leftover. 2015-10-23 15:23:01 -07:00
James Phillips edb9a119e2 Does a clean up pass on the Consul side. 2015-10-23 15:23:01 -07:00
James Phillips ac4185b888 Merges config changes after rebase. 2015-10-23 15:23:01 -07:00
Derek Chiang 139c9240ea Address comments 2015-10-23 15:23:01 -07:00
Derek Chiang 840474f170 Add a test case 2015-10-23 15:23:01 -07:00
Derek Chiang 23c08aeeb4 Use IndexedCoordinate instead 2015-10-23 15:23:01 -07:00
Derek Chiang 530e73212a Some fixes 2015-10-23 15:23:01 -07:00
Derek Chiang 2ef802b8b3 Fix a comment 2015-10-23 15:23:01 -07:00
Derek Chiang b2cff43bb5 Complete logic for sending coordinates 2015-10-23 15:23:01 -07:00
Derek Chiang 66b210afcb Some fixes 2015-10-23 15:23:01 -07:00
Derek Chiang b5bbe2bcfa Adding tests and stuff 2015-10-23 15:23:01 -07:00
Armon Dadgar d035dbd43b Merge pull request #1318 from daveadams/f-http-header-token
Allow specifying Consul token in an HTTP request header
2015-10-22 13:33:47 -07:00
Jeff Mitchell 9267f956a2 Update cleanhttp repo location 2015-10-22 14:14:22 -04:00
Jeff Mitchell 06bb9d5f36 Use cleanhttp to get rid of DefaultTransport 2015-10-22 10:47:50 -04:00
David Adams 5f175add40 Add HTTP request header X-Consul-Token
Add support for an X-Consul-Token HTTP request header to specify the
token with which this request should be fulfilled. The header would have
precedence over the responding Agent's default token, but would have
lower precedence than a token specified in the query string.
2015-10-19 11:26:01 -05:00
James Phillips 263c7e3fd3 Deletes the old state store and all its accoutrements. 2015-10-15 14:59:09 -07:00
James Phillips bcdabe4606 Knocks out the Raft indexes before doing compare. 2015-10-15 14:59:09 -07:00
James Phillips cbcd977a39 Gets new structs changes to compile, adds some corner case handling and extra unit tests. 2015-10-15 14:59:09 -07:00
Ryan Uber b46f878747 Merge pull request #1309 from hashicorp/f-remove-migrate
Removes consul-migrate for 0.6
2015-10-15 14:50:19 -07:00
Jeff Mitchell 9cddc187b5 Don't use http.DefaultClient
Two of the changes are in tests; the one of consequence is in the API.
As explained in #1308 this can cause conflicts with downstream programs.

Fixes #1308.
2015-10-15 17:49:35 -04:00
Ryan Uber aba1b26015 agent: consolidates data dir checker 2015-10-15 14:21:35 -07:00
Ryan Uber 8bc51eb237 agent: test mdb dir protection 2015-10-15 14:15:41 -07:00
Ryan Uber 2a7609d6bc agent: remove migrator, refuse to start if mdb dir found 2015-10-15 14:15:08 -07:00
Michael Puncel a94589ad67 Add http method to log output 2015-10-02 18:33:06 -07:00
James Phillips 26eadcd95c Merge pull request #1235 from wuub/master
fix conflict between handleReload and antiEntropy critical sections
2015-09-17 07:28:39 -07:00
Wojciech Bederski 9a1b52171f panic when unbalanced localState.Resume() is detected 2015-09-17 11:32:08 +02:00
Dale Wijnand c5168e1263 Fix a bunch of typos. 2015-09-15 13:22:08 +01:00
James Phillips b25797a808 Merge pull request #1187 from sfncook/enable_tag_drift_03
Enable tag drift 03
2015-09-11 15:35:32 -07:00
Anthony Scalisi 8d733b7fca remove various typos 2015-09-11 12:29:54 -07:00
Wojciech Bederski 4cd1b09ad7 make Pause()/Resume()/isPaused() behave more like a semaphore
see: https://github.com/hashicorp/consul/issues/1173 #1173

Reasoning: somewhere during consul development Pause()/Resume() and
PauseSync()/ResumeSync() were added to protect larger changes to
agent's localState.  A few of the places that it tries to protect are:

- (a *Agent) AddService(...)      # part of the method
- (c *Command) handleReload(...)  # almost the whole method
- (l *localState) antiEntropy(...)# isPaused() prevents syncChanges()

The main problem is, that in the middle of handleReload(...)'s
critical section it indirectly (loadServices()) calls  AddService(...).
AddService() in turn calls Pause() to protect itself against
syncChanges(). At the end of AddService() a defered call to Resume() is
made.

With the current implementation, this releases
isPaused() "lock" in the middle of handleReload() allowing antiEntropy
to kick in while configuration reload is still in progress.
Specifically almost all services and probably all check are unloaded
when syncChanges() is allowed to run.

This in turn can causes massive service/check de-/re-registration,
and since checks are by default registered in the critical state,
a majority of services on a node can be marked as failing.
It's made worse with automation, often calling `consul reload` in close
proximity on many nodes in the cluster.

This change basically turns Pause()/Resume() into P()/V() of
a garden-variety semaphore. Allowing Pause() to be called multiple times,
and releasing isPaused() only after all matching/defered Resumes() are
called as well.

TODO/NOTE: as with many semaphore implementations, it might be reasonable
to panic() if l.paused ever becomes negative.
2015-09-11 18:28:06 +02:00
Wojciech Bederski 24ac26b3c1 failing test showing that nested Pause()/Resume() release too early
see: #1173 / https://github.com/hashicorp/consul/issues/1173
2015-09-11 17:52:57 +02:00
Shawn Cook 99be758411 Rename EnableTagOverride and update formatting 2015-09-11 08:35:29 -07:00
Shawn Cook f448a62826 Remove debug lines 2015-09-11 08:32:59 -07:00
Shawn Cook 2f04917261 Merge remote-tracking branch 'hashicorp/master' into enable_tag_drift_03 2015-09-10 14:55:30 -07:00
Shawn Cook 8a86eee9fb Add test cases TestAgentAntiEntropy_EnableTagDrift 2015-09-10 14:08:16 -07:00
Ryan Uber 08d12e978f Merge pull request #1230 from hashicorp/f-maintfix
Respect tokens in maintenance mode
2015-09-10 12:30:07 -07:00
Ryan Uber 948bd57d6a agent: testing node/service maintenance using tokens 2015-09-10 12:08:08 -07:00
Ryan Uber e129a59316 agent: thread tokens through for maintenance mode 2015-09-10 11:43:59 -07:00
Wim 3d7c3725d8 Allow AAAA queries for nodeLookup 2015-09-08 16:54:36 +02:00
Wim 2336c6a4bd No NXDOMAIN when the answer is empty 2015-09-02 16:12:22 +02:00
Ryan Breen a013095f62 Merge pull request #1167 from railsguru/master
Add -http-port option to change the HTTP API port
2015-09-02 01:15:55 -04:00
Armon Dadgar 655666170a agent: Always enable the UI endpoints 2015-09-01 18:28:32 -07:00
Wim e97973c1e1 Limit the DNS responses after getting the NodeRecords 2015-09-01 23:23:05 +02:00