Commit graph

799 commits

Author SHA1 Message Date
Ryan Uber aba1b26015 agent: consolidates data dir checker 2015-10-15 14:21:35 -07:00
Ryan Uber 8bc51eb237 agent: test mdb dir protection 2015-10-15 14:15:41 -07:00
Ryan Uber 2a7609d6bc agent: remove migrator, refuse to start if mdb dir found 2015-10-15 14:15:08 -07:00
Michael Puncel a94589ad67 Add http method to log output 2015-10-02 18:33:06 -07:00
James Phillips 26eadcd95c Merge pull request #1235 from wuub/master
fix conflict between handleReload and antiEntropy critical sections
2015-09-17 07:28:39 -07:00
Wojciech Bederski 9a1b52171f panic when unbalanced localState.Resume() is detected 2015-09-17 11:32:08 +02:00
Dale Wijnand c5168e1263 Fix a bunch of typos. 2015-09-15 13:22:08 +01:00
James Phillips b25797a808 Merge pull request #1187 from sfncook/enable_tag_drift_03
Enable tag drift 03
2015-09-11 15:35:32 -07:00
Anthony Scalisi 8d733b7fca remove various typos 2015-09-11 12:29:54 -07:00
Wojciech Bederski 4cd1b09ad7 make Pause()/Resume()/isPaused() behave more like a semaphore
see: https://github.com/hashicorp/consul/issues/1173 #1173

Reasoning: somewhere during consul development Pause()/Resume() and
PauseSync()/ResumeSync() were added to protect larger changes to
agent's localState.  A few of the places that it tries to protect are:

- (a *Agent) AddService(...)      # part of the method
- (c *Command) handleReload(...)  # almost the whole method
- (l *localState) antiEntropy(...)# isPaused() prevents syncChanges()

The main problem is, that in the middle of handleReload(...)'s
critical section it indirectly (loadServices()) calls  AddService(...).
AddService() in turn calls Pause() to protect itself against
syncChanges(). At the end of AddService() a defered call to Resume() is
made.

With the current implementation, this releases
isPaused() "lock" in the middle of handleReload() allowing antiEntropy
to kick in while configuration reload is still in progress.
Specifically almost all services and probably all check are unloaded
when syncChanges() is allowed to run.

This in turn can causes massive service/check de-/re-registration,
and since checks are by default registered in the critical state,
a majority of services on a node can be marked as failing.
It's made worse with automation, often calling `consul reload` in close
proximity on many nodes in the cluster.

This change basically turns Pause()/Resume() into P()/V() of
a garden-variety semaphore. Allowing Pause() to be called multiple times,
and releasing isPaused() only after all matching/defered Resumes() are
called as well.

TODO/NOTE: as with many semaphore implementations, it might be reasonable
to panic() if l.paused ever becomes negative.
2015-09-11 18:28:06 +02:00
Wojciech Bederski 24ac26b3c1 failing test showing that nested Pause()/Resume() release too early
see: #1173 / https://github.com/hashicorp/consul/issues/1173
2015-09-11 17:52:57 +02:00
Shawn Cook 99be758411 Rename EnableTagOverride and update formatting 2015-09-11 08:35:29 -07:00
Shawn Cook f448a62826 Remove debug lines 2015-09-11 08:32:59 -07:00
Shawn Cook 2f04917261 Merge remote-tracking branch 'hashicorp/master' into enable_tag_drift_03 2015-09-10 14:55:30 -07:00
James Phillips d00889c3a4 Adds missing token to maint unit test. 2015-09-10 14:53:00 -07:00
Shawn Cook 8a86eee9fb Add test cases TestAgentAntiEntropy_EnableTagDrift 2015-09-10 14:08:16 -07:00
Ryan Uber 08d12e978f Merge pull request #1230 from hashicorp/f-maintfix
Respect tokens in maintenance mode
2015-09-10 12:30:07 -07:00
Ryan Uber 948bd57d6a agent: testing node/service maintenance using tokens 2015-09-10 12:08:08 -07:00
Ryan Uber e129a59316 agent: thread tokens through for maintenance mode 2015-09-10 11:43:59 -07:00
Wim 3d7c3725d8 Allow AAAA queries for nodeLookup 2015-09-08 16:54:36 +02:00
Ryan Breen d63749b30e Merge pull request #1217 from 42wim/fix-rfc2308-part3
No NXDOMAIN when the answer is empty
2015-09-04 10:42:38 -04:00
Armon Dadgar 56efa4958b Merge pull request #1214 from zendesk/fix_lock_race_2
lock.go: fix another race condition
2015-09-02 16:04:55 -07:00
Wim 2336c6a4bd No NXDOMAIN when the answer is empty 2015-09-02 16:12:22 +02:00
Ryan Breen a013095f62 Merge pull request #1167 from railsguru/master
Add -http-port option to change the HTTP API port
2015-09-02 01:15:55 -04:00
Armon Dadgar 655666170a agent: Always enable the UI endpoints 2015-09-01 18:28:32 -07:00
Michael S. Fischer 01ec256c7e lock.go: fix another race condition
The previous fix to `consul lock` (commit 6875e8d) didn't completely
eliminate the race that could occur if the lock was acquired around the
same time SIGTERM was received:  It was still possible for
Run() to spawn the process via startChild() after killChild() had
released the shared mutex.

Now, when SIGTERM is received, we acquire a mutex that prevents
spawning a new process and never release it.

We've tested this fix pretty thoroughly and believe it completely
resolves the issue.
2015-09-01 14:27:23 -07:00
Wim e97973c1e1 Limit the DNS responses after getting the NodeRecords 2015-09-01 23:23:05 +02:00
Ryan Breen 56d2fa4c17 Merge pull request #1195 from 42wim/fix-rfc2308-part2
Return SOA/NXDOMAIN when the answer is empty
2015-09-01 17:08:31 -04:00
Wim b806aceef4 Return SOA/not found when the answer is empty 2015-09-01 22:28:12 +02:00
James Phillips 0f49e1c3a9 Merge pull request #1200 from ryotarai/lock-pass-stdin
command/lock: Pass stdin to child process when -pass-stdin passed.
2015-08-31 21:14:45 -07:00
Ryan Uber d6b71de3f4 agent: reload SCADA client if endpoint changes 2015-08-27 13:29:07 -07:00
Ryan Uber 5bd7a5f239 command: atlas endpoint can be passed 2015-08-27 11:11:05 -07:00
Ryan Uber cda2bf6975 agent: atlas_endpoint is configurable 2015-08-27 11:08:01 -07:00
Ryota Arai c45f2971e7 command/lock: Pass stdin to child process when -pass-stdin passed. 2015-08-26 16:27:21 +09:00
Ryan Uber 00d78963bf agent: log a message when making a new scada connection 2015-08-25 21:03:16 -07:00
Ryan Uber 33cadcf925 agent: don't reload scada client if there is no config change 2015-08-25 20:43:57 -07:00
Ryan Uber 8eea77d58f agent: testing scada client creation in command 2015-08-25 20:22:22 -07:00
Ryan Uber 495cc41ba4 agent: test scada HTTP server creation 2015-08-25 18:51:04 -07:00
Ryan Uber e3cd2f2c0d agent: clean up scada connection manager 2015-08-25 18:27:07 -07:00
Ryan Uber bc96c14a6f agent: document the scada http creation func 2015-08-25 17:19:11 -07:00
Ryan Uber 1378fd93b0 agent: scada client and HTTP server are tracked separately 2015-08-25 16:59:53 -07:00
Andy Lo-A-Foe 325b54649a Remove duplicate code 2015-08-20 20:46:20 +02:00
Andy Lo-A-Foe 3d133ab78c Use Ports.HTTP directly 2015-08-20 20:27:20 +02:00
Andy Lo-A-Foe 7e2ecf6a3c Add documentation for http-port option 2015-08-20 20:19:35 +02:00
Shawn Cook d4ec6aa630 Update tests - NodeService init needs bool 2015-08-20 09:09:26 -07:00
Shawn Cook 854ff1eb41 Add EnableTagDrift logic to command/agent/local.go 2015-08-18 14:03:48 -07:00
Shawn Cook 3a740ac07b Remove from command/agent/config_test.go 2015-08-18 10:42:25 -07:00
Shawn Cook f6814c89ed EnableTagDrift in NodeService struct 2015-08-18 10:34:55 -07:00
Ryan Uber 5024e7c3c7 Merge pull request #1166 from hashicorp/f-dns-log
Log network address of DNS clients
2015-08-13 18:32:32 -07:00
Ryan Uber 07299a61dc agent: log network address of DNS clients 2015-08-11 10:33:27 -07:00