Commit graph

680 commits

Author SHA1 Message Date
Pierre Souchay 523feb0be4 Fixed comment about raftIndex + use test.Helper() 2018-02-19 19:30:25 +01:00
Pierre Souchay 4c188c1d08 Services Indexes modified per service instead of using a global Index
This patch improves the watches for services on large cluster:
each service has now its own index, such watches on a specific service
are not modified by changes in the global catalog.

It should improve a lot the performance of tools such as consul-template
or libraries performing watches on very large clusters with many
services/watches.
2018-02-19 18:29:22 +01:00
Edd Steel 35c2083422
Clarify comments 2018-02-17 17:46:11 -08:00
Edd Steel 61be181f6f Test every endpoint for OPTIONS/MethodNotFound 2018-02-17 17:34:13 -08:00
Edd Steel 6c33163959 Allow endpoints to handle OPTIONS/MethodNotFound themselves 2018-02-17 17:34:03 -08:00
Edd Steel 4dc9d2ebd7
Initialise allowedMethods in init() 2018-02-17 17:31:24 -08:00
Kyle Havlovitz ea452c6032
Fix the coordinate update endpoint not passing the ACL token 2018-02-15 11:58:02 -08:00
Edd Steel 40eefc9f7d
Support OPTIONS requests
- register endpoints with supported methods
- support OPTIONS requests, indicating supported methods
- extract method validation (error 405) from individual endpoints
- on 405 where multiple methods are allowed, create a single Allow
  header with comma-separated values, not multiple Allow headers.
2018-02-12 10:15:31 -08:00
Andrei Burd dbb010c865 adding human readability for dns requests debug log (#3751) 2018-02-11 09:02:28 -06:00
Pierre Souchay 824b72cf90 Merge remote-tracking branch 'origin/master' into service_metadata 2018-02-11 13:20:49 +01:00
Pierre Souchay e99bf584c9 Fixed TestSanitize unit test 2018-02-11 12:11:11 +01:00
James Phillips 37cf6583db
Fixes a panic on TCP-based DNS lookups.
This came in via the monkey patch in #3861.

Fixes #3877
2018-02-08 17:57:41 -08:00
Pierre Souchay f2df4005fe Added unit tests for structs and fixed PartialClone() 2018-02-09 01:37:45 +01:00
James Phillips 4f3b4d0e55
Addresses additional state mutations.
Did a sweep of 84d6ac2d51
and checked them all.
2018-02-07 07:02:10 -08:00
James Phillips ca461f8890
Fixes all the racy output-side updates to tags. 2018-02-06 20:35:55 -08:00
James Phillips e7dd7b2d13
Adds a more robust unit test for index churn. 2018-02-06 20:35:38 -08:00
Pierre Souchay 3acc5b58d4 Added support for Service Metadata 2018-02-07 01:54:42 +01:00
James Phillips 41e3fcf205
Makes server manager shift away from failed servers from Serf events.
Because this code was doing pointer equality checks, it would work for
the case of a failed attempted RPC because the objects are from the
manager itself:

https://github.com/hashicorp/consul/blob/v1.0.3/agent/consul/rpc.go#L283-L302

But the pointer check would always fail for events coming in from the
Serf path because the server object is newly-created:

https://github.com/hashicorp/consul/blob/v1.0.3/agent/router/serf_adapter.go#L14-L40

This means that we didn't proactively shift RPC traffic away from a
failed server, we'd have to wait for an RPC to fail, which exposes
the error to the calling client.

By switching over to a name check vs. a pointer check we get the correct
behavior. We added a DEBUG log as well to help observe this behavior during
integrated testing.

Related to #3863 since the fix here needed the same logic duplicated, owing
to the complicated atomic stuff.

/cc @dadgar for a heads up in case this also affects Nomad.
2018-02-05 17:56:00 -08:00
James Phillips c718459e49
Adds a before/after test for #3845. 2018-02-05 16:18:29 -08:00
James Phillips 5b245c0201
Merge pull request #3845 from 42wim/tagfix
Fix service tags not added to health check. Part two
2018-02-05 16:18:00 -08:00
Kyle Havlovitz 46745eb89b
Add enterprise default config section 2018-02-05 13:33:59 -08:00
James Phillips 0aa05cc5f0
Merge pull request #3855 from hashicorp/pr-3782-slackpad
Adds support for gRPC health checks.
2018-02-02 17:57:27 -08:00
James Phillips 1a08e8c0f1
Changes "TLS" to "GRPCUseTLS" since it only applies to GRPC checks. 2018-02-02 17:29:34 -08:00
Wim 5cc76cce09 Fix service tags not added to health check. Part two 2018-01-29 20:32:44 +01:00
Veselkov Konstantin c2395d9bd0 fix refactoring 2018-01-28 22:53:30 +04:00
Veselkov Konstantin c4ad54e057 fix refactoring 2018-01-28 22:48:21 +04:00
Veselkov Konstantin 05666113a4 remove golint warnings 2018-01-28 22:40:13 +04:00
James Phillips 443250c76c
Improves user lookup error message.
Closes #3188
Closes #3184
2018-01-26 07:56:44 -08:00
Kyle Havlovitz 32dbb51c3b
Remove nonvoter from metadata.Server 2018-01-25 17:08:03 -08:00
James Phillips 38f5b2e7ce
Gets rid of named return parameters.
This wasn't wrong before but we don't generally use this style in
Consul.
2018-01-25 14:29:50 -08:00
James Phillips 1acaaecbdd
Moves non-stdlib includes into their own section. 2018-01-25 14:26:15 -08:00
Kyle Havlovitz 0e76d62846
Reset clusterHealth when autopilot starts 2018-01-23 12:52:28 -08:00
Kyle Havlovitz 6d1dbe6cc4
Move autopilot health loop into leader operations 2018-01-23 11:17:41 -08:00
James Phillips a4c3a3433c
Updates web assets to latest. 2018-01-22 14:46:07 -08:00
Kyle Havlovitz c4528a6110
Merge pull request #3821 from hashicorp/persist-file-handling
Add graceful handling of malformed persisted service/check files.
2018-01-22 12:31:33 -08:00
Kyle Havlovitz bb068b4c93
Merge pull request #3820 from hashicorp/serfwan-port-fix
Enforce a valid port for the Serf WAN since it can't be disabled.
2018-01-19 15:40:56 -08:00
James Phillips 77ab587ae1
Moves the coordinate fetch after the ACL check. 2018-01-19 15:25:22 -08:00
Kyle Havlovitz b651253cb2
Don't remove the files, just log an error 2018-01-19 14:25:51 -08:00
Kyle Havlovitz f191eb2df3
Enforce a valid port for the Serf WAN since it can't be disabled.
Fixes #3817
2018-01-19 14:22:23 -08:00
Kyle Havlovitz 17ec4a9394
Add graceful handling of malformed persisted service/check files.
Previously a change was made to make the file writing atomic,
but that wasn't enough to cover something like an OS crash so we
needed something here to handle the situation more gracefully.

Fixes #1221.
2018-01-19 14:07:36 -08:00
James Hartig 81d0ffc959 Resolve symlinks in config directory
Docker/Openshift/Kubernetes mount the config file as a symbolic link and
IsDir returns true if the file is a symlink. Before calling IsDir, the
symlink should be resolved to determine if it points at a file or
directory.

Fixes #3753
2018-01-12 15:43:38 -05:00
James Phillips ca43623734
Adds the NodeID field back to the /v1/agent/self Config block.
Fixes #3778
2018-01-10 15:17:54 -08:00
James Phillips ff2aae98f4
Adds more info about how to fix the private IP error.
Closes #3790
2018-01-10 09:53:41 -08:00
James Phillips e282e9285c
Fixes crash where body was optional for PQ endpoint (it is not).
Fixes #3791
2018-01-10 09:33:49 -08:00
Dmytro Kostiuchenko a45f6ad740 Add gRPC health-check #3073 2018-01-04 16:42:30 -05:00
Diptanu Choudhury f597d66392 Using labels 2017-12-21 20:30:29 -08:00
Diptanu Choudhury ac50568a1a Added telemetry around Catalog APIs 2017-12-21 16:35:12 -08:00
James Phillips 1cd94dec8a
Updates the checked in web assets. 2017-12-20 19:51:04 -08:00
James Phillips bfce81d721
Updates the built-in web assets. 2017-12-20 17:48:51 -08:00
James Phillips 8fd94a2e7c
Wraps HTTP mux to ban all non-printable characters from paths. 2017-12-20 15:47:53 -08:00
James Phillips 6f09cf48db
Updates the built-in web UI assets. 2017-12-20 13:43:52 -08:00
James Phillips 62e97a6602
Fixes a go fmt cleanup. 2017-12-20 13:43:38 -08:00
Kyle Havlovitz 74b0c58831
Fix vet error 2017-12-18 18:04:42 -08:00
Kyle Havlovitz dfc165a47b
Move autopilot initializing to oss file 2017-12-18 18:02:44 -08:00
Kyle Havlovitz 044c38aa7b
Move autopilot setup to a separate file 2017-12-18 16:55:51 -08:00
Kyle Havlovitz 9e1ba6fb4e
Make some final tweaks to autopilot package 2017-12-18 12:26:47 -08:00
Kyle Havlovitz 6b58df5898
Merge pull request #3737 from hashicorp/autopilot-refactor
Move autopilot to a standalone package
2017-12-15 14:09:40 -08:00
James Phillips 262cbbd9ca
Merge pull request #3728 from weiwei04/fix_globalRPC_goroutine_leak
fix globalRPC goroutine leak
2017-12-14 17:54:19 -08:00
James Phillips 518ab954bc
Merge pull request #3642 from yfouquet/master
[Fix] Service tags not added to health checks
2017-12-14 13:59:39 -08:00
James Phillips 47cd775b3d
Works around mapstructure behavior to enable sessions with no checks.
Fixes #3732
2017-12-14 09:07:56 -08:00
Kyle Havlovitz 798aca92c5
Expose IsPotentialVoter for advanced autopilot logic 2017-12-13 17:53:51 -08:00
James Phillips 68c94a5047
Changes maps to merge vs. overwrite when processing configs.
Fixes #3716
2017-12-13 16:06:01 -08:00
Kyle Havlovitz a4ac148077
Merge branch 'master' into autopilot-refactor 2017-12-13 11:54:32 -08:00
Kyle Havlovitz 6c985132de
A few last autopilot adjustments 2017-12-13 11:19:17 -08:00
Kyle Havlovitz 77d92bf15c
More autopilot reorganizing 2017-12-13 10:57:37 -08:00
James Phillips 984de6e2e0
Adds TODOs referencing #3744. 2017-12-13 10:52:06 -08:00
James Phillips 63011dd393
Copies the autopilot settings from the runtime config.
Fixes #3730
2017-12-13 10:32:05 -08:00
Kyle Havlovitz f347c8a531
More refactoring to make autopilot consul-agnostic 2017-12-12 17:46:28 -08:00
Yoann Fouquet f4f7db0059 [Fix] Service tags not added to health checks
Since commit 9685bdcd0ba4b4b3adb04f9c1dd67d637ca7894e, service tags are added to the health checks.
Otherwise, when adding a service, tags are not added to its check.

In updateSyncState, we compare the checks of the local agent with the checks of the catalog.
It appears that the service tags are different (missing in one case), and so the check is synchronized.
That increase the ModifyIndex periodically when nothing changes.

Fixed it by adding serviceTags to the check.

Note that the issue appeared in version 0.8.2.
Looks related to #3259.
2017-12-12 13:39:37 +01:00
Kyle Havlovitz 8546a1d3c6
Move autopilot to a standalone package 2017-12-11 16:45:33 -08:00
James Phillips 32b64575d1
Moves Serf helper into lib to fix import cycle in consul-enterprise. 2017-12-07 16:57:58 -08:00
James Phillips c16cce80bb
Turns of intent queue warnings and enables dynamic queue sizing. 2017-12-07 16:27:06 -08:00
Wei Wei 04531ff0fb fix globalRPC goroutine leak
Signed-off-by: Wei Wei <weiwei.inf@gmail.com>
2017-12-05 11:53:30 +08:00
James Phillips c4bc89a187
Creates a registration mechanism for snapshot and restore. 2017-11-29 18:36:53 -08:00
James Phillips 8571555703
Begins split out of snapshots from the main FSM class. 2017-11-29 18:36:53 -08:00
James Phillips 4eaee8e0ba
Creates a registration mechanism for FSM commands. 2017-11-29 18:36:53 -08:00
James Phillips 3e7ea1931c
Moves the FSM into its own package.
This will help make it clearer what happens when we add some registration
plumbing for the different operations and snapshots.
2017-11-29 18:36:53 -08:00
James Phillips 7f3783f4be
Resolves an FSM snapshot TODO.
This adds checks for sink write calls before we continue the refactor, which
will resolve the other TODO comment we deleted as part of this change.
2017-11-29 18:36:53 -08:00
James Phillips 5a24d37ac0
Creates a registration mechanism for schemas.
This also splits out the registration into the table-specific source
files.
2017-11-29 18:36:52 -08:00
James Phillips 36bb30e67a
Creates a registration mechanism for RPC endpoints. 2017-11-29 18:36:52 -08:00
James Phillips 8f802411c4
Creates HTTP endpoint registry. 2017-11-29 18:36:52 -08:00
James Phillips 8da7f0ff7e
Moves coordinate disabled logic down into endpoints.
Similar rationale to the previous change for ACLs.
2017-11-29 18:36:52 -08:00
James Phillips cf30d409c9
Moves ACL disabled response logic down into endpoints.
This lets us make the registration of endpoints less fancy, on the
road to adding a registration mechanism.
2017-11-29 18:36:52 -08:00
James Phillips 6234f0bd46
Renames "segments" to "segment" to be consistent with other files. 2017-11-29 18:36:52 -08:00
James Phillips ba56669ea8
Renames stubs to be more consistent. 2017-11-29 18:36:52 -08:00
James Phillips 56552095c9
Sheds monotonic time info so tombstone GC bins work properly. 2017-11-29 10:34:24 -08:00
James Phillips 8656b7a3e9
Gives back the lock before writing to the expire channel.
The lock isn't needed after we clean up the expire bin, and as seen
in #3700 we can get into a deadlock waiting to place the expire index
into the channel while holding this lock.

Fixes #3700
2017-11-19 16:24:16 -08:00
James Phillips ae85cc4070
Skips files with unknown extensions when not forcing a format.
Fixes #3685
2017-11-10 18:06:07 -08:00
James Phillips d5bf4e9c6e
Adds a snapshot agent stub to the config structure.
Fixes #3678
2017-11-10 13:50:45 -08:00
James Phillips 50cdff36e5
Cleans up check logging.
There were places where we still didn't have the script vs. args sorted
correctly so changed all the logging to be just based on check IDs and
also made everything uniform.

Also removed some annoying debug logging, and moved some of the large output
logging to TRACE level.

Closes #3602
2017-11-10 12:48:44 -08:00
James Phillips 8210523b1b
Moves the LAN event handler after the router is created.
Fixes #3680
2017-11-10 12:26:48 -08:00
James Phillips bfbbfb62ca
Revert "Adds a small sleep to make sure we are in the next GC bucket." 2017-11-08 22:18:37 -08:00
James Phillips d6328a5bf8
Adds a sleep to make sure we are in the next GC bucket, ups time.
Fixes #3670
2017-11-08 22:02:40 -08:00
James Phillips 91824375be
Skips the tombstone GC test in Travis for now.
Related to #3670
2017-11-08 20:14:20 -08:00
James Phillips c060df20de
Adds missing os import. 2017-11-08 20:02:22 -08:00
James Phillips b94ba8aeb4
Removes bogus getPort() in favor of freeport. 2017-11-08 19:55:50 -08:00
James Phillips 04a7907a7e
Skips IPv6 test in Travis. 2017-11-08 18:28:45 -08:00
James Phillips c52824bab7
Adds a longer retry period for the AE deferred output test.
There's some justification in the comments about this and a TODO to
improve this later.

Fixes #3668
2017-11-08 18:10:13 -08:00
James Phillips 444a345a3a
Tightens timing up and reorders GC test to be less flaky. 2017-11-08 15:09:29 -08:00
James Phillips e00624425b
Doubles the GC timing. 2017-11-08 15:01:11 -08:00
James Phillips 8eb91777d9
Opens up test timing a little more. 2017-11-08 14:01:19 -08:00
James Phillips d45c2a01f1
Shifts off a gran boundary to help make test less flaky. 2017-11-08 13:57:17 -08:00
James Phillips 757e353334
Opens up the tombstone GC test timing. 2017-11-08 13:43:39 -08:00
James Phillips 532cafe0af
Adds enable_agent_tls_for_checks configuration option which allows (#3661)
HTTP health checks for services requiring 2-way TLS to be checked
using the agent's credentials.
2017-11-07 18:22:09 -08:00
James Phillips 9de2d8921f
Saves the cycled server list after a failed ping when rebalancing. (#3662)
Fixes #3463
2017-11-07 18:13:23 -08:00
James Phillips d938493671
Double-books the HTTP metrics w/ and w/o the "consul" prefix.
Fixes #3654
2017-11-07 16:32:45 -08:00
James Phillips 8709f65afd
Adds HTTP/2 support to Consul's HTTPS server. (#3657)
* Refactors the HTTP listen path to create servers in the same spot.

* Adds HTTP/2 support to Consul's HTTPS server.

* Vendors Go HTTP/2 library and associated deps.
2017-11-07 15:06:59 -08:00
James Phillips 021373d72e
Makes the metrics ACL test call the right endpoint.
This also required setting up a proper in-mem sink so we don't get
metrics-related errors.

Fixes #3655
2017-11-06 21:50:04 -08:00
Preetha Appan ae9e204b3a Sets tty in docker client back to true, as a potential fix for docker exec weirdness 2017-11-05 09:44:55 -06:00
Kyle Havlovitz 068ca11eb8
Move check definition to a sub-struct 2017-11-01 14:54:46 -07:00
Kyle Havlovitz bc3ba5f873
Merge branch 'master' into esm-changes 2017-11-01 11:37:48 -07:00
Kyle Havlovitz 83524f44c4
Merge pull request #3622 from hashicorp/coordinate-node-endpoint
agent: add /v1/coordianate/node/:node endpoint
2017-11-01 11:35:50 -07:00
Kyle Havlovitz 3542d7fcb6
Remove redundant lines from coordinate test 2017-11-01 11:25:33 -07:00
Kyle Havlovitz 9909b661ac
Fill out the tests around coordinate/node functionality 2017-10-31 15:36:44 -07:00
Frank Schröder 3cb1cd3723 config: add -config-format option (#3626)
* config: refactor ReadPath(s) methods without side-effects

Return the sources instead of modifying the state.

* config: clean data dir before every test

* config: add tests for config-file and config-dir

* config: add -config-format option

Starting with Consul 1.0 all config files must have a '.json' or '.hcl'
extension to make it unambigous how the data should be parsed. Some
automation tools generate temporary files by appending a random string
to the generated file which obfuscates the extension and prevents the
file type detection.

This patch adds a -config-format option which can be used to override
the auto-detection behavior by forcing all config files or all files
within a config directory independent of their extension to be
interpreted as of this format.

Fixes #3620
2017-10-31 17:30:01 -05:00
Frank Schröder 56561523cf vendor: update go-discover (#3634)
* vendor: update go-discover

Pull in providers:

 * Aliyun (Alibaba Cloud)
 * Digital Ocean
 * OpenStack (os)
 * Scaleway

* doc: use ... instead of xxx

* doc: strip trailing whitespace

* doc: add docs for aliyun, digitalocean, os and scaleway

* agent: fix test
2017-10-31 17:03:54 -05:00
Kyle Havlovitz fd4d9f1c16
Factor out registerNodes function 2017-10-31 13:34:49 -07:00
James Phillips c6e0366c02
Relaxes Autopilot promotion logic. (#3623)
* Relaxes Autopilot promotion logic.

When we defaulted the Raft protocol version to 3 in #3477 we made
the numPeers() routine more strict to only count voters (this is
more conservative and more correct). This had the side effect of
breaking rolling updates because it's at odds with the Autopilot
non-voter promotion logic.

That logic used to wait to only promote to maintain an odd quorum
of servers. During a rolling update (add one new server, wait, and
then kill an old server) the dead server cleanup would still count
the old server as a peer, which is conservative and the right thing
to do, and no longer count the non-voter. This would wait to promote,
so you could get into a stalemate. It is safer to promote early than
remove early, so by promoting as soon as possible we have chosen
that as the solution here.

Fixes #3611

* Gets rid of unnecessary extra not-a-voter check.
2017-10-31 15:16:56 -05:00
Frank Schroeder 82a52d3b50
docker: fix failing test 2017-10-31 09:26:34 +01:00
Frank Schroeder ed1b1b54cd
docker: render errors with %v since they can be nil 2017-10-31 09:19:20 +01:00
Kyle Havlovitz 2c7f7799bb
Add tests around coordinate update endpoint 2017-10-26 20:12:54 -07:00
Kyle Havlovitz 496dd7ab5b
Merge branch 'coordinate-node-endpoint' of github.com:hashicorp/consul into esm-changes 2017-10-26 19:20:24 -07:00
Kyle Havlovitz f80e70271d
Added Coordinate.Node rpc endpoint and client api method 2017-10-26 19:16:40 -07:00
Frank Schroeder 87206133be
agent: add /v1/coordianate/node/:node endpoint
This patch adds a /v1/coordinate/node/:node endpoint to get the network
coordinates for a single node in the network.

Since Consul Enterprise supports network segments it is still possible
to receive mutiple entries for a single node - one per segment.
2017-10-26 14:24:42 +02:00
Frank Schroeder 712447026f
docker: add comment about "connection reset by peer" error 2017-10-26 12:14:19 +02:00
Frank Schroeder 7d05e55734
docker: stop previous check on replace 2017-10-26 12:03:07 +02:00
Frank Schroeder bf98779d84
docker: close idle connections on stop 2017-10-26 12:02:39 +02:00
Frank Schroeder 0a9d2a367e
docker: do not alloc a tty since this is not interactive 2017-10-26 11:56:54 +02:00
Frank Schroeder b1a5a6b64d
docker: make sure to log the error when we fall through 2017-10-26 11:56:36 +02:00
Frank Schroeder b907c4611d
docker: ignore "connection reset by peer"
The Docker agent closes the connection during read after we have
read the body. This causes a "connection reset by peer" even though
the command was successful.

We ignore that error here since we got the correct status code
and a response body.
2017-10-26 11:56:08 +02:00
Kyle Havlovitz 16908be034
Add deregister critical service field and refactor duration parsing 2017-10-25 19:17:41 -07:00
Kyle Havlovitz ab3dac2379
Added coordinate update http endpoint 2017-10-25 19:37:30 +02:00
Kyle Havlovitz 7d82ece118
Added remaining HTTP health check fields to structs 2017-10-25 19:37:30 +02:00
Kyle Havlovitz 84a07ea113
Expose SkipNodeUpdate field and some health check info in the http api 2017-10-25 19:37:30 +02:00
Frank Schroeder 2719194b6d
fix go vet issue 2017-10-25 19:30:35 +02:00
Frank Schroeder 1eb3d0e0d4
replace custom unique id with a UUID 2017-10-25 19:30:35 +02:00
Frank Schroeder 1dab004335
Decouple the code that executes checks from the agent 2017-10-25 11:18:07 +02:00
Frank Schroeder 1d2ae14719
local state: fix go vet issue 2017-10-23 10:56:05 +02:00
Frank Schroeder a818414bb6
local state: remove stale comment 2017-10-23 10:56:05 +02:00
Frank Schroeder 329fdc40a8
local state: make test more robust 2017-10-23 10:56:05 +02:00
Frank Schroeder f5a3d73b27
local state: clone check to avoid side effect 2017-10-23 10:56:05 +02:00
Frank Schroeder b36613e7ff
local state: use synchronized access to internal maps 2017-10-23 10:56:05 +02:00
Frank Schroeder e5318061d1
ae: do not trigger on Resume while holding the lock 2017-10-23 10:56:05 +02:00
Frank Schroeder 7d5dfa9c53
ae: add remaining test cases 2017-10-23 10:56:05 +02:00
Frank Schroeder ae7269458c
ae: refactor StateSyncer to state machine for better testing 2017-10-23 10:56:05 +02:00
Frank Schroeder a72b68c562
ae: add test that we run a full before a partial sync 2017-10-23 10:56:05 +02:00
Frank Schroeder 27bc11f005
ae: make control flow more explicit 2017-10-23 10:56:05 +02:00
Frank Schroeder 29435004f6
ae: fix typo in constructor name 2017-10-23 10:56:05 +02:00
Frank Schroeder 1bb1a6787e
ae: add test for resume triggering SyncChanges 2017-10-23 10:56:05 +02:00
Frank Schroeder 6de645a8b8
ae: add test for ifNotPausedRun 2017-10-23 10:56:05 +02:00