Commit Graph

4650 Commits

Author SHA1 Message Date
Frank Schroeder 36677bc90d local state: fix test with updated error message 2017-10-23 08:03:18 +02:00
Frank Schroeder 37c8492e5e local state: fix failing tests 2017-10-23 08:03:18 +02:00
Frank Schroeder 884f98f8aa local state: tests compile 2017-10-23 08:03:18 +02:00
Frank Schroeder 60095484c4 local state: replace multi-map state with structs
The state of the service and health check records was spread out over
multiple maps guarded by a single lock. Access to the maps has to happen
in a coordinated effort and the tests often violated this which made
them brittle and racy.

This patch replaces the multiple maps with a single one for both checks
and services to make the code less fragile.

This is also necessary since moving the local state into its own package
creates circular dependencies for the tests. To avoid this the tests can
no longer access internal data structures which they should not be doing
in the first place.

The tests still don't compile but this is a ncessary step in that
direction.
2017-10-23 08:03:18 +02:00
Frank Schroeder ef9aa6b3b6 local state: move to separate package
This patch moves the local state to a separate package to further
decouple it from the agent code.

The code compiles but the tests do not yet.
2017-10-23 08:03:18 +02:00
Frank Schroeder c03eba91d0 agent: simplify some loops 2017-10-23 08:03:18 +02:00
Frank Schroeder 98e5dc86fb agent: refactor sync loop to linear flow of control 2017-10-23 08:03:18 +02:00
Frank Schroeder 5302479ad5 agent: cleanup StateSyncer
This patch cleans up the state syncer code by renaming fields, adding
helpers and documentation.
2017-10-23 08:03:18 +02:00
Frank Schroeder 034ee43cef agent: decouple anti-entropy from local state
The anti-entropy code manages background synchronizations of the local
state on a regular basis or on demand when either the state has changed
or a new consul server has been added.

This patch moves the anti-entropy code into its own package and
decouples it from the local state code since they are performing
two different functions.

To simplify code-review this revision does not make any optimizations,
renames or refactorings. This will happen in subsequent commits.
2017-10-23 08:03:18 +02:00
Frank Schroeder 6df6ac03b7 config: do not allow an ANY address as DNS recursor 2017-10-23 08:01:25 +02:00
Frank Schroeder 3b13290144 config: add support for go-sockaddr templates for DNS recursors
DNS recursors can be added through go-sockaddr templates. Entries
are deduplicated while the order is maintained.

Originally proposed by @taylorchu

See #2932
2017-10-23 08:01:25 +02:00
James Phillips 2e7d048345
Cleans up import sorting. 2017-10-21 20:08:11 -07:00
Hadar Greinsmark 0c5f5e2821 Implement HTTP Watch handler (#3413)
Implement HTTP Watch handler
2017-10-21 20:39:09 -05:00
Frank Schroeder 74859ff3c0 test: replace porter tool with freeport lib
This patch removes the porter tool which hands out free ports from a
given range with a library which does the same thing. The challenge for
acquiring free ports in concurrent go test runs is that go packages are
tested concurrently and run in separate processes. There has to be some
inter-process synchronization in preventing processes allocating the
same ports.

freeport allocates blocks of ports from a range expected to be not in
heavy use and implements a system-wide mutex by binding to the first
port of that block for the lifetime of the application. Ports are then
provided sequentially from that block and are tested on localhost before
being returned as available.
2017-10-21 22:01:09 +02:00
Frank Schröder d26b0406e4 dns: return NXDOMAIN if datacenter is invalid (#3200) (#3596)
Queries to the DNS server can contain an optional datacenter
name in the query name. You can query for 'foo.service.consul'
or 'foo.service.dc.consul' to get a response for either the
default or a specific datacenter.

Datacenter names cannot have dots, therefore the datacenter
name can refer to only one element in the DNS query name.

The DNS server allowed extra labels between the optional
datacenter name and the domain and returned a valid response
instead of returning NXDOMAIN. For example, if the domain
is set to '.consul' then 'foo.service.dc1.extra.consul'
should return NXDOMAIN because of 'extra' being between
the datacenter name 'dc1' and the domain '.consul'.

Fixes #3200
2017-10-20 16:49:17 -07:00
Frank Schroeder 6628ca1cf1
config: do not allow an ANY address as DNS recursor 2017-10-20 20:00:45 +02:00
Frank Schroeder 2c4f98cf12
config: add support for go-sockaddr templates for DNS recursors
DNS recursors can be added through go-sockaddr templates. Entries
are deduplicated while the order is maintained.

Originally proposed by @taylorchu

See #2932
2017-10-20 15:51:49 +02:00
James Phillips 3d52f42715 Fixes API client for ScriptArgs and updates documentation. (#3589)
* Updates the API client to support the current `ScriptArgs` parameter
for checks.

* Updates docs for checks to explain the `ScriptArgs` parameter issue.

* Adds mappings for "args" and "script-args" to give th API parity
with config.

* Adds checks on return codes.

* Removes debug logging that shows empty when args are used.
2017-10-18 11:28:39 -07:00
Ryan Slade 6f05ea91a3 Replace time.Now().Sub(x) with time.Since(x) 2017-10-17 20:38:24 +02:00
James Phillips 39f2359804 Fixes an XSS issue with unescaped node names. (#3578)
* Fixes an XSS issue with node names in the tomography graph.

* Updates built-in static web assets.

* Updates the change log.
2017-10-16 09:12:36 -07:00
James Phillips fdd08c78a9 Adds a brief wait and poll period to update check status after a timeout. (#3573)
* Adds a brief wait and poll period to update the check status
if we get stucking waiting for the processes to terminate.

Fixes #3570

* Jumps out of timeout case and includes script output.
2017-10-12 13:49:46 -07:00
James Phillips e9670761f9
Cleans up some drift between the OSS and Enterprise trees. 2017-10-11 15:53:07 -07:00
Kyle Havlovitz eea2bd2753 Kill check processes after the timeout is reached (#3567)
* Kill check processes after the timeout is reached

Kill the subprocess spawned by a script check once the timeout is reached. Previously Consul just marked the check critical and left the subprocess around.

Fixes #3565.

* Set err to non-nil when timeout occurs

* Fix check timeout test

* Kill entire process subtree on check timeout

* Add a docs note about windows subprocess termination
2017-10-11 11:57:39 -07:00
Frank Schroeder c4215bc04f
config: remove redundant code 2017-10-11 10:16:21 +02:00
Frank Schroeder 8cda75454a
config: fix check for segment.port <= 0 and add test 2017-10-11 10:15:55 +02:00
James Phillips a16dbc0212
Adds check to make sure port is given so we avoid a nil bind address. 2017-10-10 18:11:21 -07:00
James Phillips 275e83de08
Removes obsolete segment stub. 2017-10-10 17:21:32 -07:00
Frank Schröder 9b2e3c2091 agent: add option to discard health output (#3562)
* agent: add option to discard health output

In high volatile environments consul will have checks with "noisy"
output which changes every time even though the status does not change.
Since the output is stored in the raft log every health check update
unblocks a blocking call on health checks since the raft index has
changed even though the status of the health checks may not have changed
at all. By discarding the output of the health checks the users can
choose a different tradeoff. Less visibility on why a check failed in
exchange for a reduced change rate on the raft log.

* agent: discard output also when adding a check

* agent: add test for discard check output

* agent: update docs

* go vet

* Adds discard_check_output to reloadable config table.

* Updates the change log.
2017-10-10 17:04:52 -07:00
preetapan f6066f8305 Fixes agent error handling when check definition is invalid. Distingu… (#3560)
* Fixes agent error handling when check definition is invalid. Distinguishes between empty checks vs invalid checks

* Made CheckTypes return Checks from service definition struct rather than a new copy, and other changes from code review. This also errors when json payload contains empty structs

* Simplify and improve validate method, and make sure that CheckTypes always returns a new copy of validated check definitions

* Tweaks some small style things and error messages.

* Updates the change log.
2017-10-10 16:54:06 -07:00
Frank Schröder fa22ad4573 config: add generic method to translate between CamelCase and snake_case (#3557)
* doc: document discrepancy between id and CheckID

* doc: document enable_tag_override change

* config: add TranslateKeys helper

TranslateKeys makes it easier to map between different representations
of internal structures. It allows to recursively map alias keys to
canonical keys in structured maps.

* config: use TranslateKeys for config file

This also adds support for 'enabletagoverride' and removes
the need for a separate CheckID alias field.

* config: remove dead code

* agent: use TranslateKeys for FixupCheckType

* agent: translate enable_tag_override during service registration

* doc: add '.hcl' as valid extension

* config: map ScriptArgs to args

* config: add comment for TranslateKeys
2017-10-10 16:40:59 -07:00
James Phillips d1ad538345 Makes RPC handling more robust when rolling servers. (#3561)
* Adds client-side retry for no leader errors.

This paves over the case where the client was connected to the leader
when it loses leadership.

* Adds a configurable server RPC drain time and a fail-fast path for RPCs.

When a server leaves it gets removed from the Raft configuration, so it will
never know who the new leader server ends up being. Without this we'd be
doomed to wait out the RPC hold timeout and then fail. This makes things fail
a little quicker while a sever is draining, and since we added a client retry
AND since the server doing this has already shut down and left the Serf LAN,
clients should retry against some other server.

* Makes the RPC hold timeout configurable.

* Reorders struct members.

* Sets the RPC hold timeout default for test servers.

* Bumps the leave drain time up to 5 seconds.

* Robustifies retries with a simpler client-side RPC hold.

* Reverts untended delete.
2017-10-10 15:19:50 -07:00
Preetha Appan 25e64b5362 Fix unit test after dns library upgrade to account for correct data length 2017-10-06 17:40:17 -05:00
James Phillips a1db119d02 Fixes handling of stop channel and failed barrier attempts. (#3546)
* Fixes handling of stop channel and failed barrier attempts.

There were two issues here. First, we needed to not exit when there
was a timeout trying to write the barrier, because Raft might not
step down, so we'd be left as the leader but having run all the step
down actions.

Second, we didn't close over the stopCh correctly, so it was possible
to nil that out and have the leaderLoop never exit. We close over it
properly AND sequence the nil-ing of it AFTER the leaderLoop exits for
good measure, so the code is more robust.

Fixes #3545

* Cleans up based on code review feedback.

* Tweaks comments.

* Renames variables and removes comments.
2017-10-06 07:54:49 -07:00
Victor Boivie 77f7008363 Minor typo (boostrap) 2017-10-05 16:28:48 +02:00
James Phillips 97b580f593
Adds script warning and fixes Docker args recognition. 2017-10-04 21:41:27 -07:00
Kyle Havlovitz dde743700f Merge pull request #3535 from hashicorp/metric-docs
Update metric names and add a legacy config flag
2017-10-04 17:39:16 -07:00
Kyle Havlovitz d5fec6b7ac
Add a test for legacy metrics with a whitelist filter 2017-10-04 17:27:57 -07:00
Kyle Havlovitz be04bfed34 Clean up subprocess handling and make shell use optional (#3509)
* Clean up handling of subprocesses and make using a shell optional

* Update docs for subprocess changes

* Fix tests for new subprocess behavior

* More cleanup of subprocesses

* Minor adjustments and cleanup for subprocess logic

* Makes the watch handler reload test use the new path.

* Adds check tests for new args path, and updates existing tests to use new path.

* Adds support for script args in Docker checks.

* Fixes the sanitize unit test.

* Adds panic for unknown watch type, and reverts back to Run().

* Adds shell option back to consul lock command.

* Adds shell option back to consul exec command.

* Adds shell back into consul watch command.

* Refactors signal forwarding and makes Windows-friendly.

* Adds a clarifying comment.

* Changes error wording to a warning.

* Scopes signals to interrupt and kill.

This avoids us trying to send SIGCHILD to the dead process.

* Adds an error for shell=false for consul exec.

* Adds notes about the deprecated script and handler fields.

* De-nests an if statement.
2017-10-04 16:48:00 -07:00
Kyle Havlovitz 0063516e5e
Update metric names and add a legacy config flag 2017-10-04 16:43:27 -07:00
Frank Schröder b2c4dc4360 Provide stable config for agent/self (#3532)
* config: provide stable config for /v1/agent/self (#3530)

This patch adds a stable subset of the previous Config struct to the
agent/self response. The actual runtime configuration is moved into
DebugConfig and will be documented to change.

Fixes #3530

* config: fix tests

* doc: update api documentation for /v1/agent/self
2017-10-04 10:43:17 -07:00
James Phillips 6529c505a5 Merge pull request #3531 from hashicorp/pr-3521-slackpad
ui: Use monospace font for textarea controls.
2017-10-04 09:53:41 -07:00
James Phillips 539285cf1f
Updates checked in web assets to pick up CSS change.
Closes #3521
2017-10-04 09:52:15 -07:00
Preetha Appan f38d20eb40 Remove extra newline 2017-10-03 15:19:31 -05:00
Preetha Appan 3c81e2db7c Only allow 'list' policies within 'key' policy definitions. Consolidated two similar tests into one and fixed alignment. 2017-10-03 15:15:56 -05:00
Preetha Appan d5acfc3982 Introduces new 'list' permission that applies to KV store recursive reads, and enforced only when opted in. 2017-10-02 17:10:21 -05:00
Frank Schroeder 6b3a957c5e use ports from derived addresses 2017-09-29 20:26:43 +02:00
Frank Schroeder 8d8e2523eb config: drop advertise_addrs
Fixes #3516
2017-09-29 20:26:43 +02:00
Frank Schroeder f0efe2a3de
Fix tests after config refactor 2017-09-28 12:32:46 +02:00
Patrick Sodré 55c2746963
Implement encodeKVasRFC1464 function 2017-09-28 12:32:46 +02:00
Patrick Sodré d880634cfa
Add RFC1464 tests 2017-09-28 12:32:45 +02:00
Patrick Sodré 7083f9fb14
Turn encodeKVasRFC1464 into a plain function 2017-09-28 12:32:45 +02:00
Patrick Sodré be258a3315
Use verify for NodeLookup CNAME, and TXT tests 2017-09-28 12:32:45 +02:00
Patrick Sodré 8982719f5b
Refactor formatTxtRecords as encodeKVasRFC1464
- Move the logic of rfc1035 out of the encoding function
  - Left basic version of encodingKV as 'k=v'
2017-09-28 12:32:45 +02:00
Patrick Sodré a16e0f7419
Fix editorial suggestions 2017-09-28 12:32:45 +02:00
Patrick Sodré 4b2d1546fa
Remove redundant check of Node.Meta size 2017-09-28 12:32:45 +02:00
Patrick Sodré b8369b54fb
Return Node.Meta info using the DNS interface 2017-09-28 12:32:45 +02:00
Patrick Sodré b8905dd065
Add test for NoteLookup ANY request 2017-09-28 12:32:45 +02:00
Patrick Sodré 354765c549
Add test for querying Node.Meta with DNS TXT
- Lookup TXT records using recursive lookups
  - Expect TXT record equal to value if key starts with rfc1035-
  - Expect TXT record in rfc1464 otherwise, i.e. (k=v)

ref #2709
2017-09-28 12:32:45 +02:00
Frank Schröder 5f6d0fd8c5 fail early when advertise addr is set to ANY (#3507) 2017-09-27 13:57:55 -07:00
Frank Schröder beb803f0d9 only detect advertise address if derived value is any (#3506)
* only detect advertise address if derived value is any

* determine detect function only when advertise addr is any
2017-09-27 12:59:47 -07:00
James Phillips d677999258
Adds a comment about Datacenter and NodeName being stable interfaces
in the runtime config strucutre.
2017-09-27 11:59:22 -07:00
Frank Schröder cda0eacff1 Recursive sanitize (#3505)
* vendor: add github.com/sergi/go-diff/diffmatchpatch for diff'ing test output

* config: refactor Sanitize to recursively clean runtime config and format complex fields

* Removes an extra int cast.

* Adds a top-level check test case for sanitization.
2017-09-27 11:47:40 -07:00
James Phillips 330ce87851
Gets rid of flaky clause in stats fetcher unit test.
Given how the rutine is coded we can still get data so this wasn't
a reliable thing to check.
2017-09-26 20:53:06 -07:00
preetapan 783e24be64 Issue 3452 (#3500)
* Make sure that id and address are set in member created during reaping of catalog nodes that have been removed from serf

* Get address from node table in the state store rather than from service address

* Fix incorrect lookup by checkname instead of node name

* Make sure that serverlookup is called with the right address format, added unit test.

* Address code review comments

* Tweaks style stuff.
2017-09-26 20:49:41 -07:00
Frank Schröder 707f8e329a Metrics service prefix (#3498)
* metrics: replace statsite_prefix with service_prefix

The metrics prefix isn't statsite specific and is in fact used
for all metrics providers. Since we are deprecating fields
anyway we should fix this one as well.

Fixes #3293

* Updates docs and sorts telemetry section.

* Renames to "metrics_prefix" to disambiguate with Consul services.

* Updates the change log.
2017-09-26 17:49:55 -07:00
James Phillips 3130fcaccc Merge pull request #3501 from hashicorp/snapshot-test-hang
Cleans up some edge cases in TestSnapshot_Forward_Leader.
2017-09-26 14:08:33 -07:00
James Phillips 4b17c9618f
Cleans up some edge cases in TestSnapshot_Forward_Leader.
These could cause the tests to hang.
2017-09-26 14:07:28 -07:00
Kyle Havlovitz 3460506264 Fix watch error when http & https are disabled (#3493)
Remove an error in watch reloading that happens when http and https
are both disabled, and use an https address for running watches if
no http addresses are present.

Fixes #3425.
2017-09-26 13:47:27 -07:00
Preetha Appan 318d0232f7 Move Raft protocol version for list peers end point to server side, fix unit tests. This fixes #3449 2017-09-26 09:35:39 -05:00
Frank Schroeder 94fbae4732
fix data race
Since state.Checks() returns a shallow copy
its elements must not be modified. Copying
the elements in the handler does not guarantee
consistency since that list is guarded by a different
lock. Therefore, the only solution is to have state.Checks()
return a deep copy.
2017-09-26 13:42:10 +02:00
Frank Schroeder a1d65cbe78 config: do not clobber multiple check and service definitions
This patch ensures that multiple files with single 'check' or 'service'
definitions result in the combination of them.
2017-09-26 10:24:18 +02:00
James Phillips 23de0f9ea9
Renames `enable_ui` to `ui` to keep compatibility with existing configs. 2017-09-26 00:05:55 -07:00
Frank Schröder c7cc62ab5a agent: consolidate handling of 405 Method Not Allowed (#3405)
* agent: consolidate http method not allowed checks

This patch uses the error handling of the http handlers to handle HTTP
method not allowed errors across all available endpoints. It also adds a
test for testing whether the endpoints respond with the correct status
code.

* agent: do not panic on metrics tests

* agent: drop other tests for MethodNotAllowed

* agent: align /agent/join with reality

/agent/join uses PUT instead of GET as documented.

* agent: align /agent/check/{fail,warn,pass} with reality

/agent/check/{fail,warn,pass} uses PUT instead of GET as documented.

* fix some tests

* Drop more tests for method not allowed

* Align TestAgent_RegisterService_InvalidAddress with reality

* Changes API client join to use PUT instead of GET.

* Fixes agent endpoint verbs and removes obsolete tests.

* Updates the change log.
2017-09-25 23:11:19 -07:00
preetapan 4ced57c1f8 Merge pull request #3494 from hashicorp/enforce_json_extension
Enforce json or hcl extension to Consul config files, updated unit tests
2017-09-25 17:30:33 -05:00
James Phillips fcaa889116 Bumps default Raft protocol to version 3. (#3477)
* Changes default Raft protocol to 3.

* Changes numPeers() to report only voters.

This should have been there before, but it's more obvious that this
is incorrect now that we default the Raft protocol to 3, which puts
new servers in a read-only state while Autopilot waits for them to
become healthy.

* Fixes TestLeader_RollRaftServer.

* Fixes TestOperator_RaftRemovePeerByAddress.

* Fixes TestServer_*.

Relaxed the check for a given number of voter peers and instead do
a thorough check that all servers see each other in their Raft
configurations.

* Fixes TestACL_*.

These now just check for Raft replication to be set up, and don't
care about the number of voter peers.

* Fixes TestOperator_Raft_ListPeers.

* Fixes TestAutopilot_CleanupDeadServerPeriodic.

* Fixes TestCatalog_ListNodes_ConsistentRead_Fail.

* Fixes TestLeader_ChangeServerID and adjusts the conn pool to throw away
sockets when it sees io.EOF.

* Changes version to 1.0.0 in the options doc.

* Makes metrics test more deterministic with autopilot metrics possible.
2017-09-25 15:27:04 -07:00
Preetha Appan 1e8385df2c Enforce json or hcl extension to Consul config files, updated unit tests 2017-09-25 17:17:12 -05:00
James Phillips 5208a1ac96
Removes unused imports in agent_test.go. 2017-09-25 13:42:15 -07:00
Preetha Appan 8394ad08db Introduce Code Policy validation via sentinel, with a noop implementation 2017-09-25 13:44:55 -05:00
Frank Schröder 69a088ca85 New config parser, HCL support, multiple bind addrs (#3480)
* new config parser for agent

This patch implements a new config parser for the consul agent which
makes the following changes to the previous implementation:

 * add HCL support
 * all configuration fragments in tests and for default config are
   expressed as HCL fragments
 * HCL fragments can be provided on the command line so that they
   can eventually replace the command line flags.
 * HCL/JSON fragments are parsed into a temporary Config structure
   which can be merged using reflection (all values are pointers).
   The existing merge logic of overwrite for values and append
   for slices has been preserved.
 * A single builder process generates a typed runtime configuration
   for the agent.

The new implementation is more strict and fails in the builder process
if no valid runtime configuration can be generated. Therefore,
additional validations in other parts of the code should be removed.

The builder also pre-computes all required network addresses so that no
address/port magic should be required where the configuration is used
and should therefore be removed.

* Upgrade github.com/hashicorp/hcl to support int64

* improve error messages

* fix directory permission test

* Fix rtt test

* Fix ForceLeave test

* Skip performance test for now until we know what to do

* Update github.com/hashicorp/memberlist to update log prefix

* Make memberlist use the default logger

* improve config error handling

* do not fail on non-existing data-dir

* experiment with non-uniform timeouts to get a handle on stalled leader elections

* Run tests for packages separately to eliminate the spurious port conflicts

* refactor private address detection and unify approach for ipv4 and ipv6.

Fixes #2825

* do not allow unix sockets for DNS

* improve bind and advertise addr error handling

* go through builder using test coverage

* minimal update to the docs

* more coverage tests fixed

* more tests

* fix makefile

* cleanup

* fix port conflicts with external port server 'porter'

* stop test server on error

* do not run api test that change global ENV concurrently with the other tests

* Run remaining api tests concurrently

* no need for retry with the port number service

* monkey patch race condition in go-sockaddr until we understand why that fails

* monkey patch hcl decoder race condidtion until we understand why that fails

* monkey patch spurious errors in strings.EqualFold from here

* add test for hcl decoder race condition. Run with go test -parallel 128

* Increase timeout again

* cleanup

* don't log port allocations by default

* use base command arg parsing to format help output properly

* handle -dc deprecation case in Build

* switch autopilot.max_trailing_logs to int

* remove duplicate test case

* remove unused methods

* remove comments about flag/config value inconsistencies

* switch got and want around since the error message was misleading.

* Removes a stray debug log.

* Removes a stray newline in imports.

* Fixes TestACL_Version8.

* Runs go fmt.

* Adds a default case for unknown address types.

* Reoders and reformats some imports.

* Adds some comments and fixes typos.

* Reorders imports.

* add unix socket support for dns later

* drop all deprecated flags and arguments

* fix wrong field name

* remove stray node-id file

* drop unnecessary patch section in test

* drop duplicate test

* add test for LeaveOnTerm and SkipLeaveOnInt in client mode

* drop "bla" and add clarifying comment for the test

* split up tests to support enterprise/non-enterprise tests

* drop raft multiplier and derive values during build phase

* sanitize runtime config reflectively and add test

* detect invalid config fields

* fix tests with invalid config fields

* use different values for wan sanitiziation test

* drop recursor in favor of recursors

* allow dns_config.udp_answer_limit to be zero

* make sure tests run on machines with multiple ips

* Fix failing tests in a few more places by providing a bind address in the test

* Gets rid of skipped TestAgent_CheckPerformanceSettings and adds case for builder.

* Add porter to server_test.go to make tests there less flaky

* go fmt
2017-09-25 11:40:42 -07:00
James Phillips 268018c558
Robustifies check in TestCatalog_ListNodes_ConsistentRead_Fail test.
Fixes #3469
2017-09-13 21:22:53 -07:00
James Phillips 8be4ee766a
Revert "Manages segments list via a pointer."
This reverts commit c277a4250461443cbd63de0259e5e32766f651ea.
2017-09-07 16:37:11 -07:00
James Phillips 5008aabb62
Manages segments list via a pointer. 2017-09-07 16:21:07 -07:00
James Phillips 908f7be97f
Cleans up formatting. 2017-09-07 12:26:58 -07:00
James Phillips 02a3f3f27b
Shows the segment name in the keyring API and command output. 2017-09-07 12:17:39 -07:00
James Phillips 34bae2487d
Populates the segment keyrings based on the LAN keyring. 2017-09-07 12:17:20 -07:00
James Phillips 7c616e3768
Moves reconcile loop into segment stub. 2017-09-06 18:01:53 -07:00
James Phillips 4e34c2af06
Takes the skip out of the client check.
Without this the merge delegate won't check the segment for non-servers
a little below here.
2017-09-06 17:05:40 -07:00
James Phillips 78ac144fff Merge pull request #3447 from hashicorp/issue-3070
Skips unique node ID check for old versions of Consul.
2017-09-06 13:24:15 -07:00
James Phillips 62d9299646
Fixes incorrect comment. 2017-09-06 13:23:19 -07:00
James Phillips 031f1874d0
Pulls down some code for the check loop. 2017-09-06 13:07:42 -07:00
James Phillips 2fd9328b21
Uses the Raft configuration for the self-add skip check. 2017-09-06 13:05:51 -07:00
Preetha Appan 1eae9f1e2f Change member join reconcile step to process joining itself, to handle node IP address changes correctly when number of servers < 3 2017-09-06 13:53:01 -05:00
James Phillips 353e037c9b
Skips unique node ID check for old versions of Consul.
Fixes #3070.
2017-09-05 22:57:29 -07:00
James Phillips aecf890054
Allow _all for WAN as a no-op. 2017-09-05 13:40:19 -07:00
James Phillips c629773b40
Makes the all segments query explict, and the default for `consul members`. 2017-09-05 12:22:20 -07:00
James Phillips bc9780baad Adds simple rate limiting for client agent RPC calls to Consul servers. (#3440)
* Added rate limiting for agent RPC calls.
* Initializes the rate limiter based on the config.
* Adds the rate limiter into the snapshot RPC path.
* Adds unit tests for the RPC rate limiter.
* Groups the RPC limit parameters under "limits" in the config.
* Adds some documentation about the RPC limiter.
* Sends a 429 response when the rate limiter kicks in.
* Adds docs for new telemetry.
* Makes snapshot telemetry look like RPC telemetry and cleans up comments.
2017-09-01 15:02:50 -07:00
Kyle Havlovitz 334e082848 Merge pull request #3431 from hashicorp/network-segments-oss 2017-09-01 10:24:58 -07:00
Kyle Havlovitz ff994e9ade
Pass listeners into setupSegments 2017-08-31 17:56:43 -07:00
Kyle Havlovitz 5cc4b32a5d
Organize segments for a cleaner split between enterprise and OSS 2017-08-31 17:39:46 -07:00
Kyle Havlovitz 5ebea7f049
Fill in the segment in the QuerySource for prepared query lookups 2017-08-31 03:35:59 -07:00
Kyle Havlovitz b77a0aa932
Fix some inconsistencies with segment logic and comments 2017-08-30 17:43:46 -07:00
Kyle Havlovitz 3b0df3350f
Default bind/advertise for segments to BindAddr/AdvertiseAddr 2017-08-30 12:51:10 -07:00
Preetha Appan 0728a04dbb Wire server provider for raft layer only on protocol version 3 and above, and update changelog 2017-08-30 14:36:47 -05:00
Kyle Havlovitz d9fc2b3d75
Update coord display in ui to account for segments 2017-08-30 11:58:29 -07:00
Kyle Havlovitz 6ded43131a
Add segment addr field to tags for LAN flood joiner 2017-08-30 11:58:29 -07:00
Kyle Havlovitz 1c04f1537a
Add agent.segment interpolation to prepared queries 2017-08-30 11:58:29 -07:00
Kyle Havlovitz 107d7f6c5a
Add rpc_listener option to segment config 2017-08-30 11:58:29 -07:00
Kyle Havlovitz e582d02079
Add segment config validation 2017-08-30 11:58:29 -07:00
James Phillips 6a6eadd8c7
Adds open source side of network segments (feature is Enterprise-only). 2017-08-30 11:58:29 -07:00
Preetha Appan e944370cde More cleanup from code review 2017-08-30 12:31:36 -05:00
Preetha Appan a215c764cd Remove copy pasted duplicate line, update documentation. 2017-08-30 10:02:10 -05:00
Preetha Appan 5a29eb7486 Consolidate server lookup into one place and replace usages of localConsuls. 2017-08-30 09:30:33 -05:00
Preetha Appan cac1c29ec5 Remove unused function 2017-08-30 09:30:33 -05:00
Preetha Appan d8fe01db4c Remove stray commented line 2017-08-30 09:30:33 -05:00
Preetha Appan ca48e7e4c2 Remove server address tracking logic from manager/router and maintain it as part of lan event listener instead. Used sync.Map to track this, and added unit tests 2017-08-30 09:30:33 -05:00
Preetha Appan b4a9d77d49 ServerAddressProvider interface also returns an error now 2017-08-30 09:30:33 -05:00
Preetha Appan edb408bc22 Use config struct to create NetworkTransport layer when setting up raft 2017-08-30 09:30:33 -05:00
Preetha Appan 01f8e469aa Implement AddressProvider and wire that up to raft transport layer to support server nodes changing their IP addresses in containerized environments 2017-08-30 09:30:33 -05:00
Frank Schroeder 62c77d70f0 build: make tests independent of build tags
When the metadata server is scanning the agents for potential servers
it is parsing the version number which the agent provided when it
joined. This version number has to conform to a certain format, i.e.
'n.n.n'. Without this version number properly set some tests fail with
error messages that disguise the root cause.

The default version number is currently set to 'unknown' in
version/version.go which does not parse and triggers the tests to fail.
The work around is to use a build tag 'consul' which will use the
version number set in version_base.go instead which has the correct
format and is set to the current release version.

In addition, some parts of the code also require the version number to
be of a certain value. Setting it to '0.0.0' for example makes some
tests pass and others fail since they don't pass the semantic check.

When using go build/install/test one has to remember to use '-tags
consul' or tests will fail with non-obvious error messages.

Using build tags makes the build process more complex and error prone
since it prevents the use of the plain go toolchain and - at least in
its current form - introduces subtle build and test issues. We should
try to eliminate build tags for anything else but platform specific
code.

This patch removes all references to specific version numbers in the
code and tests and sets the default version to '9.9.9' which is
syntactically correct and passes the semantic check. This solves the
issue of running go build/install/test without tags for the OSS build.
2017-08-30 13:40:18 +02:00
Frank Schroeder 84a1bf0a99 agent: drop status code comments 2017-08-23 22:36:23 +02:00
Frank Schroeder b06584f631 agent: use http.StatusRequestEntityTooLarge instead of 413 2017-08-23 22:36:23 +02:00
Frank Schroeder cc83590962 agent: use http.StatusInternalServerError instead of 500 2017-08-23 22:36:23 +02:00
Frank Schroeder bf426beb45 agent: use http.StatusMethodNotAllowed instead of 405 2017-08-23 22:36:23 +02:00
Frank Schroeder 0c3534cbf7 agent: use http.StatusNotFound instead of 404 2017-08-23 22:36:23 +02:00
Frank Schroeder 970a7f97ec agent: use http.StatusForbidden instead of 403 2017-08-23 22:36:23 +02:00
Frank Schroeder 2e586be5aa agent: use http.StatusUnauthorized instead of 401 2017-08-23 22:36:23 +02:00
Frank Schroeder 923f8e2364 agent: use http.StatusBadRequest instead of 400 2017-08-23 22:36:23 +02:00
Frank Schroeder a32eab5923 agent: support go-discover retry-join for wan 2017-08-23 21:23:34 +02:00
Frank Schröder 44e6b8122d acl: consolidate error handling (#3401)
The error handling of the ACL code relies on the presence of certain
magic error messages. Since the error values are sent via RPC between
older and newer consul agents we cannot just replace the magic values
with typed errors and switch to type checks since this would break
compatibility with older clients.

Therefore, this patch moves all magic ACL error messages into the acl
package and provides default error values and helper functions which
determine the type of error.
2017-08-23 16:52:48 +02:00
Frank Schroeder d9e2a51887 agent: drop unused code
This code from http://github.com/hashicorp/consul/pull/3353 is no longer
required.
2017-08-22 00:02:46 +02:00
Frank Schroeder 4bfcf7b613 dns: replace nameserver lookup with consistent rpc call
This patch replaces the code which determines the list of servers in the
current cluster with an RPC call to get the list of active consul
service instances which only run on servers.

This replaces the previous implementation which was more complex and
relied on serf messages which can provide a different view than the
consistent response from the raft log.

As a side effect it makes the implementation independent of the server
and the agent which means it works consistently across both. Different
behavior for server and agent was the root cause for the bug in
http://github.com/hashicorp/consul/issue/3047.

Fixes #3407
2017-08-22 00:02:46 +02:00
Frank Schroeder 8e1f9b9b68 dns: split node lookup from request handling 2017-08-22 00:02:46 +02:00
Frank Schroeder db8ad8922e dns: refactor label by unrolling loop 2017-08-22 00:02:46 +02:00
Frank Schroeder c35206db07 dns: move ttl closer to usage 2017-08-22 00:02:46 +02:00
James Phillips 738ac55d96
Switches to using a read lock for the agent's RPC dispatcher.
This prevents RPC calls from getting serialized in this spot.

Fixes #3376
2017-08-09 18:51:55 -07:00
Frank Schröder 32d4eecc1a agent: honor deprecated flags for retry-join-{ec2,azure,gce} (#3384) 2017-08-09 16:18:30 -07:00
James Phillips 3518e27a76 Revert "Return 403 rather than a 404 when acls cause all results to be filter…" 2017-08-09 15:06:57 -07:00
James Phillips 91205b2cd6 Revert "Ensure that we return a permission denied only if the list of keys/en…" 2017-08-09 15:06:20 -07:00
Preetha Appan 121326161e Added unit test case to kvs_endpointtest 2017-08-09 15:50:22 -05:00
Preetha Appan d06002dc62 Ensure that we return a permission denied only if the list of keys/entries prior to filtering by ACL is non empty 2017-08-09 15:32:18 -05:00
Frank Schroeder c38dcf2d17
agent: move agent/consul/agent to agent/metadata 2017-08-09 14:36:52 +02:00
Frank Schroeder 85bdb77d90
agent: move agent/consul/servers to agent/router 2017-08-09 14:36:37 +02:00
Frank Schroeder 1d0bbfed9c
agent: move agent/consul/structs to agent/structs 2017-08-09 14:32:12 +02:00
James Phillips a600f681d6
Cleans up some go fmt issues. 2017-08-08 21:52:50 -07:00
James Phillips 7b4d3d5576
Fixes a vet error. 2017-08-08 16:00:18 -07:00
Kyle Havlovitz 8c2e422074 Merge pull request #3369 from hashicorp/metrics-enhancements
Add support for labels/filters from go-metrics
2017-08-08 13:55:30 -07:00
Kyle Havlovitz 160395d3c7
Add doc links for metrics endpoint 2017-08-08 13:05:38 -07:00
Kyle Havlovitz 308d7b785d
Update docs for metrics endpoint 2017-08-08 12:33:30 -07:00
Frank Schroeder 0f4986dcc7
dns: minor cleanups 2017-08-08 13:55:58 +02:00
Kyle Havlovitz 975ded2714
Add support for labels/filters from go-metrics 2017-08-08 01:45:10 -07:00
Preetha Appan 2df084968c Go back to using <nodename>.node.dc.consul as the name of the ns record being returned. 2017-08-07 16:02:33 -05:00
Frank Schroeder b571cb8097
dns: keep NS names in consul domain 2017-08-07 11:11:55 +02:00
Frank Schroeder 7b39af2b2d
dns: postmaster -> hostmaster 2017-08-07 11:11:55 +02:00
Frank Schroeder 98de22e13e
dns: we do not support zone transfers 2017-08-07 11:11:55 +02:00
Frank Schroeder e1bcbc6832
dns: drop CNAME for primary name server 2017-08-07 11:11:55 +02:00
Preetha Appan 393a0eae93
Added test case with IPV6 bind address for NS records, rewrote tests to use verify library and other code review feedback 2017-08-07 11:11:55 +02:00
Preetha Appan 52075bda1c
Added back glue records in NS response, expanded unit test. Also reused same function used in node lookup for adding A/AAAA records in the extra section of the NS response 2017-08-07 11:11:55 +02:00
Preetha Appan c7c4100503
Don't add A records for NS requests, because the record being returned already resolves correctly. Also fixed all the unit tests, and ignored hostnames that don't meet valid dns hostname criteria 2017-08-07 11:11:55 +02:00
Frank Schroeder 450d8a69b5
dns: provide correct SOA and NS responses
This patch changes the behavior of the DNS server as follows:

* The SOA response contains the SOA record in the Answer section instead
  of the Authority section. It also contains NS records in the Authority
  and the corresponding A glue records in the Extra section.
  In addition, CNAMEs are added to the Extra section to make the
  MNAME of the SOA record resolvable.

  AAAA glue records are not yet supported.

* The NS response returns up to three random servers from the
  consul cluster in the Answer section and the glue A
  records in the Extra section.

  AAAA glue records are not yet supported.
2017-08-07 11:11:55 +02:00
Preetha Appan bff45ee1da
Unify regex used to identify invalid dns characters 2017-08-07 11:11:55 +02:00
Preetha Appan 6bac9355fd
Use sanitized version of node name of server in NS record, and start with "server" rather than "ns" 2017-08-07 11:11:55 +02:00
Preetha Appan 7e9d683ab1
Removed a copy pasted irrelevant comment, and other code review feedback 2017-08-07 11:11:54 +02:00
Preetha Appan c38906daad
Add NS records and A records for each server. Constructs ns host names using the advertise address of the server. 2017-08-07 11:11:54 +02:00
James Phillips 803ed9a245 Adds secure introduction for the ACL replication token. (#3357)
Adds secure introduction for the ACL replication token, as well as a separate enable config for ACL replication.
2017-08-03 15:39:31 -07:00
Frank Schroeder 1cb602e085
agent: fix code for updated go-discover signature
Closes #3351
2017-08-03 21:32:11 +02:00
James Phillips c31b56a03e Adds a new /v1/acl/bootstrap API (#3349) 2017-08-02 17:05:18 -07:00
Miguel Prokop ea6d610dee agent: Fix script quoting on windows (#1875)
This patch fixes the quoting for executing scripts on windows
and splits the platform dependent code.

Fixes #1875
2017-08-02 17:01:21 +02:00
Frank Schroeder 68e8f3d0f7 agent: use github.com/hashicorp/go-discover
Replace the provider specific node discovery code
with go-discover to support AWS, Azure and GCE.

Fixes #3282
2017-08-01 11:41:43 +02:00
Preetha Appan 307049e17f Return nil instead of empty list when returning a PermissionDenied error, updated unit test 2017-07-31 17:23:20 -05:00
Preetha Appan da29b74d03 Return 403 rather than a 404 when acls cause all results to be filtered out. This fixes #2637 2017-07-31 13:50:29 -05:00
preetapan 677949b14d Merge pull request #3332 from hashicorp/issue_3322
This fixes #3322
2017-07-28 17:54:30 -05:00
Preetha Appan 3b12545844 Tweaked parsing error message to quote properly 2017-07-28 17:52:35 -05:00
James Phillips 8f1f762ddd Adds missing autopilot snapshot test and avoids snapshotting nil. (#3333) 2017-07-28 15:48:42 -07:00
Preetha Appan 86b9e3c5f3 Validate unix sockets and ip addresses as needed, more test cases 2017-07-28 17:18:10 -05:00
Preetha Appan ac068de336 Modify ResolveTmplAddrs to parse advertise IPs, added test cases that fail to parse correctly 2017-07-28 15:01:32 -05:00
Preetha Appan 4b82d09df0 Removed extra newlines 2017-07-28 10:51:11 -05:00
Preetha Appan 7b99f7ca08 Fix comments, and remove redundant TestConfig init from a couple of unit tests 2017-07-28 10:40:43 -05:00
Frank Schroeder f27202b608
add tests for go-sockaddr template parsing 2017-07-28 15:40:22 +02:00
Frank Schroeder 5ee498cbc5
agent: unix sockets are not ip addrs 2017-07-28 14:53:21 +02:00
Frank Schroeder 0b13a38d90
config: refactor tmpl resolution fn 2017-07-28 12:20:49 +02:00
Preetha Appan 28016190e0 Moved handling advertise address to readConfig and out of the agent's constructor, plus unit test fixes 2017-07-27 22:06:31 -05:00
Preetha Appan 398c1e450c Move go-socketaddr template parsing into config package to make it happen before creating a new agent. Also removed redundant parsetemplate calls from agent.go. 2017-07-27 16:17:35 -05:00
James Phillips 6b51744ddf Adds option to prepared queries to remove empty tags. (#3330) 2017-07-26 22:46:43 -07:00
James Phillips 6e794ea1b3 Adds support for agent-side ACL token management via API instead of config files. (#3324)
* Adds token store and removes all runtime use of config for ACL tokens.
* Adds a new API for changing agent tokens on the fly.
2017-07-26 11:03:43 -07:00
Preetha Appan 4692b1478e Add extra test case for deleting entire tree with empty prefix 2017-07-26 09:42:07 -05:00
Preetha Appan 74ba4c3c6b Don't insert tombstone for empty prefix delete. Other minor unit test fixes 2017-07-25 21:54:11 -05:00
Preetha Appan a6b7e66e9a Removed redundant comments and unit test 2017-07-25 20:39:33 -05:00
Preetha Appan 1503d63595 Removed redundant call to reap tombstone from unit test 2017-07-25 19:39:05 -05:00
Preetha Appan 996302c085 Improved unit test per code review 2017-07-25 19:17:40 -05:00
Preetha Appan f4cccf44e3 Use new DeletePrefixMethod for implementing KVSDeleteTree operation. This makes deletes on sub trees larger than one million nodes about 100 times faster. Added unit tests. 2017-07-25 17:21:18 -05:00
James Phillips cf7b1aaf04 Removes an unnecessary close. 2017-07-24 21:41:18 -07:00
Preetha Appan 213af3650f Removed redundant logging 2017-07-24 21:07:48 -05:00
Preetha Appan c08ff6c8ae Clean up temporary files on write errors, and ignore any temporary service files on load with a warning. This fixes #3207 2017-07-24 12:42:51 -05:00
James Phillips a0867b5d49
Tweaks the error when scripts are disabled.
This will hopefully help people self-serve if they upgrade without accounting
for this.
2017-07-19 22:15:04 -07:00
Kyle Havlovitz c74d7558a5 Fix UpgradeVersionTag field not being passed correctly (#3304) 2017-07-19 17:39:48 -07:00
Preetha Appan 9116186b4c Made unit test for AddCheck error check the actual error string 2017-07-19 11:00:56 -05:00
Preetha Appan f790c7279a Unit test for failure case of AddCheck 2017-07-19 10:28:52 -05:00
Frank Schroeder e6e711a401
fix spelling in filenames
Fixes #3301
2017-07-19 13:16:38 +02:00
Frank Schroeder 6d0bd1faaf agent: make docker client work on windows 2017-07-19 12:03:59 +02:00
Frank Schroeder e195d592be
build: add missing build tags 2017-07-19 05:17:01 +02:00
preetapan efae3cccc0 Merge pull request #3296 from hashicorp/ensure_registration_race
Fix race condition between removing a service and adding a check for …
2017-07-18 18:36:47 -05:00
Preetha Appan db1d477592 Clean up any watch monitors associated with a failed AddCheck 2017-07-18 16:54:20 -05:00
Preetha Appan 4b8958b35b Removed unit test, added clarifying comment and returned a friendlier error message similar to the one in agent's AddService method
Fixes #3297
2017-07-18 16:15:47 -05:00
Preetha Appan 530e87eab0 Fix race condition between removing a service and adding a check for the same service, which was causing orphaned checks 2017-07-18 16:15:47 -05:00
Kyle Havlovitz 1ffd2ec05b
Add UpgradeVersionTag to autopilot config 2017-07-18 13:35:41 -07:00
Frank Schroeder 8bcbb7b827 agent: stop docker checks on shutdown 2017-07-18 20:59:24 +02:00
Frank Schroeder c8ae94b688 agent: stop and remove docker checks
Note that there is no test since the correct way to solve (and test)
this is to replace the different maps with a single one or to hide
that functionality behind a separate data structure. This will be
addressed in #3294.

Fixes #3265
2017-07-18 20:59:24 +02:00
Frank Schroeder b4e5c0647b
agent: replace docker check
This patch replaces the Docker client which is used
for health checks with a simplified version tailored
for that purpose.

See #3254
See #3257
Fixes #3270
2017-07-18 20:24:38 +02:00
James Phillips 42472e8bb5 Prevents disabling gossip keyring file from disabling gossip encryption. (#3278) 2017-07-17 12:48:45 -07:00
James Phillips 788dd255a1 Adds new config to make script checks opt-in, updates documentation. (#3284) 2017-07-17 11:20:35 -07:00
James Phillips 838591c916 Changes remote exec KV read to call GetTokenForAgent(). (#3283)
* Changes remote exec KV read to call GetTokenForAgent(), which can use
the acl_agent_token instead of the acl_token.

Fixes #3160.

* Fixes remote exec unit test with ACLs.

* Adds unhappy ACL path to unit tests for remote exec.
2017-07-16 21:12:16 -07:00
James Phillips 5876b81896 Adds node read privileges to the acl_agent_master_token. (#3277)
Fixes #3113.
2017-07-16 20:08:26 -07:00
Frank Schröder de97fb0670 azure: tag map can return nil (#3280)
Fixes #3193
2017-07-16 14:29:43 -07:00
James Phillips ac7c48c3ea Obfuscates ACL tokens appearing in /v1/acl/<verb>/<token> APIs. (#3276)
* Obfuscates ACL tokens appearing in /v1/acl APIs.

* Makes test positively identify the desired strings.

* Adds an example and explanation of the regular expression.
2017-07-15 00:07:08 -07:00
James Phillips 759be97635 Changes ACL clone response to 403 if not authorized, or if token doesn't exist. (#3275)
Fixes #1113
2017-07-14 20:43:30 -07:00
Kyle Havlovitz d985dbc36b
Add TLS setting to router areas 2017-07-14 17:38:08 -07:00
James Phillips 8572931afe Cleans up version 8 ACLs in the agent and the docs. (#3248)
* Moves magic check and service constants into shared structs package.

* Removes the "consul" service from local state.

Since this service is added by the leader, it doesn't really make sense to
also keep it in local state (which requires special ACLs to configure), and
requires a bunch of special cases in the local state logic. This requires
fewer special cases and makes ACL bootstrapping cleaner.

* Makes coordinate update ACL log message a warning, similar to other AE warnings.

* Adds much more detailed examples for bootstrapping ACLs.

This can hopefully replace https://gist.github.com/slackpad/d89ce0e1cc0802c3c4f2d84932fa3234.
2017-07-13 22:33:47 -07:00
Frank Schroeder 3fcf1bc9e2
agent: fix go vet issue 2017-07-11 07:13:46 -07:00
James Phillips 68991da95f Adds the ability to blacklist specific HTTP endpoints. (#3252) 2017-07-10 13:51:25 -07:00
James Phillips 219fb6dd70 UI cleanup follow up from #3245. (#3251)
* Removes unnecessary set for model component which will be null.

* Returns a 404 for a missing node, not a 200 with an empty response.

* Updates built-in web assets.
2017-07-10 09:40:00 -07:00
James Phillips c849458d9b Changes the default ACL token type to "client" in web UI. (#3246)
* Changes the default ACL token type to "client".

* Updates built-in web assets.
2017-07-08 17:28:04 -07:00
James Phillips 0a17a8284f Cleans up web UI and fixes ACL token "stuckness" issue. (#3245)
* Removes GitHub reference.

* Doesn't display ACL token on the unauthorized page.

* Removes useless fetch for nodes and cleans up comments.

* Provides a path to reset the ACL token when it's invalid.

This included making the settings page global so it's reachable, and adding
some more information about an error on the error page.

* Updates built-in web assets.
2017-07-08 17:16:05 -07:00
Frank Schroeder 75cd1828b8 address review comments 2017-07-07 09:22:34 +02:00
Frank Schroeder 33588d8fdd agent: remove unused code 2017-07-07 09:22:34 +02:00
Frank Schroeder 37202cc751 agent: make TestClient_RPC_ConsulServerPing more robust 2017-07-07 09:22:34 +02:00
Frank Schroeder bbf715fdaf agent: fix data races with registerEndpoint
Only register a different endpoint after it has been
fully created.
2017-07-07 09:22:34 +02:00
Frank Schroeder cfe3437c0c agent: make Reap test timing less aggressive 2017-07-07 09:22:34 +02:00
James Phillips 7b54e325df Adds a comment about flood joining. 2017-07-07 09:22:34 +02:00
James Phillips 247f4a7e41 Simplifies Serf dynamic port selection code.
This isn't racy, it's just a little dirty. The listen will happen and a port
will be selected and injected into the config once the Serf instance is
created, so we don't need the retry loop here.
2017-07-07 09:22:34 +02:00
James Phillips e2935e7509 test: Changes WAN/LAN join confirmer to use port number vs. address.
This fixes TestServer_JoinSeparateLanAndWanAddresses which sets bogus
advertise addresses as part of the test. Port numbers uniquely identify
members since everything is running on localhost.
2017-07-07 09:22:34 +02:00
Frank Schroeder 98dc634f17 test: make joinLAN/WAN reliable
only return if the members can see each other
2017-07-07 09:22:34 +02:00
Frank Schroeder 74d3c4d896 rpc: make TestServer_JoinSeparateLanAndWanAddresses more robust 2017-07-07 09:22:34 +02:00
Frank Schroeder ae59198e38 rpc: make TestClient_SnapshotRPC_TLS more robust 2017-07-07 09:22:34 +02:00
Frank Schroeder f4af0b6ab6 agent: make timing sensitive tests more robust
* make timing less aggressive
* mark timing tests as non-parallel
2017-07-07 09:22:34 +02:00
Frank Schroeder 37a7e52dd9 agent: fix TestCheckHTTP_TLSSkipVerify_true_pass
Make check timing less aggressive and give the test some time
to execute.
2017-07-07 09:22:34 +02:00
Frank Schroeder 46221d2b56 agent: do not modify agent config after NewAgent 2017-07-07 09:22:34 +02:00
Frank Schroeder 217d34f66d agent: fix pending data races between localState and agent
This patch creates a local config structure for the local state
which is independent from the agent but populated from its
configuration. This avoids data races between the agent configuration
which can change during tests and concurrent go routines using the
configuraiton at the same time.
2017-07-07 09:22:34 +02:00
Frank Schroeder 6715a7a0c2 dns: fix data race in TestDNS_ServiceLookup_FilterACL
The agent config cannot be modified after start.
2017-07-07 09:22:34 +02:00
Frank Schroeder 62b695fb17 agent: fix data race in TestAgentAntiEntropy_EnableTagOverride 2017-07-07 09:22:34 +02:00
Frank Schroeder 188ea638d5 agent: clone partial consul config
The agent configuration for the consul server is a partial configuration
which needs to be cloned to avoid data races.

This is a stop-gap measure before moving the configuration into
a separate package.
2017-07-07 09:22:34 +02:00
Frank Schroeder 53e409758b dns: fix data races in DNS compression tests
Make the DisableCompression value configurable at runtime
to allow tests to change it without restarting/recreating
the server.
2017-07-07 09:22:34 +02:00
Frank Schroeder 24d8bdfb02 agent: fix data race between consul server and local state 2017-07-07 09:22:34 +02:00
Frank Schroeder 0ed76615d3 rpc: monkey patch fix for data races for localState
The tests that use the localState of the agent access the internal
variables and call methods which are not guarded by locks creating
data races in tests. While the use of internal variables is somewhat
easy to spot the fact that not all methods are thread-safe is a
surprise.

A proper fix requires the localState struct to be moved into its own
package so that tests in the agent can only access the external
interface.

However, the localState is currently dependent on the agent.Config
which would create a circular dependency. Therefore, the Config
struct needs to be moved first for this to happen.

This patch literally monkey patches the use of the lock around the
cases which have data races and marks them with a
// todo(fs): data race comment.
2017-07-07 09:22:34 +02:00
Frank Schroeder 96c03ce73b rpc: try shutting down leader first to avoid hang in TestLeader_LeftServer 2017-07-07 09:22:34 +02:00
Frank Schroeder 2eb2941e8c rpc: fix logging and try quicker timing of TestServer_JoinSeparateLanAndWanAddresses 2017-07-07 09:22:34 +02:00
Frank Schroeder 98510f898c rpc: less agressive raft timeouts
Allowing more time for raft to consolidate should
drop the number of leader elections.
2017-07-07 09:22:34 +02:00
Frank Schroeder 50c81a9397 rpc: run agent/consul tests in parallel 2017-07-07 09:22:34 +02:00
Frank Schroeder b3189a566a rpc: refactor sessionTimers and fix racy tests
The sessionTimers map was secured by a lock which wasn't used
properly in the tests. This lead to data races and failing tests
when accessing the length or the members of the map.

This patch adds a separate SessionTimers struct which is safe
for concurrent use and which ecapsulates the behavior of the
sessionTimers map.
2017-07-07 09:22:34 +02:00
Frank Schroeder 06ad8e96be rpc: fix TestServer_Leave
wait for the leader election.
2017-07-07 09:22:34 +02:00
Frank Schroeder 4a073aec1c rpc: fix TestSession_Renew
make the timing less tight
2017-07-07 09:22:34 +02:00
Frank Schroeder 77ff9f680f rpc: fix TestReadyForConsistentRead
timing was too tight. Standardized name.
2017-07-07 09:22:34 +02:00
Frank Schroeder e3252f921a rpc: fix for 'no leader' in TLS tests
Ensure both servers know about each other before looking
for a leader.
2017-07-07 09:22:34 +02:00
Frank Schroeder 2497b8416b rpc: fix TestServer_JoinWAN_Flood
The second server in the first data center should not be
in bootstrap mode.
2017-07-07 09:22:34 +02:00
Frank Schroeder 7af30dd7d7 rpc: provide unique node names for server and client 2017-07-07 09:22:34 +02:00
Frank Schroeder 457910b191 rpc: prefix log output with test name 2017-07-07 09:22:34 +02:00
Frank Schroeder c33f7ecbe2 rpc: discover serf wan port before starting serf lan
When using dynamic ports for the serf clusters then
the actual bind port of the serf WAN cluster needs to
be discovered before the serf LAN cluster is started
since the serf LAN cluster announces the port of the WAN
cluster.
2017-07-07 09:22:34 +02:00
Frank Schroeder 84c90cbd07 rpc: bind rpc test server to port 0 2017-07-07 09:22:34 +02:00
Frank Schroeder 7f5957ee93 rpc: refactor: unify test server setup 2017-07-07 09:22:34 +02:00
Frank Schroeder 325637b6be rpc: fix typos 2017-07-07 09:22:34 +02:00
Frank Schroeder 85aa360843 agent: refactor: log to stderr during tests 2017-07-07 09:22:34 +02:00
Frank Schroeder 865a825116 agent: refactor: use handler for test http tls server 2017-07-07 09:22:34 +02:00
Frank Schroeder 21a0e94aea agent: refactor: make address translation part of the agent 2017-07-07 09:22:34 +02:00
Preetha Appan ae656575ea Rename to raftNotifyCh, fix typo 2017-07-06 09:10:36 -05:00
Preetha Appan 777504ff0e Fixes deadlock between barrier write and leader notify channel read . Fixes #3230 2017-07-05 17:09:18 -05:00
Grégoire Seux 2e7c202ca2 Correctly forward Host header in healthcheck (#3203)
Host header must be set explicitely on http requests

Change-Id: I91a32f0fb1ec3fbc713adf0e10869797e91172c7
Signed-off-by: Grégoire Seux <g.seux@criteo.com>
2017-06-29 16:26:08 -07:00
Preetha Appan c872a05922 Fix missing formatting directive causing go vet to fail 2017-06-27 16:32:38 -05:00
Frank Schroeder 913748bcc4
Revert "agent: add allowStale option for HTTP API (#3142)"
This reverts commit 1e0fd27a74f5b18775ce91a84310430de35a4a80.
2017-06-27 07:04:55 +02:00
Frank Schröder 5500eb95eb agent: fix DNS recursor tests (#3190)
The makeRecursor function was using an unreliable mechanism
to start a server with a random port. This patch changes this
so that the server starts on port 0 to let the kernel pick
a free port.

In addition, to similar functions for starting a test DNS
server were folded into one.
2017-06-25 10:42:37 -07:00
James Phillips fb640d1ffe
Removes some useless comments. 2017-06-25 10:32:35 -07:00
James Phillips 4b85d33ef1 Fixes watch tracking during reloads and fixes address issue. (#3189)
This patch fixes watch registration through the config file and a broken log line when the watch registration fails. It also plumbs all the watch loading through a common function and tweaks the
unit test to create the watch before the reload.
2017-06-24 12:52:41 -07:00
James Phillips 2184136284 Changes host-based node IDs from opt-out to opt-in. (#3187) 2017-06-24 09:36:53 -07:00
James Phillips 59621dbccc Revert "discover: move instance discover code into separate package (#3144)" (#3180)
This reverts commit 26bfb2d00a30bf30ebdd85ba2e1e19f37355853f.
2017-06-23 01:38:55 -07:00
James Phillips 728971afdb Fixes broken HTTP header and method for health checks. (#3178)
* Fixes broken HTTP header and method for health checks.
* Adds a fuzz utility and test to make sure copy is complete.
2017-06-23 01:15:48 -07:00
wojtkiewicz f320bb9083
agent: add allowStale option for HTTP API (#3142)
This patch adds an "allowStale" option to the HTTP API
configuration which allows stale reads to provide linear
read scalability.

Fixes #3142
2017-06-22 10:31:13 +02:00
wojtkiewicz 26c8697a40
agent: add "http_config"
This patch adds an "http_config" object to the config file
and moves the "http_api_response_headers" option there.

"http_api_response_headers" is now deprecated in favor of
"http_config.response_headers"
2017-06-22 10:31:11 +02:00
James Phillips d2251018d9 Fixes checked in web assets and associated build scripts. (#3173) 2017-06-21 14:43:07 -07:00
Frank Schröder 4bdff5fff4 discover: move instance discover code into separate package (#3144)
This patch moves the code that discovers instances from metadata
information to github.com/hashicorp/go-discover with
sub-packages for each provider.
2017-06-21 10:40:38 +02:00
Frank Schröder 04b636d1f4 agent: notify systemd after JoinLAN (#2121)
This patch adds support for notifying systemd via the
NOTIFY_SOCKET by sending 'READY=1' to the socket after
a successful JoinLAN.

Fixes #2121
2017-06-21 06:43:55 +02:00
Frank Schroeder f8e52c897e agent: fix 'consul leave' shutdown race (#2880)
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP.

This patch splits the shutdown process into two parts:

 * shutdown the agent
 * shutdown the endpoints (http and dns)

They can be executed multiple times, concurrently and in any order but
should be executed first agent, then endpoints to provide consistent
behavior across all use cases. Both calls have to be executed for a
proper shutdown.

This could be partially hidden in a single function but would introduce
some magic that happens behind the scenes which one has to know of but
isn't obvious.

Fixes #2880
2017-06-21 05:52:51 +02:00
Frank Schroeder 75f3add1f3 agent: drop unused constant 2017-06-21 05:42:39 +02:00
Frank Schroeder a1dec8a46f agent: make registerEndpoint private
This is only used for testing.
2017-06-21 05:42:39 +02:00
Frank Schroeder d3ab99244b agent: make the RPC endpoint overwrite mechanism more transparent
This patch hides the RPC handler overwrite mechanism from the
rest of the code so that it works in all cases and that there
is no cooperation required from the tested code, i.e. we can
drop a.getEndpoint().
2017-06-21 05:42:39 +02:00
Frank Schroeder 27adc31672 agent: rename agent var 2017-06-21 05:42:39 +02:00
Frank Schroeder 3e20a2ba81 agent: move structs into consul/structs pkg
* CheckDefinition
 * ServiceDefinition
 * CheckType
2017-06-21 05:42:39 +02:00
Frank Schroeder db78252019 agent: move NotifyGroup into the agent pkg 2017-06-21 05:42:39 +02:00
Frank Schroeder 2c47bc5d5b agent: move conn pool for muxed connections into separate pkg 2017-06-21 05:42:39 +02:00
Frank Schroeder e930b55f71 agent: move the SnapshotReplyFn out of the way
When splitting up the consul package into server and client
the SnapshotReplyFn needs to be in a separate package to avoid
a circular dependency.
2017-06-21 05:42:39 +02:00
Frank Schroeder b805a79078 agent: use the delegate interface for local state 2017-06-21 05:42:39 +02:00
Frank Schroeder 586b345767 agent: rename clientServer interface to delegate 2017-06-21 05:42:39 +02:00
preetapan 9e527836be Merge pull request #3154 from hashicorp/issue_2644_redux
Fix stale reads on server startup. Consistent reads will now wait for up to config.RPCHoldTimeout for the server to get past its raft log, before returning an error. Servers that are starting up will eventually catch up. 
This fixes issue #2644
2017-06-20 19:47:12 -05:00
Preetha Appan bb559d8e6e Minor fixes per code review 2017-06-20 19:43:07 -05:00
Frank Schroeder 280611d407
Revert "agent: fix 'consul leave' shutdown race (#2880)"
This reverts commit 90c83a32b586c7d4add8d8ca0096025ecb886a77.
2017-06-19 21:34:08 +02:00
Frank Schroeder 226a5d3db4 agent: fix 'consul leave' shutdown race (#2880)
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP. Ideally, the external endpoints
should be shutdown before the internal state but if the goal is to
respond reliably that the agent is down then this is not possible.

This patch splits the agent shutdown into two parts implemented in a
single method to keep it simple and unambiguos for the caller. The first
stage shuts down the internal state, checks, RPC server, ...
synchronously and then triggers the shutdown of the external endpoints
asychronously. This way the caller is guaranteed that the internal state
services are down when Shutdown returns and there remains enough time to
send a response.

Fixes #2880
2017-06-19 21:24:26 +02:00
Preetha Appan f535a298f3 Added unit test to verify consistentRead method behavior 2017-06-16 11:58:12 -05:00
Preetha Appan f616a8dd06 Code review feedback, fixed major logic bug 2017-06-16 10:49:54 -05:00
Preetha Appan 42d3a3f3db Redo bug fix for stale reads on server startup, leveraging RPCHOldtimeout instead of maxQueryTime, plus tests 2017-06-15 22:41:30 -05:00
Kyle Havlovitz 5e45aec642 Add an option to disable keyring file (#3145)
Also disables keyring file in dev mode.
2017-06-15 15:24:04 -07:00
Seth Vargo 10c450c78d Add EDNS0 support (#3131)
This is a refactor of GH-1980. Originally I tried to do a straight
rebase, but the code has changed too much.
2017-06-14 16:22:54 -07:00
Preetha Appan ea1f301661 Fixed static asset build target and checked in new executable for assetfs 2017-06-12 12:57:02 -05:00
Frank Schroeder cd837b0b18 pkg refactor
command/agent/*                  -> agent/*
    command/consul/*                 -> agent/consul/*
    command/agent/command{,_test}.go -> command/agent{,_test}.go
    command/base/command.go          -> command/base.go
    command/base/*                   -> command/*
    commands.go                      -> command/commands.go

The script which did the refactor is:

(
	cd $GOPATH/src/github.com/hashicorp/consul
	git mv command/agent/command.go command/agent.go
	git mv command/agent/command_test.go command/agent_test.go
	git mv command/agent/flag_slice_value{,_test}.go command/
	git mv command/agent .
	git mv command/base/command.go command/base.go
	git mv command/base/config_util{,_test}.go command/
	git mv commands.go command/
	git mv consul agent
	rmdir command/base/

	gsed -i -e 's|package agent|package command|' command/agent{,_test}.go
	gsed -i -e 's|package agent|package command|' command/flag_slice_value{,_test}.go
	gsed -i -e 's|package base|package command|' command/base.go command/config_util{,_test}.go
	gsed -i -e 's|package main|package command|' command/commands.go

	gsed -i -e 's|base.Command|BaseCommand|' command/commands.go
	gsed -i -e 's|agent.Command|AgentCommand|' command/commands.go
	gsed -i -e 's|\tCommand:|\tBaseCommand:|' command/commands.go
	gsed -i -e 's|base\.||' command/commands.go
	gsed -i -e 's|command\.||' command/commands.go

	gsed -i -e 's|command|c|' main.go
	gsed -i -e 's|range Commands|range command.Commands|' main.go
	gsed -i -e 's|Commands: Commands|Commands: command.Commands|' main.go

	gsed -i -e 's|base\.BoolValue|BoolValue|' command/operator_autopilot_set.go
	gsed -i -e 's|base\.DurationValue|DurationValue|' command/operator_autopilot_set.go
	gsed -i -e 's|base\.StringValue|StringValue|' command/operator_autopilot_set.go
	gsed -i -e 's|base\.UintValue|UintValue|' command/operator_autopilot_set.go

	gsed -i -e 's|\bCommand\b|BaseCommand|' command/base.go
	gsed -i -e 's|BaseCommand Options|Command Options|' command/base.go
	gsed -i -e 's|base.Command|BaseCommand|' command/*.go
	gsed -i -e 's|c\.Command|c.BaseCommand|g' command/*.go
	gsed -i -e 's|\tCommand:|\tBaseCommand:|' command/*_test.go
	gsed -i -e 's|base\.||' command/*_test.go

	gsed -i -e 's|\bCommand\b|AgentCommand|' command/agent{,_test}.go
	gsed -i -e 's|cmd.AgentCommand|cmd.BaseCommand|' command/agent.go

	gsed -i -e 's|cli.AgentCommand = new(Command)|cli.Command = new(AgentCommand)|' command/agent_test.go
	gsed -i -e 's|exec.AgentCommand|exec.Command|' command/agent_test.go
	gsed -i -e 's|exec.BaseCommand|exec.Command|' command/agent_test.go
	gsed -i -e 's|NewTestAgent|agent.NewTestAgent|' command/agent_test.go
	gsed -i -e 's|= TestConfig|= agent.TestConfig|' command/agent_test.go
	gsed -i -e 's|: RetryJoin|: agent.RetryJoin|' command/agent_test.go

	gsed -i -e 's|\.\./\.\./|../|' command/config_util_test.go

	gsed -i -e 's|\bverifyUniqueListeners|VerifyUniqueListeners|' agent/config{,_test}.go command/agent.go
	gsed -i -e 's|\bserfLANKeyring\b|SerfLANKeyring|g' agent/{agent,keyring,testagent}.go command/agent.go
	gsed -i -e 's|\bserfWANKeyring\b|SerfWANKeyring|g' agent/{agent,keyring,testagent}.go command/agent.go
	gsed -i -e 's|\bNewAgent\b|agent.New|g' command/agent{,_test}.go
	gsed -i -e 's|\bNewAgent|New|' agent/{acl_test,agent,testagent}.go

	gsed -i -e 's|\bAgent\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bBool\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bDefaultConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bDevConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bMergeConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bReadConfigPaths\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bParseMetaPair\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bSerfLANKeyring\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bSerfWANKeyring\b|agent.&|g' command/agent{,_test}.go

	gsed -i -e 's|circonus\.agent|circonus|g' command/agent{,_test}.go
	gsed -i -e 's|logger\.agent|logger|g' command/agent{,_test}.go
	gsed -i -e 's|metrics\.agent|metrics|g' command/agent{,_test}.go
	gsed -i -e 's|// agent.Agent|// agent|' command/agent{,_test}.go
	gsed -i -e 's|a\.agent\.Config|a.Config|' command/agent{,_test}.go

	gsed -i -e 's|agent\.AppendSliceValue|AppendSliceValue|' command/{configtest,validate}.go

	gsed -i -e 's|consul/consul|agent/consul|' GNUmakefile

	gsed -i -e 's|\.\./test|../../test|' agent/consul/server_test.go

	# fix imports
	f=$(grep -rl 'github.com/hashicorp/consul/command/agent' * | grep '\.go')
	gsed -i -e 's|github.com/hashicorp/consul/command/agent|github.com/hashicorp/consul/agent|' $f
	goimports -w $f

	f=$(grep -rl 'github.com/hashicorp/consul/consul' * | grep '\.go')
	gsed -i -e 's|github.com/hashicorp/consul/consul|github.com/hashicorp/consul/agent/consul|' $f
	goimports -w $f

	goimports -w command/*.go main.go
)
2017-06-10 18:52:45 +02:00