* Added rate limiting for agent RPC calls.
* Initializes the rate limiter based on the config.
* Adds the rate limiter into the snapshot RPC path.
* Adds unit tests for the RPC rate limiter.
* Groups the RPC limit parameters under "limits" in the config.
* Adds some documentation about the RPC limiter.
* Sends a 429 response when the rate limiter kicks in.
* Adds docs for new telemetry.
* Makes snapshot telemetry look like RPC telemetry and cleans up comments.
The error handling of the ACL code relies on the presence of certain
magic error messages. Since the error values are sent via RPC between
older and newer consul agents we cannot just replace the magic values
with typed errors and switch to type checks since this would break
compatibility with older clients.
Therefore, this patch moves all magic ACL error messages into the acl
package and provides default error values and helper functions which
determine the type of error.
Note that there is no test since the correct way to solve (and test)
this is to replace the different maps with a single one or to hide
that functionality behind a separate data structure. This will be
addressed in #3294.
Fixes#3265
This patch replaces the Docker client which is used
for health checks with a simplified version tailored
for that purpose.
See #3254
See #3257Fixes#3270
* Moves magic check and service constants into shared structs package.
* Removes the "consul" service from local state.
Since this service is added by the leader, it doesn't really make sense to
also keep it in local state (which requires special ACLs to configure), and
requires a bunch of special cases in the local state logic. This requires
fewer special cases and makes ACL bootstrapping cleaner.
* Makes coordinate update ACL log message a warning, similar to other AE warnings.
* Adds much more detailed examples for bootstrapping ACLs.
This can hopefully replace https://gist.github.com/slackpad/d89ce0e1cc0802c3c4f2d84932fa3234.
The agent configuration for the consul server is a partial configuration
which needs to be cloned to avoid data races.
This is a stop-gap measure before moving the configuration into
a separate package.
This patch fixes watch registration through the config file and a broken log line when the watch registration fails. It also plumbs all the watch loading through a common function and tweaks the
unit test to create the watch before the reload.
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP.
This patch splits the shutdown process into two parts:
* shutdown the agent
* shutdown the endpoints (http and dns)
They can be executed multiple times, concurrently and in any order but
should be executed first agent, then endpoints to provide consistent
behavior across all use cases. Both calls have to be executed for a
proper shutdown.
This could be partially hidden in a single function but would introduce
some magic that happens behind the scenes which one has to know of but
isn't obvious.
Fixes#2880
This patch hides the RPC handler overwrite mechanism from the
rest of the code so that it works in all cases and that there
is no cooperation required from the tested code, i.e. we can
drop a.getEndpoint().
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP. Ideally, the external endpoints
should be shutdown before the internal state but if the goal is to
respond reliably that the agent is down then this is not possible.
This patch splits the agent shutdown into two parts implemented in a
single method to keep it simple and unambiguos for the caller. The first
stage shuts down the internal state, checks, RPC server, ...
synchronously and then triggers the shutdown of the external endpoints
asychronously. This way the caller is guaranteed that the internal state
services are down when Shutdown returns and there remains enough time to
send a response.
Fixes#2880