Commit Graph

147 Commits

Author SHA1 Message Date
Daniel Nephin 1c8eca2bfd agent: rename AddService->AddServiceFromSource
In preparation for extracting a single AddService func that accepts a request struct.
2021-01-25 17:25:01 -05:00
Daniel Nephin 45f0afcbf4 structs: Fix printing of IDs
These types are used as values (not pointers) in other structs. Using a pointer receiver causes
problems when the value is printed. fmt will not call the String method if it is passed a value
and the String method has a pointer receiver. By using a value receiver the correct string is printed.

Also remove some unused methods.
2021-01-07 18:47:38 -05:00
Michael Montgomery a1748aa2cb Merge branch 'master' into 6074-allow-config-MaxHeaderBytes 2020-12-30 14:14:05 -06:00
Michael Montgomery 519f537b8b Fixed failing tests
Removed use of `NewTestAgent`, per review comment
Removed CLI flag, per review comment
Updated website documentation
Added changelog entry
2020-12-30 14:09:50 -06:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
Michael Montgomery ed719c978b Merge branch 'master' into 6074-allow-config-MaxHeaderBytes 2020-11-20 07:43:53 -06:00
Daniel Nephin 8647483605 Use freeport
To prevent other tests which already use freeport from flaking when port 0 steals their reserved port.
2020-11-18 16:07:34 -05:00
Daniel Nephin fed2a61dfc agent: fix bug with multiple listeners
Previously the listener was being passed to a closure in a loop without
capturing the loop variable. The result is only the last listener is
used, so the http/https servers only listen on one address.

This problem is fixed by capturing the variable by passing it into a
function.
2020-11-18 13:03:29 -05:00
Michael Montgomery 1c0a46849a Resolves #6074. Adds new option to configure HTTP Server's MaxHeaderBytes with option `-http-max-header-bytes`
Adds tests for behavior
2020-10-29 12:38:19 -05:00
Mathilde Gilles a1ec792acc Fix: service LocallyRegisteredAsSidecar property is not persisted
When a service is deregistered, we check whever matching services were
registered as sidecar along with it and deregister them as well.
To determine if a service is indeed a sidecar we check the
structs.ServiceNode.LocallyRegisteredAsSidecar property. However, to
avoid interal API leakage, it is excluded from JSON serialization,
meaning it is not persisted to disk either.
When the agent is restarted, this property lost and sidecars are no
longer deregistered along with their parent service.
To fix this, we now specifically save this property in the persisted
service file.
2020-10-13 19:38:58 +02:00
R.B. Boyer 4b525e5d16
agent: enable enable_central_service_config by default (#8746) 2020-10-01 09:19:14 -05:00
R.B. Boyer 657995cc93
agent: when enable_central_service_config is enabled ensure agent reload doesn't revert check state to critical (#8747)
Likely introduced when #7345 landed.
2020-09-24 16:24:04 -05:00
Daniel Nephin e0119a6e92
Merge pull request #8680 from hashicorp/dnephin/replace-consul-opts-with-base-deps
agent: Repalce ConsulOptions with a new struct from agent.BaseDeps
2020-09-24 12:45:54 -04:00
Daniel Nephin c621b4a420 agent/consul: pass dependencies directly from agent
In an upcoming change we will need to pass a grpc.ClientConnPool from
BaseDeps into Server. While looking at that change I noticed all of the
existing consulOption fields are already on BaseDeps.

Instead of duplicating the fields, we can create a struct used by
agent/consul, and use that struct in BaseDeps. This allows us to pass
along dependencies without translating them into different
representations.

I also looked at moving all of BaseDeps in agent/consul, however that
created some circular imports. Resolving those cycles wouldn't be too
bad (it was only an error in agent/consul being imported from
cache-types), however this change seems a little better by starting to
introduce some structure to BaseDeps.

This change is also a small step in reducing the scope of Agent.

Also remove some constants that were only used by tests, and move the
relevant comment to where the live configuration is set.

Removed some validation from NewServer and NewClient, as these are not
really runtime errors. They would be code errors, which will cause a
panic anyway, so no reason to handle them specially here.
2020-09-15 17:29:32 -04:00
Daniel Nephin 4eb514a59f http: fix tests incorrectly using HTTPAddr to get the address of the
https server.

In #8234 I changed a few tests to use TestAgent.HTTPAddr() to find the
addr used in the test. Due to the way HTTPAddr() was implemented these
tests were passing, but I think the pass was incidental. HTTPAddr() was
not matching any servers, and was instead returning the last server,
which happened to be the one these tests wanted.

This commit fixes the implementation of HTTPAddr to panic if no match
was found. The tests which require an HTTPS server are changed to use
a new firstAddr() to look up the correct address.
2020-09-04 15:29:17 -04:00
Daniel Nephin ec5d20b0de
Merge pull request #8554 from hashicorp/dnephin/agent-setup-persisted-tokens
agent: move token persistence from agent into token.Store
2020-09-03 17:29:21 -04:00
Daniel Nephin 330b73725f agent: add apiServers type for managing HTTP servers
Remove Server field from HTTPServer. The field is no longer used.
2020-09-03 13:40:12 -04:00
Daniel Nephin b64ce07ef7 agent/token: Move token persistence out of agent
And into token.Store. This change isolates any awareness of token
persistence in a single place.

It is a small step in allowing Agent.New to accept its dependencies.
2020-08-31 15:00:34 -04:00
Daniel Nephin 39b06a0c0b
Merge pull request #8552 from pierresouchay/reload_cache_throttling_config
Ensure that Cache options are reloaded when `consul reload` is performed
2020-08-28 15:04:42 -04:00
Pierre Souchay f92ae5e6ca Also test reload of EntryFetchMaxBurst 2020-08-27 18:14:05 +02:00
Matt Keeler 106e1d50bd
Move RPC router from Client/Server and into BaseDeps (#8559)
This will allow it to be a shared component which is needed for AutoConfig
2020-08-27 11:23:52 -04:00
Pierre Souchay 4983e093a0 Tests that changes in rate limit are taken into account by agent 2020-08-27 16:41:20 +02:00
Daniel Nephin 8d35e37b3c testing: Remove all the defer os.Removeall
Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage
the removal
2020-08-14 19:58:53 -04:00
Daniel Nephin b6d91d59f3 testing: Remove TestAgent.Key and change TestAgent.DataDir
TestAgent.Key was only used by 3 tests. Extracting it from the common helper that is used in hundreds of
tests helps keep the shared part small and more focused.

This required a second change (which I was planning on making anyway), which was to change the behaviour of
DataDir. Now in all cases the TestAgent will use the DataDir, and clean it up once the test is complete.
2020-08-13 17:53:24 -04:00
Daniel Nephin 6be568119b Extract nodeID functions to a different file
In preparation for turning them into functions.
To reduce the scope of Agent, and refactor how Agent is created and started.
2020-08-12 13:05:10 -04:00
Daniel Nephin cbdceeb044 config: Make Source an interface
This will allow us to accept config from auto-config without needing to
go through a serialziation cycle.
2020-08-10 12:46:28 -04:00
Matt Keeler a0ebab5bac
Ensure certificates retrieved through the cache get persisted with auto-config (#8409) 2020-07-30 11:37:18 -04:00
Matt Keeler dad0f189a2
Agent Auto Config: Implement Certificate Generation (#8360)
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.

This also refactors the auto-config package a bit to split things out into multiple files.
2020-07-28 15:31:48 -04:00
Matt Keeler 8df112526d
Fix some broken code in master
There were several PRs that while all passed CI independently, when they all got merged into the same branch caused compilation errors in test code.

The main changes that caused issues where changing agent/cache.Cache.New to require a concrete options struct instead of a pointer. This broke the cert monitor tests and the catalog_list_services_test.go. Another change was made to unembed the http.Server from the agent.HTTPServer struct. That coupled with another change to add a test to ensure cache rate limiting coming from HTTP requests was working as expected caused compilation failures.
2020-07-28 09:50:10 -04:00
Pierre Souchay 947d8eb039
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler fa6a2b38d9
Add an AutoEncrypt “integration” test
Also fix a bug where Consul could segfault if TLS was enabled but no client certificate was provided. How no one has reported this as a problem I am not sure.
2020-06-30 15:23:29 -04:00
Daniel Nephin 0285956fac Update TestAgent_GetCoordinate
The old test case was a very specific regresion test for a case that is no longer possible.
Replaced with a new test that checks the default coordinate is returned.
2020-06-24 13:00:15 -04:00
Daniel Nephin 07c1081d39 Fix a bunch of unparam lint issues 2020-06-24 13:00:14 -04:00
Matt Keeler 7086a50353 Change auto config authorizer to allow for future extension
The envisioned changes would allow extra settings to enable dynamically defined auth methods to be used instead of  or in addition to the statically defined one in the configuration.
2020-06-18 15:22:24 -04:00
Matt Keeler 2c7844d220
Implement Client Agent Auto Config
There are a couple of things in here.

First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.

This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
2020-06-17 16:49:46 -04:00
Daniel Nephin 89d95561df Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Hans Hasselberg 7f14d3ac8a
tests: use constructor instead init (#8024) 2020-06-04 22:59:06 +02:00
Pierre Souchay 7cd5477c3c
checks: when a service does not exists in an alias, consider it failing (#7384)
In current implementation of Consul, check alias cannot determine
if a service exists or not. Because a service without any check
is semantically considered as passing, so when no healthchecks
are found for an agent, the check was considered as passing.

But this make little sense as the current implementation does not
make any difference between:
 * a non-existing service (passing)
 * a service without any check (passing as well)

In order to make it work, we have to ensure that when a check did
not find any healthcheck, the service does indeed exists. If it
does not, lets consider the check as failing.
2020-06-04 14:50:52 +02:00
Daniel Nephin e8a883e829
Replace goe/verify.Values with testify/require.Equal (#7993)
* testing: replace most goe/verify.Values with require.Equal

One difference between these two comparisons is that go/verify considers
nil slices/maps to be equal to empty slices/maps, where as testify/require
does not, and does not appear to provide any way to enable that behaviour.

Because of this difference some expected values were changed from empty
slices to nil slices, and some calls to verify.Values were left.

* Remove github.com/pascaldekloe/goe/verify

Reduce the number of assertion packages we use from 2 to 1
2020-06-02 12:41:25 -04:00
Daniel Nephin 2e0f750f1a Add unconvert linter
To find unnecessary type convertions
2020-05-12 13:47:25 -04:00
Pierre Souchay 2b8da952a8
agent: show warning when enable_script_checks is enabled without safty net (#7437)
In order to enforce a bit security on Consul agents, add a new method in agent
to highlight possible security issues.

This does not return an error for now, but might in the future.

For now, it detects issues such as:

https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations/

This would display this kind of messages:

```
2020-03-11T18:27:49.873+0100 [ERROR] agent: [SECURITY] issue: error="using enable-script-checks without ACLs and without allow_write_http_from is DANGEROUS, use enable-local-script-checks instead see https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations/"
```
2020-04-02 09:59:23 +02:00
Daniel Nephin 09c6ac8b92 Rename NewTestAgentWithFields to StartTestAgent
This function now only starts the agent.

Using:

git grep -l 'StartTestAgent(t, true,' | \
        xargs sed -i -e 's/StartTestAgent(t, true,/StartTestAgent(t,/g'
2020-03-31 17:14:55 -04:00
Daniel Nephin d623dcbd01 Convert the remaining calls to NewTestAgentWithFields
After removing the t.Name() parameter with sed, convert the last few tests which
use a custom name to call NewTestAgentWithFields instead.
2020-03-31 17:14:55 -04:00
Daniel Nephin 8b6877febd Remove name from NewTestAgent
Using:

git grep -l 'NewTestAgent(t, t.Name(),' | \
    xargs sed -i -e 's/NewTestAgent(t, t.Name(),/NewTestAgent(t,/g'
2020-03-31 16:13:44 -04:00
Daniel Nephin 96c4a35de7 Remove t.Name() from TestAgent.Name
And re-add the name to the logger so that log messages from different agents
in a single can be identified.
2020-03-30 16:47:24 -04:00
Daniel Nephin 823295fe2a testing: reduce verbosity of output log
Previously the log output included the test name twice and a long date
format. The test output is already grouped by test, so adding the test
name did not add any new information. The date and time are only useful
to understand elapsed time, so using a short format should provide
succident detail.

Also fixed a bug in NewTestAgentWithFields where nil was returned
instead of the test agent.
2020-03-30 13:23:13 -04:00
Chris Piraino 0c5c97205f
Fix flakey health check reload test (#7490)
This test would occasionally fail because we checked for a status of
"critical" initially. This races with the actual healthcheck being run
and declared passing.

We instead use a ttl health check so that we don't rely on timing at all.
2020-03-25 09:09:13 -05:00
R.B. Boyer a7fb26f50f
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00
Pierre Souchay 49dc891737
agent: configuration reload preserves check's statuses for services (#7345)
This fixes issue #7318

Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health
of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks
of services.

What happened when a reload was:

1. save all healthcheck statuses
2. cleanup everything
3. add new services with healthchecks

In step 3, the state of healthchecks was taken into account locally,
so at step 3, but since we cleaned up at step 2, state was lost.

This PR introduces the snap parameter, so step 3 can use information from step 1
2020-03-09 12:59:41 +01:00
Chris Piraino 3dd0b59793
Allow users to configure either unstructured or JSON logging (#7130)
* hclog Allow users to choose between unstructured and JSON logging
2020-01-28 17:50:41 -06:00