Commit graph

2485 commits

Author SHA1 Message Date
Aliaksandr Mianzhynski c79180980c Return grpc serving status in health check errors 2020-09-22 21:16:58 +03:00
Daniel Nephin b3ec7df80f api: rename HTTPServer to HTTPHandlers
Resolves a TODO about naming. This type is a set of handlers for an http.Server, it is not
itself a Server. It provides http.Handler functions.
2020-09-18 17:38:23 -04:00
Hans Hasselberg c6fa758d6f
fix TestLeader_SecondaryCA_IntermediateRenew (#8702)
* fix lessThanHalfTime
* get lock for CAProvider()
* make a var to relate both vars
* rename to getCAProviderWithLock
* move CertificateTimeDriftBuffer to agent/connect/ca
2020-09-18 10:13:29 +02:00
Daniel Nephin 2d3540d6b5
Merge pull request #8620 from hashicorp/dnephin/better-impl-of-TestAgent.HTTPAddr
http: fix tests incorrectly using HTTPAddr to get the address of the https server
2020-09-17 11:48:57 -04:00
Mike Morris fe984b3ee3
test: update tags for database service registrations and queries (#8693) 2020-09-16 14:05:01 -04:00
Kyle Havlovitz 8f83f7ac13
Merge pull request #8560 from hashicorp/vault-ca-renew-token
Automatically renew the token used by the Vault CA provider
2020-09-16 07:30:30 -07:00
Daniel Nephin 9f83eb3dc9
Merge pull request #8685 from pierresouchay/do_not_flood_logs_with_Non-server_in_server-only_area
[BUGFIX] Avoid GetDatacenter* methods to flood Consul servers logs
2020-09-15 17:57:05 -04:00
Kyle Havlovitz c8fd61abc7 Merge branch 'master' into vault-ca-renew-token 2020-09-15 14:39:04 -07:00
Daniel Nephin c621b4a420 agent/consul: pass dependencies directly from agent
In an upcoming change we will need to pass a grpc.ClientConnPool from
BaseDeps into Server. While looking at that change I noticed all of the
existing consulOption fields are already on BaseDeps.

Instead of duplicating the fields, we can create a struct used by
agent/consul, and use that struct in BaseDeps. This allows us to pass
along dependencies without translating them into different
representations.

I also looked at moving all of BaseDeps in agent/consul, however that
created some circular imports. Resolving those cycles wouldn't be too
bad (it was only an error in agent/consul being imported from
cache-types), however this change seems a little better by starting to
introduce some structure to BaseDeps.

This change is also a small step in reducing the scope of Agent.

Also remove some constants that were only used by tests, and move the
relevant comment to where the live configuration is set.

Removed some validation from NewServer and NewClient, as these are not
really runtime errors. They would be code errors, which will cause a
panic anyway, so no reason to handle them specially here.
2020-09-15 17:29:32 -04:00
Daniel Nephin 0536b2047e agent/consul: make router required 2020-09-15 17:26:26 -04:00
Daniel Nephin 49086dd5ae
Merge pull request #8679 from hashicorp/streaming/fix-TestHandler_EmitsStats
streaming: Fix TestHandler_EmitsStats
2020-09-15 17:04:55 -04:00
Kyle Havlovitz 316600a685 Update vault CA for latest api client 2020-09-15 13:33:55 -07:00
Paul Banks 0062106c46
Update UI Config passing to not use an inline script (#8645)
* Update UI Config passing to not use an inline script

* Update agent/http.go

* Fix incorrect placeholder name
2020-09-15 20:57:37 +01:00
Kyle Havlovitz 63d3a5fc1f Clean up CA shutdown logic and error 2020-09-15 12:28:58 -07:00
Kyle Havlovitz 6f1dd25139
Merge pull request #8646 from hashicorp/common-intermediate-ttl
Move IntermediateCertTTL to common CA config
2020-09-15 12:03:29 -07:00
Pierre Souchay 617e0d2364 [BUGFIX] Avoid GetDatacenter* methods to flood Consul servers logs
When calling `GetDatacentersByDistance()` or `GetDatacentersMap()`, an
incorrect condition was used to diplay log message, thus flooding
Consul's logs.

Example of message:

```
  [WARN] agent.router: Non-server in server-only area: non_server=myClientNode area=lan
```

This message is only valid for WAN areas, filter to avoid creating
hundreds of logs/s on our clusters, each time someone is calling this
method.

Our logs were flooded by such messages when migrating our Consul servers
from 1.7.7 to 1.8.4.

This will issue fix #8663
2020-09-15 11:54:59 +02:00
Daniel Nephin 1e40f00567 agent/grpc: make TestHandler_EmitsStats predictable
Occasionally this test would flake. The flakes were fixed by:

1. Stopping the service and retrying to check on metrics. This way we
   also include the active_streams going to 0 in the metric calls.

2. Using a reference to the global Metrics. This way when other tests
   have background goroutines that are still shutting down, they won't
   emit metrics to the metric instance with the fake Sink. The stats
   test can patch the local reference to the global, so the existing
   statHandlers will continue to emit to the global, but the stats
   test will send all metrics to the replacement.
2020-09-14 19:05:22 -04:00
Daniel Nephin c827e7f1e9 grpc: add Datacenter field to testing service response 2020-09-14 19:02:09 -04:00
freddygv e0db834148 Fix text type assertion 2020-09-14 16:28:40 -06:00
freddygv 43efb4809c Merge master 2020-09-14 16:17:43 -06:00
freddygv 66e5c5989a Fix type assertion 2020-09-14 16:12:21 -06:00
Daniel Nephin 75515f3431
Merge pull request #8587 from hashicorp/streaming/add-grpc-server
streaming: add gRPC server for handling connections
2020-09-14 15:24:54 -04:00
freddygv 33af8dab9a Resolve conflicts against master 2020-09-11 18:41:58 -06:00
freddygv 60cb306524 Add session flag to cookie config 2020-09-11 18:34:03 -06:00
freddygv ae8c609f10 PR comments 2020-09-11 10:49:26 -06:00
Kyle Havlovitz 1595add842 Clean up Vault renew tests and shutdown 2020-09-11 08:41:05 -07:00
freddygv 5871b667a5 Revert EnvoyConfig nesting 2020-09-11 09:21:43 -06:00
Kyle Havlovitz dc393336d1 Use mapstructure for decoding vault data 2020-09-10 06:31:04 -07:00
Kyle Havlovitz 7588e22739 Add a stop function to make sure the renewer is shut down on leader change 2020-09-10 06:12:48 -07:00
Kyle Havlovitz c52dfeb633 Move IntermediateCertTTL to common CA config 2020-09-10 00:23:22 -07:00
Kyle Havlovitz 1c57b72a9f Add a test for token renewal 2020-09-09 16:36:37 -07:00
Daniel Nephin 1a930fcf75 grpc: Add a simple test service for testing the gRPC server 2020-09-08 12:10:43 -04:00
Daniel Nephin 863a9df951 server: add gRPC server for streaming events
Includes a stats handler and stream interceptor for grpc metrics.

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-08 12:10:41 -04:00
Daniel Nephin 4eb514a59f http: fix tests incorrectly using HTTPAddr to get the address of the
https server.

In #8234 I changed a few tests to use TestAgent.HTTPAddr() to find the
addr used in the test. Due to the way HTTPAddr() was implemented these
tests were passing, but I think the pass was incidental. HTTPAddr() was
not matching any servers, and was instead returning the last server,
which happened to be the one these tests wanted.

This commit fixes the implementation of HTTPAddr to panic if no match
was found. The tests which require an HTTPS server are changed to use
a new firstAddr() to look up the correct address.
2020-09-04 15:29:17 -04:00
freddygv 1ee039ed95 Set tgw filter router config name to cluster name 2020-09-04 12:45:05 -06:00
Hans Hasselberg 51f079dcdd
secondaryIntermediateCertRenewalWatch abort on success (#8588)
secondaryIntermediateCertRenewalWatch was using `retryLoopBackoff` to
renew the intermediate certificate. Once it entered the inner loop and
started `retryLoopBackoff` it would never leave that.
`retryLoopBackoffAbortOnSuccess` will return when renewing is
successful, like it was intended originally.
2020-09-04 11:47:16 +02:00
freddygv 3e4bc36941 Add server receiver to routes and log tgw err 2020-09-03 16:19:58 -06:00
Daniel Nephin 670b7cbd99
Merge pull request #8357 from hashicorp/streaming/add-service-health-events
streaming: add ServiceHealth events
2020-09-03 17:53:56 -04:00
Daniel Nephin ec5d20b0de
Merge pull request #8554 from hashicorp/dnephin/agent-setup-persisted-tokens
agent: move token persistence from agent into token.Store
2020-09-03 17:29:21 -04:00
Daniel Nephin c17a5b0628 state: handle terminating gateways in service health events 2020-09-03 16:58:05 -04:00
Daniel Nephin b241debee7 state: improve comments in catalog_events.go
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:05 -04:00
Daniel Nephin 870823e8ed state: use changeType in serviceChanges
To be a little more explicit, instead of nil implying an indirect change
2020-09-03 16:58:05 -04:00
Daniel Nephin 68682e7e83 don't over allocate slice 2020-09-03 16:58:04 -04:00
Daniel Nephin 5f52220f53 state: fix a bug in building service health events
The nodeCheck slice was being used as the first arg in append, which in some cases will modify the array backing the slice. This would lead to service checks for other services in the wrong event.

Also refactor some things to reduce the arguments to functions.
2020-09-03 16:58:04 -04:00
Daniel Nephin c61313b78a state: Remove unused args and return values
Also rename some functions to identify them as constructors for events
2020-09-03 16:58:04 -04:00
Daniel Nephin 668b98bcce state: use an enum for tracking node changes 2020-09-03 16:58:04 -04:00
Daniel Nephin 7c3c627028 state: serviceHealthSnapshot
refactored to remove unused return value and remove duplication
2020-09-03 16:58:04 -04:00
Daniel Nephin fdfe176deb state: Add Change processor and snapshotter for service health
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:04 -04:00
Daniel Nephin 6a1a43721d state: fix bug in changeTrackerDB.publish
Creating a new readTxn does not work because it will not see the newly created objects that are about to be committed. Instead use the active write Txn.
2020-09-03 16:58:01 -04:00
Daniel Nephin 81cc3daf69 stream: have SnapshotFunc accept a non-pointer SubscribeRequest
The value is not expected to be modified. Passing a value makes that explicit.
2020-09-03 16:54:02 -04:00
freddygv 56fdae9ace Update resolver defaulting 2020-09-03 13:08:44 -06:00
freddygv b149185794 Update golden files after default route fix for tgw 2020-09-03 12:35:11 -06:00
Daniel Nephin 330b73725f agent: add apiServers type for managing HTTP servers
Remove Server field from HTTPServer. The field is no longer used.
2020-09-03 13:40:12 -04:00
freddygv 23147c1d5b Fix http assertion in route creation 2020-09-03 10:21:20 -06:00
freddygv 0c50b8e769 Add explicit protocol overrides in tgw xds test cases 2020-09-03 08:57:48 -06:00
freddygv 02d6acd8fc Ensure resolver node with LB isn't considered default 2020-09-03 08:55:57 -06:00
freddygv c4bce2154b Move valid policies to pkg level 2020-09-02 15:49:03 -06:00
freddygv daad3b9210 Remove LB infix and move injection to xds 2020-09-02 15:13:50 -06:00
R.B. Boyer b0bde51e70
connect: all config entries pick up a meta field (#8596)
Fixes #8595
2020-09-02 14:10:25 -05:00
Chris Piraino df1381f77f
Merge pull request #8603 from hashicorp/feature/usage-metrics
Track node and service counts in the state store and emit them periodically as metrics
2020-09-02 13:23:39 -05:00
R.B. Boyer 4197bed23b
connect: fix bug in preventing some namespaced config entry modifications (#8601)
Whenever an upsert/deletion of a config entry happens, within the open
state store transaction we speculatively test compile all discovery
chains that may be affected by the pending modification to verify that
the write would not create an erroneous scenario (such as splitting
traffic to a subset that did not exist).

If a single discovery chain evaluation references two config entries
with the same kind and name in different namespaces then sometimes the
upsert/deletion would be falsely rejected. It does not appear as though
this bug would've let invalid writes through to the state store so the
correction does not require a cleanup phase.
2020-09-02 10:47:19 -05:00
Chris Piraino b245d60200 Set metrics reporting interval to 9 seconds
This is below the 10 second interval that lib/telemetry.go implements as
its aggregation interval, ensuring that we always report these metrics.
2020-09-02 10:24:23 -05:00
Chris Piraino e9b397005c Update godoc string for memdb wrapper functions/structs 2020-09-02 10:24:22 -05:00
Chris Piraino 80f923a47a Refactor state store usage to track unique service names
This commit refactors the state store usage code to track unique service
name changes on transaction commit. This means we only need to lookup
usage entries when reading the information, as opposed to iterating over
a large number of service indices.

- Take into account a service instance's name being changed
- Do not iterate through entire list of service instances, we only care
about whether there is 0, 1, or more than 1.
2020-09-02 10:24:21 -05:00
Chris Piraino 79e6534345 Use ReadTxn interface in state store helper functions 2020-09-02 10:24:20 -05:00
Chris Piraino d90d95421d Add WriteTxn interface and convert more functions to ReadTxn
We add a WriteTxn interface for use in updating the usage memdb table,
with the forward-looking prospect of incrementally converting other
functions to accept interfaces.

As well, we use the ReadTxn in new usage code, and as a side effect
convert a couple of existing functions to use that interface as well.
2020-09-02 10:24:19 -05:00
Chris Piraino 45a4057f60 Report node/service usage metrics from every server
Using the newly provided state store methods, we periodically emit usage
metrics from the servers.

We decided to emit these metrics from all servers, not just the leader,
because that means we do not have to care about leader election flapping
causing metrics turbulence, and it seems reasonable for each server to
emit its own view of the state, even if they should always converge
rapidly.
2020-09-02 10:24:17 -05:00
Chris Piraino 3af96930eb Add new usage memdb table that tracks usage counts of various elements
We update the usage table on Commit() by using the TrackedChanges() API
of memdb.

Track memdb changes on restore so that usage data can be compiled
2020-09-02 10:24:16 -05:00
freddygv d7bda050e0 Restructure structs and other PR comments 2020-09-02 09:10:50 -06:00
Daniel Nephin 9535a1b57d token: OSS support for enterprise tokens 2020-08-31 15:10:15 -04:00
Daniel Nephin 8e477feb22 config: use token.Config for ACLToken config
Using the target Config struct reduces the amount of copying and
translating of configuration structs.
2020-08-31 15:10:15 -04:00
Daniel Nephin b64ce07ef7 agent/token: Move token persistence out of agent
And into token.Store. This change isolates any awareness of token
persistence in a single place.

It is a small step in allowing Agent.New to accept its dependencies.
2020-08-31 15:00:34 -04:00
Daniel Nephin fbae521775 fix TestStore_RegularTokens
This test was only passing because t.Parallel was causing every subtest to run with the last value in the iteration,
which sets a value for all tokens. The test started to fail once t.Parallel was removed, but the same failure could
have been produced by adding 'tt := tt' to the t.Run() func.

These tests run in under 10ms, so there is no reason to use t.Parallel.
2020-08-31 14:59:14 -04:00
Matt Keeler 335c604ced
Merge of auto-config and auto-encrypt code (#8523)
auto-encrypt is now handled as a special case of auto-config.

This also is moving all the cert-monitor code into the auto-config package.
2020-08-31 13:12:17 -04:00
freddygv 58a018c20b Add documentation for resolver LB cfg 2020-08-28 14:46:13 -06:00
freddygv 194d34b09d Pass LB config to Envoy via xDS 2020-08-28 14:27:40 -06:00
freddygv 8f470b30d7 Log error as error 2020-08-28 13:11:55 -06:00
freddygv afb14b6705 Compile down LB policy to disco chain nodes 2020-08-28 13:11:04 -06:00
Daniel Nephin 845661c8af
Merge pull request #8548 from edevil/fix_flake
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve
2020-08-28 15:10:55 -04:00
Daniel Nephin 39b06a0c0b
Merge pull request #8552 from pierresouchay/reload_cache_throttling_config
Ensure that Cache options are reloaded when `consul reload` is performed
2020-08-28 15:04:42 -04:00
Pierre Souchay ee50b55163 Added Unit test for cache reloading 2020-08-28 13:03:58 +02:00
freddygv 391d569a45 Add LB policy to service-resolver 2020-08-27 19:44:02 -06:00
Jack 145bcdc2bb
Add http2 and grpc support to ingress gateways (#8458) 2020-08-27 15:34:08 -06:00
R.B. Boyer f2b8bf109c
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
R.B. Boyer a7a8b8d6d9
agent: ensure that we normalize bootstrapped config entries (#8547) 2020-08-27 11:37:25 -05:00
Pierre Souchay f92ae5e6ca Also test reload of EntryFetchMaxBurst 2020-08-27 18:14:05 +02:00
Matt Keeler 106e1d50bd
Move RPC router from Client/Server and into BaseDeps (#8559)
This will allow it to be a shared component which is needed for AutoConfig
2020-08-27 11:23:52 -04:00
Pierre Souchay 4983e093a0 Tests that changes in rate limit are taken into account by agent 2020-08-27 16:41:20 +02:00
Pierre Souchay 084d0e8015 Added options.Equals() and minor fixes indentation fixes 2020-08-27 13:44:45 +02:00
R.B. Boyer 6fad634512
agent: expose the list of supported envoy versions on /v1/agent/self (#8545) 2020-08-26 10:04:11 -05:00
Kyle Havlovitz 6f7152841f Automatically renew the token used by the Vault CA provider 2020-08-25 10:34:49 -07:00
Pierre Souchay dd385f05e6 Ensure that Cache options are reloaded when consul reload is performed.
This will apply cache throttling parameters are properly applied:
 * cache.EntryFetchMaxBurst
 * cache.EntryFetchRate

When values are updated, a log is displayed in info.
2020-08-24 23:33:10 +02:00
André Cruz 673bd69f36
Decrease test flakiness
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve and TestCacheNotifyPolling
2020-08-24 20:30:02 +01:00
André Cruz a64686fab6
testing: Fix govet errors 2020-08-21 18:01:55 +01:00
Daniel Nephin 1c3a638d69
Merge pull request #8537 from hashicorp/dnephin/fix-panic-on-connect-nil
Fix panic when decoding 'Connect: null'
2020-08-20 18:00:25 -04:00
Daniel Nephin 4155cae1cb Fix panic when decoding 'Connect: null'
Surprisingly the json Unmarshal updates the aux pointer to a nil.
2020-08-20 17:52:14 -04:00
Daniel Nephin a97adadd2b config: use logging.Config in RuntimeConfig
To add structure to RuntimeConfig, and remove the need to translate into a third type.
2020-08-19 13:21:00 -04:00
Daniel Nephin e4578aace8 logging: move init of grpclog
This line initializes global state. Moving it out of the constructor and closer to where logging
is setup helps keep related things together.
2020-08-19 13:21:00 -04:00
Daniel Nephin 7349018ff3 logging: Setup accept io.Writer instead of []io.Writer
Also accept a non-pointer Config, since the config is not modified
2020-08-19 13:20:41 -04:00
Daniel Nephin a520cf3ea7 testing: disable global metrics sink in tests
This might be better handled by allowing configuration for the InMemSink interval and retail, and disabling
the global. For now this is a smaller change to remove the goroutine leak caused by tests because go-metrics
does not provide any way of shutting down the global goroutine.
2020-08-18 19:04:57 -04:00
Daniel Nephin 84642486b9 agent: extract dependency creation from New
With this change, Agent.New() accepts many of the dependencies instead
of creating them in New. Accepting fully constructed dependencies from
a constructor makes the type easier to test, and easier to change.

There are still a number of dependencies created in Start() which can
be addressed in a follow up.
2020-08-18 19:04:55 -04:00
Daniel Nephin b204e342c5
Merge pull request #8514 from hashicorp/dnephin/testing-improvements-1
testing: small improvements to TestSessionCreate and testutil.retry
2020-08-18 18:26:05 -04:00
Daniel Nephin ab0d206eac
Merge pull request #8528 from hashicorp/dnephin/move-node-name-validation
config: Move some config validation from Agent.Start to config.Builder.Validate
2020-08-18 18:25:41 -04:00
Hans Hasselberg 02de4c8b76
add primary keys to list keyring (#8522)
During gossip encryption key rotation it would be nice to be able to see if all nodes are using the same key. This PR adds another field to the json response from `GET v1/operator/keyring` which lists the primary keys in use per dc. That way an operator can tell when a key was successfully setup as primary key.

Based on https://github.com/hashicorp/serf/pull/611 to add primary key to list keyring output:

```json
[
  {
    "WAN": true,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 6,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "NumNodes": 6
  },
  {
    "WAN": false,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 8,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  },
  {
    "WAN": false,
    "Datacenter": "dc1",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 3,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  }
]
```

I intentionally did not change the CLI output because I didn't find a good way of displaying this information. There are a couple of options that we could implement later:
* add a flag to show the primary keys
* add a flag to show json output

Fixes #3393.
2020-08-18 09:50:24 +02:00
Daniel Nephin 7078ca07fa config: Move remote-script-checks warning to config
Previously it was done in Agent.Start, but it can be done much earlier
2020-08-17 17:39:49 -04:00
Daniel Nephin a0dc4222b6 config: move NodeName validation to config validation
Previsouly it was done in Agent.Start, which is much later then it needs to be.

The new 'dns' package was required, because otherwise there would be an
import cycle. In the future we should move more of the dns server into
the dns package.
2020-08-17 17:25:02 -04:00
Daniel Nephin 647236bb17
Merge pull request #8515 from hashicorp/dnephin/unexport-testing-shims
config: unexport fields and resolve TODOs in config.Builder
2020-08-17 16:03:07 -04:00
Daniel Nephin 3e0d63a6b7 testing: use t.Cleanup in testutil.TempFile
So that it has the same behaviour as TempDir.

Also remove the now unnecessary 'defer os.Remove'
2020-08-14 20:06:01 -04:00
Daniel Nephin 8d35e37b3c testing: Remove all the defer os.Removeall
Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage
the removal
2020-08-14 19:58:53 -04:00
Daniel Nephin fe8790da9e config: unexport and resolve TODOs in config.Builder
- unexport testing shims, and document their purpose
- resolve a TODO by moving validation to NewBuilder and storing the one
  field that is used instead of all of Options
- create a slice with the correct size to avoid extra allocations
2020-08-14 19:23:32 -04:00
Daniel Nephin 85655098be testing: Improve session_endpoint_test
While working on another change I caused a bunch of these tests to fail.
Unfortunately the failure messages were not super helpful at first.

One problem was that the request and response were created outside of
the retry. This meant that when the second attempt happened, the request
body was empty (because the buffer had been consumed), and so the
request was not actually being retried. This was fixed by moving more of
the request creation into the retry block.

Another problem was that these functions can return errors in two ways, and
are not consistent about which way they use. Some errors are returned to
the response writer, but the tests were not checking those errors, which
was causing a panic later on. This was fixed by adding a check for the
response code.

Also adds some missing t.Helper(), and has assertIndex use checkIndex so
that it is clear these are the same implementation.
2020-08-14 18:55:52 -04:00
Daniel Nephin 2725513eea testutil: Add t.Cleanup to TempDir
TempDir registers a Cleanup so that the directory is always removed. To disable to cleanup, set the TEST_NOCLEANUP
env var.
2020-08-14 13:19:10 -04:00
Daniel Nephin 6b0ac22c1b testing: fix flaky test TestDNS_NonExistentDC_RPC
I saw this test flake locally, and it was easy to reproduce with -count=10.

The failure was: 'TestAgent.dns: rpc error: error=No known Consul servers'.

Waiting for the agent seems to fix it.
2020-08-13 18:03:04 -04:00
Daniel Nephin 512a523a3e testing: wait until monitor has started before shutdown
This commit fixes a test that I saw flake locally while running tests. The test output from the monitor
started immediately after the line the test was looking for.

To fix the problem a channel is closed when the goroutine starts. Shutdown is not called until this channel
is closed, which seems to greatly reduce the chance of a flake.
2020-08-13 17:53:29 -04:00
Daniel Nephin b6d91d59f3 testing: Remove TestAgent.Key and change TestAgent.DataDir
TestAgent.Key was only used by 3 tests. Extracting it from the common helper that is used in hundreds of
tests helps keep the shared part small and more focused.

This required a second change (which I was planning on making anyway), which was to change the behaviour of
DataDir. Now in all cases the TestAgent will use the DataDir, and clean it up once the test is complete.
2020-08-13 17:53:24 -04:00
Daniel Nephin 3abe4e43d3 testing: use t.Cleanup in TestAgent for returnPorts 2020-08-13 17:09:37 -04:00
Daniel Nephin edf6e74a14 testing: remove unused fields from TestACLAgent 2020-08-13 17:03:55 -04:00
Daniel Nephin 24d8db906f agent: rename vars in newConsulConfig
'base' is a bit misleading, since it is the return value. Renamed to cfg.
2020-08-13 11:58:21 -04:00
Daniel Nephin 88ddd8e0e7 agent: Move setupKeyring functions to keyring.go
There are a couple reasons for this change:

1. agent.go is way too big. Smaller files makes code eaasier to read
   because tools that show usage also include filename which can give
   a lot more context to someone trying to understand which functions
   call other functions.
2. these two functions call into a large number of functions already in
   keyring.go.
2020-08-13 11:58:21 -04:00
Daniel Nephin 119df79d7c agent: unmethod consulConfig
To allow us to move newConsulConfig out of Agent.
2020-08-13 11:58:21 -04:00
Daniel Nephin 055e7e8ca3 Fix conflict in merged PRs
One PR renamed the var from config->cfg, and another used the old name config, which caused the
build to fail on master.
2020-08-13 11:28:26 -04:00
Daniel Nephin 629c34085d state: remove unused Store method receiver
And use ReadTxn interface where appropriate.
2020-08-13 11:25:22 -04:00
Daniel Nephin 20f94bf9ab
Merge pull request #8463 from hashicorp/dnephin/unmethod-make-node-id
agent: convert NodeID methods to functions
2020-08-13 11:18:11 -04:00
Daniel Nephin fc797a279a
Merge pull request #8461 from hashicorp/dnephin/remove-notify-shutdown
agent/consul: Remove NotifyShutdown
2020-08-13 11:16:48 -04:00
Daniel Nephin d8ffcd5686
Merge pull request #8365 from hashicorp/dnephin/fix-service-by-node-meta-flake
state: speed up tests that use watchLimit
2020-08-13 11:16:12 -04:00
Daniel Nephin 45dae87ee7 auto-config: reduce awareness of config
This is a small step to allowing Agent to accept its dependencies
instead of creating them in New.

There were two fields in autoconfig.Config that were used exclusively
to load config. These were replaced with a single function, allowing us
to move LoadConfig back to the config package.

Also removed the WithX functions for building a Config. Since these were
simple assignment, it appeared we were not getting much value from them.
2020-08-12 13:23:23 -04:00
Daniel Nephin 62e402c4f9 Remove check that hostID is a uuid.
Immediately afterward we hash the ID, so it does not need to be a uuid anymore.
2020-08-12 13:05:10 -04:00
Daniel Nephin 55b074f0eb agent: convert NodeID methods to functions
Making these functions allows us to cleanup how an agent is initialized. They only make use of a config and a logger, so they do not need to be agent methods.

Also cleanup the testing to use t.Run and require.
2020-08-12 13:05:10 -04:00
Daniel Nephin 6be568119b Extract nodeID functions to a different file
In preparation for turning them into functions.
To reduce the scope of Agent, and refactor how Agent is created and started.
2020-08-12 13:05:10 -04:00
R.B. Boyer 63422ca9c5
connect: use stronger validation that ingress gateways have compatible protocols defined for their upstreams (#8470)
Fixes #8466

Since Consul 1.8.0 there was a bug in how ingress gateway protocol
compatibility was enforced. At the point in time that an ingress-gateway
config entry was modified the discovery chain for each upstream was
checked to ensure the ingress gateway protocol matched. Unfortunately
future modifications of other config entries were not validated against
existing ingress-gateway definitions, such as:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. create service-defaults for 'api' setting protocol=http (worked, but not ok)
3. create service-splitter or service-router for 'api' (worked, but caused an agent panic)

If you were to do these in a different order, it would fail without a
crash:

1. create service-defaults for 'api' setting protocol=http (ok)
2. create service-splitter or service-router for 'api' (ok)
3. create tcp ingress-gateway pointing to 'api' (fail with message about
   protocol mismatch)

This PR introduces the missing validation. The two new behaviors are:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. (NEW) create service-defaults for 'api' setting protocol=http ("ok" for back compat)
3. (NEW) create service-splitter or service-router for 'api' (fail with
   message about protocol mismatch)

In consideration for any existing users that may be inadvertently be
falling into item (2) above, that is now officiall a valid configuration
to be in. For anyone falling into item (3) above while you cannot use
the API to manufacture that scenario anymore, anyone that has old (now
bad) data will still be able to have the agent use them just enough to
generate a new agent/proxycfg error message rather than a panic.
Unfortunately we just don't have enough information to properly fix the
config entries.
2020-08-12 11:19:20 -05:00
Freddy 77de9bbe22
Notify alias checks when aliased service is [de]registered (#8456) 2020-08-12 09:47:41 -06:00
Daniel Nephin b936f84c07
Merge pull request #8469 from hashicorp/dnephin/config-source
config: make Source an interface to avoid the marshal/unmarshal cycle in auto-config
2020-08-12 11:17:15 -04:00
Hans Hasselberg 7a6d916ddc
Merge pull request #8471 from hashicorp/local_only
thread local-only through the layers
2020-08-12 08:54:51 +02:00
Freddy 50fee12d62
Internal endpoint to query intentions associated with a gateway (#8400) 2020-08-11 17:20:41 -06:00
Kyle Havlovitz 8118e3db40 Fix a state store comment about version 2020-08-11 13:46:12 -07:00
Kyle Havlovitz 2601585017 fsm: Fix snapshot bug with restoring node/service/check indexes 2020-08-11 11:49:52 -07:00
Hans Hasselberg e0297b6e99 Refactor keyring ops:
* changes some functions to return data instead of modifying pointer
  arguments
* renames globalRPC() to keyringRPCs() to make its purpose more clear
* restructures KeyringOperation() to make it more understandable
2020-08-11 13:42:03 +02:00
Hans Hasselberg 08b1fea379 thread local-only through the layers
$ consul keyring -list -local-only
==> Gathering installed encryption keys...

dc1 (LAN):
  aUlAW4ST3+vwseI61so24CoORkyjZofcmHk+j7QPSYQ= [1/1]
2020-08-11 13:41:53 +02:00
Daniel Nephin 3a4242c121 auto-config: Avoid the marshal/unmarshal cycle in auto-config
Use a LiteralConfig and return a config.Config from translate.
2020-08-10 20:07:52 -04:00
freddygv 6dcfa11c21 Update error handling 2020-08-10 17:48:22 -06:00
Daniel Nephin cbdceeb044 config: Make Source an interface
This will allow us to accept config from auto-config without needing to
go through a serialziation cycle.
2020-08-10 12:46:28 -04:00
Mike Morris d9ef146d82
changelog: Update for 1.8.2, 1.7.6, 1.7.5 and 1.6.7 (#8462)
* update bindata_assetfs.go

* Release v1.8.2

* Putting source back into Dev Mode

* changelog: add entries for 1.7.6, 1.7.5 and 1.6.7

Co-authored-by: hashicorp-ci <hashicorp-ci@users.noreply.github.com>
2020-08-07 18:58:09 -04:00
Daniel Nephin bef9348ca8 testing: remove unnecessary defers in tests
The data directory is now removed by the test helper that created it.
2020-08-07 17:28:16 -04:00
Daniel Nephin f3b63514d5 testing: Remove NotifyShutdown
NotifyShutdown was only used for testing. Now that t.Cleanup exists, we
can use that instead of attaching cleanup to the Server shutdown.

The Autopilot test which used NotifyShutdown doesn't need this
notification because Shutdown is synchronous. Waiting for the function
to return is equivalent.
2020-08-07 17:14:44 -04:00
Matt Keeler 05ebf9b8c5
Require token replication to be enabled in secondary dcs when ACLs are enabled with AutoConfig (#8451)
AutoConfig will generate local tokens for clients and the ability to use local tokens is gated off of token replication being enabled and being configured with a replication token. Therefore we already have a hard requirement on having token replication enabled, this commit just makes sure to surface that to the operator instead of having to discern what the issue is from RPC errors.
2020-08-07 10:20:27 -04:00
Hans Hasselberg fdceb24323
auto_config implies connect (#8433) 2020-08-07 12:02:02 +02:00
freddygv 4ac644f401 Fix test build 2020-08-06 11:31:56 -06:00
Hans Hasselberg 417d4adfb7
Mark its own cluster as healthy when rebalancing. (#8406)
This code started as an optimization to avoid doing an RPC Ping to
itself. But in a single server cluster the rebalancing was led to
believe that there were no healthy servers because foundHealthyServer
was not set. Now this is being set properly.

Fixes #8401 and #8403.
2020-08-06 10:42:09 +02:00
freddygv 83f4e32376 PR comments and addtl tests 2020-08-05 16:07:11 -06:00
Daniel Nephin 62641b820a
Merge pull request #8404 from hashicorp/dnephin/remove-log-output-field
Use Logger consistently, instead of LogOutput
2020-08-05 14:31:43 -04:00