Commit Graph

3996 Commits

Author SHA1 Message Date
Daniel Nephin 460f8919c9 ca: make getLeafSigningCertFromRoot safer
As a method on the struct type this would not be safe to call without first checking
c.isIntermediateUsedToSignLeaf.

So for now, move this logic to the CAMananger, so that it is always correct.
2021-12-02 12:42:49 -05:00
Daniel Nephin 64532ef636 ca: fix stored CARoot representation with Vault provider
We were not adding the local signing cert to the CARoot. This commit
fixes that bug, and also adds support for fixing existing CARoot on
upgrade.

Also update the tests for both primary and secondary to be more strict.
Check the SigningKeyID is correct after initialization and rotation.
2021-12-02 12:42:49 -05:00
Dan Upton eff3dc09b6
Rename `agent_master` ACL token in the API and CLI (#11669) 2021-12-02 17:05:27 +00:00
Dan Upton e1829a8706
Rename `master` and `agent_master` ACL tokens in the config file format (#11665) 2021-12-01 21:08:14 +00:00
Chris S. Kim 67eacee31e
ENT to OSS sync (#11703) 2021-12-01 14:56:10 -05:00
R.B. Boyer 70b143ddc5
auto-config: ensure the feature works properly with partitions (#11699) 2021-12-01 13:32:34 -06:00
Daniel Nephin 963a9819d0 ca: add some godoc and func for finding leaf signing cert
This will be used in a follow up commit.
2021-11-30 18:36:41 -05:00
Daniel Nephin 056a52ba64 sdk/freeport: rename Port to GetOne
For better consistency with GetN
2021-11-30 17:32:41 -05:00
Chris S. Kim e9c661db7f
Refactor test helper (#11689)
Allow custom ACL root tokens to be passed
2021-11-30 13:22:07 -05:00
Chris S. Kim 0ec67cc2d1
acl: Fill authzContext from token in Coordinate endpoints (#11688) 2021-11-30 13:17:41 -05:00
freddygv 76146dfc5b Move ent config test to ent file 2021-11-29 12:15:17 -07:00
freddygv 6d51282adf Prevent partition-exports entry from OSS usage
Validation was added on the config entry kind since that is called when
validating config entries to bootstrap via agent configuration and when
applying entries via the config RPC endpoint.
2021-11-29 11:24:16 -07:00
Daniel Nephin 4f0d092c95 testing: remove unnecessary calls to freeport
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.

https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.

This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.

This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
2021-11-29 12:19:43 -05:00
Daniel Nephin 20a8e11bf2 testing: use the new freeport interfaces 2021-11-27 15:39:46 -05:00
Daniel Nephin 2cf41e4dc8 go-sso: remove returnFunc now that freeport handles return 2021-11-27 15:29:38 -05:00
Daniel Nephin 8219e8571e sdk: add freeport functions that use t.Cleanup 2021-11-27 15:04:43 -05:00
Daniel Nephin 772d8f7381 ca: clean up unnecessary raft.Apply response checking
In d2ab767fef21244e9fe3b9887ea70fc177912381 raftApply was changed to handle this check in
a single place, instad of having every caller check it. It looks like these few places
were missed when I did that clean up.

This commit removes the remaining resp.(error) checks, since they are all no-ops now.
2021-11-26 17:57:55 -05:00
Daniel Nephin 48954adfdc
Merge pull request #11339 from hashicorp/dnephin/ca-manager-isolate-secondary-2
ca: reduce use of state in the secondary
2021-11-26 14:41:45 -05:00
Daniel Nephin 8240286956 ca: remove state check in secondarySetPrimaryRoots
This function is only ever called from operations that have already acquired the state lock, so checking
the value of state can never fail.

This change is being made in preparation for splitting out a separate type for the secondary logic. The
state can't easily be shared, so really only the expored top-level functions should acquire the 'state lock'.
2021-11-26 14:14:47 -05:00
Daniel Nephin 877094e2fa ca: remove actingSecondaryCA
This commit removes the actingSecondaryCA field, and removes the stateLock around it. This field
was acting as a proxy for providerRoot != nil, so replace it with that check instead.

The two methods which called secondarySetCAConfigured already set the state, so checking the
state again at this point will not catch runtime errors (only programming errors, which we can catch with tests).
In general, handling state transitions should be done on the "entrypoint" methods where execution starts, not
in every internal method.

This is being done to remove some unnecessary references to c.state, in preparations for extracting
types for primary/secondary.
2021-11-26 14:14:47 -05:00
Daniel Nephin cd5f6b2dfb ca: reduce consul provider backend interface a bit
This makes it easier to fake, which will allow me to use the ConsulProvider as
an 'external PKI' to test a customer setup where the actual root CA is not
the root we use for the Consul CA.

Replaces a call to the state store to fetch the clusterID with the
clusterID field already available on the built-in provider.
2021-11-25 11:46:06 -05:00
Dhia Ayachi f605689154
Partition/kv indexid sessions (#11639)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* remove partition for Checks as it's always use the session partition

* partition sessions index id table

* fix rebase issues

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-24 11:34:36 -05:00
Dhia Ayachi b1c4be3da0
Partition session checks store (#11638)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

* convert `IndexID` of `session_checks` table

* convert `indexSession` of `session_checks` table

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* fix oss linter

* fix review comments

* remove partition for Checks as it's always use the session partition

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-24 09:10:38 -05:00
Chris S. Kim c22adc8dc7
cleanup: Clarify deprecated legacy intention endpoints (#11635) 2021-11-23 19:32:18 -05:00
Chris S. Kim d2b86e7f48
Merge from ent (#11506) 2021-11-19 11:50:44 -05:00
R.B. Boyer fa7a66cd30
agent: purge service/check registration files for incorrect partitions on reload (#11607) 2021-11-18 14:44:20 -06:00
Iryna Shustava bd3fb0d0e9
connect: Support auth methods for the vault connect CA provider (#11573)
* Support vault auth methods for the Vault connect CA provider
* Rotate the token (re-authenticate to vault using auth method) when the token can no longer be renewed
2021-11-18 13:15:28 -07:00
Daniel Nephin fee9696d4f ca: use the cluster ID passed to the primary
instead of fetching it from the state store.
2021-11-16 16:57:22 -05:00
Daniel Nephin 07a33a1526 ca: accept only the cluster ID to SpiffeIDSigningForCluster
To make it more obivous where ClusterID is used, and remove the need to create a struct
when only one field is used.
2021-11-16 16:57:21 -05:00
Will Jordan 2e66b7a5e6
Update node info sync comment (#11465) 2021-11-16 11:16:11 -08:00
R.B. Boyer 83bf7ab3ff
re-run gofmt on 1.17 (#11579)
This should let freshly recompiled golangci-lint binaries using Go 1.17
pass 'make lint'
2021-11-16 12:04:01 -06:00
R.B. Boyer 086ff42b56
partitions: various refactors to support partitioning the serf LAN pool (#11568) 2021-11-15 09:51:14 -06:00
freddygv f33eae6fe1 Update proxycfg for ingress service partitions 2021-11-12 14:33:31 -07:00
freddygv dc7ea2ef1e Accept partition for ingress services 2021-11-12 14:33:14 -07:00
freddygv 5ac1ab359b Move assertion to after config fetch 2021-11-10 10:50:08 -07:00
freddygv 2261d51515 Use ClusterID to check for readiness
The TrustDomain is populated from the Host() method which includes the
hard-coded "consul" domain. This means that despite having an empty
cluster ID, the TrustDomain won't be empty.
2021-11-10 10:45:22 -07:00
freddygv 482d3bc610 Prevent replicating partition-exports 2021-11-09 16:42:42 -07:00
freddygv 739490df12 handle error scenario of empty local DC 2021-11-09 16:42:42 -07:00
freddygv b9b41625b9 Restrict DC for partition-exports writes
There are two restrictions:
- Writes from the primary DC which explicitly target a secondary DC.
- Writes to a secondary DC that do not explicitly target the primary DC.

The first restriction is because the config entry is not supported in
secondary datacenters.

The second restriction is to prevent the scenario where a user writes
the config entry to a secondary DC, the write gets forwarded to the
primary, but then the config entry does not apply in the secondary.
This makes the scope more explicit.
2021-11-09 16:42:42 -07:00
Freddy eb2b40b22d
Update filter chain creation for sidecar/ingress listeners (#11245)
The duo of `makeUpstreamFilterChainForDiscoveryChain` and `makeListenerForDiscoveryChain` were really hard to reason about, and led to concealing a bug in their branching logic. There were several issues here:

- They tried to accomplish too much: determining filter name, cluster name, and whether RDS should be used. 
- They embedded logic to handle significantly different kinds of upstream listeners (passthrough, prepared query, typical services, and catch-all)
- They needed to coalesce different data sources (Upstream and CompiledDiscoveryChain)

Rather than handling all of those tasks inside of these functions, this PR pulls out the RDS/clusterName/filterName logic.

This refactor also fixed a bug with the handling of [UpstreamDefaults](https://www.consul.io/docs/connect/config-entries/service-defaults#defaults). These defaults get stored as UpstreamConfig in the proxy snapshot with a DestinationName of "*", since they apply to all upstreams. However, this wildcard destination name must not be used when creating the name of the associated upstream cluster. The coalescing logic in the original functions here was in some situations creating clusters with a `*.` prefix, which is not a valid destination.
2021-11-09 14:43:51 -07:00
Kyle Havlovitz 14591de8d2
Merge pull request #11461 from deblasis/feature/empty_client_addr_warning
config: warn the user if client_addr is empty
2021-11-09 09:37:38 -08:00
Daniel Upton caa5b5a5a6
xds: prefer fed state gateway definitions if they're fresher (#11522)
Fixes an issue described in #10132, where if two DCs are WAN federated
over mesh gateways, and the gateway in the non-primary DC is terminated
and receives a new IP address (as is commonly the case when running them
on ephemeral compute instances) the primary DC is unable to re-establish
its connection until the agent running on its own gateway is restarted.

This was happening because we always preferred gateways discovered by
the `Internal.ServiceDump` RPC (which would fail because there's no way
to dial the remote DC) over those discovered in the federation state,
which is replicated as long as the primary DC's gateway is reachable.
2021-11-09 16:45:36 +00:00
Freddy 0ad360fadf
Merge pull request #11514 from hashicorp/dnephin/ca-fix-secondary-init
ca: properly handle the case where the secondary initializes after the primary
2021-11-08 17:16:16 -07:00
freddygv e6622ab0ab Avoid returning empty roots with uninitialized CA
Currently getCARoots could return an empty object with an empty trust
domain before the CA is initialized. This commit returns an error while
there is no CA config or no trust domain.

There could be a CA config and no trust domain because the CA config can
be created in InitializeCA before initialization succeeds.
2021-11-08 16:51:49 -07:00
Dhia Ayachi f61892393f
refactor session state store tables to use the new index pattern (#11525)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused func `prefixIndexFromServiceNameAsString`

* fix test error check

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-08 16:20:50 -05:00
Dhia Ayachi dfafd4e38c
KV refactoring, part 2 (#11512)
* add partition to the kv get pretty print

* fix failing test

* add test for kvs RPC endpoint
2021-11-08 11:43:21 -05:00
Dhia Ayachi 17190c0076
KV state store refactoring and partitioning (#11510)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* partition kvs indexID table

* add `partitionedIndexEntryName` in oss for test purpose

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* remove entmeta reference from oss

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-08 09:35:56 -05:00
Giulio Micheloni 10cdc0a5c8
Merge branch 'main' into serve-panic-recovery 2021-11-06 16:12:06 +01:00
Daniel Nephin 69ad7c0544 ca: Only initialize clusterID in the primary
The secondary must get the clusterID from the primary
2021-11-05 18:08:44 -04:00
Daniel Nephin 3173582b75 ca: return an error when secondary fails to initialize
Previously secondaryInitialize would return nil in this case, which prevented the
deferred initialize from happening, and left the CA in an uninitialized state until a config
update or root rotation.

To fix this I extracted the common parts into the delegate implementation. However looking at this
again, it seems like the handling in secondaryUpdateRoots is impossible, because that function
should never be called before the secondary is initialzied. I beleive we can remove some of that
logic in a follow up.
2021-11-05 18:02:51 -04:00
Daniel Nephin db29ad346b acl: remove id and revision from Policy constructors
The fields were removed in a previous commit.

Also remove an unused constructor for PolicyMerger
2021-11-05 15:45:08 -04:00
Daniel Nephin 617b11302f acl: remove Policy.ID and Policy.Revision
These two fields do not appear to be used anywhere. We use the structs.ACLPolicy ID in the
ACLResolver cache, but the acl.Policy ID and revision are not used.
2021-11-05 15:43:52 -04:00
R.B. Boyer 1d8e7bb565
rename helper method to reflect the non-deprecated terminology (#11509) 2021-11-05 13:51:50 -05:00
Connor b3af482e09
Support Vault Namespaces explicitly in CA config (#11477)
* Support Vault Namespaces explicitly in CA config

If there is a Namespace entry included in the Vault CA configuration,
set it as the Vault Namespace on the Vault client

Currently the only way to support Vault namespaces in the Consul CA
config is by doing one of the following:
1) Set the VAULT_NAMESPACE environment variable which will be picked up
by the Vault API client
2) Prefix all Vault paths with the namespace

Neither of these are super pleasant. The first requires direct access
and modification to the Consul runtime environment. It's possible and
expected, not super pleasant.

The second requires more indepth knowledge of Vault and how it uses
Namespaces and could be confusing for anyone without that context. It
also infers that it is not supported

* Add changelog

* Remove fmt.Fprint calls

* Make comment clearer

* Add next consul version to website docs

* Add new test for default configuration

* go mod tidy

* Add skip if vault not present

* Tweak changelog text
2021-11-05 11:42:28 -05:00
R.B. Boyer 7fbf749bc4
segments: ensure that the serf_lan_allowed_cidrs applies to network segments (#11495) 2021-11-04 17:17:19 -05:00
Mark Anderson e9a0fa7d36
Remove some usage of md5 from the system (#11491)
* Remove some usage of md5 from the system

OSS side of https://github.com/hashicorp/consul-enterprise/pull/1253

This is a potential security issue because an attacker could conceivably manipulate inputs to cause persistence files to collide, effectively deleting the persistence file for one of the colliding elements.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-11-04 13:07:54 -07:00
FFMMM 9afecfa10c
plumb thru root cert tll to the aws ca provider (#11449)
* plumb thru root cert ttl to the aws ca provider

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* Update .changelog/11449.txt

Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>

Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
2021-11-04 12:19:08 -07:00
FFMMM e7ffef54ee
fix aws pca certs (#11470)
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2021-11-03 12:21:24 -07:00
Mathew Estafanous 508664440d
Convert (some) test endpoints to use ServeHTTP instead of direct calls to handlers. (#11445) 2021-11-03 11:12:36 -04:00
FFMMM 27227c0fd2
add root_cert_ttl option for consul connect, vault ca providers (#11428)
* add root_cert_ttl option for consul connect, vault ca providers

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>

* add changelog, pr feedback

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* Update .changelog/11428.txt, more docs

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* Update website/content/docs/agent/options.mdx

Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
2021-11-02 11:02:10 -07:00
Daniel Nephin 0ec2a804df
Merge pull request #10690 from tarat44/h2c-support-in-ping-checks
add support for h2c in h2 ping health checks
2021-11-02 13:53:06 -04:00
Alessandro De Blasis 2b3f4efbab config: warn the user if client_addr is empty
if the provided value is empty string then the client services
(DNS, HTTP, HTTPS, GRPC) are not listening and the user is not notified
in any way about what's happening.
Also, since a not provided client_addr defaults to 127.0.0.1, we make sure
we are not getting unwanted warnings

Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2021-11-01 22:47:20 +00:00
Daniel Nephin 00ed2b243f
Merge pull request #10771 from hashicorp/dnephin/emit-telemetry-metrics-immediately
telemetry: improve cert expiry metrics
2021-11-01 18:31:03 -04:00
freddygv ecccf22fd7 Exclude default partition from GatewayKey string
This will behave the way we handle SNI and SPIFFE IDs, where the default
partition is excluded.

Excluding the default ensures that don't attempt to compare default.dc2
to dc2 in OSS.
2021-11-01 14:45:52 -06:00
freddygv d944e6ae3a Update GatewayKeys deduplication
Federation states data is only keyed on datacenter, so it cannot be
directly compared against keys for gateway groups.
2021-11-01 13:58:53 -06:00
freddygv ce43e8cf99 Store GatewayKey in proxycfg snapshot for re-use 2021-11-01 13:58:53 -06:00
freddygv 51c888a41a Update locality check in xds 2021-11-01 13:58:53 -06:00
freddygv 6657c88296 Update locality check in proxycfg 2021-11-01 13:58:53 -06:00
Daniel Nephin c706bf135c
Merge pull request #11340 from hashicorp/dnephin/ca-manager-provider
ca: split the Provider interface into Primary/Secondary
2021-11-01 14:11:15 -04:00
Daniel Nephin eaaceedf31
Merge pull request #11338 from hashicorp/dnephin/ca-manager-isolate-secondary
ca: clearly identify methods that are primary-only or secondary-only
2021-11-01 14:10:31 -04:00
Daniel Upton a620b6be2e
Support Check-And-Set deletion of config entries (#11419)
Implements #11372
2021-11-01 16:42:01 +00:00
Dhia Ayachi 4d763ef9e6
regenerate expired certs (#11462)
* regenerate expired certs

* add documentation to generate tests certificates
2021-11-01 11:40:16 -04:00
Jared Kirschner 6dfcbeceec
Merge pull request #11348 from kbabuadze/fix-answers-alt-domain
Fix answers for alt domain
2021-10-29 17:09:20 -04:00
R.B. Boyer d40d098321
agent: for various /v1/agent endpoints parse the partition parameter on the request (#11444)
Also update the corresponding CLI commands to send the parameter
appropriately.

NOTE: Behavioral changes are not happening in this PR.
2021-10-28 16:44:38 -05:00
R.B. Boyer 017e9d5ae4
agent: add a clone function for duplicating the serf lan configuration (#11443) 2021-10-28 16:11:26 -05:00
Daniel Nephin a8d6392ab5 Add tests for cert expiry metrics 2021-10-28 14:38:57 -04:00
Daniel Nephin 503dee2d80
Merge pull request #10671 from hashicorp/dnephin/fix-subscribe-test-flake
subscribe: improve TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages
2021-10-28 12:57:09 -04:00
Evan Culver b3c92f22b1
connect: Remove support for Envoy 1.16 (#11354) 2021-10-27 18:51:35 -07:00
Evan Culver 98acbfa79c
connect: Add support for Envoy 1.20 (#11277) 2021-10-27 18:38:10 -07:00
freddygv 3dd21023bc Ensure partition-exports kind gets marshalled
The api module has decoding functions that rely on 'kind' being present
of payloads. This is so that we can decode into the appropriate api type
for the config entry.

This commit ensures that a static kind is marshalled in responses from
Consul's api endpoints so that the api module can decode them.
2021-10-27 15:01:26 -06:00
Daniel Nephin 0a19d7fd76 agent: move agent tls metric monitor to a more appropriate place
And add a test for it
2021-10-27 16:26:09 -04:00
Daniel Nephin 1b2144c982 telemetry: set cert expiry metrics to NaN on start
So that followers do not report 0, which would make alerting difficult.
2021-10-27 15:19:25 -04:00
Daniel Nephin a7fcf14c5c telemetry: fix cert expiry metrics by removing labels
These labels should be set by whatever process scrapes Consul (for
prometheus), or by the agent that receives them (for datadog/statsd).

We need to remove them here because the labels are part of the "metric
key", so we'd have to pre-declare the metrics with the labels. We could
do that, but that is extra work for labels that should be added from
elsewhere.

Also renames the closure to be more descriptive.
2021-10-27 15:19:25 -04:00
Daniel Nephin 4300daa2e6 telemetry: only emit leader cert expiry metrics on the servers 2021-10-27 15:19:25 -04:00
Daniel Nephin 9de725c17d telemetry: prevent stale values from cert monitors
Prometheus scrapes metrics from each process, so when leadership transfers to a different node
the previous leader would still be reporting the old cached value.

By setting NaN, I believe we should zero-out the value, so that prometheus should only consider the
value from the new leader.
2021-10-27 15:19:25 -04:00
Daniel Nephin 616cc9b6f8 telemetry: improve cert expiry metrics
Emit the metric immediately so that after restarting an agent, the new expiry time will be
emitted. This is particularly important when this metric is being monitored, because we want
the alert to resovle itself immediately.

Also fixed a bug that was exposed in one of these metrics. The CARoot can be nil, so we have
to handle that case.
2021-10-27 15:19:25 -04:00
Daniel Nephin 24951f0c7e subscribe: attempt to fix a flaky test
TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages has been
flaking a few times. This commit cleans up the test a bit, and improves
the failure output.

I don't believe this actually fixes the flake, but I'm not able to
reproduce it reliably.

The failure appears to be that the event with Port=0 is being sent in
both the snapshot and as the first event after the EndOfSnapshot event.

Hopefully the improved logging will show us if these are really
duplicate events, or actually different events with different indexes.
2021-10-27 15:09:09 -04:00
Freddy ae76144f55
Merge pull request #11435 from hashicorp/ent-authorizer-refactor
[OSS] Export ACLs refactor
2021-10-27 13:04:40 -06:00
Freddy 520bda999b
Merge pull request #11432 from hashicorp/ap/exports-mgw
[OSS] Update mesh gateways to handle partitions
2021-10-27 12:54:53 -06:00
freddygv 592965d61e Rework acl exports interface 2021-10-27 12:50:39 -06:00
Freddy 9bbeea0432
Merge pull request #11433 from hashicorp/exported-service-acls
[OSS] acl: Expand ServiceRead and NodeRead to account for partition exports
2021-10-27 12:48:08 -06:00
freddygv 05f91bd2b8 Update comments 2021-10-27 12:36:44 -06:00
Freddy d8ae915160
Merge pull request #11431 from hashicorp/ap/exports-proxycfg
[OSS] Update partitioned mesh gw handling for connect proxies
2021-10-27 11:27:43 -06:00
Freddy 8e23a6a0cc
Merge pull request #11416 from hashicorp/ap/exports-update
Rename service-exports to partition-exports
2021-10-27 11:27:31 -06:00
freddygv 40271beb38 Fixup partitions assertion 2021-10-27 11:15:25 -06:00
freddygv 67412ac5e7 Fixup imports 2021-10-27 11:15:25 -06:00
freddygv 4de3537391 Split up locality check from hostname check 2021-10-27 11:15:25 -06:00
freddygv 9769b31641 Move the exportingpartitions constant to enterprise 2021-10-27 11:15:25 -06:00
freddygv 0391a65772 Replace default partition check 2021-10-27 11:15:25 -06:00
freddygv ee45ac9dc5 PR comments 2021-10-27 11:15:25 -06:00
freddygv f99946553a Leave todo about default name 2021-10-27 11:15:25 -06:00
freddygv 9d375ad6d2 Add oss impl of registerEntCache 2021-10-27 11:15:25 -06:00
freddygv 183849416b Register the ExportingPartitions cache type 2021-10-27 11:15:25 -06:00
freddygv 8b5a9369eb Account for partitions in xds gen for mesh gw
This commit avoids skipping gateways in remote partitions of the local
DC when generating listeners/clusters/endpoints.
2021-10-27 11:15:25 -06:00
freddygv d1d513b1b3 Account for partition in SNI for gateways 2021-10-27 11:15:25 -06:00
freddygv 4f0432be5e Update xds pkg to account for GatewayKey 2021-10-27 09:03:56 -06:00
freddygv f3f15640a9 Update mesh gateway proxy watches for partitions
This commit updates mesh gateway watches for cross-partitions
communication.

* Mesh gateways are keyed by partition and datacenter.

* Mesh gateways will now watch gateways in partitions that export
services to their partition.

* Mesh gateways in non-default partitions will not have cross-datacenter
watches. They are not involved in traditional WAN federation.
2021-10-27 09:03:56 -06:00
freddygv af662c8c1c Avoid mixing named and unnamed params 2021-10-26 23:42:25 -06:00
freddygv 1de62bb0a2 Avoid passing nil config pointer 2021-10-26 23:42:25 -06:00
freddygv 4a2e40aa3c Avoid panic on nil partitionAuthorizer config
partitionAuthorizer.config can be nil if it wasn't provided on calls to
newPartitionAuthorizer outside of the ACLResolver. This usage happens
often in tests.

This commit: adds a nil check when the config is going to be used,
updates non-test usage of NewPolicyAuthorizerWithDefaults to pass a
non-nil config, and dettaches setEnterpriseConf from the ACLResolver.
2021-10-26 23:42:25 -06:00
freddygv 015d85cd74 Update NodeRead for partition-exports
When issuing cross-partition service discovery requests, ACL filtering
often checks for NodeRead privileges. This is because the common return
type is a CheckServiceNode, which contains node data.
2021-10-26 23:42:11 -06:00
Kyle Havlovitz afb0976eac acl: pass PartitionInfo through ent ACLConfig 2021-10-26 23:41:52 -06:00
Kyle Havlovitz 56d1858c4a acl: Expand ServiceRead logic to look at service-exports for cross-partition 2021-10-26 23:41:32 -06:00
freddygv 4737ad118d Swap in structs.EqualPartitions for cmp 2021-10-26 23:36:01 -06:00
freddygv 1bade08f91 Replace Split with SplitN 2021-10-26 23:36:01 -06:00
freddygv 3966677aaf Finish removing useInDatacenter 2021-10-26 23:36:01 -06:00
freddygv 69476221c1 Update XDS for sidecars dialing through gateways 2021-10-26 23:35:48 -06:00
freddygv ea311d2e47 Configure sidecars to watch gateways in partitions
Previously the datacenter of the gateway was the key identifier, now it
is the datacenter and partition.

When dialing services in other partitions or datacenters we now watch
the appropriate partition.
2021-10-26 23:35:37 -06:00
freddygv feaebde1f1 Remove useInDatacenter from disco chain requests
useInDatacenter was used to determine whether the mesh gateway mode of
the upstream should be returned in the discovery chain target. This
commit makes it so that the mesh gateway mode is returned every time,
and it is up to the caller to decide whether mesh gateways should be
watched or used.
2021-10-26 23:35:21 -06:00
R.B. Boyer e27e58c6cc
agent: refactor the agent delegate interface to be partition friendly (#11429) 2021-10-26 15:08:55 -05:00
Chris S. Kim 27f8a85664
agent: Ensure partition is considered in agent endpoints (#11427) 2021-10-26 15:20:57 -04:00
Konstantine 2f9ee8e558 remove spaces 2021-10-26 12:38:13 -04:00
Konstantine be14f6da90 fix altDomain responses for services where address is IP, added tests 2021-10-26 12:38:13 -04:00
Konstantine eec9d66e22 fix encodeIPAsFqdn to return alt-domain when requested, added test case 2021-10-26 12:38:12 -04:00
Konstantine 9d6797a463 fixed altDomain response for NS type queries, and added test 2021-10-26 12:38:12 -04:00
Konstantine 0735e12412 edited TestDNS_AltDomains_Service to test responses for altDomains, and added TXT additional section check 2021-10-26 12:38:12 -04:00
Konstantine 8972e093d9 fixed alt-domain answer for SRV records, and TXT records in additional section 2021-10-26 12:38:12 -04:00
Chris S. Kim 3f736467e6
ui: Pass primary dc through to uiserver (#11317)
Co-authored-by: John Cowen <johncowen@users.noreply.github.com>
2021-10-26 10:30:17 -04:00
freddygv 83d4d0e108 Remove outdated partition label from test 2021-10-25 18:47:02 -06:00
freddygv c3e381b4c1 Rename service-exports to partition-exports
Existing config entries prefixed by service- are specific to individual
services. Since this config entry applies to partitions it is being
renamed.

Additionally, the Partition label was changed to Name because using
Partition at the top-level and in the enterprise meta was leading to the
enterprise meta partition being dropped by msgpack.
2021-10-25 17:58:48 -06:00
Daniel Nephin f24bad2a52
Merge pull request #11232 from hashicorp/dnephin/acl-legacy-remove-docs
acl: add docs and changelog for the removal of the legacy ACL system
2021-10-25 18:38:00 -04:00
Daniel Nephin f7cdd210fe Update agent/consul/acl_client.go
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2021-10-25 17:25:14 -04:00
Daniel Nephin 732b841dd7 state: remove support for updating legacy ACL tokens 2021-10-25 17:25:14 -04:00
Daniel Nephin 76b007dacd acl: remove init check for legacy anon token
This token should always already be migrated from a previous version.
2021-10-25 17:25:14 -04:00
Daniel Nephin 8ae6ee4e36 acl: remove legacy parameter to ACLDatacenter
It is no longer used now that legacy ACLs have been removed.
2021-10-25 17:25:14 -04:00
Daniel Nephin d778113773 acl: remove ACLTokenTypeManagement 2021-10-25 17:25:14 -04:00
Daniel Nephin 2f0eba1980 acl: remove ACLTokenTypeClient,
along with the last test referencing it.
2021-10-25 17:25:14 -04:00
Daniel Nephin 88c6aeea34 acl: remove legacy arg to store.ACLTokenSet
And remove the tests for legacy=true
2021-10-25 17:25:14 -04:00
Daniel Nephin b31a7fc498 acl: remove EmbeddedPolicy
This method is no longer. It only existed for legacy tokens, which are no longer supported.
2021-10-25 17:25:14 -04:00
Daniel Nephin ceaa36f983 acl: remove tests for resolving legacy tokens
The code for this was already removed, which suggests this is not actually testing what it claims.

I'm guessing these are still resolving because the tokens are converted to non-legacy tokens?
2021-10-25 17:25:14 -04:00
Daniel Nephin a46e3bd2fc acl: stop replication on leadership lost
It seems like this was missing. Previously this was only called by init of ACLs during an upgrade.
Now that legacy ACLs are  removed, nothing was calling stop.

Also remove an unused method from client.
2021-10-25 17:24:12 -04:00
Daniel Nephin 15cd8c7ab8 Remove incorrect TODO 2021-10-25 17:20:06 -04:00
Daniel Nephin 589b238374 acl: move the legacy ACL struct to the one package where it is used
It is now only used for restoring snapshots. We can remove it in phase 2.
2021-10-25 17:20:06 -04:00
Daniel Nephin 0ba5d0afcd acl: remove most of the rest of structs/acl_legacy.go 2021-10-25 17:20:06 -04:00
Paul Banks ab5cdce760
Merge pull request #11163 from hashicorp/feature/ingress-tls-mixed
Add support for enabling connect-based ingress TLS per listener.
2021-10-25 21:36:01 +01:00
FFMMM 6433a57d3c
fix autopilot_failure_tolerance, add autopilot metrics test case (#11399)
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2021-10-25 10:55:59 -07:00
FFMMM 67a624a49f
use *telemetry.MetricsPrefix as prometheus.PrometheusOpts.Name (#11290)
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2021-10-21 13:33:01 -07:00
Dhia Ayachi 75f69a98a2
fix leadership transfer on leave suggestions (#11387)
* add suggestions

* set isLeader to false when leadership transfer succeed
2021-10-21 14:02:26 -04:00
Dhia Ayachi 2d1ac1f7d0
try to perform a leadership transfer when leaving (#11376)
* try to perform a leadership transfer when leaving

* add a changelog
2021-10-21 12:44:31 -04:00
Kyle Havlovitz 752a285552 Add new service-exports config entry 2021-10-20 12:24:18 -07:00