open-consul

Author	SHA1	Message	Date
Chris S. Kim	d73a9522cb	Add support for streaming CA roots to peers (#13260 ) Sender watches for changes to CA roots and sends them through the replication stream. Receiver saves CA roots to tablePeeringTrustBundle	2022-05-26 15:24:09 -04:00
Riddhi Shah	e5f1d8dce4	Add support for merge-central-config query param (#13001 ) Adds a new query param merge-central-config for use with the below endpoints: /catalog/service/:service /catalog/connect/:service /health/service/:service /health/connect/:service If set on the request, the response will include a fully resolved service definition which is merged with the proxy-defaults/global and service-defaults/:service config entries (on-demand style). This is useful to view the full service definition for a mesh service (connect-proxy kind or gateway kind) which might not be merged before being written into the catalog (example: in case of services in the agentless model).	2022-05-25 13:20:17 -07:00
R.B. Boyer	bc10055edc	peering: replicate expected SNI, SPIFFE, and service protocol to peers (#13218 ) The importing peer will need to know what SNI and SPIFFE name corresponds to each exported service. Additionally it will need to know at a high level the protocol in use (L4/L7) to generate the appropriate connection pool and local metrics. For replicated connect synthetic entities we edit the `Connect{}` part of a `NodeService` to have a new section: { "PeerMeta": { "SNI": [ "web.default.default.owt.external.183150d5-1033-3672-c426-c29205a576b8.consul" ], "SpiffeID": [ "spiffe://183150d5-1033-3672-c426-c29205a576b8.consul/ns/default/dc/dc1/svc/web" ], "Protocol": "tcp" } } This data is then replicated and saved as-is at the importing side. Both SNI and SpiffeID are slices for now until I can be sure we don't need them for how mesh gateways will ultimately work.	2022-05-25 12:37:44 -05:00
alex	451dc50f4f	peering: expose IsLeader, hung up on dialer if follower (#13164 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com> Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>	2022-05-23 11:30:58 -07:00
cskh	39cb731988	Upgrade golangci-lint for go v1.18 (#13176 )	2022-05-23 10:26:45 -04:00
R.B. Boyer	3b12a5179f	test: fix flaky test TestEventBufferFuzz (#13175 )	2022-05-23 09:22:30 -05:00
Matt Keeler	c629e89289	Fix tests broken in #13173 (#13178 ) I changed the error type returned in a situation but didn’t update the tests to expect that error.	2022-05-23 10:00:06 -04:00
Matt Keeler	8a968299dd	Fix flaky tests in the agent/grpc/public/services/serverdiscovery package (#13173 ) Occasionally we had seen the TestWatchServers_ACLToken_PermissionDenied be flagged as flaky in circleci. This change should fix that. Why it fixes it is complicated. The test was failing with a panic when a mocked ACL Resolver was being called more times than expected. I struggled for a while to determine how that could be. This test should call authorize once and only once and the error returned should cause the stream to be terminated and the error returned to the gRPC client. Another oddity was no amount of running this test locally seemed to be able to reproduce the issue. I ran the test hundreds of thousands of time and it always passed. It turns out that there is nothing wrong with the test. It just so happens that the panic from unexpected invocation of a mocked call happened during the test but was caused by a previous test (specifically the TestWatchServers_StreamLifecycle test) The stream from the previous test remained open after all the test Cleanup functions were run and it just so happened that when the EventPublisher eventually picked up that the context was cancelled during cleanup, it force closes all subscriptions which causes some loops to be re-entered and the streams to be reauthorized. Its that looping in response to forced subscription closures that causes the mock to eventually panic. All the components, publisher, server, client all operate based on contexts. We cancel all those contexts but there is no syncrhonous way to know when they are stopped. We could have implemented a syncrhonous stop but in the context of an actual running Consul, context cancellation + async stopping is perfectly fine. What we (Dan and I) eventually thought was that the behavior of grpc streams such as this when a server was shutting down wasn’t super helpful. What we would want is for a client to be able to distinguish between subscription closed because something may have changed requiring re-authentication and subscription closed because the server is shutting down. That way we can send back appropriate error messages to detail that the server is shutting down and not confuse users with potentially needing to resubscribe. So thats what this PR does. We have introduced a shutting down state to our event subscriptions and the various streaming gRPC services that rely on the event publisher will all just behave correctly and actually stop the stream (not attempt transparent reauthorization) if this particular error is the one we get from the stream. Additionally the error that gets transmitted back through gRPC when this does occur indicates to the consumer that the server is going away. That is more helpful so that a client can then attempt to reconnect to another server.	2022-05-23 08:59:13 -04:00
R.B. Boyer	69d3e729a4	agent: allow for service discovery queries involving peer name to use streaming (#13168 )	2022-05-20 15:27:01 -05:00
R.B. Boyer	68789effeb	test: TestServer_RPC_MetricsIntercept should use a concurrency-safe metrics store (#13157 )	2022-05-19 15:39:28 -05:00
R.B. Boyer	91691eca87	peering: replicate discovery chains information to importing peers Treat each exported service as a "discovery chain" and replicate one synthetic CheckServiceNode for each chain and remote mesh gateway. The health will be a flattened generated check of the checks for that mesh gateway node.	2022-05-19 14:21:44 -05:00
Freddy	8894365c5a	[OSS] Add upsert handling for receiving CheckServiceNode (#13061 )	2022-05-12 15:04:44 -06:00
R.B. Boyer	c855df87ec	remove remaining shim runStep functions (#13015 ) Wraps up the refactor from #13013	2022-05-10 16:24:45 -05:00
R.B. Boyer	9ad10318cd	add general runstep test helper instead of copying it all over the place (#13013 )	2022-05-10 15:25:51 -05:00
Evan Culver	d64726c8e9	peering: add store.PeeringsForService implementation (#12957 )	2022-05-06 12:35:31 -07:00
Dan Upton	6bfdb48560	acl: gRPC login and logout endpoints (#12935 ) Introduces two new public gRPC endpoints (`Login` and `Logout`) and includes refactoring of the equivalent net/rpc endpoints to enable the majority of logic to be reused (i.e. by extracting the `Binder` and `TokenWriter` types). This contains the OSS portions of the following enterprise commits: - 75fcdbfcfa6af21d7128cb2544829ead0b1df603 - bce14b714151af74a7f0110843d640204082630a - cc508b70fbf58eda144d9af3d71bd0f483985893	2022-05-04 17:38:45 +01:00
Kyle Havlovitz	3bd001fb29	Return ACLRemoteError from cache and test it correctly	2022-05-03 10:05:26 -07:00
Kyle Havlovitz	f84ed5f70b	Store and return rpc error in acl cache entries	2022-04-28 09:08:55 -07:00
R.B. Boyer	642b75b60b	health: ensure /v1/health/service/:service endpoint returns the most recent results when a filter is used with streaming (#12640 ) The primary bug here is in the streaming subsystem that makes the overall v1/health/service/:service request behave incorrectly when servicing a blocking request with a filter provided. There is a secondary non-streaming bug being fixed here that is much less obvious related to when to update the `reply` variable in a `blockingQuery` evaluation. It is unlikely that it is triggerable in practical environments and I could not actually get the bug to manifest, but I fixed it anyway while investigating the original issue. Simple reproduction (streaming): 1. Register a service with a tag. curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \ --header 'Content-Type: application/json' \ --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "a" ], "EnableTagOverride": true }' 2. Do an initial filter query that matches on the tag. curl -sLi --get 'http://localhost:8500/v1/health/service/test' --data-urlencode 'filter=a in Service.Tags' 3. Note you get one result. Use the `X-Consul-Index` header to establish a blocking query in another terminal, this should not return yet. curl -sLi --get 'http://localhost:8500/v1/health/service/test?index=$INDEX' --data-urlencode 'filter=a in Service.Tags' 4. Re-register that service with a different tag. curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \ --header 'Content-Type: application/json' \ --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "b" ], "EnableTagOverride": true }' 5. Your blocking query from (3) should return with a header `X-Consul-Query-Backend: streaming` and empty results if it works correctly `[]`. Attempts to reproduce with non-streaming failed (where you add `&near=_agent` to the read queries and ensure `X-Consul-Query-Backend: blocking-query` shows up in the results).	2022-04-27 10:39:45 -05:00
Dhia Ayachi	9dc5200155	update raft to v1.3.8 (#12844 ) * update raft to v1.3.7 * add changelog * fix compilation error * fix HeartbeatTimeout * fix ElectionTimeout to reload only if value is valid * fix default values for `ElectionTimeout` and `HeartbeatTimeout` * fix test defaults * bump raft to v1.3.8	2022-04-25 10:19:26 -04:00
R.B. Boyer	809344a6f5	peering: initial sync (#12842 ) - Add endpoints related to peering: read, list, generate token, initiate peering - Update node/service/check table indexing to account for peers - Foundational changes for pushing service updates to a peer - Plumb peer name through Health.ServiceNodes path see: ENT-1765, ENT-1280, ENT-1283, ENT-1283, ENT-1756, ENT-1739, ENT-1750, ENT-1679, ENT-1709, ENT-1704, ENT-1690, ENT-1689, ENT-1702, ENT-1701, ENT-1683, ENT-1663, ENT-1650, ENT-1678, ENT-1628, ENT-1658, ENT-1640, ENT-1637, ENT-1597, ENT-1634, ENT-1613, ENT-1616, ENT-1617, ENT-1591, ENT-1588, ENT-1596, ENT-1572, ENT-1555 Co-authored-by: R.B. Boyer <rb@hashicorp.com> Co-authored-by: freddygv <freddy@hashicorp.com> Co-authored-by: Chris S. Kim <ckim@hashicorp.com> Co-authored-by: Evan Culver <eculver@hashicorp.com> Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>	2022-04-21 17:34:40 -05:00
Will Jordan	45ffdc360e	Add timeout to Client RPC calls (#11500 ) Adds a timeout (deadline) to client RPC calls, so that streams will no longer hang indefinitely in unstable network conditions. Co-authored-by: kisunji <ckim@hashicorp.com>	2022-04-21 16:21:35 -04:00
Matt Keeler	f49adfaaf0	Implement the ServerDiscovery.WatchServers gRPC endpoint (#12819 ) * Implement the ServerDiscovery.WatchServers gRPC endpoint * Fix the ConnectCA.Sign gRPC endpoints metadata forwarding. * Unify public gRPC endpoints around the public.TraceID function for request_id logging	2022-04-21 12:56:18 -04:00
Blake Covarrubias	2beea7eb7c	acl: Clarify node/service identities must be lowercase (#12807 ) Modify ACL error message for invalid node/service identities names to clearly state only lowercase alphanumeric characters are supported.	2022-04-21 09:29:16 -07:00
R.B. Boyer	bbd38e95ce	chore: upgrade mockery to v2 and regenerate (#12836 )	2022-04-21 09:48:21 -05:00
Riddhi Shah	1d49f5c84e	[OSS] gRPC call to get envoy bootstrap params (#12825 ) Adds a new gRPC endpoint to get envoy bootstrap params. The new consul-dataplane service will use this endpoint to generate an envoy bootstrap configuration.	2022-04-19 17:24:21 -07:00
Matt Keeler	3badd4c35c	Add event generation for autopilot state updates (#12626 ) Whenever autopilot updates its state it notifies Consul. That notification will then trigger Consul to extract out the ready server information. If the ready servers have changed, then an event will be published to notify any subscribers of the full set of ready servers. All these ready server event things are contained within an autopilotevents package instead of the consul package to make importing them into the grpc related packages possible	2022-04-19 13:03:03 -04:00
DanStough	a050aa39b9	Update go version to 1.18.1	2022-04-18 11:41:10 -04:00
Dan Upton	769d1d6e8e	ConnectCA.Sign gRPC Endpoint (#12787 ) Introduces a gRPC endpoint for signing Connect leaf certificates. It's also the first of the public gRPC endpoints to perform leader-forwarding, so establishes the pattern of forwarding over the multiplexed internal RPC port.	2022-04-14 14:26:14 +01:00
Kyle Havlovitz	199f1c7200	Fix namespace default field names in expanded token output	2022-04-13 16:46:39 -07:00
Paul Glass	5eea62b47a	acl: Adjust region handling in AWS IAM auth method (#12774 ) * acl: Adjust region handling in AWS IAM auth method	2022-04-13 14:31:37 -05:00
Karl Cardenas	b0b197964c	Merge pull request #12562 from hashicorp/docs/blake-agent-config docs: Agent configuration hierarchy reorganization	2022-04-12 12:33:42 -07:00
FFMMM	cf7e6484aa	add more labels to RequestRecorder (#12727 ) Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>	2022-04-12 10:50:25 -07:00
Matt Keeler	2a4ca71d3f	Move to using a shared EventPublisher (#12673 ) Previously we had 1 EventPublisher per state.Store. When a state store was closed/abandoned such as during a consul snapshot restore, this had the behavior of force closing subscriptions for that topic and evicting event snapshots from the cache. The intention of this commit is to keep all that behavior. To that end, the shared EventPublisher now supports the ability to refresh a topic. That will perform the force close + eviction. The FSM upon abandoning the previous state.Store will call RefreshTopic for all the topics with events generated by the state.Store.	2022-04-12 09:47:42 -04:00
Blake Covarrubias	3175bf6b1b	Remove .html extensions from docs URLs	2022-04-11 17:38:49 -07:00
Natalie Smith	cd17e98800	docs: fix yet more references to agent/options	2022-04-11 17:38:49 -07:00
R.B. Boyer	f4eac06b21	xds: ensure that all connect timeout configs can apply equally to tproxy direct dial connections (#12711 ) Just like standard upstreams the order of applicability in descending precedence: 1. caller's `service-defaults` upstream override for destination 2. caller's `service-defaults` upstream defaults 3. destination's `service-resolver` ConnectTimeout 4. system default of 5s Co-authored-by: mrspanishviking <kcardenas@hashicorp.com>	2022-04-07 16:58:21 -05:00
Matt Keeler	3447880091	Enable running autopilot state updates on all servers (#12617 ) * Fixes a lint warning about t.Errorf not supporting %w * Enable running autopilot on all servers On the non-leader servers all they do is update the state and do not attempt any modifications. * Fix the RPC conn limiting tests Technically they were relying on racey behavior before. Now they should be reliable.	2022-04-07 10:48:48 -04:00
FFMMM	0f68bf879a	[rpc/middleware][consul] plumb intercept off, add server level happy test (#12692 )	2022-04-06 14:33:05 -07:00
Mark Anderson	ed3e42296d	Fixup acl.EnterpriseMeta Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-04-05 15:11:49 -07:00
Riddhi Shah	0e5d46e9c4	Merge pull request #12695 from hashicorp/feature-negotiation-grpc-api-oss [OSS] Supported dataplane features gRPC endpoint	2022-04-05 11:26:33 -07:00
Dan Upton	e3d2b91e34	ca: move ConnectCA.Sign authorization logic to CAManager (#12609 ) OSS sync of enterprise changes at 8d6fd125	2022-04-05 13:16:20 -05:00
Kyle Havlovitz	9380343689	Merge pull request #12672 from hashicorp/tgate-san-validation Respect SNI with terminating gateways and log a warning if it isn't set alongside TLS	2022-04-05 11:15:59 -07:00
Riddhi Shah	f053279c4e	[OSS] Supported dataplane features gRPC endpoint Adds a new gRPC service and endpoint to return the list of supported consul dataplane features. The Consul Dataplane will use this API to customize its interaction with that particular server.	2022-04-05 07:38:58 -07:00
Dan Upton	e48c1611ee	WatchRoots gRPC endpoint (#12678 ) Adds a new gRPC streaming endpoint (WatchRoots) that dataplane clients will use to fetch the current list of active Connect CA roots and receive new lists whenever the roots are rotated.	2022-04-05 15:26:14 +01:00
Kyle Havlovitz	4974d8471b	Log a warning when a terminating gateway service has TLS but not SNI configured	2022-03-31 12:18:40 -07:00
Kyle Havlovitz	9a2474381a	Add expanded token read flag and endpoint option	2022-03-31 10:49:49 -07:00
Paul Glass	aae6d8080d	Add IAM Auth Method (#12583 ) This adds an aws-iam auth method type which supports authenticating to Consul using AWS IAM identities. Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>	2022-03-31 10:18:48 -05:00
Eric	91a493efe9	Bump go-control-plane * `go get cloud.google.com/go@v0.59.0` * `go get github.com/envoyproxy/go-control-plane@v0.9.9` * `make envoy-library` * Bumpprotoc to 3.15.8	2022-03-30 13:11:27 -04:00
R.B. Boyer	d4e80b8800	server: ensure that service-defaults meta is incorporated into the discovery chain response (#12511 ) Also add a new "Default" field to the discovery chain response to clients	2022-03-30 10:04:18 -05:00
FFMMM	0fd6cdc900	introduce EmptyReadRequest for status_endpoint (#12653 ) Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>	2022-03-29 18:05:45 -07:00
Eric	8fd73ede3e	remove gogo from acl protobufs	2022-03-28 16:20:56 -04:00
Kyle Havlovitz	d9f31345e0	Merge pull request #12596 from hashicorp/overview-endpoint oss: Add overview UI internal endpoint	2022-03-24 14:27:54 -07:00
Mike Morris	8020fb2098	agent: convert listener config to TLS types (#12522 ) * tlsutil: initial implementation of types/TLSVersion tlsutil: add test for parsing deprecated agent TLS version strings tlsutil: return TLSVersionInvalid with error tlsutil: start moving tlsutil cipher suite lookups over to types/tls tlsutil: rename tlsLookup to ParseTLSVersion, add cipherSuiteLookup agent: attempt to use types in runtime config agent: implement b.tlsVersion validation in config builder agent: fix tlsVersion nil check in builder tlsutil: update to renamed ParseTLSVersion and goTLSVersions tlsutil: fixup TestConfigurator_CommonTLSConfigTLSMinVersion tlsutil: disable invalid config parsing tests tlsutil: update tests auto_config: lookup old config strings from base.TLSMinVersion auto_config: update endpoint tests to use TLS types agent: update runtime_test to use TLS types agent: update TestRuntimeCinfig_Sanitize.golden agent: update config runtime tests to expect TLS types * website: update Consul agent tls_min_version values * agent: fixup TLS parsing and compilation errors * test: fixup lint issues in agent/config_runtime_test and tlsutil/config_test * tlsutil: add CHACHA20_POLY1305 cipher suites to goTLSCipherSuites * test: revert autoconfig tls min version fixtures to old format * types: add TLSVersions public function * agent: add warning for deprecated TLS version strings * agent: move agent config specific logic from tlsutil.ParseTLSVersion into agent config builder * tlsutil(BREAKING): change default TLS min version to TLS 1.2 * agent: move ParseCiphers logic from tlsutil into agent config builder * tlsutil: remove unused CipherString function * agent: fixup import for types package * Revert "tlsutil: remove unused CipherString function" This reverts commit 6ca7f6f58d268e617501b7db9500113c13bae70c. * agent: fixup config builder and runtime tests * tlsutil: fixup one remaining ListenerConfig -> ProtocolConfig * test: move TLS cipher suites parsing test from tlsutil into agent config builder tests * agent: remove parseCiphers helper from auto_config_endpoint_test * test: remove unused imports from tlsutil * agent: remove resolved FIXME comment * tlsutil: remove TODO and FIXME in cipher suite validation * agent: prevent setting inherited cipher suite config when TLS 1.3 is specified * changelog: add entry for converting agent config to TLS types * agent: remove FIXME in runtime test, this is covered in builder tests with invalid tls9 value now * tlsutil: remove config tests for values checked at agent config builder boundary * tlsutil: remove tls version check from loadProtocolConfig * tlsutil: remove tests and TODOs for logic checked in TestBuilder_tlsVersion and TestBuilder_tlsCipherSuites * website: update search link for supported Consul agent cipher suites * website: apply review suggestions for tls_min_version description * website: attempt to clean up markdown list formatting for tls_min_version * website: moar linebreaks to fix tls_min_version formatting * Revert "website: moar linebreaks to fix tls_min_version formatting" This reverts commit 38585927422f73ebf838a7663e566ac245f2a75c. * autoconfig: translate old values for TLSMinVersion * agent: rename var for translated value of deprecated TLS version value * Update agent/config/deprecated.go Co-authored-by: Dan Upton <daniel@floppy.co> * agent: fix lint issue * agent: fixup deprecated config test assertions for updated warning Co-authored-by: Dan Upton <daniel@floppy.co>	2022-03-24 15:32:25 -04:00
Kyle Havlovitz	0d5cbf6f30	Sort by partition/ns/servicename instead of the reverse	2022-03-24 12:16:05 -07:00
Kyle Havlovitz	1b654c9807	Clean up ent meta id usage in overview summary	2022-03-23 12:47:12 -07:00
Mark Anderson	28c925f6d0	Fixup dropped SecretID usage Looks like something got munged at some point. Not sure how it slipped in, but my best guess is that because TestTxn_Apply_ACLDeny is marked flaky we didn't block merge because it failed. Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-03-22 21:20:03 -07:00
Kyle Havlovitz	04f1d9bcc9	oss: Add overview UI internal endpoint	2022-03-22 17:05:09 -07:00
Dan Upton	2fe06f663b	streaming: emit events when Connect CA Roots change (#12590 ) OSS sync of enterprise changes at 614f786d	2022-03-22 19:13:59 +00:00
Dan Upton	fb441e323a	Restructure gRPC server setup (#12586 ) OSS sync of enterprise changes at 0b44395e	2022-03-22 12:40:24 +00:00
Mark Anderson	2b367626f0	Add source of authority annotations to the PermissionDeniedError output. (#12567 ) This extends the acl.AllowAuthorizer with source of authority information. The next step is to unify the AllowAuthorizer and ACLResolveResult structures; that will be done in a separate PR. Part of #12481 Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-03-18 10:32:25 -07:00
Dan Upton	57f0f42733	Support per-listener TLS configuration ⚙️ (#12504 ) Introduces the capability to configure TLS differently for Consul's listeners/ports (i.e. HTTPS, gRPC, and the internal multiplexed RPC port) which is useful in scenarios where you may want the HTTPS or gRPC interfaces to present a certificate signed by a well-known/public CA, rather than the certificate used for internal communication which must have a SAN in the form `server.<dc>.consul`.	2022-03-18 10:46:58 +00:00
FFMMM	3c08843847	[sync oss] add net/rpc interceptor implementation (#12573 ) * sync ent changes from 866dcb0667 Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> * update oss go.mod Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>	2022-03-17 16:02:26 -07:00
Eric	ae1cdc85b1	Remove the stdduration gogo extension	2022-03-16 12:12:29 -04:00
mrspanishviking	1ae820ea0a	Revert "[Docs] Agent configuration hierarchy "	2022-03-15 16:13:58 -07:00
trujillo-adam	667976c94f	fixing merge conflicts part 3	2022-03-15 15:25:03 -07:00
trujillo-adam	60a88bb40f	merging new hierarchy for agent configuration	2022-03-14 15:44:41 -07:00
Mark Anderson	ab099e5fcb	Refactor config checks oss (#12550 ) Currently the config_entry.go subsystem delegates authorization decisions via the ConfigEntry interface CanRead and CanWrite code. Unfortunately this returns a true/false value and loses the details of the source. This is not helpful, especially since it the config subsystem can be more complex to understand, since it covers so many domains. This refactors CanRead/CanWrite to return a structured error message (PermissionDenied or the like) with more details about the reason for denial. Part of #12241 Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-03-11 13:45:51 -08:00
Mark Anderson	5591cb1e11	Bulk acl message fixup oss (#12470 ) * First pass for helper for bulk changes Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Convert ACLRead and ACLWrite to new form Signed-off-by: Mark Anderson <manderson@hashicorp.com> * AgentRead and AgentWRite Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Fix EventWrite Signed-off-by: Mark Anderson <manderson@hashicorp.com> * KeyRead, KeyWrite, KeyList Signed-off-by: Mark Anderson <manderson@hashicorp.com> * KeyRing Signed-off-by: Mark Anderson <manderson@hashicorp.com> * NodeRead NodeWrite Signed-off-by: Mark Anderson <manderson@hashicorp.com> * OperatorRead and OperatorWrite Signed-off-by: Mark Anderson <manderson@hashicorp.com> * PreparedQuery Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Intention partial Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Fix ServiceRead, Write ,etc Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Error check ServiceRead? Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Fix Sessionread/Write Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Fixup snapshot ACL Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Error fixups for txn Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Add changelog Signed-off-by: Mark Anderson <manderson@hashicorp.com> * Fixup review comments Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-03-10 18:48:27 -08:00
Eric Haberkorn	45312886fe	Code review changes	2022-03-07 14:39:33 -05:00
Eric	3d46f9ef7c	Add `Meta` to `ServiceConfigResponse`	2022-03-07 10:05:18 -05:00
R.B. Boyer	b63a0f3909	reduce flakiness/raciness of errNotFound and errNotChanged blocking query tests (#12518 ) Improves tests from #12362 These tests try to setup the following concurrent scenario: 1. (goroutine 1) execute read RPC with index=0 2. (goroutine 1) get response from (1) @ index=10 3. (goroutine 1) execute read RPC with index=10 and block 4. (goroutine 2) WHILE (3) is blocking, start slamming the system with stray writes that will cause the WatchSet to wakeup 5. (goroutine 2) after doing all writes, shut down the reader above 6. (goroutine 1) stops reading, double checks that it only ever woke up once (from 1)	2022-03-04 11:20:01 -06:00
R.B. Boyer	07b92a2855	server: fix spurious blocking query suppression for discovery chains (#12512 ) Minor fix for behavior in #12362 IsDefault sometimes returns true even if there was a proxy-defaults or service-defaults config entry that was consulted. This PR fixes that.	2022-03-03 16:54:41 -06:00
Daniel Nephin	8f4b6af68a	Merge pull request #12298 from jorgemarey/b-persistnewrootandconfig Avoid raft change when no config is provided on persistNewRootAndConfig	2022-03-03 11:03:50 -05:00
Daniel Nephin	2082bdc286	ca: make sure the test fails without the fix Also change the path used for the secondary so that both primary and secondary do not overwrite each other.	2022-03-02 18:22:49 -05:00
R.B. Boyer	679cea7171	raft: upgrade to v1.3.6 (#12496 ) Add additional protections on the Consul side to prevent NonVoters from bootstrapping raft. This should un-flake TestServer_Expect_NonVoters	2022-03-02 17:00:02 -06:00
Daniel Nephin	849d86e7f5	Merge pull request #12467 from hashicorp/dnephin/ci-vault-test-safer ca: require that tests that use Vault are named correctly	2022-03-01 12:54:02 -05:00
R.B. Boyer	033e0ed13f	test: parallelize more of TestLeader_ReapOrLeftMember_IgnoreSelf (#12468 ) before: $ go test ./agent/consul -run TestLeader_ReapOrLeftMember_IgnoreSelf ok github.com/hashicorp/consul/agent/consul 21.147s after: $ go test ./agent/consul -run TestLeader_ReapOrLeftMember_IgnoreSelf ok github.com/hashicorp/consul/agent/consul 5.402s	2022-03-01 10:30:06 -06:00
Jorge Marey	aba9e724a8	Fix vault test with suggested changes	2022-03-01 10:20:00 +01:00
Jorge Marey	8b1b264b6f	Add test case to verify #12298	2022-03-01 09:25:52 +01:00
Jorge Marey	2ca00df0d8	Avoid raft change when no config is provided on CAmanager - This avoids a change to the raft store when no roots or config are provided to persistNewRootAndConfig	2022-03-01 09:25:52 +01:00
Daniel Nephin	dd565aa5e4	ca: fix a test This test does not use Vault, so does not need ca.SkipIfVaultNotPresent	2022-02-28 16:26:18 -05:00
R.B. Boyer	3804677570	server: suppress spurious blocking query returns where multiple config entries are involved (#12362 ) Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect: - `DiscoveryChain.Get` - `ConfigEntry.ResolveServiceConfig` - `Intentions.Match` All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism. ### No Config Entries Found In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110. ### No Change Since Last Wakeup Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110. This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report. ### Extra Endpoints Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated: - `ConfigEntry.List` - `Internal.IntentionUpstreams` (tproxy)	2022-02-25 15:46:34 -06:00
R.B. Boyer	4b0f657b31	fix flaky test panic (#12446 )	2022-02-24 17:35:46 -06:00
R.B. Boyer	a97d20cf63	catalog: compare node names case insensitively in more places (#12444 ) Many places in consul already treated node names case insensitively. The state store indexes already do it, but there are a few places that did a direct byte comparison which have now been corrected. One place of particular consideration is ensureCheckIfNodeMatches which is executed during snapshot restore (among other places). If a node check used a slightly different casing than the casing of the node during register then the snapshot restore here would deterministically fail. This has been fixed. Primary approach: git grep -i "node.[!=]=.node" -- ':!_test.go' ':!docs' git grep -i '\[[^]]member[^]]\] git grep -i '\[[^]]$member\\|name\\|node$[^]]\]' -- ':!_test.go' ':!website' ':!ui' ':!agent/proxycfg/testing.go:' ':!*.md'	2022-02-24 16:54:47 -06:00
R.B. Boyer	d860384731	server: partly fix config entry replication issue that prevents replication in some circumstances (#12307 ) There are some cross-config-entry relationships that are enforced during "graph validation" at persistence time that are required to be maintained. This means that config entries may form a digraph at times. Config entry replication procedes in a particular sorted order by kind and name. Occasionally there are some fixups to these digraphs that end up replicating in the wrong order and replicating the leaves (ingress-gateway) before the roots (service-defaults) leading to replication halting due to a graph validation error related to things like mismatched service protocol requirements. This PR changes replication to give each computed change (upsert/delete) a fair shot at being applied before deciding to terminate that round of replication in error. In the case where we've simply tried to do the operations in the wrong order at least ONE of the outstanding requests will complete in the right order, leading the subsequent round to have fewer operations to do, with a smaller likelihood of graph validation errors. This does not address all scenarios, but for scenarios where the edits are being applied in the wrong order this should avoid replication halting. Fixes #9319 The scenario that is NOT ADDRESSED by this PR is as follows: 1. create: service-defaults: name=new-web, protocol=http 2. create: service-defaults: name=old-web, protocol=http 3. create: service-resolver: name=old-web, redirect-to=new-web 4. delete: service-resolver: name=old-web 5. update: service-defaults: name=old-web, protocol=grpc 6. update: service-defaults: name=new-web, protocol=grpc 7. create: service-resolver: name=old-web, redirect-to=new-web If you shutdown dc2 just before (4) and turn it back on after (7) replication is impossible as there is no single edit you can make to make forward progress.	2022-02-23 17:27:48 -06:00
Daniel Nephin	3639f4b551	Merge pull request #11910 from hashicorp/dnephin/ca-provider-interface-for-ica-in-primary ca: add support for an external trusted CA	2022-02-22 13:14:52 -05:00
R.B. Boyer	11fdc70b34	configentry: make a new package to hold shared config entry structs that aren't used for RPC or the FSM (#12384 ) First two candidates are ConfigEntryKindName and DiscoveryChainConfigEntries.	2022-02-22 10:36:36 -06:00
Daniel Nephin	cb1a80184f	rpc: set response to nil when not found Otherwise when the query times out we might incorrectly send a value for the reply, when we should send an empty reply. Also document errNotFound and how to handle the result in that case.	2022-02-18 12:26:06 -05:00
Daniel Nephin	79820738cc	ca: test that original certs from secondary still verify There's a chance this could flake if the secondary hasn't received the update yet, but running this test many times doesn't show any flakes yet.	2022-02-17 18:45:16 -05:00
Daniel Nephin	ca4e60e09b	Update TODOs to reference an issue with more details And remove a no longer needed TODO	2022-02-17 18:21:30 -05:00
Daniel Nephin	0abaf29c10	ca: add test cases for rotating external trusted CA	2022-02-17 18:21:30 -05:00
Daniel Nephin	aacc40012f	ca: add a test for secondary with external CA	2022-02-17 18:21:30 -05:00
Daniel Nephin	471b2098bb	ca: examine the full chain in newCARoot make TestNewCARoot much more strict compare the full result instead of only a few fields. add a test case with 2 and 3 certificates in the pem	2022-02-17 18:21:30 -05:00
Daniel Nephin	2d5254a73b	Merge pull request #12110 from hashicorp/dnephin/blocking-queries-not-found rpc: make blocking queries for non-existent items more efficient	2022-02-17 18:09:39 -05:00
Florian Apolloner	895da50986	Support for connect native services in topology view. (#12098 )	2022-02-16 16:51:54 -05:00
Chris S. Kim	18096fd2fb	Move IndexEntryName helpers to common files (#12365 )	2022-02-16 12:56:38 -05:00
Daniel Nephin	06657e5be0	rpc: add errNotFound to all Get queries Any query that returns a list of items is not part of this commit.	2022-02-15 18:24:34 -05:00
Daniel Nephin	bdafa24c50	Make blockingQuery efficient with 'not found' results. By using the query results as state. Blocking queries are efficient when the query matches some results, because the ModifyIndex of those results, returned as queryMeta.Mindex, will never change unless the items themselves change. Blocking queries for non-existent items are not efficient because the queryMeta.Index can (and often does) change when other entities are written. This commit reduces the churn of these queries by using a different comparison for "has changed". Instead of using the modified index, we use the existence of the results. If the previous result was "not found" and the new result is still "not found", we know we can ignore the modified index and continue to block. This is done by setting the minQueryIndex to the returned queryMeta.Index, which prevents the query from returning before a state change is observed.	2022-02-15 18:24:33 -05:00
Daniel Nephin	6e73df7dc2	Add a test for blocking query on non-existent entry This test shows how blocking queries are not efficient when the query returns no results. The test fails with 100+ calls instead of the expected 2. This test is still a bit flaky because it depends on the timing of the writes. It can sometimes return 3 calls. A future commit should fix this and make blocking queries even more optimal for not-found results.	2022-02-15 18:23:17 -05:00

1 2 3 4 5 ...

1831 commits