open-consul

Commit Graph

Author	SHA1	Message	Date
Chris S. Kim	e4c20ec190	Refactor client RPC timeouts (#14965 ) Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created. Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.	2022-10-18 15:05:09 -04:00
Derek Menteer	b641dcf03d	Expose `grpc_tls` via serf for cluster peering.	2022-08-29 13:43:49 -05:00
cskh	e7b5baa3cc	feat(telemetry): add labels to serf and memberlist metrics (#14161 ) * feat(telemetry): add labels to serf and memberlist metrics * changelog * doc update Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>	2022-08-11 22:09:56 -04:00
Luke Kysow	e9960dfdf3	peering: default to false (#13963 ) * defaulting to false because peering will be released as beta * Ignore peering disabled error in bundles cachetype Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: freddygv <freddy@hashicorp.com> Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>	2022-08-01 15:22:36 -04:00
alex	0f6354685b	block PeerName register requests (#13887 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-29 14:36:22 -07:00
Luke Kysow	d21f793b74	peering: add config to enable/disable peering (#13867 ) * peering: add config to enable/disable peering Add config: ``` peering { enabled = true } ``` Defaults to true. When disabled: 1. All peering RPC endpoints will return an error 2. Leader won't start its peering establishment goroutines 3. Leader won't start its peering deletion goroutines	2022-07-22 15:20:21 -07:00
R.B. Boyer	40c5c7eee2	server: broadcast the public grpc port using lan serf and update the consul service in the catalog with the same data (#13687 ) Currently servers exchange information about their WAN serf port and RPC port with serf tags, so that they all learn of each other's addressing information. We intend to make larger use of the new public-facing gRPC port exposed on all of the servers, so this PR addresses that by passing around the gRPC port via serf tags and then ensuring the generated consul service in the catalog has metadata about that new port as well for ease of non-serf-based lookup.	2022-07-07 13:55:41 -05:00
Dhia Ayachi	9dc5200155	update raft to v1.3.8 (#12844 ) * update raft to v1.3.7 * add changelog * fix compilation error * fix HeartbeatTimeout * fix ElectionTimeout to reload only if value is valid * fix default values for `ElectionTimeout` and `HeartbeatTimeout` * fix test defaults * bump raft to v1.3.8	2022-04-25 10:19:26 -04:00
Dan Upton	8bc11b08dc	Rename `ACLMasterToken` => `ACLInitialManagementToken` (#11746 )	2021-12-07 12:39:28 +00:00
Daniel Nephin	8e2c71528f	config: add NoFreelistSync option # Conflicts: # agent/config/testdata/TestRuntimeConfig_Sanitize-enterprise.golden # agent/consul/server.go	2021-12-02 16:56:15 -05:00
FFMMM	27227c0fd2	add root_cert_ttl option for consul connect, vault ca providers (#11428 ) * add root_cert_ttl option for consul connect, vault ca providers Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Chris S. Kim <ckim@hashicorp.com> * add changelog, pr feedback Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> * Update .changelog/11428.txt, more docs Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * Update website/content/docs/agent/options.mdx Co-authored-by: Kyle Havlovitz <kylehav@gmail.com> Co-authored-by: Chris S. Kim <ckim@hashicorp.com> Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>	2021-11-02 11:02:10 -07:00
R.B. Boyer	017e9d5ae4	agent: add a clone function for duplicating the serf lan configuration (#11443 )	2021-10-28 16:11:26 -05:00
Daniel Nephin	09ae0ab94a	acl: make ACLDisabledTTL a constant This field was never user-configurable. We always overwrote the value with 120s from NonUserSource. However, we also never copied the value from RuntimeConfig to consul.Config, So the value in NonUserSource was always ignored, and we used the default value of 30s set by consul.DefaultConfig. All of this code is an unnecessary distraction because a user can not actually configure this value. This commit removes the fields and uses a constant value instad. Someone attempting to set acl.disabled_ttl in their config will now get an error about an unknown field, but previously the value was completely ignored, so the new behaviour seems more correct. We have to keep this field in the AutoConfig response for backwards compatibility, but the value will be ignored by the client, so it doesn't really matter what value we set.	2021-08-17 13:34:18 -04:00
Daniel Nephin	75baa22e64	acl: remove ACLResolver config fields from consul.Config	2021-08-17 13:32:52 -04:00
Daniel Nephin	047abdd73c	acl: remove ACLDatacenter This field has been unnecessary for a while now. It was always set to the same value as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter directly. This change was made by GoLand refactor, which did most of the work for me.	2021-08-06 18:27:00 -04:00
Daniel Nephin	1e23d181b5	config: remove misleading UseTLS field This field was documented as enabling TLS for outgoing RPC, but that was not the case. All this field did was set the use_tls serf tag. Instead of setting this field in a place far from where it is used, move the logic to where the serf tag is set, so that the code is much more obvious.	2021-07-09 19:01:45 -04:00
Daniel Nephin	3c60a46376	config: remove duplicate TLSConfig fields from agent/consul.Config tlsutil.Config already presents an excellent structure for this configuration. Copying the runtime config fields to agent/consul.Config makes code harder to trace, and provides no advantage. Instead of copying the fields around, use the tlsutil.Config struct directly instead. This is one small step in removing the many layers of duplicate configuration.	2021-07-09 18:49:42 -04:00
Daniel Nephin	b4a10443d1	ca: remove unused RotationPeriod field This field was never used. Since it is persisted as part of a map[string]interface{} it is pretty easy to remove it.	2021-07-05 19:15:44 -04:00
Matt Keeler	58b934133d	hcs-1936: Prepare for adding license auto-retrieval to auto-config in enterprise	2021-05-24 13:20:30 -04:00
Paul Banks	d47eea3a3f	Make Raft trailing logs and snapshot timing reloadable (#10129 ) * WIP reloadable raft config * Pre-define new raft gauges * Update go-metrics to change gauge reset behaviour * Update raft to pull in new metric and reloadable config * Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important * Update telemetry docs * Update config and telemetry docs * Add note to oldestLogAge on when it is visible * Add changelog entry * Update website/content/docs/agent/options.mdx Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>	2021-05-04 15:36:53 +01:00
Daniel Nephin	e8427a48ab	agent/consuk: Rename RPCRate -> RPCRateLimit so that the field name is consistent across config structs.	2021-01-14 17:26:00 -05:00
Daniel Nephin	e5320c2db6	agent/consul: make Client/Server config reloading more obvious I believe this commit also fixes a bug. Previously RPCMaxConnsPerClient was not being re-read from the RuntimeConfig, so passing it to Server.ReloadConfig was never changing the value. Also improve the test runtime by not doing a lot of unnecessary work.	2021-01-14 17:21:10 -05:00
Matt Keeler	4bca029be9	Refactor to call non-voting servers read replicas (#9191 ) Co-authored-by: Kit Patella <kit@jepsen.io>	2020-11-17 10:53:57 -05:00
R.B. Boyer	a5bd1ba323	agent: return the default ACL policy to callers as a header (#9101 ) Header is: X-Consul-Default-ACL-Policy=<allow\|deny> This is of particular utility when fetching matching intentions, as the fallthrough for a request that doesn't match any intentions is to enforce using the default acl policy.	2020-11-12 10:38:32 -06:00
Matt Keeler	755fb72994	Switch to using the external autopilot module	2020-11-09 09:22:11 -05:00
Daniel Nephin	dd0e8d42c4	Merge pull request #8825 from hashicorp/streaming/add-config streaming: add config and docs	2020-10-09 14:33:58 -04:00
Matt Keeler	141eb60f06	Add per-agent reconnect timeouts (#8781 ) This allows for client agent to be run in a more stateless manner where they may be abruptly terminated and not expected to come back. If advertising a per-agent reconnect timeout using the advertise_reconnect_timeout configuration when that agent leaves, other agents will wait only that amount of time for the agent to come back before reaping it. This has the advantageous side effect of causing servers to deregister the node/services/checks for that agent sooner than if the global reconnect_timeout was used.	2020-10-08 15:02:19 -04:00
Daniel Nephin	05df7b18a9	config: add field for enabling streaming RPC endpoint	2020-10-08 12:11:20 -04:00
R.B. Boyer	d6dce2332a	connect: intentions are now managed as a new config entry kind "service-intentions" (#8834 ) - Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older copies of consul will not see new config entries it doesn't understand replicate down. - Add shim conversion code so that the old API/CLI method of interacting with intentions will continue to work so long as none of these are edited via config entry endpoints. Almost all of the read-only APIs will continue to function indefinitely. - Add new APIs that operate on individual intentions without IDs so that the UI doesn't need to implement CAS operations. - Add a new serf feature flag indicating support for intentions-as-config-entries. - The old line-item intentions way of interacting with the state store will transparently flip between the legacy memdb table and the config entry representations so that readers will never see a hiccup during migration where the results are incomplete. It uses a piece of system metadata to control the flip. - The primary datacenter will begin migrating intentions into config entries on startup once all servers in the datacenter are on a version of Consul with the intentions-as-config-entries feature flag. When it is complete the old state store representations will be cleared. We also record a piece of system metadata indicating this has occurred. We use this metadata to skip ALL of this code the next time the leader starts up. - The secondary datacenters continue to run the old intentions replicator until all servers in the secondary DC and primary DC support intentions-as-config-entries (via serf flag). Once this condition it met the old intentions replicator ceases. - The secondary datacenters replicate the new config entries as they are migrated in the primary. When they detect that the primary has zeroed it's old state store table it waits until all config entries up to that point are replicated and then zeroes its own copy of the old state store table. We also record a piece of system metadata indicating this has occurred. We use this metadata to skip ALL of this code the next time the leader starts up.	2020-10-06 13:24:05 -05:00
Daniel Nephin	863a9df951	server: add gRPC server for streaming events Includes a stats handler and stream interceptor for grpc metrics. Co-authored-by: Paul Banks <banks@banksco.de>	2020-09-08 12:10:41 -04:00
Chris Piraino	b245d60200	Set metrics reporting interval to 9 seconds This is below the 10 second interval that lib/telemetry.go implements as its aggregation interval, ensuring that we always report these metrics.	2020-09-02 10:24:23 -05:00
Chris Piraino	45a4057f60	Report node/service usage metrics from every server Using the newly provided state store methods, we periodically emit usage metrics from the servers. We decided to emit these metrics from all servers, not just the leader, because that means we do not have to care about leader election flapping causing metrics turbulence, and it seems reasonable for each server to emit its own view of the state, even if they should always converge rapidly.	2020-09-02 10:24:17 -05:00
Daniel Nephin	f3b63514d5	testing: Remove NotifyShutdown NotifyShutdown was only used for testing. Now that t.Cleanup exists, we can use that instead of attaching cleanup to the Server shutdown. The Autopilot test which used NotifyShutdown doesn't need this notification because Shutdown is synchronous. Waiting for the function to return is equivalent.	2020-08-07 17:14:44 -04:00
Daniel Nephin	e6c94c1411	Remove LogOutput from Server	2020-08-05 14:00:44 -04:00
Daniel Nephin	c7c941811d	config: Remove unused field	2020-08-05 13:25:08 -04:00
Matt Keeler	e7d8a02ae8	Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363 ) This allows this to be reused elsewhere.	2020-07-23 16:05:28 -04:00
Matt Keeler	eda8cb39fd	Implement the insecure version of the Cluster.AutoConfig RPC endpoint Right now this is only hooked into the insecure RPC server and requires JWT authorization. If no JWT authorizer is setup in the configuration then we inject a disabled “authorizer” to always report that JWT authorization is disabled.	2020-06-17 11:25:29 -04:00
R.B. Boyer	16db20b1f3	acl: remove the deprecated `acl_enforce_version_8` option (#7991 ) Fixes #7292	2020-05-29 16:16:03 -05:00
Kit Patella	c3d24d7c3e	agent: stub out auditing functionality in OSS	2020-04-16 15:07:52 -07:00
R.B. Boyer	a7fb26f50f	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Hans Hasselberg	50281032e0	Security fixes (#7182 ) * Mitigate HTTP/RPC Services Allow Unbounded Resource Usage Fixes #7159. Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: Paul Banks <banks@banksco.de>	2020-01-31 11:19:37 -05:00
Matt Keeler	485a0a65ea	Updates to Config Entries and Connect for Namespaces (#7116 )	2020-01-24 10:04:58 -05:00
Hans Hasselberg	315ba7d6ad	connect: check if intermediate cert needs to be renewed. (#6835 ) Currently when using the built-in CA provider for Connect, root certificates are valid for 10 years, however secondary DCs get intermediates that are valid for only 1 year. There is no mechanism currently short of rotating the root in the primary that will cause the secondary DCs to renew their intermediates. This PR adds a check that renews the cert if it is half way through its validity period. In order to be able to test these changes, a new configuration option was added: IntermediateCertTTL which is set extremely low in the tests.	2020-01-17 23:27:13 +01:00
Matej Urbas	d877e091d6	agent: configurable MaxQueryTime and DefaultQueryTime. (#3777 )	2020-01-17 14:20:57 +01:00
Matt Keeler	9bd378a95c	Add EnterpriseConfig stubs (#6566 )	2019-10-01 14:34:55 -04:00
R.B. Boyer	5c5f21088c	sdk: add freelist tracking and ephemeral port range skipping to freeport This should cut down on test flakiness. Problems handled: - If you had enough parallel test cases running, the former circular approach to handling the port block could hand out the same port to multiple cases before they each had a chance to bind them, leading to one of the two tests to fail. - The freeport library would allocate out of the ephemeral port range. This has been corrected for Linux (which should cover CI). - The library now waits until a formerly-in-use port is verified to be free before putting it back into circulation.	2019-09-17 14:30:43 -05:00
Hans Hasselberg	73c4e9f07c	tls: auto_encrypt enables automatic RPC cert provisioning for consul clients (#5597 )	2019-06-27 22:22:07 +02:00
Pierre Souchay	e394a9469b	Support for maximum size for Output of checks (#5233 ) * Support for maximum size for Output of checks This PR allows users to limit the size of output produced by checks at the agent and check level. When set at the agent level, it will limit the output for all checks monitored by the agent. When set at the check level, it can override the agent max for a specific check but only if it is lower than the agent max. Default value is 4k, and input must be at least 1.	2019-06-26 09:43:25 -06:00
Hans Hasselberg	0d8d7ae052	agent: transfer leadership when establishLeadership fails (#5247 )	2019-06-19 14:50:48 +02:00
Kyle Havlovitz	ad24456f49	Set the dead node reclaim timer at 30s	2019-05-15 11:59:33 -07:00

1 2

98 Commits