open-consul

Commit Graph

Author	SHA1	Message	Date
Daniel Nephin	44da869ed4	stream: Use a no-op event publisher if streaming is disabled	2020-10-28 13:54:19 -04:00
Kyle Havlovitz	926a393a5c	Merge pull request #8784 from hashicorp/renew-intermediate-primary connect: Enable renewing the intermediate cert in the primary DC	2020-10-09 12:18:59 -07:00
Kyle Havlovitz	62270c3f9a	Merge branch 'master' into renew-intermediate-primary	2020-10-09 04:40:34 -07:00
Daniel Nephin	05df7b18a9	config: add field for enabling streaming RPC endpoint	2020-10-08 12:11:20 -04:00
Daniel Nephin	783627aeef	Merge pull request #8768 from hashicorp/streaming/add-subscribe-service subscribe: add subscribe service for streaming change events	2020-10-07 21:24:03 -04:00
R.B. Boyer	d6dce2332a	connect: intentions are now managed as a new config entry kind "service-intentions" (#8834 ) - Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older copies of consul will not see new config entries it doesn't understand replicate down. - Add shim conversion code so that the old API/CLI method of interacting with intentions will continue to work so long as none of these are edited via config entry endpoints. Almost all of the read-only APIs will continue to function indefinitely. - Add new APIs that operate on individual intentions without IDs so that the UI doesn't need to implement CAS operations. - Add a new serf feature flag indicating support for intentions-as-config-entries. - The old line-item intentions way of interacting with the state store will transparently flip between the legacy memdb table and the config entry representations so that readers will never see a hiccup during migration where the results are incomplete. It uses a piece of system metadata to control the flip. - The primary datacenter will begin migrating intentions into config entries on startup once all servers in the datacenter are on a version of Consul with the intentions-as-config-entries feature flag. When it is complete the old state store representations will be cleared. We also record a piece of system metadata indicating this has occurred. We use this metadata to skip ALL of this code the next time the leader starts up. - The secondary datacenters continue to run the old intentions replicator until all servers in the secondary DC and primary DC support intentions-as-config-entries (via serf flag). Once this condition it met the old intentions replicator ceases. - The secondary datacenters replicate the new config entries as they are migrated in the primary. When they detect that the primary has zeroed it's old state store table it waits until all config entries up to that point are replicated and then zeroes its own copy of the old state store table. We also record a piece of system metadata indicating this has occurred. We use this metadata to skip ALL of this code the next time the leader starts up.	2020-10-06 13:24:05 -05:00
Daniel Nephin	fa115c6249	Move agent/subscribe -> agent/rpc/subscribe	2020-10-06 12:49:35 -04:00
Daniel Nephin	011109a6f6	subscirbe: extract streamID and logging from Subscribe By extracting all of the tracing logic the core logic of the Subscribe endpoint is much easier to read.	2020-10-06 12:49:35 -04:00
Daniel Nephin	371ec2d70a	subscribe: add a stateless subscribe service for the gRPC server With a Backend that provides access to the necessary dependencies.	2020-10-06 12:49:35 -04:00
R.B. Boyer	ccd0200bd9	server: ensure that we also shutdown network segment serf instances on server shutdown (#8786 ) This really only matters for unit tests, since typically if an agent shuts down its server, it follows that up by exiting the process, which would also clean up all of the networking anyway.	2020-09-30 16:23:43 -05:00
Kyle Havlovitz	2956313f2d	connect: Enable renewing the intermediate cert in the primary DC	2020-09-30 12:31:21 -07:00
R.B. Boyer	45609fccdf	server: make sure that the various replication loggers use consistent logging (#8745 )	2020-09-24 15:49:38 -05:00
Daniel Nephin	c621b4a420	agent/consul: pass dependencies directly from agent In an upcoming change we will need to pass a grpc.ClientConnPool from BaseDeps into Server. While looking at that change I noticed all of the existing consulOption fields are already on BaseDeps. Instead of duplicating the fields, we can create a struct used by agent/consul, and use that struct in BaseDeps. This allows us to pass along dependencies without translating them into different representations. I also looked at moving all of BaseDeps in agent/consul, however that created some circular imports. Resolving those cycles wouldn't be too bad (it was only an error in agent/consul being imported from cache-types), however this change seems a little better by starting to introduce some structure to BaseDeps. This change is also a small step in reducing the scope of Agent. Also remove some constants that were only used by tests, and move the relevant comment to where the live configuration is set. Removed some validation from NewServer and NewClient, as these are not really runtime errors. They would be code errors, which will cause a panic anyway, so no reason to handle them specially here.	2020-09-15 17:29:32 -04:00
Daniel Nephin	0536b2047e	agent/consul: make router required	2020-09-15 17:26:26 -04:00
Daniel Nephin	863a9df951	server: add gRPC server for streaming events Includes a stats handler and stream interceptor for grpc metrics. Co-authored-by: Paul Banks <banks@banksco.de>	2020-09-08 12:10:41 -04:00
Chris Piraino	45a4057f60	Report node/service usage metrics from every server Using the newly provided state store methods, we periodically emit usage metrics from the servers. We decided to emit these metrics from all servers, not just the leader, because that means we do not have to care about leader election flapping causing metrics turbulence, and it seems reasonable for each server to emit its own view of the state, even if they should always converge rapidly.	2020-09-02 10:24:17 -05:00
Matt Keeler	106e1d50bd	Move RPC router from Client/Server and into BaseDeps (#8559 ) This will allow it to be a shared component which is needed for AutoConfig	2020-08-27 11:23:52 -04:00
Daniel Nephin	f3b63514d5	testing: Remove NotifyShutdown NotifyShutdown was only used for testing. Now that t.Cleanup exists, we can use that instead of attaching cleanup to the Server shutdown. The Autopilot test which used NotifyShutdown doesn't need this notification because Shutdown is synchronous. Waiting for the function to return is equivalent.	2020-08-07 17:14:44 -04:00
Daniel Nephin	061ae94c63	Rename NewClient/NewServer Now that duplicate constructors have been removed we can use the shorter names for the single constructor.	2020-08-05 14:00:55 -04:00
Daniel Nephin	e6c94c1411	Remove LogOutput from Server	2020-08-05 14:00:44 -04:00
Daniel Nephin	73493ca01b	Pass a logger to ConnPool and yamux, instead of an io.Writer Allowing us to remove the LogOutput field from config.	2020-08-05 13:25:08 -04:00
Freddy	7c2c8815d7	Avoid panics during shutdown routine (#8412 )	2020-07-30 11:11:10 -06:00
Matt Keeler	76add4f24c	Allow setting verify_incoming* when using auto_encrypt or auto_config (#8394 ) Ensure that enabling AutoConfig sets the tls configurator properly This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.	2020-07-30 10:15:12 -04:00
Matt Keeler	dad0f189a2	Agent Auto Config: Implement Certificate Generation (#8360 ) Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server. This also refactors the auto-config package a bit to split things out into multiple files.	2020-07-28 15:31:48 -04:00
Daniel Nephin	23a940daad	server: Abandom state store to shutdown EventPublisher So that we don't leak goroutines	2020-07-14 15:57:47 -04:00
Daniel Nephin	13e0d258b5	Merge pull request #8237 from hashicorp/dnephin/remove-acls-enabled-from-delegate Remove ACLsEnabled from delegate interface	2020-07-09 16:35:43 -04:00
Matt Keeler	39d9babab3	Pass the Config and TLS Configurator into the AutoConfig constructor This is instead of having the AutoConfigBackend interface provide functions for retrieving them. NOTE: the config is not reloadable. For now this is fine as we don’t look at any reloadable fields. If that changes then we should provide a way to make it reloadable.	2020-07-08 12:36:11 -04:00
Matt Keeler	a77ed471c8	Rename (Server).forward to (Server).ForwardRPC Also get rid of the preexisting shim in server.go that existed before to have this name just call the unexported one.	2020-07-08 11:05:44 -04:00
Matt Keeler	386ec3a2a2	Refactor AutoConfig RPC to not have a direct dependency on the Server type Instead it has an interface which can be mocked for better unit testing that is deterministic and not prone to flakiness.	2020-07-08 11:05:44 -04:00
Daniel Nephin	8b6036c077	Remove ACLsEnabled from delegate interface In all cases (oss/ent, client/server) this method was returning a value from config. Since the value is consistent, it doesn't need to be part of the delegate interface.	2020-07-03 17:00:20 -04:00
Matt Keeler	5600069d69	Store the Connect CA rate limiter on the server This fixes a bug where auto_encrypt was operating without utilizing a common rate limiter.	2020-06-30 09:59:07 -04:00
Matt Keeler	d471977f62	Fix go routine leak in auto encrypt ca roots tracking	2020-06-24 17:09:50 -04:00
Matt Keeler	341aedbce9	Ensure that retryLoopBackoff can be cancelled We needed to pass a cancellable context into the limiter.Wait instead of context.Background. So I made the func take a context instead of a chan as most places were just passing through a Done chan from a context anyways. Fix go routine leak in the gateway locator	2020-06-24 12:41:08 -04:00
Matt Keeler	f5d57ccd48	Allow the Agent its its child Client/Server to share a connection pool This is needed so that we can make an AutoConfig RPC at the Agent level prior to creating the Client/Server.	2020-06-17 16:19:33 -04:00
Matt Keeler	8c601ad8db	Merge pull request #8035 from hashicorp/feature/auto-config/server-rpc	2020-06-17 16:07:25 -04:00
Matt Keeler	eda8cb39fd	Implement the insecure version of the Cluster.AutoConfig RPC endpoint Right now this is only hooked into the insecure RPC server and requires JWT authorization. If no JWT authorizer is setup in the configuration then we inject a disabled “authorizer” to always report that JWT authorization is disabled.	2020-06-17 11:25:29 -04:00
Daniel Nephin	89d95561df	Enable gofmt simplify Code changes done automatically with 'gofmt -s -w'	2020-06-16 13:21:11 -04:00
R.B. Boyer	3ad570ba99	server: don't activate federation state replication or anti-entropy until all servers are running 1.8.0+ (#8014 )	2020-06-04 16:05:27 -05:00
Hans Hasselberg	dd8cd9bc24	Merge pull request #7966 from hashicorp/pool_improvements Agent connection pool cleanup	2020-06-04 08:56:26 +02:00
R.B. Boyer	16db20b1f3	acl: remove the deprecated `acl_enforce_version_8` option (#7991 ) Fixes #7292	2020-05-29 16:16:03 -05:00
Hans Hasselberg	5cda505495	pool: remove useTLS and ForceTLS In the past TLS usage was enforced with these variables, but these days this decision is made by TLSConfigurator and there is no reason to keep using the variables.	2020-05-29 08:21:24 +02:00
R.B. Boyer	813d69622e	agent: handle re-bootstrapping in a secondary datacenter when WAN federation via mesh gateways is configured (#7931 ) The main fix here is to always union the `primary-gateways` list with the list of mesh gateways in the primary returned from the replicated federation states list. This will allow any replicated (incorrect) state to be supplemented with user-configured (correct) state in the config file. Eventually the game of random selection whack-a-mole will pick a winning entry and re-replicate the latest federation states from the primary. If the user-configured state is actually the incorrect one, then the same eventual correct selection process will work in that case, too. The secondary fix is actually to finish making wanfed-via-mgws actually work as originally designed. Once a secondary datacenter has replicated federation states for the primary AND managed to stand up its own local mesh gateways then all of the RPCs from a secondary to the primary SHOULD go through two sets of mesh gateways to arrive in the consul servers in the primary (one hop for the secondary datacenter's mesh gateway, and one hop through the primary datacenter's mesh gateway). This was neglected in the initial implementation. While everything works, ideally we should treat communications that go around the mesh gateways as just provided for bootstrapping purposes. Now we heuristically use the success/failure history of the federation state replicator goroutine loop to determine if our current mesh gateway route is working as intended. If it is, we try using the local gateways, and if those don't work we fall back on trying the primary via the union of the replicated state and the go-discover configuration flags. This can be improved slightly in the future by possibly initializing the gateway choice to local on startup if we already have replicated state. This PR does not address that improvement. Fixes #7339	2020-05-27 11:31:10 -05:00
Hans Hasselberg	854aac510f	agent: refactor to use a single addrFn	2020-05-05 21:08:10 +02:00
Hans Hasselberg	6626cb69d6	rpc: oss changes for network area connection pooling (#7735 )	2020-04-30 22:12:17 +02:00
Daniel Nephin	ebb851f32d	agent: Remove unused Encrypted from interface It appears to be unused. It looks like it has been around a while, I geuss at some point we stopped using this method.	2020-03-26 12:34:31 -04:00
R.B. Boyer	a7fb26f50f	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
ShimmerGlass	a27ccc7248	agent: add server raft.{last,applied}_index gauges (#6694 ) These metrics are useful for : * Tracking the rate of update to the db * Allow to have a rough idea of when an index originated	2020-02-11 10:50:18 +01:00
Kit Patella	d28bc1acbe	rpc: measure blocking queries (#7224 ) * agent: measure blocking queries * agent.rpc: update docs to mention we only record blocking queries * agent.rpc: make go fmt happy * agent.rpc: fix non-atomic read and decrement with bitwise xor of uint64 0 * agent.rpc: clarify review question * agent.rpc: today I learned that one must declare all variables before interacting with goto labels * Update agent/consul/server.go agent.rpc: more precise comment on `Server.queriesBlocking` Co-Authored-By: Paul Banks <banks@banksco.de> * Update website/source/docs/agent/telemetry.html.md agent.rpc: improve queries_blocking description Co-Authored-By: Paul Banks <banks@banksco.de> * agent.rpc: fix some bugs found in review * add a note about the updated counter behavior to telemetry.md * docs: add upgrade-specific note on consul.rpc.quer{y,ies_blocking} behavior Co-authored-by: Paul Banks <banks@banksco.de>	2020-02-10 10:01:15 -08:00
Hans Hasselberg	50281032e0	Security fixes (#7182 ) * Mitigate HTTP/RPC Services Allow Unbounded Resource Usage Fixes #7159. Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: Paul Banks <banks@banksco.de>	2020-01-31 11:19:37 -05:00
R.B. Boyer	01ebdff2a9	various tweaks on top of the hclog work (#7165 )	2020-01-29 11:16:08 -06:00

1 2 3 4

152 Commits