open-consul

Commit Graph

Author	SHA1	Message	Date
Chris Piraino	df1381f77f	Merge pull request #8603 from hashicorp/feature/usage-metrics Track node and service counts in the state store and emit them periodically as metrics	2020-09-02 13:23:39 -05:00
R.B. Boyer	4197bed23b	connect: fix bug in preventing some namespaced config entry modifications (#8601 ) Whenever an upsert/deletion of a config entry happens, within the open state store transaction we speculatively test compile all discovery chains that may be affected by the pending modification to verify that the write would not create an erroneous scenario (such as splitting traffic to a subset that did not exist). If a single discovery chain evaluation references two config entries with the same kind and name in different namespaces then sometimes the upsert/deletion would be falsely rejected. It does not appear as though this bug would've let invalid writes through to the state store so the correction does not require a cleanup phase.	2020-09-02 10:47:19 -05:00
Chris Piraino	b245d60200	Set metrics reporting interval to 9 seconds This is below the 10 second interval that lib/telemetry.go implements as its aggregation interval, ensuring that we always report these metrics.	2020-09-02 10:24:23 -05:00
Chris Piraino	e9b397005c	Update godoc string for memdb wrapper functions/structs	2020-09-02 10:24:22 -05:00
Chris Piraino	80f923a47a	Refactor state store usage to track unique service names This commit refactors the state store usage code to track unique service name changes on transaction commit. This means we only need to lookup usage entries when reading the information, as opposed to iterating over a large number of service indices. - Take into account a service instance's name being changed - Do not iterate through entire list of service instances, we only care about whether there is 0, 1, or more than 1.	2020-09-02 10:24:21 -05:00
Chris Piraino	79e6534345	Use ReadTxn interface in state store helper functions	2020-09-02 10:24:20 -05:00
Chris Piraino	d90d95421d	Add WriteTxn interface and convert more functions to ReadTxn We add a WriteTxn interface for use in updating the usage memdb table, with the forward-looking prospect of incrementally converting other functions to accept interfaces. As well, we use the ReadTxn in new usage code, and as a side effect convert a couple of existing functions to use that interface as well.	2020-09-02 10:24:19 -05:00
Chris Piraino	45a4057f60	Report node/service usage metrics from every server Using the newly provided state store methods, we periodically emit usage metrics from the servers. We decided to emit these metrics from all servers, not just the leader, because that means we do not have to care about leader election flapping causing metrics turbulence, and it seems reasonable for each server to emit its own view of the state, even if they should always converge rapidly.	2020-09-02 10:24:17 -05:00
Chris Piraino	3af96930eb	Add new usage memdb table that tracks usage counts of various elements We update the usage table on Commit() by using the TrackedChanges() API of memdb. Track memdb changes on restore so that usage data can be compiled	2020-09-02 10:24:16 -05:00
Matt Keeler	335c604ced	Merge of auto-config and auto-encrypt code (#8523 ) auto-encrypt is now handled as a special case of auto-config. This also is moving all the cert-monitor code into the auto-config package.	2020-08-31 13:12:17 -04:00
Daniel Nephin	845661c8af	Merge pull request #8548 from edevil/fix_flake Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve	2020-08-28 15:10:55 -04:00
R.B. Boyer	f2b8bf109c	xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569 )	2020-08-27 12:20:58 -05:00
Matt Keeler	106e1d50bd	Move RPC router from Client/Server and into BaseDeps (#8559 ) This will allow it to be a shared component which is needed for AutoConfig	2020-08-27 11:23:52 -04:00
André Cruz	673bd69f36	Decrease test flakiness Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve and TestCacheNotifyPolling	2020-08-24 20:30:02 +01:00
André Cruz	a64686fab6	testing: Fix govet errors	2020-08-21 18:01:55 +01:00
Hans Hasselberg	02de4c8b76	add primary keys to list keyring (#8522 ) During gossip encryption key rotation it would be nice to be able to see if all nodes are using the same key. This PR adds another field to the json response from `GET v1/operator/keyring` which lists the primary keys in use per dc. That way an operator can tell when a key was successfully setup as primary key. Based on https://github.com/hashicorp/serf/pull/611 to add primary key to list keyring output: ```json [ { "WAN": true, "Datacenter": "dc2", "Segment": "", "Keys": { "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 6, "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6 }, "PrimaryKeys": { "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6 }, "NumNodes": 6 }, { "WAN": false, "Datacenter": "dc2", "Segment": "", "Keys": { "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 8, "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8 }, "PrimaryKeys": { "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8 }, "NumNodes": 8 }, { "WAN": false, "Datacenter": "dc1", "Segment": "", "Keys": { "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 3, "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8 }, "PrimaryKeys": { "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8 }, "NumNodes": 8 } ] ``` I intentionally did not change the CLI output because I didn't find a good way of displaying this information. There are a couple of options that we could implement later: * add a flag to show the primary keys * add a flag to show json output Fixes #3393.	2020-08-18 09:50:24 +02:00
Daniel Nephin	8d35e37b3c	testing: Remove all the defer os.Removeall Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage the removal	2020-08-14 19:58:53 -04:00
Daniel Nephin	629c34085d	state: remove unused Store method receiver And use ReadTxn interface where appropriate.	2020-08-13 11:25:22 -04:00
Daniel Nephin	fc797a279a	Merge pull request #8461 from hashicorp/dnephin/remove-notify-shutdown agent/consul: Remove NotifyShutdown	2020-08-13 11:16:48 -04:00
Daniel Nephin	d8ffcd5686	Merge pull request #8365 from hashicorp/dnephin/fix-service-by-node-meta-flake state: speed up tests that use watchLimit	2020-08-13 11:16:12 -04:00
R.B. Boyer	63422ca9c5	connect: use stronger validation that ingress gateways have compatible protocols defined for their upstreams (#8470 ) Fixes #8466 Since Consul 1.8.0 there was a bug in how ingress gateway protocol compatibility was enforced. At the point in time that an ingress-gateway config entry was modified the discovery chain for each upstream was checked to ensure the ingress gateway protocol matched. Unfortunately future modifications of other config entries were not validated against existing ingress-gateway definitions, such as: 1. create tcp ingress-gateway pointing to 'api' (ok) 2. create service-defaults for 'api' setting protocol=http (worked, but not ok) 3. create service-splitter or service-router for 'api' (worked, but caused an agent panic) If you were to do these in a different order, it would fail without a crash: 1. create service-defaults for 'api' setting protocol=http (ok) 2. create service-splitter or service-router for 'api' (ok) 3. create tcp ingress-gateway pointing to 'api' (fail with message about protocol mismatch) This PR introduces the missing validation. The two new behaviors are: 1. create tcp ingress-gateway pointing to 'api' (ok) 2. (NEW) create service-defaults for 'api' setting protocol=http ("ok" for back compat) 3. (NEW) create service-splitter or service-router for 'api' (fail with message about protocol mismatch) In consideration for any existing users that may be inadvertently be falling into item (2) above, that is now officiall a valid configuration to be in. For anyone falling into item (3) above while you cannot use the API to manufacture that scenario anymore, anyone that has old (now bad) data will still be able to have the agent use them just enough to generate a new agent/proxycfg error message rather than a panic. Unfortunately we just don't have enough information to properly fix the config entries.	2020-08-12 11:19:20 -05:00
Hans Hasselberg	7a6d916ddc	Merge pull request #8471 from hashicorp/local_only thread local-only through the layers	2020-08-12 08:54:51 +02:00
Freddy	50fee12d62	Internal endpoint to query intentions associated with a gateway (#8400 )	2020-08-11 17:20:41 -06:00
Kyle Havlovitz	8118e3db40	Fix a state store comment about version	2020-08-11 13:46:12 -07:00
Kyle Havlovitz	2601585017	fsm: Fix snapshot bug with restoring node/service/check indexes	2020-08-11 11:49:52 -07:00
Hans Hasselberg	e0297b6e99	Refactor keyring ops: * changes some functions to return data instead of modifying pointer arguments * renames globalRPC() to keyringRPCs() to make its purpose more clear * restructures KeyringOperation() to make it more understandable	2020-08-11 13:42:03 +02:00
Daniel Nephin	bef9348ca8	testing: remove unnecessary defers in tests The data directory is now removed by the test helper that created it.	2020-08-07 17:28:16 -04:00
Daniel Nephin	f3b63514d5	testing: Remove NotifyShutdown NotifyShutdown was only used for testing. Now that t.Cleanup exists, we can use that instead of attaching cleanup to the Server shutdown. The Autopilot test which used NotifyShutdown doesn't need this notification because Shutdown is synchronous. Waiting for the function to return is equivalent.	2020-08-07 17:14:44 -04:00
Hans Hasselberg	fdceb24323	auto_config implies connect (#8433 )	2020-08-07 12:02:02 +02:00
Daniel Nephin	061ae94c63	Rename NewClient/NewServer Now that duplicate constructors have been removed we can use the shorter names for the single constructor.	2020-08-05 14:00:55 -04:00
Daniel Nephin	e6c94c1411	Remove LogOutput from Server	2020-08-05 14:00:44 -04:00
Daniel Nephin	fdf966896f	Remove LogOutput from Client	2020-08-05 14:00:42 -04:00
Daniel Nephin	73493ca01b	Pass a logger to ConnPool and yamux, instead of an io.Writer Allowing us to remove the LogOutput field from config.	2020-08-05 13:25:08 -04:00
Daniel Nephin	c7c941811d	config: Remove unused field	2020-08-05 13:25:08 -04:00
Freddy	7c2c8815d7	Avoid panics during shutdown routine (#8412 )	2020-07-30 11:11:10 -06:00
Matt Keeler	76add4f24c	Allow setting verify_incoming* when using auto_encrypt or auto_config (#8394 ) Ensure that enabling AutoConfig sets the tls configurator properly This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.	2020-07-30 10:15:12 -04:00
Matt Keeler	dad0f189a2	Agent Auto Config: Implement Certificate Generation (#8360 ) Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server. This also refactors the auto-config package a bit to split things out into multiple files.	2020-07-28 15:31:48 -04:00
Matt Keeler	3a1058a06b	Move connect root retrieval and cert signing logic out of the RPC endpoints (#8364 ) The code now lives on the Server type itself. This was done so that all of this could be shared with auto config certificate signing.	2020-07-24 10:00:51 -04:00
Matt Keeler	e7d8a02ae8	Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363 ) This allows this to be reused elsewhere.	2020-07-23 16:05:28 -04:00
Daniel Nephin	597dcf2bfb	Merge pull request #8323 from hashicorp/dnephin/add-event-publisher-2 stream: close subscriptions on shutdown	2020-07-23 13:12:50 -04:00
Matt Keeler	c3e7d689b7	Refactor the agentpb package (#8362 ) First move the whole thing to the top-level proto package name. Secondly change some things around internally to have sub-packages.	2020-07-23 11:24:20 -04:00
Daniel Nephin	decba06b7d	stream: close all subs when EventProcessor is shutdown.	2020-07-22 19:04:10 -04:00
Daniel Nephin	e802689bbe	stream: fix overallocation in filter And add tests	2020-07-22 19:04:10 -04:00
Daniel Nephin	f64725f7aa	state: speed up TestStateStore_ServicesByNodeMeta Make watchLimit a var so that we can patch it in tests and reduce the time spent creating state.	2020-07-22 16:57:06 -04:00
Daniel Nephin	a44ddea9ba	state: Use subtests in TestStateStore_ServicesByNodeMeta These subtests make it much easier to identify the slow part of the test, but they also help enumerate all the different cases which are being tested.	2020-07-22 16:39:09 -04:00
Daniel Nephin	6d3b042872	Merge pull request #7948 from hashicorp/dnephin/buffer-test-logs testutil: NewLogBuffer - buffer logs until a test fails	2020-07-21 15:21:52 -04:00
Matt Keeler	8ea8a939f0	Merge pull request #8311 from hashicorp/bugfix/auto-encrypt-token-update	2020-07-21 13:15:27 -04:00
Daniel Nephin	80ff174880	testutil: NewLogBuffer - buffer logs until a test fails Replaces #7559 Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case. Attaching log output from background goroutines also cause data races. If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race. The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long. This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful. All of the logs are printed from the test goroutine, so they should be associated with the correct test. Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly. Related: https://github.com/golang/go/issues/38458 (may be fixed in go1.15) https://github.com/golang/go/issues/38382#issuecomment-612940030	2020-07-21 12:50:40 -04:00
Matt Keeler	133a6d99f2	Fix issue with changing the agent token causing failure to renew the auto-encrypt certificate The fallback method would still work but it would get into a state where it would let the certificate expire for 10s before getting a new one. And the new one used the less secure RPC endpoint. This is also a pretty large refactoring of the auto encrypt code. I was going to write some tests around the certificate monitoring but it was going to be impossible to get a TestAgent configured in such a way that I could write a test that ran in less than an hour or two to exercise the functionality. Moving the certificate monitoring into its own package will allow for dependency injection and in particular mocking the cache types to control how it hands back certificates and how long those certificates should live. This will allow for exercising the main loop more than would be possible with it coupled so tightly with the Agent.	2020-07-21 12:19:25 -04:00
Daniel Nephin	7599e280de	stream: handle empty event in TestEventSnapshot When the race detector is enabled we see this test fail occasionally. The reordering of execution seems to make it possible for the snapshot splice to happen before any events are published to the topicBuffers. We can handle this case in the test the same way it is handled by a subscription, by proceeding to the next event.	2020-07-20 18:20:02 -04:00

1 2 3 4 5 ...

934 Commits