open-consul

Author	SHA1	Message	Date
Freddy	f5634a24e8	Clean up StatsFetcher work when context is exceeded (#6086 )	2019-07-12 08:23:28 -06:00
Matt Keeler	6cc936d64b	Move ctx and cancel func setup into the Replicator.Start (#6115 ) Previously a sequence of events like: Start Stop Start Stop would segfault on the second stop because the original ctx and cancel func were only initialized during the constructor and not during Start.	2019-07-12 10:10:48 -04:00
Jack Pearkes	2b1761bab3	Make cluster names SNI always (#6081 ) * Make cluster names SNI always * Update some tests * Ensure we check for prepared query types * Use sni for route cluster names * Proper mesh gateway mode defaulting when the discovery chain is used * Ignore service splits from PatchSliceOfMaps * Update some xds golden files for proper test output * Allow for grpc/http listeners/cluster configs with the disco chain * Update stats expectation	2019-07-08 12:48:48 +01:00
Matt Keeler	35a839952b	Fix Internal.ServiceDump blocking (#6076 ) maxIndexWatchTxn was only watching the IndexEntry of the max index of all the entries. It needed to watch all of them regardless of which was the max. Also plumbed the query source through in the proxy config to help better track requests.	2019-07-04 16:17:49 +01:00
R.B. Boyer	a1900754db	digest the proxy-defaults protocol into the graph (#6050 )	2019-07-02 11:01:17 -05:00
R.B. Boyer	bccbb2b4ae	activate most discovery chain features in xDS for envoy (#6024 )	2019-07-01 22:10:51 -05:00
Matt Keeler	39bb0e3e77	Implement Mesh Gateways This includes both ingress and egress functionality.	2019-07-01 16:28:30 -04:00
Matt Keeler	03ccc7c5ae	Fix secondary dc connect CA roots watch issue The general problem was that a the CA config which contained the trust domain was happening outside of the blocking mechanism so if the client started the blocking query before the primary dcs roots had been set then a state trust domain was being pushed down. This was fixed here but in the future we should probably fixup the CA initialization code to not initialize the CA config twice when it doesn’t need to.	2019-07-01 16:28:30 -04:00
Matt Keeler	44dea31d1f	Include a content hash of the intention for use during replication	2019-07-01 16:28:30 -04:00
Matt Keeler	0fc4da6861	Implement intention replication and secondary CA initialization	2019-07-01 16:28:30 -04:00
Matt Keeler	24749bc7e5	Implement Kind based ServiceDump and caching of the ServiceDump RPC	2019-07-01 16:28:30 -04:00
R.B. Boyer	686e4606c6	do some initial config entry graph validation during writes (#6047 )	2019-07-01 15:23:36 -05:00
hashicorp-ci	e36792395e	Merge Consul OSS branch 'master' at commit e91f73f59249f5756896b10890e9298e7c1fbacc	2019-06-30 02:00:31 +00:00
Sarah Christoff	8a930f7d3a	Remove failed nodes from serfWAN (#6028 ) * Prune Servers from WAN and LAN * cleaned up and fixed LAN to WAN * moving things around * force-leave remove from serfWAN, create pruneSerfWAN * removed serfWAN remove, reduced complexity, fixed comments * add another place to remove from serfWAN * add nil check * Update agent/consul/server.go Co-Authored-By: Paul Banks <banks@banksco.de>	2019-06-28 12:40:07 -05:00
Hans Hasselberg	73c4e9f07c	tls: auto_encrypt enables automatic RPC cert provisioning for consul clients (#5597 )	2019-06-27 22:22:07 +02:00
R.B. Boyer	3eb1f00371	initial version of L7 config entry compiler (#5994 ) With this you should be able to fetch all of the relevant discovery chain config entries from the state store in one query and then feed them into the compiler outside of a transaction. There are a lot of TODOs scattered through here, but they're mostly around handling fun edge cases and can be deferred until more of the plumbing works completely.	2019-06-27 13:38:21 -05:00
R.B. Boyer	8850656580	adding new config entries for L7 discovery chain (unused) (#5987 )	2019-06-27 12:37:43 -05:00
Todd Radel	8ece11a24a	connect: store signingKeyId instead of authorityKeyId (#6005 )	2019-06-27 16:47:22 +02:00
Aestek	04a52a967b	acl: allow service deregistration with node write permission (#5217 ) With ACLs enabled if an agent is wiped and restarted without a leave it can no longer deregister the services it had previously registered because it no longer has the tokens the services were registered with. To remedy that we allow service deregistration from tokens with node write permission.	2019-06-27 14:24:34 +02:00
hashicorp-ci	3224bea082	Merge Consul OSS branch 'master' at commit 4eb73973b6e53336fd505dc727ac84c1f7e78872	2019-06-27 02:00:41 +00:00
Pierre Souchay	ca7c7faac8	agent: added metadata information about servers into consul service description (#5455 ) This allows have information about servers from HTTP APIs without using the command line.	2019-06-26 23:46:47 +02:00
Pierre Souchay	e394a9469b	Support for maximum size for Output of checks (#5233 ) * Support for maximum size for Output of checks This PR allows users to limit the size of output produced by checks at the agent and check level. When set at the agent level, it will limit the output for all checks monitored by the agent. When set at the check level, it can override the agent max for a specific check but only if it is lower than the agent max. Default value is 4k, and input must be at least 1.	2019-06-26 09:43:25 -06:00
hashicorp-ci	d237e86d83	Merge Consul OSS branch 'master' at commit 88b15d84f9fdb58ceed3dc971eb0390be85e3c15 skip-checks: true	2019-06-25 02:00:26 +00:00
Matt Keeler	f0f28707bc	New Cache Types (#5995 ) * Add a cache type for the Catalog.ListServices endpoint * Add a cache type for the Catalog.ListDatacenters endpoint	2019-06-24 14:11:34 -04:00
Matt Keeler	93debd2610	Ensure that looking for services by addreses works with Tagged Addresses (#5984 )	2019-06-21 13:16:17 -04:00
Hans Hasselberg	0d8d7ae052	agent: transfer leadership when establishLeadership fails (#5247 )	2019-06-19 14:50:48 +02:00
Aestek	24c29e195b	kv: do not trigger watches when setting the same value (#5885 ) If a KVSet is performed but does not update the entry, do not trigger watches for this key. This avoids releasing blocking queries for KV values that did not actually changed.	2019-06-18 15:06:29 +02:00
Matt Keeler	4c03f99a85	Fix CAS operations on Services (#5971 ) * Fix CAS operations on services * Update agent/consul/state/catalog_test.go Co-Authored-By: R.B. Boyer <public@richardboyer.net>	2019-06-17 10:41:04 -04:00
Paul Banks	e90fab0aec	Add rate limiting to RPCs sent within a server instance too (#5927 )	2019-06-13 04:26:27 -05:00
Freddy	8f5fe058ea	Increase reliability of TestResetSessionTimerLocked_Renew	2019-05-24 13:54:51 -04:00
Freddy	f7f0207f78	Run TestServer_Expect on its own (#5890 )	2019-05-23 19:52:33 -04:00
Freddy	e9bdb3a4f9	Flaky test: ACLReplication_Tokens (#5891 ) * Exclude non-go workflows while testing * Wait for s2 global-management policy * Revert "Exclude non-go workflows while testing" This reverts commit 47a83cbe9f19d0e1e475eabaa223d61fb4c56019.	2019-05-23 19:52:02 -04:00
Freddy	c9e6640337	Add retries to StatsFetcherTest (#5892 )	2019-05-23 19:51:31 -04:00
freddygv	d133d565a5	Wait for s2 global-management policy	2019-05-21 17:58:37 -06:00
Freddy	7ce28bbfee	Stop running TestLeader_ChangeServerID in parallel	2019-05-21 15:28:08 -06:00
Kyle Havlovitz	ad24456f49	Set the dead node reclaim timer at 30s	2019-05-15 11:59:33 -07:00
Kyle Havlovitz	dcbffdb956	Merge branch 'master' into change-node-id	2019-05-15 10:51:04 -07:00
Matt Keeler	46956ed769	Copy the proxy config instead of direct assignment (#5786 ) This prevents modifying the data in the state store which is supposed to be immutable.	2019-05-06 12:09:59 -04:00
R.B. Boyer	372bb06c83	acl: a role binding rule for a role that does not exist should be ignored (#5778 ) I wrote the docs under this assumption but completely forgot to actually enforce it.	2019-05-03 14:22:44 -05:00
R.B. Boyer	7d0f729f77	acl: enforce that you cannot persist tokens and roles with missing links except during replication (#5779 )	2019-05-02 15:02:21 -05:00
Matt Keeler	26708570c5	Fix ConfigEntryResponse binary marshaller and ensure we watch the chan in ConfigEntry.Get even when no entry exists. (#5773 )	2019-05-02 15:25:29 -04:00
Paul Banks	df0c61fd31	Fix previous accidental master push 🤦 (#5771 ) * Fix previous accidental master push 🤦 * Fix ACL test	2019-05-02 15:49:37 +01:00
Paul Banks	95bb1e368f	Fix panic in Resolving service config when proxy-defaults isn't defined yet (#5769 )	2019-05-02 14:12:21 +01:00
Paul Banks	cf24e7d1ed	Fix uint8 conversion issues for service config response maps.	2019-05-02 14:11:33 +01:00
Paul Banks	078f4cf5bb	Add integration test for central config; fix central config WIP (#5752 ) * Add integration test for central config; fix central config WIP * Add integration test for central config; fix central config WIP * Set proxy protocol correctly and begin adding upstream support * Add upstreams to service config cache key and start new notify watcher if they change. This doesn't update the tests to pass though. * Fix some merging logic get things working manually with a hack (TODO fix properly) * Simplification to not allow enabling sidecars centrally - it makes no sense without upstreams anyway * Test compile again and obvious ones pass. Lots of failures locally not debugged yet but may be flakes. Pushing up to see what CI does * Fix up service manageer and API test failures * Remove the enable command since it no longer makes much sense without being able to turn on sidecar proxies centrally * Remove version.go hack - will make integration test fail until release * Remove unused code from commands and upstream merge * Re-bump version to 1.5.0	2019-05-01 16:39:31 -07:00
Matt Keeler	9c77f2c52a	Update to use a consulent build tag instead of just ent (#5759 )	2019-05-01 11:11:27 -04:00
Matt Keeler	ea6cbf01a5	Centralized Config CLI (#5731 ) * Add HTTP endpoints for config entry management * Finish implementing decoding in the HTTP Config entry apply endpoint * Add CAS operation to the config entry apply endpoint Also use this for the bootstrapping and move the config entry decoding function into the structs package. * First pass at the API client for the config entries * Fixup some of the ConfigEntry APIs Return a singular response object instead of a list for the ConfigEntry.Get RPC. This gets plumbed through the HTTP API as well. Dont return QueryMeta in the JSON response for the config entry listing HTTP API. Instead just return a list of config entries. * Minor API client fixes * Attempt at some ConfigEntry api client tests These don’t currently work due to weak typing in JSON * Get some of the api client tests passing * Implement reflectwalk magic to correct JSON encoding a ProxyConfigEntry Also added a test for the HTTP endpoint that exposes the problem. However, since the test doesn’t actually do the JSON encode/decode its still failing. * Move MapWalk magic into a binary marshaller instead of JSON. * Add a MapWalk test * Get rid of unused func * Get rid of unused imports * Fixup some tests now that the decoding from msgpack coerces things into json compat types * Stub out most of the central config cli Fully implement the config read command. * Basic config delete command implementation * Implement config write command * Implement config list subcommand Not entirely sure about the output here. Its basically the read output indented with a line specifying the kind/name of each type which is also duplicated in the indented output. * Update command usage * Update some help usage formatting * Add the connect enable helper cli command * Update list command output * Rename the config entry API client methods. * Use renamed apis * Implement config write tests Stub the others with the noTabs tests. * Change list output format Now just simply output 1 line per named config * Add config read tests * Add invalid args write test. * Add config delete tests * Add config list tests * Add connect enable tests * Update some CLI commands to use CAS ops This also modifies the HTTP API for a write op to return a boolean indicating whether the value was written or not. * Fix up the HTTP API CAS tests as I realized they weren’t testing what they should. * Update config entry rpc tests to properly test CAS * Fix up a few more tests * Fix some tests that using ConfigEntries.Apply * Update config_write_test.go * Get rid of unused import	2019-04-30 16:27:16 -07:00
Matt Keeler	697efb588c	Make a few config entry endpoints return 404s and allow for snake_case and lowercase key names. (#5748 )	2019-04-30 18:19:19 -04:00
Matt Keeler	8beb5c6082	ACL Token ID Initialization (#5307 )	2019-04-30 11:45:36 -04:00
Kyle Havlovitz	64174f13d6	Add HTTP endpoints for config entry management (#5718 )	2019-04-29 18:08:09 -04:00
R.B. Boyer	ea2740fd32	Merge pull request #5617 from hashicorp/f-acl-ux Secure ACL Introduction for Kubernetes	2019-04-26 15:34:26 -05:00
Aestek	9813abcb09	Fix: fail messages after a node rename replace the new node definition (#5520 ) When receiving a serf faild message for a node which is not in the catalog, do not perform a register request to set is serf heath to critical as it could overwrite the node information and services if it was renamed. Fixes : #5518	2019-04-26 21:33:41 +01:00
R.B. Boyer	5a505c5b3a	acl: adding support for kubernetes auth provider login (#5600 ) * auth providers * binding rules * auth provider for kubernetes * login/logout	2019-04-26 14:49:25 -05:00
R.B. Boyer	9542fdc9bc	acl: adding Roles to Tokens (#5514 ) Roles are named and can express the same bundle of permissions that can currently be assigned to a Token (lists of Policies and Service Identities). The difference with a Role is that it not itself a bearer token, but just another entity that can be tied to a Token. This lets an operator potentially curate a set of smaller reusable Policies and compose them together into reusable Roles, rather than always exploding that same list of Policies on any Token that needs similar permissions. This also refactors the acl replication code to be semi-generic to avoid 3x copypasta.	2019-04-26 14:49:12 -05:00
R.B. Boyer	f43bc981e9	making ACLToken.ExpirationTime a *time.Time value instead of time.Time (#5663 ) This is mainly to avoid having the API return "0001-01-01T00:00:00Z" as a value for the ExpirationTime field when it is not set. Unfortunately time.Time doesn't respect the json marshalling "omitempty" directive.	2019-04-26 14:48:16 -05:00
R.B. Boyer	b3956e511c	acl: ACL Tokens can now be assigned an optional set of service identities (#5390 ) These act like a special cased version of a Policy Template for granting a token the privileges necessary to register a service and its connect proxy, and read upstreams from the catalog.	2019-04-26 14:48:04 -05:00
R.B. Boyer	76321aa952	acl: tokens can be created with an optional expiration time (#5353 )	2019-04-26 14:47:51 -05:00
Matt Keeler	3ea9fe3bff	Implement bootstrapping proxy defaults from the config file (#5714 )	2019-04-26 14:25:03 -04:00
Matt Keeler	3b5d38fb49	Implement config entry replication (#5706 )	2019-04-26 13:38:39 -04:00
Alvin Huang	96c2c79908	Add fmt and vet (#5671 ) * add go fmt and vet * go fmt fixes	2019-04-25 12:26:33 -04:00
Kyle Havlovitz	6faa8ba451	Fill out the service manager functionality and fix tests	2019-04-23 00:17:28 -07:00
Kyle Havlovitz	d51fd740bf	Merge pull request #5615 from hashicorp/config-entry-rpc Add RPC endpoints for config entry operations	2019-04-23 00:16:54 -07:00
Kyle Havlovitz	e64d1b8016	Rename config entry ACL methods	2019-04-22 23:55:11 -07:00
kaitlincarter-hc	7859d8c409	[docs] Server Performance (#5627 ) * Moving server performance guide to docs. * fixing broken links * updating broken link * fixing broken links	2019-04-17 13:17:12 -05:00
Matt Keeler	ac78c23021	Implement data filtering of some endpoints (#5579 ) Fixes: #4222 # Data Filtering This PR will implement filtering for the following endpoints: ## Supported HTTP Endpoints - `/agent/checks` - `/agent/services` - `/catalog/nodes` - `/catalog/service/:service` - `/catalog/connect/:service` - `/catalog/node/:node` - `/health/node/:node` - `/health/checks/:service` - `/health/service/:service` - `/health/connect/:service` - `/health/state/:state` - `/internal/ui/nodes` - `/internal/ui/services` More can be added going forward and any endpoint which is used to list some data is a good candidate. ## Usage When using the HTTP API a `filter` query parameter can be used to pass a filter expression to Consul. Filter Expressions take the general form of: ``` <selector> == <value> <selector> != <value> <value> in <selector> <value> not in <selector> <selector> contains <value> <selector> not contains <value> <selector> is empty <selector> is not empty not <other expression> <expression 1> and <expression 2> <expression 1> or <expression 2> ``` Normal boolean logic and precedence is supported. All of the actual filtering and evaluation logic is coming from the [go-bexpr](https://github.com/hashicorp/go-bexpr) library ## Other changes Adding the `Internal.ServiceDump` RPC endpoint. This will allow the UI to filter services better.	2019-04-16 12:00:15 -04:00
Kyle Havlovitz	2cffe4894f	Move the ACL logic into the ConfigEntry interface	2019-04-10 14:27:28 -07:00
Kyle Havlovitz	81254deb59	Add RPC endpoints for config entry operations	2019-04-06 23:38:08 -07:00
Alvin Huang	aacb81a566	Merge pull request #5376 from hashicorp/fix-tests Fix tests in prep for CircleCI Migration	2019-04-04 17:09:32 -04:00
Kyle Havlovitz	d6c25a13a5	Merge pull request #5539 from hashicorp/service-config Service config state model	2019-04-02 16:34:58 -07:00
Kyle Havlovitz	63c9434779	Cleaned up some error handling/comments around config entries	2019-04-02 15:42:12 -07:00
Kyle Havlovitz	ace5c7a1cb	Encode config entry FSM messages in a generic type	2019-03-28 00:06:56 -07:00
Kyle Havlovitz	96a460c0cf	Clean up service config state store methods	2019-03-27 16:52:38 -07:00
R.B. Boyer	ab57b02ff8	acl: memdb filter of tokens-by-policy was inverted (#5575 ) The inversion wasn't noticed because the parallel execution of TokenList tests was operating incorrectly due to variable shadowing.	2019-03-27 15:24:44 -05:00
Jeff Mitchell	d3c7d57209	Move internal/ to sdk/ (#5568 ) * Move internal/ to sdk/ * Add a readme to the SDK folder	2019-03-27 08:54:56 -04:00
Jeff Mitchell	a41c865059	Convert to Go Modules (#5517 ) * First conversion * Use serf 0.8.2 tag and associated updated deps * * Move freeport and testutil into internal/ * Make internal/ its own module * Update imports * Add replace statements so API and normal Consul code are self-referencing for ease of development * Adapt to newer goe/values * Bump to new cleanhttp * Fix ban nonprintable chars test * Update lock bad args test The error message when the duration cannot be parsed changed in Go 1.12 (ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test. * Update another test as well * Bump travis * Bump circleci * Bump go-discover and godo to get rid of launchpad dep * Bump dockerfile go version * fix tar command * Bump go-cleanhttp	2019-03-26 17:04:58 -04:00
Paul Banks	68e8933ba5	Connect: Make Connect health queries unblock correctly (#5508 ) * Make Connect health queryies unblock correctly in all cases and use optimal number of watch chans. Fixes #5506. * Node check test cases and clearer bug test doc * Comment update	2019-03-21 16:01:56 +00:00
Kyle Havlovitz	c2cba68042	Fix fsm serialization and add snapshot/restore	2019-03-20 16:13:13 -07:00
Kyle Havlovitz	9df597b257	Fill out state store/FSM functions and add tests	2019-03-19 15:56:17 -07:00
Kyle Havlovitz	53913461db	Add config types and state store table	2019-03-19 10:06:46 -07:00
Kyle Havlovitz	bb0839ea5b	Condense some test logic and add a comment about renaming	2019-03-18 16:15:36 -07:00
Paul Banks	dd08426b04	Optimize health watching to single chan/goroutine. (#5449 ) Refs #4984. Watching chans for every node we touch in a health query is wasteful. In #4984 it shows that if there are more than 682 service instances we always fallback to watching all services which kills performance. We already have a record in MemDB that is reliably update whenever the service health result should change thanks to per-service watch indexes. So in general, provided there is at least one service instances and we actually have a service index for it (we always do now) we only ever need to watch a single channel. This saves us from ever falling back to the general index and causing the performance cliff in #4984, but it also means fewer goroutines and work done for every blocking health query. It also saves some allocations made during the query because we no longer have to populate a WatchSet with 3 chans per service instance which saves the internal map allocation. This passes all state store tests except the one that explicitly checked for the fallback behaviour we've now optimized away and in general seems safe.	2019-03-15 20:18:48 +00:00
R.B. Boyer	d65008700a	acl: reduce complexity of token resolution process with alternative singleflighting (#5480 ) acl: reduce complexity of token resolution process with alternative singleflighting Switches acl resolution to use golang.org/x/sync/singleflight. For the identity/legacy lookups this is a drop-in replacement with the same overall approach to request coalescing. For policies this is technically a change in behavior, but when considered holistically is approximately performance neutral (with the benefit of less code). There are two goals with this blob of code (speaking specifically of policy resolution here): 1) Minimize cross-DC requests. 2) Minimize client-to-server LAN requests. The previous iteration of this code was optimizing for the case of many possibly different tokens being resolved concurrently that have a significant overlap in linked policies such that deduplication would be worth the complexity. While this is laudable there are some things to consider that can help to adjust expectations: 1) For v1.4+ policies are always replicated, and once a single policy shows up in a secondary DC the replicated data is considered authoritative for requests made in that DC. This means that our earlier concerns about minimizing cross-DC requests are irrelevant because there will be no cross-DC policy reads that occur. 2) For Server nodes the in-memory ACL policy cache is capped at zero, meaning it has no caching. Only Client nodes run with a cache. This means that instead of having an entire DC's worth of tokens (what a Server might see) that can have policy resolutions coalesced these nodes will only ever be seeing node-local token resolutions. In a reasonable worst-case scenario where a scheduler like Kubernetes has "filled" a node with Connect services, even that will only schedule ~100 connect services per node. If every service has a unique token there will only be 100 tokens to coalesce and even then those requests have to occur concurrently AND be hitting an empty consul cache. Instead of seeing a great coalescing opportunity for cutting down on redundant Policy resolutions, in practice it's far more likely given node densities that you'd see requests for the same token concurrently than you would for two tokens sharing a policy concurrently (to a degree that would warrant the overhead of the current variation of singleflighting. Given that, this patch switches the Policy resolution process to only singleflight by requesting token (but keeps the cache as by-policy).	2019-03-14 09:35:34 -05:00
Kyle Havlovitz	3aec844fd2	Update state store test for changing node ID	2019-03-13 17:05:31 -07:00
Kyle Havlovitz	df4ec913f0	Add a test for changing a failed node's ID	2019-03-13 15:39:07 -07:00
Hans Hasselberg	d511e86491	agent: enable reloading of tls config (#5419 ) This PR introduces reloading tls configuration. Consul will now be able to reload the TLS configuration which previously required a restart. It is not yet possible to turn TLS ON or OFF with these changes. Only when TLS is already turned on, the configuration can be reloaded. Most importantly the certificates and CAs.	2019-03-13 10:29:06 +01:00
R.B. Boyer	e9614ee92f	acl: correctly extend the cache for acl identities during resolution (#5475 )	2019-03-12 10:23:43 -05:00
Aestek	071fcb28ba	[catalog] Update the node's services indexes on update (#5458 ) Node updates were not updating the service indexes, which are used for service related queries. This caused the X-Consul-Index to stay the same after a node update as seen from a service query even though the node data is returned in heath queries. If that happened in between queries the client would miss this change. We now update the indexes of the services on the node when it is updated. Fixes: #5450	2019-03-11 14:48:19 +00:00
Kyle Havlovitz	bf09061e86	Add logic to allow changing a failed node's ID	2019-03-07 22:42:54 -08:00
Alvin Huang	ece3b5907d	fix typos	2019-03-06 14:47:33 -05:00
R.B. Boyer	91e78e00c7	fix typos reported by golangci-lint:misspell (#5434 )	2019-03-06 11:13:28 -06:00
R.B. Boyer	c24e3584be	improve flaky LANReap tests by expliciting configuring the tombstone timeout In TestServer_LANReap autopilot is running, so the alternate flow through the serf reaping function is possible. In that situation the ReconnectTimeout is not relevant so for parity also override the TombstoneTimeout value as well. For additional parity update the TestServer_WANReap and TestClient_LANReap versions of this test in the same way even though autopilot is irrelevant here .	2019-03-05 14:34:03 -06:00
Matt Keeler	87f9365eee	Fixes for CVE-2019-8336 Fix error in detecting raft replication errors. Detect redacted token secrets and prevent attempting to insert. Add a Redacted field to the TokenBatchRead and TokenRead RPC endpoints This will indicate whether token secrets have been redacted. Ensure any token with a redacted secret in secondary datacenters is removed. Test that redacted tokens cannot be replicated.	2019-03-04 19:13:24 +00:00
Matt Keeler	612aba7ced	Dont modify memdb owned token data for get/list requests of tokens (#5412 ) Previously we were fixing up the token links directly on the *ACLToken returned by memdb. This invalidated some assumptions that a snapshot is immutable as well as potentially being able to cause a crash. The fix here is to give the policy link fixing function copy on write semantics. When no fixes are necessary we can return the memdb object directly, otherwise we copy it and create a new list of links. Eventually we might find a better way to keep those policy links in sync but for now this fixes the issue.	2019-03-04 09:28:46 -05:00
Matt Keeler	416a6543a6	Call RemoveServer for reap events (#5317 ) This ensures that servers are removed from RPC routing when they are reaped.	2019-03-04 09:19:35 -05:00
R.B. Boyer	d3be5c1d3a	fix ignored errors in state store internals as reported by errcheck	2019-03-01 14:18:00 -06:00
Matt Keeler	0c76a4389f	ACL Token Persistence and Reloading (#5328 ) This PR adds two features which will be useful for operators when ACLs are in use. 1. Tokens set in configuration files are now reloadable. 2. If `acl.enable_token_persistence` is set to `true` in the configuration, tokens set via the `v1/agent/token` endpoint are now persisted to disk and loaded when the agent starts (or during configuration reload) Note that token persistence is opt-in so our users who do not want tokens on the local disk will see no change. Some other secondary changes: * Refactored a bunch of places where the replication token is retrieved from the token store. This token isn't just for replicating ACLs and now it is named accordingly. * Allowed better paths in the `v1/agent/token/` API. Instead of paths like: `v1/agent/token/acl_replication_token` the path can now be just `v1/agent/token/replication`. The old paths remain to be valid. * Added a couple new API functions to set tokens via the new paths. Deprecated the old ones and pointed to the new names. The names are also generally better and don't imply that what you are setting is for ACLs but rather are setting ACL tokens. There is a minor semantic difference there especially for the replication token as again, its no longer used only for ACL token/policy replication. The new functions will detect 404s and fallback to using the older token paths when talking to pre-1.4.3 agents. * Docs updated to reflect the API additions and to show using the new endpoints. * Updated the ACL CLI set-agent-tokens command to use the non-deprecated APIs.	2019-02-27 14:28:31 -05:00
Hans Hasselberg	75ababb54f	Centralise tls configuration part 1 (#5366 ) In order to be able to reload the TLS configuration, we need one way to generate the different configurations. This PR introduces a `tlsutil.Configurator` which holds a `tlsutil.Config`. Afterwards it is responsible for rendering every `tls.Config`. In this particular PR I moved `IncomingHTTPSConfig`, `IncomingTLSConfig`, and `OutgoingTLSWrapper` into `tlsutil.Configurator`. This PR is a pure refactoring - not a single feature added. And not a single test added. I only slightly modified existing tests as necessary.	2019-02-26 16:52:07 +01:00
Alvin Huang	c4168e6dfc	add wait to TestClient_JoinLAN	2019-02-22 17:34:45 -05:00
Alvin Huang	2e961d6539	add retry to TestResetSessionTimerLocked	2019-02-22 17:34:45 -05:00
R.B. Boyer	ae1cb27126	fix incorrect body of TestACLEndpoint_PolicyBatchRead Lifted from PR #5307 as it was an unrelated drive-by fix on that PR anyway. s/token/policy/	2019-02-22 09:32:51 -06:00
R.B. Boyer	8e344c0218	test: switch test file from assert -> require for consistency Also in acl_endpoint_test.go: * convert logical blocks in some token tests to subtests * remove use of require.New This removes a lot of noise in a later PR.	2019-02-14 14:21:19 -06:00
R.B. Boyer	57be6ca215	correct some typos	2019-02-13 13:02:12 -06:00
R.B. Boyer	a3e0fb8370	ensure that we plumb our configured logger into all parts of the raft library	2019-02-13 13:02:09 -06:00
R.B. Boyer	3b60891bf8	reduce the local scope of variable	2019-02-13 11:54:28 -06:00
R.B. Boyer	77d28fe9ce	clarify the ACL.PolicyDelete endpoint (#5337 ) There was an errant early-return in PolicyDelete() that bypassed the rest of the function. This was ok because the only caller of this function ignores the results. This removes the early-return making it structurally behave like TokenDelete() and for both PolicyDelete and TokenDelete clarify the lone callers to indicate that the return values are ignored. We may wish to avoid the entire return value as well, but this patch doesn't go that far.	2019-02-13 09:16:30 -06:00
R.B. Boyer	106d87a4a8	update TestStateStore_ACLBootstrap to not rely upon request mutation (#5335 )	2019-02-12 16:09:26 -06:00
Matt Keeler	fa2c7059a2	Move autopilot initialization to prevent race (#5322 ) `establishLeadership` invoked during leadership monitoring may use autopilot to do promotions etc. There was a race with doing that and having autopilot initialized and this fixes it.	2019-02-11 11:12:24 -05:00
Matt Keeler	210c3a56b0	Improve Connect with Prepared Queries (#5291 ) Given a query like: ``` { "Name": "tagged-connect-query", "Service": { "Service": "foo", "Tags": ["tag"], "Connect": true } } ``` And a Consul configuration like: ``` { "services": [ "name": "foo", "port": 8080, "connect": { "sidecar_service": {} }, "tags": ["tag"] ] } ``` If you executed the query it would always turn up with 0 results. This was because the sidecar service was being created without any tags. You could instead make your config look like: ``` { "services": [ "name": "foo", "port": 8080, "connect": { "sidecar_service": { "tags": ["tag"] } }, "tags": ["tag"] ] } ``` However that is a bit redundant for most cases. This PR ensures that the tags and service meta of the parent service get copied to the sidecar service. If there are any tags or service meta set in the sidecar service definition then this copying does not take place. After the changes, the query will now return the expected results. A second change was made to prepared queries in this PR which is to allow filtering on ServiceMeta just like we allow for filtering on NodeMeta.	2019-02-04 09:36:51 -05:00
R.B. Boyer	b5d71ea779	testutil: redirect some test agent logs to testing.T.Logf (#5304 ) When tests fail, only the logs for the failing run are dumped to the console which helps in diagnosis. This is easily added to other test scenarios as they come up.	2019-02-01 09:21:54 -06:00
Kyle Havlovitz	b30b541007	connect: Forward intention RPCs if this isn't the primary	2019-01-22 11:29:21 -08:00
Kyle Havlovitz	a731173661	Merge pull request #5249 from hashicorp/ca-fixes-oss Minor CA fixes	2019-01-22 11:25:09 -08:00
Kyle Havlovitz	b0f07d9b5e	Merge pull request #4869 from hashicorp/txn-checks Add node/service/check operations to transaction api	2019-01-22 11:16:09 -08:00
Matt Keeler	cc2cd75f5c	Fix several ACL token/policy resolution issues. (#5246 ) * Fix 2 remote ACL policy resolution issues 1 - Use the right method to fire async not found errors when the ACL.PolicyResolve RPC returns that error. This was previously accidentally firing a token result instead of a policy result which would have effectively done nothing (unless there happened to be a token with a secret id == the policy id being resolved. 2. When concurrent policy resolution is being done we single flight the requests. The bug before was that for the policy resolution that was going to piggy back on anothers RPC results it wasn’t waiting long enough for the results to come back due to looping with the wrong variable. * Fix a handful of other edge case ACL scenarios The main issue was that token specific issues (not able to access a particular policy or the token being deleted after initial fetching) were poisoning the policy cache. A second issue was that for concurrent token resolutions, the first resolution to get started would go fetch all the policies. If before the policies were retrieved a second resolution request came in, the new request would register watchers for those policies but then never block waiting for them to complete. This resulted in using the default policy when it shouldn't have.	2019-01-22 13:14:43 -05:00
Paul Banks	1c4dfbcd2e	connect: tame thundering herd of CSRs on CA rotation (#5228 ) * Support rate limiting and concurrency limiting CSR requests on servers; handle CA rotations gracefully with jitter and backoff-on-rate-limit in client * Add CSR rate limiting docs * Fix config naming and add tests for new CA configs	2019-01-22 17:19:36 +00:00
Kyle Havlovitz	4f53fe897a	oss: add the enterprise server stub for intention replication check	2019-01-18 17:32:10 -08:00
Matt Keeler	2f6a9edfac	Store leaf cert indexes in raft and use for the ModifyIndex on the returned certs (#5211 ) * Store leaf cert indexes in raft and use for the ModifyIndex on the returned certs This ensures that future certificate signings will have a strictly greater ModifyIndex than any previous certs signed.	2019-01-11 16:04:57 -05:00
Aestek	ff13518961	Improve blocking queries on services that do not exist (#4810 ) ## Background When making a blocking query on a missing service (was never registered, or is not registered anymore) the query returns as soon as any service is updated. On clusters with frequent updates (5~10 updates/s in our DCs) these queries virtually do not block, and clients with no protections againt this waste ressources on the agent and server side. Clients that do protect against this get updates later than they should because of the backoff time they implement between requests. ## Implementation While reducing the number of unnecessary updates we still want : * Clients to be notified as soon as when the last instance of a service disapears. * Clients to be notified whenever there's there is an update for the service. * Clients to be notified as soon as the first instance of the requested service is added. To reduce the number of unnecessary updates we need to block when a request to a missing service is made. However in the following case : 1. Client `client1` makes a query for service `foo`, gets back a node and X-Consul-Index 42 2. `foo` is unregistered 3. `client1` makes a query for `foo` with `index=42` -> `foo` does not exist, the query blocks and `client1` is not notified of the change on `foo` We could store the last raft index when each service was last alive to know wether we should block on the incoming query or not, but that list could grow indefinetly. We instead store the last raft index when a service was unregistered and use it when a query targets a service that does not exist. When a service `srv` is unregistered this "missing service index" is always greater than any X-Consul-Index held by the clients while `srv` was up, allowing us to immediatly notify them. 1. Client `client1` makes a query for service `foo`, gets back a node and `X-Consul-Index: 42` 2. `foo` is unregistered, we set the "missing service index" to 43 3. `client1` makes a blocking query for `foo` with `index=42` -> `foo` does not exist, we check against the "missing service index" and return immediatly with `X-Consul-Index: 43` 4. `client1` makes a blocking query for `foo` with `index=43` -> we block 5. Other changes happen in the cluster, but foo still doesn't exist and "missing service index" hasn't changed, the query is still blocked 6. `foo` is registered again on index 62 -> `foo` exists and its index is greater than 43, we unblock the query	2019-01-11 09:26:14 -05:00
Matt Keeler	29b4512120	acl: Prevent tokens from deleting themselves (#5210 ) Fixes #4897 Also apparently token deletion could segfault in secondary DCs when attempting to delete non-existant tokens. For that reason both checks are wrapped within the non-nil check.	2019-01-10 09:22:51 -05:00
Kyle Havlovitz	c266277a49	txn: clean up some state store/acl code	2019-01-09 11:59:23 -08:00
Pierre Souchay	5b8a7d7127	Avoid to have infinite recursion in DNS lookups when resolving CNAMEs (#4918 ) * Avoid to have infinite recursion in DNS lookups when resolving CNAMEs This will avoid killing Consul when a Service.Address is using CNAME to a Consul CNAME that creates an infinite recursion. This will fix https://github.com/hashicorp/consul/issues/4907 * Use maxRecursionLevel = 3 to allow several recursions	2019-01-07 16:53:54 -05:00
Paul Banks	0962e95e85	bugfix: use ServiceTags to generate cache key hash (#4987 ) * bugfix: use ServiceTags to generate cahce key hash * update unit test * update * remote print log * Update .gitignore * Completely deprecate ServiceTag field internally for clarity * Add explicit test for CacheInfo cases	2019-01-07 21:30:47 +00:00
Kyle Havlovitz	8b1dc6a22c	txn: fix an issue with querying nodes by name instead of ID	2018-12-12 12:46:33 -08:00
Pierre Souchay	61870be137	[Travis][UnstableTests] Fixed unstable tests in travis (#5013 ) * [Travis][UnstableTests] Fixed unstable tests in travis as seen in https://travis-ci.org/hashicorp/consul/jobs/460824602 * Fixed unstable tests in https://travis-ci.org/hashicorp/consul/jobs/460857687	2018-12-12 12:09:42 -08:00
Kyle Havlovitz	efcdc85e1a	api: add support for new txn operations	2018-12-12 10:54:09 -08:00
Kyle Havlovitz	2408f99cca	txn: add tests for RPC endpoint	2018-12-12 10:04:10 -08:00
Kyle Havlovitz	9f4f673c4d	txn: add ACL enforcement/validation to new txn ops	2018-12-12 10:04:10 -08:00
Kyle Havlovitz	41e8120d3d	state: add tests for new txn ops	2018-12-12 10:04:10 -08:00
Kyle Havlovitz	a40a346be8	txn: add service operations	2018-12-12 10:04:10 -08:00
Kyle Havlovitz	b1aeb3b943	txn: add node operations	2018-12-12 10:04:10 -08:00
Kyle Havlovitz	bd6b7ad162	txn: add pre-check operations to txn endpoint	2018-12-12 10:04:10 -08:00
Kyle Havlovitz	8a0d7b65d6	Add check operations to transaction api	2018-12-12 10:04:10 -08:00
Kyle Havlovitz	e7946197b8	connect/ca: prevent blank CA config in snapshot This PR both prevents a blank CA config from being written out to a snapshot and allows Consul to gracefully recover from a snapshot with an invalid CA config. Fixes #4954.	2018-12-06 17:40:53 -08:00
R.B. Boyer	c86eff8859	agent: remove some stray fmt.Print* calls (#5015 )	2018-11-29 09:45:51 -06:00
Pierre Souchay	d0ca1bade9	Fixed another list of unstable unit tests in travis (#4915 ) * Fixed another list of unstable unit tests in travis Fixed failing tests in https://travis-ci.org/hashicorp/consul/jobs/451357061 * Fixed another list of unstable unit tests in travis. Fixed failing tests in https://travis-ci.org/hashicorp/consul/jobs/451357061	2018-11-20 11:27:26 +00:00
Kyle Havlovitz	3cc7d6ebb5	Merge pull request #4952 from hashicorp/test-version tests: Bump test server version to 1.4.0	2018-11-13 13:37:10 -08:00
R.B. Boyer	8662a6d260	acl: add stub hooks to support some plumbing in enterprise (#4951 )	2018-11-13 15:35:54 -06:00
Kyle Havlovitz	19f9cad3fe	oss: bump test server version to 1.4.0	2018-11-13 13:13:26 -08:00
Aestek	4fb564abbc	Fix catalog tag filter backward compat (#4944 ) Fix catalog service node filtering (ex /v1/catalog/service/srv?tag=tag1) between agent version <=v1.2.3 and server >=v1.3.0. New server version did not account for the old field when filtering hence request made from old agent were not tag-filtered.	2018-11-13 14:44:36 +00:00
Kyle Havlovitz	b0dcf54e50	Merge pull request #4917 from hashicorp/replication-token-cleanup Use acl replication_token for connect	2018-11-12 09:12:54 -08:00
Kyle Havlovitz	038aefa0bc	update non-voting server test to fix enterprise diff	2018-11-09 12:50:24 -08:00
Kyle Havlovitz	70accbb2e0	oss: do a proper check-and-set on the CA roots/config fsm operation	2018-11-09 12:36:23 -08:00
R.B. Boyer	2e29f234b1	acl: fixes ACL replication for legacy tokens without AccessorIDs (#4885 )	2018-11-07 07:59:44 -08:00
Kyle Havlovitz	1a4204f363	agent: fix formatting	2018-11-07 02:16:03 -08:00
R.B. Boyer	a5d57f5326	fix comment typos (#4890 )	2018-11-02 12:00:39 -05:00
Kyle Havlovitz	5b7b8bf842	Merge pull request #4872 from hashicorp/node-snapshot-fix Node ID/datacenter snapshot fix	2018-10-31 15:51:07 -07:00
Matt Keeler	26b1873b3b	Adds documentation for the new ACL APIs (#4851 ) * Update the ACL API docs * Add a CreateTime to the anon token Also require acl:read permissions at least to perform rule translation. Don’t want someone DoSing the system with an open endpoint that actually does a bit of work. * Fix one place where I was referring to id instead of AccessorID * Add godocs for the API package additions. * Minor updates: removed some extra commas and updated the acl intro paragraph * minor tweaks * Updated the language to be clearer * Updated the language to be clearer for policy page * I was also confused by that! Your updates are much clearer. Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com> * Sounds much better. Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com> * Updated sidebar layout and deprecated warning	2018-10-31 15:11:51 -07:00
Matt Keeler	ec9934b6f8	Remaining ACL Unit Tests (#4852 ) * Add leader token upgrade test and fix various ACL enablement bugs * Update the leader ACL initialization tests. * Add a StateStore ACL tests for ACLTokenSet and ACLTokenGetBy* functions * Advertise the agents acl support status with the agent/self endpoint. * Make batch token upsert CAS’able to prevent consistency issues with token auto-upgrade * Finish up the ACL state store token tests * Finish the ACL state store unit tests Also rename some things to make them more consistent. * Do as much ACL replication testing as I can.	2018-10-31 13:00:46 -07:00
Kyle Havlovitz	cf2210b5c5	fsm: update snapshot/restore test to include ID and datacenter	2018-10-30 15:53:14 -07:00
Kyle Havlovitz	58ff5e46cb	fsm: add missing ID/datacenter to persistNodes	2018-10-30 15:52:54 -07:00
Matt Keeler	0dd537e506	Fix the NonVoter Bootstrap test (#4786 )	2018-10-24 10:23:50 -04:00

1 2 3 4 5 ...

667 commits