open-consul

Commit Graph

Author	SHA1	Message	Date
Daniel Nephin	8a68c6d517	lib/retry: allow jitter to exceed max wait. I changed this in https://github.com/hashicorp/consul/pull/8802#pullrequestreview-500779357 because exceeding the MaxWait seemed wrong, but as other have pointed out, that behaviour is probably correct. When multiple waiters hit the max value, we don't want them to converge, so restore the behaviour of allowing jitter to exceed max, and document it.	2021-04-07 18:33:11 -04:00
Daniel Nephin	23df31f7c0	Merge pull request #8698 from pierreca/fix-iserreof Use errors.Is() in IsErrEOF()	2021-03-16 17:56:15 -04:00
Daniel Nephin	3685f39970	lib/mutex: add mutex with TryLock and update vendor	2021-01-25 18:01:47 -05:00
Daniel Nephin	ef0999547a	testing: skip slow tests with -short Add a skip condition to all tests slower than 100ms. This change was made using `gotestsum tool slowest` with data from the last 3 CI runs of master. See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests With this change: ``` $ time go test -count=1 -short ./agent ok github.com/hashicorp/consul/agent 0.743s real 0m4.791s $ time go test -count=1 -short ./agent/consul ok github.com/hashicorp/consul/agent/consul 4.229s real 0m8.769s ```	2020-12-07 13:42:55 -05:00
Kit Patella	7c3013a60f	add note about deleting TelemetryConfig.MergeDefaults in the future	2020-11-16 15:53:52 -08:00
Kit Patella	4c30ebbb73	fix some tests that were broken from the TelemetryConfig change	2020-11-16 15:22:36 -08:00
Kit Patella	64c82130b9	prometheussink has the same number of params again	2020-11-16 14:01:40 -08:00
Kit Patella	6290be054a	use the MetricsPrefix to set the service name and provide as slice literal to avoid bugs from append modifying its first arg	2020-11-16 14:01:12 -08:00
Kit Patella	464d13d80b	push prometheus sink definiitons into prometheus.PrometheusOpts	2020-11-16 12:44:47 -08:00
Kit Patella	9533372ded	first pass on agent-configured prometheusDefs and adding defs for every consul metric	2020-11-12 18:12:12 -08:00
Kit Patella	233a552bbe	remove definitions for consul.runtime... metrics - they're prepended with hostnames and won't init	2020-11-04 14:02:47 -08:00
Kit Patella	7f362b2d09	add definitions for key metrics. This will not build until we have the definitions patch to go-metrics	2020-11-02 15:01:00 -08:00
Daniel Nephin	09d62f1df0	lib/ttlcache: unexport key and additional godoc	2020-10-20 19:16:03 -04:00
Daniel Nephin	2601998766	lib/ttlcache: add a constant for NotIndexed	2020-10-20 19:10:20 -04:00
Daniel Nephin	0beaced90f	cache: fix a bug with Prepopulate Prepopulate was setting entry.Expiry.HeapIndex to 0. Previously this would result in a call to heap.Fix(0) which wasn't correct, but was also not really a problem because at worse it would re-notify. With the recent change to extract cachettl it was changed to call Update(idx), which would have updated the wrong entry. A previous commit removed the setting of entry.Expiry so that the HeapIndex would be reported as -1, and this commit adds a test and handles the -1 heap index.	2020-10-20 19:10:20 -04:00
Daniel Nephin	9d5b738cdb	lib/ttlcache: extract package from agent/cache	2020-10-20 19:10:20 -04:00
Kit Patella	40b9769b1f	Merge pull request #8877 from hashicorp/mkcp/telemetry/consul.api.http Add flag for disabling 1.9 metrics backwards compatibility and warnings when set to default	2020-10-08 13:22:37 -07:00
Matt Keeler	141eb60f06	Add per-agent reconnect timeouts (#8781 ) This allows for client agent to be run in a more stateless manner where they may be abruptly terminated and not expected to come back. If advertising a per-agent reconnect timeout using the advertise_reconnect_timeout configuration when that agent leaves, other agents will wait only that amount of time for the agent to come back before reaping it. This has the advantageous side effect of causing servers to deregister the node/services/checks for that agent sooner than if the global reconnect_timeout was used.	2020-10-08 15:02:19 -04:00
Kit Patella	328036dd37	add config flag to disable 1.9 metrics backwards compatibility. Add warnings on start and reload on default value	2020-10-07 17:12:52 -07:00
Daniel Nephin	40aac46cf4	lib/retry: Refactor to reduce the interface surface Reduce Jitter to one function Rename NewRetryWaiter Fix a bug in calculateWait where maxWait was applied before jitter, which would make it possible to wait longer than maxWait.	2020-10-04 18:12:42 -04:00
Daniel Nephin	7d82b21206	lib/retry: export fields The fields are only ever read by Waiter, and setting the fields makes the calling code read much better without having to create a bunch of constants that only ever get used once.	2020-10-04 17:43:02 -04:00
Daniel Nephin	0c7f9c72d7	lib/retry: extract a new package from lib	2020-10-04 17:43:01 -04:00
Pierre Cauchois	d620babae3	use errors.As() for wrapped ServerError	2020-09-24 19:23:48 +00:00
Pierre Cauchois	e70b5a33d8	ServerError type check before EOF string comparison	2020-09-19 01:59:04 +00:00
Pierre Cauchois	1d7b5bc5c0	remove t.Parallel()	2020-09-18 01:16:01 +00:00
Pierre Cauchois	736a04a473	Add unit tests for isErrEOF()	2020-09-17 21:43:04 +00:00
Pierre Cauchois	b35874e1a6	Use errors.Is() in IsErrEOF() IsErrEOF returns false when it should return true in a couple of cases: 1. if the error has been wrapped in another error (for example, if EOF is wrapped in an RPC error) 2. if the error has been created from an Error field in an RPC response (as it is the case in CallWithCodec in the net-rpc-msgpackrpc package for example)	2020-09-17 01:42:06 +00:00
Daniel Nephin	a520cf3ea7	testing: disable global metrics sink in tests This might be better handled by allowing configuration for the InMemSink interval and retail, and disabling the global. For now this is a smaller change to remove the goroutine leak caused by tests because go-metrics does not provide any way of shutting down the global goroutine.	2020-08-18 19:04:57 -04:00
Daniel Nephin	7d5f1ba6bd	Merge pull request #8176 from hashicorp/dnephin/add-linter-unparam-1 lint: add unparam linter and fix some of the issues	2020-06-25 15:34:48 -04:00
Daniel Nephin	07c1081d39	Fix a bunch of unparam lint issues	2020-06-24 13:00:14 -04:00
Matt Keeler	e395efdbdc	Add test to ensure the StopChannelContext works properly	2020-06-24 12:34:57 -04:00
Daniel Nephin	1ef8279ac9	Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4 ci: enable SA4006 staticcheck check and add ineffassign	2020-06-17 12:16:02 -04:00
Daniel Nephin	8753d1f1ba	ci: Add ineffsign linter And fix an additional ineffective assignment that was not caught by staticcheck	2020-06-16 17:32:50 -04:00
Daniel Nephin	97342de262	Merge pull request #8070 from hashicorp/dnephin/add-gofmt-simplify ci: Enable gofmt simplify	2020-06-16 17:18:38 -04:00
Matt Keeler	d994dc7b35	Agent Auto Configuration: Configuration Syntax Updates (#8003 )	2020-06-16 15:03:22 -04:00
Daniel Nephin	89d95561df	Enable gofmt simplify Code changes done automatically with 'gofmt -s -w'	2020-06-16 13:21:11 -04:00
Daniel Nephin	0fb5e53d14	decode: do not modify the source data in HookTranslateKeys This was causing a 'fatal error: concurrent map iteration and map write' with gateways	2020-06-15 14:22:41 -04:00
Daniel Nephin	dad8f29d4e	decode: Only recursively unslice when the target is an interface{}	2020-06-15 12:56:51 -04:00
Daniel Nephin	f613c919d2	decode: recursively unslice opaque config Also handle []interface{} in HookWeakDecodeFromSlice Without this change only the top level []map[string]interface{} will be unpacked as a single item. With this change any nested config will be unpacked.	2020-06-12 22:00:33 -04:00
Daniel Nephin	7b99d9a25d	config: add HookWeakDecodeFromSlice Currently opaque config blocks (config entries, and CA provider config) are modified by PatchSliceOfMaps, making it impossible for these opaque config sections to contain slices of maps. In order to fix this problem, any lazy-decoding of these blocks needs to support weak decoding of []map[string]interface{} to a struct type before PatchSliceOfMaps is replaces. This is necessary because these config blobs are persisted, and during an upgrade an older version of Consul could read one of the new configuration values, which would cause an error. To support the upgrade path, this commit first introduces the new hooks for weak decoding of []map[string]interface{} and uses them only in the lazy-decode paths. That way, in a future release, new style configuration will be supported by the older version of Consul. This decode hook has a number of advantages: 1. It no longer panics. It allows mapstructure to report the error 2. It no longer requires the user to declare which fields are slices of structs. It can deduce that information from the 'to' value. 3. It will make it possible to preserve opaque configuration, allowing for structured opaque config.	2020-06-08 17:05:09 -04:00
Daniel Nephin	e8a883e829	Replace goe/verify.Values with testify/require.Equal (#7993 ) * testing: replace most goe/verify.Values with require.Equal One difference between these two comparisons is that go/verify considers nil slices/maps to be equal to empty slices/maps, where as testify/require does not, and does not appear to provide any way to enable that behaviour. Because of this difference some expected values were changed from empty slices to nil slices, and some calls to verify.Values were left. * Remove github.com/pascaldekloe/goe/verify Reduce the number of assertion packages we use from 2 to 1	2020-06-02 12:41:25 -04:00
Daniel Nephin	e359b10f77	Merge pull request #7963 from hashicorp/dnephin/replace-lib-translate-keys Replace lib.TranslateKeys with a mapstructure decode hook	2020-05-27 16:51:26 -04:00
Daniel Nephin	8f939da431	config: use the new HookTranslateKeys instead of lib.TranslateKeys With the exception of CA provider config, which will be migrated at some later time.	2020-05-27 16:24:47 -04:00
Daniel Nephin	8dc52a56ea	config: add HookTranslteKeys This hook replaces lib.TranslateKeys and has a number of advantages: 1. Primarily, aliases for fields are defined on the field itself, making the aliases much easier to maintain, and more obvious to the reader. 2. TranslateKeys translation rules are not aware of structure. It could very easily incorrectly translate a key on one struct that was intended to be a translation rule for a completely different struct, leading to very hard to debug errors. The hook removes the need for the unexpected "translation rule is an empty string to indicate stop traversal" special case. 3. TranslateKeys attempts to duplicate a bunch of tree traversal logic that already exists in mapstructure. Using mapstructure for traversal removes the need to traverse the entire structure multiple times, and makes the behaviour more obvious to the reader. This change is being made to enable a future change of replacing PatchSliceOfMaps. TranslateKeys sits in between PatchSliceOfMaps and mapstructure.Decode, so it must be converted to a hook first, before PatchSliceOfMaps can be replaced by a decode hook.	2020-05-27 16:24:47 -04:00
R.B. Boyer	54c7f825d6	create lib/stringslice package (#7934 )	2020-05-27 11:47:32 -05:00
R.B. Boyer	813d69622e	agent: handle re-bootstrapping in a secondary datacenter when WAN federation via mesh gateways is configured (#7931 ) The main fix here is to always union the `primary-gateways` list with the list of mesh gateways in the primary returned from the replicated federation states list. This will allow any replicated (incorrect) state to be supplemented with user-configured (correct) state in the config file. Eventually the game of random selection whack-a-mole will pick a winning entry and re-replicate the latest federation states from the primary. If the user-configured state is actually the incorrect one, then the same eventual correct selection process will work in that case, too. The secondary fix is actually to finish making wanfed-via-mgws actually work as originally designed. Once a secondary datacenter has replicated federation states for the primary AND managed to stand up its own local mesh gateways then all of the RPCs from a secondary to the primary SHOULD go through two sets of mesh gateways to arrive in the consul servers in the primary (one hop for the secondary datacenter's mesh gateway, and one hop through the primary datacenter's mesh gateway). This was neglected in the initial implementation. While everything works, ideally we should treat communications that go around the mesh gateways as just provided for bootstrapping purposes. Now we heuristically use the success/failure history of the federation state replicator goroutine loop to determine if our current mesh gateway route is working as intended. If it is, we try using the local gateways, and if those don't work we fall back on trying the primary via the union of the replicated state and the go-discover configuration flags. This can be improved slightly in the future by possibly initializing the gateway choice to local on startup if we already have replicated state. This PR does not address that improvement. Fixes #7339	2020-05-27 11:31:10 -05:00
Kyle Havlovitz	04b6bd637a	Filter wildcard gateway services to match listener protocol This now requires some type of protocol setting in ingress gateway tests to ensure the services are not filtered out. - small refactor to add a max(x, y) function - Use internal configEntryTxn function and add MaxUint64 to lib	2020-05-06 15:06:13 -05:00
R.B. Boyer	032e0ae901	cli: fix usage of gzip.Reader to better detect corrupt snapshots during save/restore (#7697 )	2020-04-24 17:18:56 -05:00
R.B. Boyer	a7fb26f50f	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Matt Keeler	3e56f5c8b8	ACL enforcement for the agent/health/services endpoints (#7191 ) ACL enforcement for the agent/health/services endpoints	2020-01-31 11:16:24 -05:00

1 2 3

103 Commits