open-consul/agent
R.B. Boyer 813d69622e
agent: handle re-bootstrapping in a secondary datacenter when WAN federation via mesh gateways is configured (#7931)
The main fix here is to always union the `primary-gateways` list with
the list of mesh gateways in the primary returned from the replicated
federation states list. This will allow any replicated (incorrect) state
to be supplemented with user-configured (correct) state in the config
file. Eventually the game of random selection whack-a-mole will pick a
winning entry and re-replicate the latest federation states from the
primary. If the user-configured state is actually the incorrect one,
then the same eventual correct selection process will work in that case,
too.

The secondary fix is actually to finish making wanfed-via-mgws actually
work as originally designed. Once a secondary datacenter has replicated
federation states for the primary AND managed to stand up its own local
mesh gateways then all of the RPCs from a secondary to the primary
SHOULD go through two sets of mesh gateways to arrive in the consul
servers in the primary (one hop for the secondary datacenter's mesh
gateway, and one hop through the primary datacenter's mesh gateway).
This was neglected in the initial implementation. While everything
works, ideally we should treat communications that go around the mesh
gateways as just provided for bootstrapping purposes.

Now we heuristically use the success/failure history of the federation
state replicator goroutine loop to determine if our current mesh gateway
route is working as intended. If it is, we try using the local gateways,
and if those don't work we fall back on trying the primary via the union
of the replicated state and the go-discover configuration flags.

This can be improved slightly in the future by possibly initializing the
gateway choice to local on startup if we already have replicated state.
This PR does not address that improvement.

Fixes #7339
2020-05-27 11:31:10 -05:00
..
ae agent: ensure node info sync and full sync. (#7189) 2020-02-06 15:30:58 +01:00
agentpb server: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#7419) 2020-03-10 11:15:22 -05:00
cache agent/cache: remove error return from fetch 2020-04-17 11:55:01 -04:00
cache-types Fix a number of problems found by staticcheck 2020-05-19 16:50:14 -04:00
checks grpc: use default resolver scheme for grpc dialing (#7617) 2020-05-20 22:26:26 +02:00
config Allow to restrict servers that can join a given Serf Consul cluster. (#7628) 2020-05-20 11:31:19 +02:00
connect ci: Run all connect/ca tests from the integration suite 2020-03-24 15:22:01 -04:00
consul agent: handle re-bootstrapping in a secondary datacenter when WAN federation via mesh gateways is configured (#7931) 2020-05-27 11:31:10 -05:00
debug fix comment typos (#4890) 2018-11-02 12:00:39 -05:00
exec fix go vet issue 2017-10-25 19:30:35 +02:00
local tests: fix unstable test TestAgentAntiEntropy_Checks. (#7594) 2020-05-14 09:54:49 +02:00
metadata wan federation via mesh gateways (#6884) 2020-03-09 15:59:02 -05:00
mock Sync of OSS changes to support namespaces (#6909) 2019-12-09 21:26:41 -05:00
pool rpc: oss changes for network area connection pooling (#7735) 2020-04-30 22:12:17 +02:00
proxycfg Standardize support for Tagged and BindAddresses in Ingress Gateways (#7924) 2020-05-21 09:08:12 -05:00
router Fix a number of problems found by staticcheck 2020-05-19 16:50:14 -04:00
structs Add unconvert linter 2020-05-12 13:47:25 -04:00
systemd agent: notify systemd after JoinLAN (#2121) 2017-06-21 06:43:55 +02:00
token Updates to allow for using an enterprise specific token as the agents token 2020-04-28 09:44:26 -04:00
xds connect: fix endpoints clusterName when using cluster escape hatch (#7319) 2020-05-26 10:57:22 +02:00
acl.go Fix identity resolution on clients and in secondary dcs (#7862) 2020-05-13 13:00:08 -04:00
acl_endpoint.go test: move some test helpers over from enterprise (#7754) 2020-05-01 14:52:15 -05:00
acl_endpoint_legacy.go Use encoding/json as JSON decoder instead of mapstructure (#6680) 2019-10-29 11:13:36 -07:00
acl_endpoint_legacy_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
acl_endpoint_test.go acl: add auth method for JWTs (#7846) 2020-05-11 20:59:29 -05:00
acl_test.go Fix identity resolution on clients and in secondary dcs (#7862) 2020-05-13 13:00:08 -04:00
agent.go Stop all watches before shuting down anything dring shutdown. (#7526) 2020-05-26 10:01:49 +02:00
agent_endpoint.go Fix a number of problems found by staticcheck 2020-05-19 16:50:14 -04:00
agent_endpoint_test.go tests: added unit test to ensure watches are not re-triggered on consul reload (#7449) 2020-05-20 12:38:29 +02:00
agent_oss.go Some boilerplate to allow for ACL Bootstrap disabling configurability 2020-04-28 09:42:46 -04:00
agent_test.go Add unconvert linter 2020-05-12 13:47:25 -04:00
bindata_assetfs.go update bindata_assetfs.go 2020-02-11 15:19:16 +00:00
blacklist.go Adds the ability to blacklist specific HTTP endpoints. (#3252) 2017-07-10 13:51:25 -07:00
blacklist_test.go Adds the ability to blacklist specific HTTP endpoints. (#3252) 2017-07-10 13:51:25 -07:00
catalog_endpoint.go Catalog + Namespace OSS changes. (#7219) 2020-02-10 10:40:44 -05:00
catalog_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
check.go Sync of OSS changes to support namespaces (#6909) 2019-12-09 21:26:41 -05:00
config_endpoint.go Small refactoring to move meta parsing into the switch statement (#7170) 2020-01-29 19:12:48 -05:00
config_endpoint_test.go Expect default enterprise metadata in gateway tests (#7664) 2020-04-20 09:02:35 -05:00
connect_auth.go Intentions ACL enforcement updates (#7028) 2020-01-13 15:51:40 -05:00
connect_ca_endpoint.go connect: Add AWS PCA provider (#6795) 2019-11-21 17:40:29 +00:00
connect_ca_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
coordinate_endpoint.go Use encoding/json as JSON decoder instead of mapstructure (#6680) 2019-10-29 11:13:36 -07:00
coordinate_endpoint_test.go Fix a number of problems found by staticcheck 2020-05-19 16:50:14 -04:00
discovery_chain_endpoint.go Remove deadcode 2020-04-22 16:48:28 -04:00
discovery_chain_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
dns.go Ingress Gateways for TCP services (#7509) 2020-04-16 14:00:48 -07:00
dns_oss.go Sync of OSS changes to support namespaces (#6909) 2019-12-09 21:26:41 -05:00
dns_test.go Fix a number of problems found by staticcheck 2020-05-19 16:50:14 -04:00
enterprise_delegate_oss.go Update to use a consulent build tag instead of just ent (#5759) 2019-05-01 11:11:27 -04:00
event_endpoint.go Allow users to configure either unstructured or JSON logging (#7130) 2020-01-28 17:50:41 -06:00
event_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
federation_state_endpoint.go wan federation via mesh gateways (#6884) 2020-03-09 15:59:02 -05:00
health_endpoint.go Catalog + Namespace OSS changes. (#7219) 2020-02-10 10:40:44 -05:00
health_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
http.go Handle error from template.Execute 2020-05-19 16:50:14 -04:00
http_decode_test.go Remove deadcode 2020-04-22 16:48:28 -04:00
http_oss.go http: migrate from instrumentation in s.wrap() to an s.enterpriseHandler() 2020-05-13 15:47:05 -07:00
http_oss_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
http_register.go Gateway Services Nodes UI Endpoint (#7685) 2020-05-11 11:35:17 -06:00
http_test.go Fix a number of problems found by staticcheck 2020-05-19 16:50:14 -04:00
intentions_endpoint.go Fix a couple bugs regarding intentions with namespaces (#7169) 2020-01-29 17:30:38 -05:00
intentions_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
keyring.go agent: sensible keyring error (#7272) 2020-02-13 20:35:09 +01:00
keyring_test.go Rename NewTestAgentWithFields to StartTestAgent 2020-03-31 17:14:55 -04:00
kvs_endpoint.go docs: add docs for kv_max_value_size (#7405) 2020-03-09 11:13:40 +01:00
kvs_endpoint_test.go Fix a number of problems found by staticcheck 2020-05-19 16:50:14 -04:00
notify.go Fixes memory leak when blocking on /event/list (#4482) 2018-08-02 14:54:48 +01:00
notify_test.go Fixes memory leak when blocking on /event/list (#4482) 2018-08-02 14:54:48 +01:00
operator_endpoint.go Use encoding/json as JSON decoder instead of mapstructure (#6680) 2019-10-29 11:13:36 -07:00
operator_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
prepared_query_endpoint.go Add support for dual stack IPv4/IPv6 network (#6640) 2020-01-17 09:54:17 -05:00
prepared_query_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
remote_exec.go Allow users to configure either unstructured or JSON logging (#7130) 2020-01-28 17:50:41 -06:00
remote_exec_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
retry_join.go wan federation via mesh gateways (#6884) 2020-03-09 15:59:02 -05:00
retry_join_test.go wan federation via mesh gateways (#6884) 2020-03-09 15:59:02 -05:00
service_checks_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
service_manager.go Enable CLI to register terminating gateways (#7500) 2020-03-26 10:20:56 -06:00
service_manager_test.go Rename NewTestAgentWithFields to StartTestAgent 2020-03-31 17:14:55 -04:00
session_endpoint.go Fix session backwards incompatibility with 1.6.x and earlier. 2020-03-05 15:34:55 -05:00
session_endpoint_test.go Fix a number of problems found by staticcheck 2020-05-19 16:50:14 -04:00
sidecar_service.go wan federation via mesh gateways (#6884) 2020-03-09 15:59:02 -05:00
sidecar_service_test.go Rename NewTestAgentWithFields to StartTestAgent 2020-03-31 17:14:55 -04:00
signal_unix.go cli: forward SIGTERM to child process of 'lock' and 'watch' subcommands (#4737) 2018-10-02 15:57:21 -05:00
signal_windows.go cli: forward SIGTERM to child process of 'lock' and 'watch' subcommands (#4737) 2018-10-02 15:57:21 -05:00
snapshot_endpoint.go Remove SnapshotRPC passthrough 2020-04-13 12:32:57 -04:00
snapshot_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
status_endpoint.go Allow forwarding of some status RPCs (#6198) 2019-07-25 14:26:22 -04:00
status_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
testagent.go Rename NewTestAgentWithFields to StartTestAgent 2020-03-31 17:14:55 -04:00
testagent_test.go New config parser, HCL support, multiple bind addrs (#3480) 2017-09-25 11:40:42 -07:00
translate_addr.go Add the v1/catalog/node-services/:node endpoint (#7115) 2020-01-24 09:27:25 -05:00
txn_endpoint.go docs: add docs for kv_max_value_size (#7405) 2020-03-09 11:13:40 +01:00
txn_endpoint_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
ui_endpoint.go Gateway Services Nodes UI Endpoint (#7685) 2020-05-11 11:35:17 -06:00
ui_endpoint_test.go Fix a number of problems found by staticcheck 2020-05-19 16:50:14 -04:00
user_event.go agent: ensure that we always use the same settings for msgpack (#7245) 2020-02-07 15:50:24 -06:00
user_event_test.go Remove name from NewTestAgent 2020-03-31 16:13:44 -04:00
util.go agent: ensure that we always use the same settings for msgpack (#7245) 2020-02-07 15:50:24 -06:00
util_test.go Fixed unstable test TestForwardSignals() 2020-04-03 14:23:03 +02:00
watch_handler.go Allow users to configure either unstructured or JSON logging (#7130) 2020-01-28 17:50:41 -06:00
watch_handler_test.go Allow users to configure either unstructured or JSON logging (#7130) 2020-01-28 17:50:41 -06:00