open-consul/agent
Aestek ff13518961 Improve blocking queries on services that do not exist (#4810)
## Background

When making a blocking query on a missing service (was never registered, or is not registered anymore) the query returns as soon as any service is updated.
On clusters with frequent updates (5~10 updates/s in our DCs) these queries virtually do not block, and clients with no protections againt this waste ressources on the agent and server side. Clients that do protect against this get updates later than they should because of the backoff time they implement between requests.

## Implementation

While reducing the number of unnecessary updates we still want :
* Clients to be notified as soon as when the last instance of a service disapears.
* Clients to be notified whenever there's there is an update for the service.
* Clients to be notified as soon as the first instance of the requested service is added.

To reduce the number of unnecessary updates we need to block when a request to a missing service is made. However in the following case :

1. Client `client1` makes a query for service `foo`, gets back a node and X-Consul-Index 42
2. `foo` is unregistered 
3. `client1`  makes a query for `foo` with `index=42` -> `foo` does not exist, the query blocks and `client1` is not notified of the change on `foo` 

We could store the last raft index when each service was last alive to know wether we should block on the incoming query or not, but that list could grow indefinetly. 
We instead store the last raft index when a service was unregistered and use it when a query targets a service that does not exist. 
When a service `srv` is unregistered this "missing service index" is always greater than any X-Consul-Index held by the clients while `srv` was up, allowing us to immediatly notify them.

1. Client `client1` makes a query for service `foo`, gets back a node and `X-Consul-Index: 42`
2. `foo` is unregistered, we set the "missing service index" to 43 
3. `client1` makes a blocking query for `foo` with `index=42` -> `foo` does not exist, we check against the "missing service index" and return immediatly with `X-Consul-Index: 43`
4. `client1` makes a blocking query for `foo` with `index=43` -> we block
5. Other changes happen in the cluster, but foo still doesn't exist and "missing service index" hasn't changed, the query is still blocked
6. `foo` is registered again on index 62 -> `foo` exists and its index is greater than 43, we unblock the query
2019-01-11 09:26:14 -05:00
..
ae Add -sidecar-for and new /agent/service/:service_id endpoint (#4691) 2018-10-10 16:55:34 +01:00
cache connect: agent leaf cert caching improvements (#5091) 2019-01-10 12:46:11 +00:00
cache-types cache: Pass through wait query param to the cache.Get (#5203) 2019-01-10 11:23:37 -05:00
checks Add -sidecar-for and new /agent/service/:service_id endpoint (#4691) 2018-10-10 16:55:34 +01:00
config [Security] Allow blocking Write endpoints on Agent using Network Addresses (#4719) 2019-01-10 09:27:26 -05:00
connect connect: agent leaf cert caching improvements (#5091) 2019-01-10 12:46:11 +00:00
consul Improve blocking queries on services that do not exist (#4810) 2019-01-11 09:26:14 -05:00
debug fix comment typos (#4890) 2018-11-02 12:00:39 -05:00
exec
local [Fix] Services sometimes not being synced with acl_enforce_version_8 = false (#4771) 2019-01-04 10:01:50 -05:00
metadata New ACLs (#4791) 2018-10-19 12:04:07 -04:00
mock
pool
proxycfg merge feedback: fix typos; actually use deliverLatest added previously but not plumbed in 2018-10-10 16:55:34 +01:00
proxyprocess Add -sidecar-for and new /agent/service/:service_id endpoint (#4691) 2018-10-10 16:55:34 +01:00
router
structs connect: agent leaf cert caching improvements (#5091) 2019-01-10 12:46:11 +00:00
systemd
token re-add Connect multi-dc config changes 2018-10-19 08:41:03 -07:00
xds New ACLs (#4791) 2018-10-19 12:04:07 -04:00
acl.go New ACLs (#4791) 2018-10-19 12:04:07 -04:00
acl_endpoint.go Adds documentation for the new ACL APIs (#4851) 2018-10-31 15:11:51 -07:00
acl_endpoint_legacy.go New ACLs (#4791) 2018-10-19 12:04:07 -04:00
acl_endpoint_legacy_test.go New ACLs (#4791) 2018-10-19 12:04:07 -04:00
acl_endpoint_test.go New ACLs (#4791) 2018-10-19 12:04:07 -04:00
acl_test.go New ACLs (#4791) 2018-10-19 12:04:07 -04:00
agent.go connect: agent leaf cert caching improvements (#5091) 2019-01-10 12:46:11 +00:00
agent_endpoint.go cache: Pass through wait query param to the cache.Get (#5203) 2019-01-10 11:23:37 -05:00
agent_endpoint_test.go cache: Pass through wait query param to the cache.Get (#5203) 2019-01-10 11:23:37 -05:00
agent_test.go agent: add default weights to service in local state to prevent AE churn (#5126) 2019-01-08 10:13:49 +00:00
bindata_assetfs.go Release v1.4.0 2018-11-14 22:37:47 +00:00
blacklist.go
blacklist_test.go
catalog_endpoint.go Support multiple tags for health and catalog http api endpoints (#4717) 2018-10-11 12:50:05 +01:00
catalog_endpoint_test.go Implement /v1/agent/health/service/<service name> endpoint (#3551) 2019-01-07 09:39:23 -05:00
check.go
config.go
connect_auth.go connect: remove additional trust-domain validation (#4934) 2018-11-12 20:20:12 +00:00
connect_ca_endpoint.go Fix CA pruning when CA config uses string durations. (#4669) 2018-09-13 15:43:00 +01:00
connect_ca_endpoint_test.go Fix more unstable tests in agent and command 2018-09-12 14:49:27 +01:00
coordinate_endpoint.go
coordinate_endpoint_test.go Fix more unstable tests in agent and command 2018-09-12 14:49:27 +01:00
dns.go Avoid to have infinite recursion in DNS lookups when resolving CNAMEs (#4918) 2019-01-07 16:53:54 -05:00
dns_test.go Avoid to have infinite recursion in DNS lookups when resolving CNAMEs (#4918) 2019-01-07 16:53:54 -05:00
enterprise_delegate_oss.go
event_endpoint.go Fixes memory leak when blocking on /event/list (#4482) 2018-08-02 14:54:48 +01:00
event_endpoint_test.go New ACLs (#4791) 2018-10-19 12:04:07 -04:00
health_endpoint.go Support multiple tags for health and catalog http api endpoints (#4717) 2018-10-11 12:50:05 +01:00
health_endpoint_test.go Add Proxy Upstreams to Service Definition (#4639) 2018-10-10 16:55:34 +01:00
http.go [Security] Allow blocking Write endpoints on Agent using Network Addresses (#4719) 2019-01-10 09:27:26 -05:00
http_oss.go Implement /v1/agent/health/service/<service name> endpoint (#3551) 2019-01-07 09:39:23 -05:00
http_oss_test.go [Security] Allow blocking Write endpoints on Agent using Network Addresses (#4719) 2019-01-10 09:27:26 -05:00
http_test.go [Security] Allow blocking Write endpoints on Agent using Network Addresses (#4719) 2019-01-10 09:27:26 -05:00
intentions_endpoint.go agent: 400 error on invalid UUID format, api handles errors properly 2018-06-27 07:40:06 +02:00
intentions_endpoint_test.go agent: 400 error on invalid UUID format, api handles errors properly 2018-06-27 07:40:06 +02:00
keyring.go
keyring_test.go
kvs_endpoint.go
kvs_endpoint_test.go Improve reliability of tests with TestAgent (#4525) 2018-08-14 12:08:33 -04:00
notify.go Fixes memory leak when blocking on /event/list (#4482) 2018-08-02 14:54:48 +01:00
notify_test.go Fixes memory leak when blocking on /event/list (#4482) 2018-08-02 14:54:48 +01:00
operator_endpoint.go
operator_endpoint_test.go Fix unstable tests in agent, api, and command/watch 2018-09-10 16:58:53 +01:00
prepared_query_endpoint.go Support Agent Caching for Service Discovery Results (#4541) 2018-10-10 16:55:34 +01:00
prepared_query_endpoint_test.go Support Agent Caching for Service Discovery Results (#4541) 2018-10-10 16:55:34 +01:00
remote_exec.go
remote_exec_test.go Fix unstable tests in agent, api, and command/watch 2018-09-10 16:58:53 +01:00
retry_join.go agent: configure k8s go-discover 2018-09-05 13:38:13 -07:00
retry_join_test.go Vendoring update for go-discover. (#4412) 2018-07-25 16:21:04 -07:00
session_endpoint.go
session_endpoint_test.go Avoid to have infinite recursion in DNS lookups when resolving CNAMEs (#4918) 2019-01-07 16:53:54 -05:00
sidecar_service.go merge feedback: fix typos; actually use deliverLatest added previously but not plumbed in 2018-10-10 16:55:34 +01:00
sidecar_service_test.go [Security] Add finer control over script checks (#4715) 2018-10-11 13:22:11 +01:00
signal_unix.go cli: forward SIGTERM to child process of 'lock' and 'watch' subcommands (#4737) 2018-10-02 15:57:21 -05:00
signal_windows.go cli: forward SIGTERM to child process of 'lock' and 'watch' subcommands (#4737) 2018-10-02 15:57:21 -05:00
snapshot_endpoint.go
snapshot_endpoint_test.go
status_endpoint.go
status_endpoint_test.go Fixed another list of unstable unit tests in travis (#4915) 2018-11-20 11:27:26 +00:00
testagent.go connect: agent leaf cert caching improvements (#5091) 2019-01-10 12:46:11 +00:00
testagent_test.go
translate_addr.go
txn_endpoint.go
txn_endpoint_test.go Improve reliability of tests with TestAgent (#4525) 2018-08-14 12:08:33 -04:00
ui_endpoint.go agent: ExternalSources instead of Meta 2018-09-07 10:06:55 -07:00
ui_endpoint_test.go Added SOA configuration for DNS settings. (#4714) 2018-10-10 15:50:56 -04:00
user_event.go
user_event_test.go New ACLs (#4791) 2018-10-19 12:04:07 -04:00
util.go cli: forward SIGTERM to child process of 'lock' and 'watch' subcommands (#4737) 2018-10-02 15:57:21 -05:00
util_test.go cli: forward SIGTERM to child process of 'lock' and 'watch' subcommands (#4737) 2018-10-02 15:57:21 -05:00
watch_handler.go
watch_handler_test.go