Commit Graph

105 Commits

Author SHA1 Message Date
Daniel Nephin 4ef9578a07
Merge pull request #9703 from pierresouchay/streaming_tags_and_case_insensitive
Streaming filter tags + case insensitive lookups for Service Names
2021-02-26 12:06:26 -05:00
Daniel Nephin c40d063a0e structs: rename EnterpriseMeta constructor
To match the Go convention.
2021-02-16 14:45:43 -05:00
Daniel Nephin 0683964519 streaming: move ServiceTag and NodeMetaFiltering to the cache-entry
So that all the client side filtering is in the same place. Previously
only the bexpr filter was in the cache-entry.

Also makes a small change to the filtering so that instead of rebuilding
slices of items, the filtering can return a bool to determine if the
event payload is saved or not.
2021-02-11 20:20:09 -05:00
Daniel Nephin bd122bb9f5 streaming: double the cache TTL
10 minutes is the default blocking query timeout. Using the same value results in us hitting
the expired cache entry bug frequently. By extending this TTL we at least mitigate the problem.

The underlying bug still needs to be fixed.
2021-02-09 14:36:26 -05:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
Pierre Souchay 09673426e3 Applied suggestions from @dnephin
* Renamed `cachedHealResultSorter` into `sortCheckServiceNodes`
* Use `<` instead of `strings.Compare`
* Single line comparison in unit test
2020-11-25 21:40:51 +01:00
Pierre Souchay 9239df6dbd [Streaming] Predictable order for results of /health/service/:serviceName to mimic memdb
This ensures the result is consitent with/witout streaming

Will partially fix #9239
2020-11-20 16:23:35 +01:00
Daniel Nephin 78260952b0 cache-type: use namespace in tests
to verify that the namespace is passed through correctly to the server.
2020-10-30 15:07:04 -04:00
Daniel Nephin 8da30fcb9a subscribe: set the request namespace 2020-10-30 14:34:04 -04:00
Daniel Nephin c106d94742 proto: remove Event.Key field
The field is never used, and the value is available from the payload.
2020-10-28 16:33:00 -04:00
Daniel Nephin ab43236f86 proto: remove Event.Namespace field
All events are part of a single Topic, so we don't need this field.
2020-10-28 16:33:00 -04:00
Daniel Nephin 312a3bb9b1 streaming: apply filter to a single item
Instead of the whole map. This should save a lot of time performing reflecting on a large map.
The filter does not change, so there is no reason to re-apply it to older entries.
2020-10-19 18:24:02 -04:00
Daniel Nephin dd0e8d42c4
Merge pull request #8825 from hashicorp/streaming/add-config
streaming: add config and docs
2020-10-09 14:33:58 -04:00
Daniel Nephin 6a8eac77af cache-types: skip tests with races 2020-10-08 20:15:13 -04:00
Daniel Nephin 3483e2fb89 streaming: Use a shorter LastGetTTL for the cache 2020-10-08 12:11:20 -04:00
Daniel Nephin dbfa6530f1 streaming: store services with a unique ID that includes namespace 2020-10-06 16:54:56 -04:00
Daniel Nephin 83401194ab streaming: improve godoc for cache-type
And fix a bug where any error that implemented the temporary interface was considered
a temporary error, even when the method would return false.
2020-10-06 13:52:02 -04:00
Daniel Nephin f857aef4a8 submatview: add a test for handling of NewSnapshotToFollow
Also add some godoc
Rename some vars and functions
Fix a data race in the new cache test for entry closing.
2020-10-06 13:22:02 -04:00
Daniel Nephin 58cf09247b submatview: refactor Materializer
Refactor of Materializer.Run
Use handlers to manage state in Materializer
Rename Materializer receiver
rename m.l to m.lock, and flip some conditionals to remove the negative.
Improve godoc, rename Deps, move resetErr, and pass err into notifyUpdate
Update for NewSnapshotToFollow events
Refactor to move context cancel out of Materializer
2020-10-06 13:22:02 -04:00
Daniel Nephin e8c7881196 submatview: Move the 'use materialize from result.State' logic
No need to do all this other work if we have one already.

This logic moved closer to this call site 3 times during the process
of refactoring.
2020-10-06 13:22:02 -04:00
Daniel Nephin 3bb252888b submatview: Move Materializer to submatview package 2020-10-06 13:22:02 -04:00
Daniel Nephin d24e243f70 submatview: Refactor MaterializeView
Replace InitFilter with Reset.
Removes the need to store a fatalErr and the cache-type, and removes the need to recreate the filter
each time.
Pass dependencies into MaterializedView.
Remove context from MaterializedView.
Rename state to view.
Rename MaterialziedView to Materialzier.
Rename to NewMaterializer
Pass in retry.Waiter
2020-10-06 13:22:02 -04:00
Daniel Nephin 50846a96ff cache-types: Update Streaming health cache-type
To use latest protobuf types
2020-10-06 13:22:02 -04:00
Daniel Nephin e5d37bdf23 agent/cache: Add cache-type and materialized view for streaming health
Extracted from d97412ce4c399a35b41bbdae2716f0e32dce80bf

Co-authored-by: Paul Banks <banks@banksco.de>
2020-10-06 13:21:57 -04:00
freddygv 43efb4809c Merge master 2020-09-14 16:17:43 -06:00
freddygv 4ac644f401 Fix test build 2020-08-06 11:31:56 -06:00
Daniel Nephin 21fa99a83b Return nil value on error.
The main bug was fixed in cb050b280ceb4186de765118611a7a92d8158c3f, but the return value of 'result' is still misleading.
Change the return value to nil to make the code more clear.
2020-08-05 13:10:17 -04:00
freddygv 94d1f0a310 end to end changes to pass gatewayservices to /ui/services/ 2020-07-30 10:21:11 -06:00
Matt Keeler 2ec4e46eb2
Default Cache rate limiting options in New
Also get rid of the TestCache helper which was where these defaults were happening previously.
2020-07-28 12:34:35 -04:00
Matt Keeler 8df112526d
Fix some broken code in master
There were several PRs that while all passed CI independently, when they all got merged into the same branch caused compilation errors in test code.

The main changes that caused issues where changing agent/cache.Cache.New to require a concrete options struct instead of a pointer. This broke the cert monitor tests and the catalog_list_services_test.go. Another change was made to unembed the http.Server from the agent.HTTPServer struct. That coupled with another change to add a test to ensure cache rate limiting coming from HTTP requests was working as expected caused compilation failures.
2020-07-28 09:50:10 -04:00
Matt Keeler 6d94900cd7
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.
2020-07-21 12:19:25 -04:00
Daniel Nephin 797abe1f00 agent/cache: Use AllowNotModifiedResponse in CatalogListServices
Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-14 18:58:20 -04:00
Matt Keeler e9e88e4527
Initialize the agent leaf cert cache result with a state to prevent unnecessary second certificate signing 2020-06-30 09:59:07 -04:00
Matt Keeler fa42d9b34f
Fix auto_encrypt IP/DNS SANs
The initial auto encrypt CSR wasn’t containing the user supplied IP and DNS SANs. This fixes that. Also We were configuring a default :: IP SAN. This should be ::1 instead and was fixed.
2020-06-30 09:59:07 -04:00
Daniel Nephin 1ef8279ac9
Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-06-17 12:16:02 -04:00
Daniel Nephin 89d95561df Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Daniel Nephin 5f24171f13 ci: enable SA4006 staticcheck check
And fix the 'value not used' issues.

Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
2020-06-16 13:10:11 -04:00
Matt Keeler 976f922abf
Make the Agent Cache more Context aware (#8092)
Blocking queries issues will still be uncancellable (that cannot be helped until we get rid of net/rpc). However this makes it so that if calling getWithIndex (like during a cache Notify go routine) we can cancell the outer routine. Previously it would keep issuing more blocking queries until the result state actually changed.
2020-06-15 11:01:25 -04:00
freddygv 1e7e716742 Move compound service names to use ServiceName type 2020-06-12 13:47:43 -06:00
freddygv 806b1fb608 Move GatewayServices out of Internal 2020-06-12 13:46:47 -06:00
Daniel Nephin 1cdfc4f290 ci: Enabled SA2002 staticcheck check
And handle errors in the main test goroutine
2020-06-05 17:50:11 -04:00
Daniel Nephin 545bd766e7 Fix a number of problems found by staticcheck
Some of these problems are minor (unused vars), but others are real bugs (ignored errors).

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2020-05-19 16:50:14 -04:00
Freddy ebbb234ecb
Gateway Services Nodes UI Endpoint (#7685)
The endpoint supports queries for both Ingress Gateways and Terminating Gateways. Used to display a gateway's linked services in the UI.
2020-05-11 11:35:17 -06:00
Chris Piraino 2a10984efb Add test for adding DNSSAN for ConnectCALeaf cache type 2020-05-06 15:12:02 -05:00
Kyle Havlovitz bd6bb3bf2d Add TLS option and DNS SAN support to ingress config
xds: Only set TLS context for ingress listener when requested
2020-05-06 15:12:02 -05:00
Freddy f5c1e5268b
TLS Origination for Terminating Gateways (#7671) 2020-04-27 16:25:37 -06:00
Daniel Nephin 1251c01b73 agent/cache: Make all cache options RegisterOptions
Previously the SupportsBlocking option was specified by a method on the
type, and all the other options were specified from RegisterOptions.

This change moves RegisterOptions to a method on the type, and moves
SupportsBlocking into the options struct.

Currently there are only 2 cache-types. So all cache-types can implement
this method by embedding a struct with those predefined values. In the
future if a cache type needs to be registered more than once with different
options it can remove the embedded type and implement the method in a way
that allows for paramaterization.
2020-04-16 18:56:34 -04:00
Kyle Havlovitz 6a5eba63ab
Ingress Gateways for TCP services (#7509)
* Implements a simple, tcp ingress gateway workflow

This adds a new type of gateway for allowing Ingress traffic into Connect from external services.

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-04-16 14:00:48 -07:00
sasha 8afa406177
add DNSSAN and IPSAN to cache key (#7597) 2020-04-15 10:11:11 -05:00
R.B. Boyer a7fb26f50f
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00