Commit Graph

53 Commits

Author SHA1 Message Date
Mark Anderson ed3e42296d Fixup acl.EnterpriseMeta
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-04-05 15:11:49 -07:00
R.B. Boyer e9230e93d8
xds: adding control of the mesh-wide min/max TLS versions and cipher suites from the mesh config entry (#12601)
- `tls.incoming`: applies to the inbound mTLS targeting the public
  listener on `connect-proxy` and `terminating-gateway` envoy instances

- `tls.outgoing`: applies to the outbound mTLS dialing upstreams from
  `connect-proxy` and `ingress-gateway` envoy instances

Fixes #11966
2022-03-30 13:43:59 -05:00
freddygv 7fba7456ec Fix race of upstreams with same passthrough ip
Due to timing, a transparent proxy could have two upstreams to dial
directly with the same address.

For example:
- The orders service can dial upstreams shipping and payment directly.
- An instance of shipping at address 10.0.0.1 is deregistered.
- Payments is scaled up and scheduled to have address 10.0.0.1.
- The orders service receives the event for the new payments instance
before seeing the deregistration for the shipping instance. At this
point two upstreams have the same passthrough address and Envoy will
reject the listener configuration.

To disambiguate this commit considers the Raft index when storing
passthrough addresses. In the example above, 10.0.0.1 would only be
associated with the newer payments service instance.
2022-02-10 17:01:57 -07:00
freddygv d5a2eb677f Ensure passthrough addresses get cleaned up
Transparent proxies can set up filter chains that allow direct
connections to upstream service instances. Services that can be dialed
directly are stored in the PassthroughUpstreams map of the proxycfg
snapshot.

Previously these addresses were not being cleaned up based on new
service health data. The list of addresses associated with an upstream
service would only ever grow.

As services scale up and down, eventually they will have instances
assigned to an IP that was previously assigned to a different service.
When IP addresses are duplicated across filter chain match rules the
listener config will be rejected by Envoy.

This commit updates the proxycfg snapshot management so that passthrough
addresses can get cleaned up when no longer associated with a given
upstream.

There is still the possibility of a race condition here where due to
timing an address is shared between multiple passthrough upstreams.
That concern is mitigated by #12195, but will be further addressed
in a follow-up.
2022-02-10 17:01:57 -07:00
R.B. Boyer 05c7373a28 bulk rewrite using this script
set -euo pipefail

    unset CDPATH

    cd "$(dirname "$0")"

    for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== require: $f ==="
        sed -i '/require := require.New(t)/d' $f
        # require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
        # require.XXX(tblah) but not require.XXX(t, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
        # require.XXX(rblah) but not require.XXX(r, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
        gofmt -s -w $f
    done

    for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== assert: $f ==="
        sed -i '/assert := assert.New(t)/d' $f
        # assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
        # assert.XXX(tblah) but not assert.XXX(t, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
        # assert.XXX(rblah) but not assert.XXX(r, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
        gofmt -s -w $f
    done
2022-01-20 10:46:23 -06:00
R.B. Boyer baf886c6f3
proxycfg: introduce explicit UpstreamID in lieu of bare string (#12125)
The gist here is that now we use a value-type struct proxycfg.UpstreamID
as the map key in ConfigSnapshot maps where we used to use "upstream
id-ish" strings. These are internal only and used just for bidirectional
trips through the agent cache keyspace (like the discovery chain target
struct).

For the few places where the upstream id needs to be projected into xDS,
that's what (proxycfg.UpstreamID).EnvoyID() is for. This lets us ALWAYS
inject the partition and namespace into these things without making
stuff like the golden testdata diverge.
2022-01-20 10:12:04 -06:00
Dhia Ayachi 5f6bf369af
reset `coalesceTimer` to nil as soon as the event is consumed (#11924)
* reset `coalesceTimer` to nil as soon as the event is consumed

* add change log

* refactor to add relevant test.

* fix linter

* Apply suggestions from code review

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* remove non needed check

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-01-05 12:17:47 -05:00
R.B. Boyer a0156785dd
various partition related todos (#11822) 2021-12-13 11:43:33 -06:00
freddygv ce43e8cf99 Store GatewayKey in proxycfg snapshot for re-use 2021-11-01 13:58:53 -06:00
freddygv 3966677aaf Finish removing useInDatacenter 2021-10-26 23:36:01 -06:00
freddygv 7927a97c2f Fixup manager tests 2021-09-15 17:24:05 -06:00
Dhia Ayachi 96d7842118
partition dicovery chains (#10983)
* partition dicovery chains

* fix default partition for OSS
2021-09-07 16:29:32 -04:00
freddygv 1f192eb7d9 Fixup proxy config test fixtures
- The TestNodeService helper created services with the fixed name "web",
and now that name is overridable.

- The discovery chain snapshot didn't have prepared query endpoints so
the endpoints tests were missing data for prepared queries
2021-08-20 17:38:57 -06:00
R.B. Boyer 61f1c01b83
agent: ensure that most agent behavior correctly respects partition configuration (#10880) 2021-08-19 15:09:42 -05:00
Daniel Nephin 7c865d03ac proxycfg: Lookup the agent token as a default
When no ACL token is provided with the service registration.
2021-08-12 15:51:34 -04:00
Daniel Nephin d189524e71 proxycfg: Add a test to show the bug
When a token is not provided at registration, the agent token is not being used.
2021-08-12 15:47:59 -04:00
R.B. Boyer 62ac98b564
agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669) 2021-07-22 13:20:45 -05:00
Freddy 61ae2995b7
Add flag for transparent proxies to dial individual instances (#10329) 2021-06-09 14:34:17 -06:00
Mark Anderson 626b27a874 Continue working through proxy and agent
Rework/listeners, rename makeListener

Refactor, tests pass

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-05-04 12:41:43 -07:00
Daniel Nephin 18c9e73832 connect: do not set QuerySource.Node
Setting this field to a value is equivalent to using the 'near' query paramter.
The intent is to sort the results by proximity to the node requesting
them. However with connect we send the results to envoy, which doesn't
care about the order, so setting this field is increasing the work
performed for no gain.

It is necessary to unset this field now because we would like connect
to use streaming, but streaming does not support sorting by proximity.
2021-04-27 19:03:16 -04:00
Freddy 6d15569062
Split Upstream.Identifier() so non-empty namespace is always prepended in ent (#10031) 2021-04-15 13:54:40 -06:00
freddygv 6c43195e2a Merge master and fix upstream config protocol defaulting 2021-03-17 21:13:40 -06:00
freddygv 3c97e5a777 Update proxycfg for transparent proxy 2021-03-17 13:40:39 -06:00
Daniel Nephin 2a53b8293a proxycfg: use rpcclient/health.Client instead of passing around cache name
This should allow us to swap out the implementation with something other
than `agent/cache` without making further code changes.
2021-03-12 11:46:04 -05:00
Daniel Nephin 410b1261c2 proxycfg: Use streaming in connect state 2021-03-12 11:35:42 -05:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
R.B. Boyer f2b8bf109c
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
Pierre Souchay 947d8eb039
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Daniel Nephin 07c1081d39 Fix a bunch of unparam lint issues 2020-06-24 13:00:14 -04:00
Kyle Havlovitz 28b4819882
Merge pull request #7759 from hashicorp/ingress/tls-hosts
Add TLS option for Ingress Gateway listeners
2020-05-11 09:18:43 -07:00
Freddy a749f46316
Remove timeout and call to Fatal from goroutine (#7797) 2020-05-06 14:33:17 -06:00
Kyle Havlovitz bd6bb3bf2d Add TLS option and DNS SAN support to ingress config
xds: Only set TLS context for ingress listener when requested
2020-05-06 15:12:02 -05:00
Kyle Havlovitz 6a5eba63ab
Ingress Gateways for TCP services (#7509)
* Implements a simple, tcp ingress gateway workflow

This adds a new type of gateway for allowing Ingress traffic into Connect from external services.

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-04-16 14:00:48 -07:00
R.B. Boyer a7fb26f50f
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00
Matt Keeler 2524a028ea
OSS Changes for various config entry namespacing bugs (#7226) 2020-02-06 10:52:25 -05:00
Chris Piraino 3dd0b59793
Allow users to configure either unstructured or JSON logging (#7130)
* hclog Allow users to choose between unstructured and JSON logging
2020-01-28 17:50:41 -06:00
Matt Keeler 485a0a65ea
Updates to Config Entries and Connect for Namespaces (#7116) 2020-01-24 10:04:58 -05:00
Matt Keeler 442924c35a
Sync of OSS changes to support namespaces (#6909) 2019-12-09 21:26:41 -05:00
Freddy 5eace88ce2
Expose HTTP-based paths through Connect proxy (#6446)
Fixes: #5396

This PR adds a proxy configuration stanza called expose. These flags register
listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only
listening on the loopback interface, while still accepting traffic from non
Connect-enabled services.

Under expose there is a boolean checks flag that would automatically expose all
registered HTTP and gRPC check paths.

This stanza also accepts a paths list to expose individual paths. The primary
use case for this functionality would be to expose paths for third parties like
Prometheus or the kubelet.

Listeners for requests to exposed paths are be configured dynamically at run
time. Any time a proxy, or check can be registered, a listener can also be
created.

In this initial implementation requests to these paths are not
authenticated/encrypted.
2019-09-25 20:55:52 -06:00
R.B. Boyer 0675e0606e
connect: generate the full SNI names for discovery targets in the compiler rather than in the xds package (#6340) 2019-08-19 13:03:03 -05:00
R.B. Boyer 64fc002e03
connect: fix failover through a mesh gateway to a remote datacenter (#6259)
Failover is pushed entirely down to the data plane by creating envoy
clusters and putting each successive destination in a different load
assignment priority band. For example this shows that normally requests
go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080:

- name: foo
  load_assignment:
    cluster_name: foo
    policy:
      overprovisioning_factor: 100000
    endpoints:
    - priority: 0
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 1.2.3.4
              port_value: 8080
    - priority: 1
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 6.7.8.9
              port_value: 8080

Mesh gateways route requests based solely on the SNI header tacked onto
the TLS layer. Envoy currently only lets you configure the outbound SNI
header at the cluster layer.

If you try to failover through a mesh gateway you ideally would
configure the SNI value per endpoint, but that's not possible in envoy
today.

This PR introduces a simpler way around the problem for now:

1. We identify any target of failover that will use mesh gateway mode local or
   remote and then further isolate any resolver node in the compiled discovery
   chain that has a failover destination set to one of those targets.

2. For each of these resolvers we will perform a small measurement of
   comparative healths of the endpoints that come back from the health API for the
   set of primary target and serial failover targets. We walk the list of targets
   in order and if any endpoint is healthy we return that target, otherwise we
   move on to the next target.

3. The CDS and EDS endpoints both perform the measurements in (2) for the
   affected resolver nodes.

4. For CDS this measurement selects which TLS SNI field to use for the cluster
   (note the cluster is always going to be named for the primary target)

5. For EDS this measurement selects which set of endpoints will populate the
   cluster. Priority tiered failover is ignored.

One of the big downsides to this approach to failover is that the failover
detection and correction is going to be controlled by consul rather than
deferring that entirely to the data plane as with the prior version. This also
means that we are bound to only failover using official health signals and
cannot make use of data plane signals like outlier detection to affect
failover.

In this specific scenario the lack of data plane signals is ok because the
effectiveness is already muted by the fact that the ultimate destination
endpoints will have their data plane signals scrambled when they pass through
the mesh gateway wrapper anyway so we're not losing much.

Another related fix is that we now use the endpoint health from the
underlying service, not the health of the gateway (regardless of
failover mode).
2019-08-05 13:30:35 -05:00
R.B. Boyer 0165e93517
connect: expose an API endpoint to compile the discovery chain (#6248)
In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand.

Also removed ability to configure the envoy OverprovisioningFactor.
2019-08-02 15:34:54 -05:00
R.B. Boyer 4666599e18
connect: reconcile how upstream configuration works with discovery chains (#6225)
* connect: reconcile how upstream configuration works with discovery chains

The following upstream config fields for connect sidecars sanely
integrate into discovery chain resolution:

- Destination Namespace/Datacenter: Compilation occurs locally but using
different default values for namespaces and datacenters. The xDS
clusters that are created are named as they normally would be.

- Mesh Gateway Mode (single upstream): If set this value overrides any
value computed for any resolver for the entire discovery chain. The xDS
clusters that are created may be named differently (see below).

- Mesh Gateway Mode (whole sidecar): If set this value overrides any
value computed for any resolver for the entire discovery chain. If this
is specifically overridden for a single upstream this value is ignored
in that case. The xDS clusters that are created may be named differently
(see below).

- Protocol (in opaque config): If set this value overrides the value
computed when evaluating the entire discovery chain. If the normal chain
would be TCP or if this override is set to TCP then the result is that
we explicitly disable L7 Routing and Splitting. The xDS clusters that
are created may be named differently (see below).

- Connect Timeout (in opaque config): If set this value overrides the
value for any resolver in the entire discovery chain. The xDS clusters
that are created may be named differently (see below).

If any of the above overrides affect the actual result of compiling the
discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op
override to "tcp") then the relevant parameters are hashed and provided
to the xDS layer as a prefix for use in naming the Clusters. This is to
ensure that if one Upstream discovery chain has no overrides and
tangentially needs a cluster named "api.default.XXX", and another
Upstream does have overrides for "api.default.XXX" that they won't
cross-pollinate against the operator's wishes.

Fixes #6159
2019-08-01 22:03:34 -05:00
R.B. Boyer 1cc6d07d0f
add test for discovery chain agent cache-type (#6130) 2019-07-15 10:09:52 -05:00
R.B. Boyer 72a8195839
implement some missing service-router features and add more xDS testing (#6065)
- also implement OnlyPassing filters for non-gateway clusters
2019-07-12 14:16:21 -05:00
hashicorp-ci 8adbb8471e Merge Consul OSS branch 'master' at commit a58d8e91ac258c04174afca3818cbdae23aa8d3f 2019-07-03 02:00:31 +00:00
R.B. Boyer bccbb2b4ae
activate most discovery chain features in xDS for envoy (#6024) 2019-07-01 22:10:51 -05:00
Matt Keeler bcb3439c4c
Fix some tests that I broke when refactoring the ConfigSnapshot (#6051)
* Fix some tests that I broke when refactoring the ConfigSnapshot

* Make sure the MeshGateway config is added to all the right api structs

* Fix some more tests
2019-07-01 19:47:58 -04:00
Pierre Souchay 2e9370ba11 Bump timeout in TestManager_BasicLifecycle (#6030) 2019-07-01 17:02:00 -06:00
Matt Keeler 39bb0e3e77 Implement Mesh Gateways
This includes both ingress and egress functionality.
2019-07-01 16:28:30 -04:00