UNIX domain socket paths are limited to 104-108 characters, depending on
the OS. This limit was quite easy to exceed when testing the feature on
Kubernetes, due to how proxy IDs encode the Pod ID eg:
metrics-collector-59467bcb9b-fkkzl-hcp-metrics-collector-sidecar-proxy
To ensure we stay under that character limit this commit makes a
couple changes:
- Use a b64 encoded SHA1 hash of the namespace + proxy ID to create a
short and deterministic socket file name.
- Add validation to proxy registrations and proxy-defaults to enforce a
limit on the socket directory length.
This is being added so that metrics sent to HCP can be augmented with the source node's ID.
Opting not to add this to stats_tag out of caution, since it would increase the cardinality of metrics emitted by Envoy for all users.
There is no functional impact to Envoy expected from this change.
Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Add a new envoy flag: "envoy_hcp_metrics_bind_socket_dir", a directory
where a unix socket will be created with the name
`<namespace>_<proxy_id>.sock` to forward Envoy metrics.
If set, this will configure:
- In bootstrap configuration a local stats_sink and static cluster.
These will forward metrics to a loopback listener sent over xDS.
- A dynamic listener listening at the socket path that the previously
defined static cluster is sending metrics to.
- A dynamic cluster that will forward traffic received at this listener
to the hcp-metrics-collector service.
Reasons for having a static cluster pointing at a dynamic listener:
- We want to secure the metrics stream using TLS, but the stats sink can
only be defined in bootstrap config. With dynamic listeners/clusters
we can use the proxy's leaf certificate issued by the Connect CA,
which isn't available at bootstrap time.
- We want to intelligently route to the HCP collector. Configuring its
addreess at bootstrap time limits our flexibility routing-wise. More
on this below.
Reasons for defining the collector as an upstream in `proxycfg`:
- The HCP collector will be deployed as a mesh service.
- Certificate management is taken care of, as mentioned above.
- Service discovery and routing logic is automatically taken care of,
meaning that no code changes are required in the xds package.
- Custom routing rules can be added for the collector using discovery
chain config entries. Initially the collector is expected to be
deployed to each admin partition, but in the future could be deployed
centrally in the default partition. These config entries could even be
managed by HCP itself.
- When an envoy version is out of a supported range, we now return the envoy version being used as `major.minor.x` to indicate that it is the minor version at most that is incompatible
- When an envoy version is in the list of unsupported envoy versions we return back the envoy version in the error message as `major.minor.patch` as now the exact version matters.
* Add some fixes to allow for registering via consul connect envoy -gateway api
* Fix infinite recursion
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Ensure nothing in the troubleshoot go module depends on consul's top level module. This is so we can import troubleshoot into consul-k8s and not import all of consul.
* turns troubleshoot into a go module [authored by @curtbushko]
* gets the envoy protos into the troubleshoot module [authored by @curtbushko]
* adds a new go module `envoyextensions` which has xdscommon and extensioncommon folders that both the xds package and the troubleshoot package can import
* adds testing and linting for the new go modules
* moves the unit tests in `troubleshoot/validateupstream` that depend on proxycfg/xds into the xds package, with a comment describing why those tests cannot be in the troubleshoot package
* fixes all the imports everywhere as a result of these changes
Co-authored-by: Curt Bushko <cbushko@gmail.com>
* Add support for envoy readiness flags
- add flags 'envoy-ready-bind-port` and `envoy-ready-bind-addr` on consul connect envoy to create a ready listener on that address.
Fix issue where TLS configuration was ignored for unix sockets in consul connect envoy.
Disable xds check on bootstrap mode and change check to warn only.
* add functions for returning the max and min Envoy major versions
- added an UnsupportedEnvoyVersions list
- removed an unused error from TestDetermineSupportedProxyFeaturesFromString
- modified minSupportedVersion to use the function for getting the Min Envoy major version. Using just the major version without the patch is equivalent to using `.0`
* added a function for executing the envoy --version command
- added a new exec.go file to not be locked to unix system
* added envoy version check when using consul connect envoy
* added changelog entry
* added docs change
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.
The resulting behavior from this commit is:
`ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
`ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
`ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
Now that peered upstreams can generate envoy resources (#13758), we need a way to disambiguate local from peered resources in our metrics. The key difference is that datacenter and partition will be replaced with peer, since in the context of peered resources partition is ambiguous (could refer to the partition in a remote cluster or one that exists locally). The partition and datacenter of the proxy will always be that of the source service.
Regexes were updated to make emitting datacenter and partition labels mutually exclusive with peer labels.
Listener filter names were updated to better match the existing regex.
Cluster names assigned to peered upstreams were updated to be synthesized from local peer name (it previously used the externally provided primary SNI, which contained the peer name from the other side of the peering). Integration tests were updated to assert for the new peer labels.
When the protocol is http-like, and an intention has a peered source
then the normal RBAC mTLS SAN field check is replaces with a joint combo
of:
mTLS SAN field must be the service's local mesh gateway leaf cert
AND
the first XFCC header (from the MGW) must have a URI field that matches the original intention source
Also:
- Update the regex program limit to be much higher than the teeny
defaults, since the RBAC regex constructions are more complicated now.
- Fix a few stray panics in xds generation.
Adds the merge-central-config query param option to the /catalog/node-services/:node-name API,
to get a service definition in the response that is merged with central defaults (proxy-defaults/service-defaults).
Updated the consul connect envoy command to use this option when
retrieving the proxy service details so as to render the bootstrap configuration correctly.
Changes the sourcing of the envoy bootstrap configuration
to not use agent APIs and instead use the catalog(server) API.
This is done by passing a node-name flag to the command,
(which can only be used with proxy-id).
Also fixes a bug where the golden envoy bootstrap config files
used for tests did not use the expected destination service name
in certain places for connect proxy kind.
set -euo pipefail
unset CDPATH
cd "$(dirname "$0")"
for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
echo "=== require: $f ==="
sed -i '/require := require.New(t)/d' $f
# require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
# require.XXX(tblah) but not require.XXX(t, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
# require.XXX(rblah) but not require.XXX(r, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
gofmt -s -w $f
done
for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
echo "=== assert: $f ==="
sed -i '/assert := assert.New(t)/d' $f
# assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
# assert.XXX(tblah) but not assert.XXX(t, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
# assert.XXX(rblah) but not assert.XXX(r, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
gofmt -s -w $f
done
The DebugConfig in the self endpoint can change at any time. It's not a stable API.
With the previous change to rename GRPCPort to XDSPort this command would have broken.
This commit adds the XDSPort to a stable part of the XDS api, and changes the envoy command to read
this new field.
It includes support for the old API as well, in case a newer CLI is used with an older API, and
adds a test for both cases.
The main branch is being renamed from master->main. This commit should
update all references to the main branch to the new name.
Co-Authored-By: Mike Morris <mikemorris@users.noreply.github.com>
Normally the named pipe would buffer up to 64k, but in some cases when a
soft limit is reached, they will start only buffering up to 4k.
In either case, we should not deadlock.
This commit changes the pipe-bootstrap command to first buffer all of
stdin into the process, before trying to write it to the named pipe.
This allows the process memory to act as the buffer, instead of the
named pipe.
Also changed the order of operations in `makeBootstrapPipe`. The new
test added in this PR showed that simply buffering in the process memory
was not enough to fix the issue. We also need to ensure that the
`pipe-bootstrap` process is started before we try to write to its
stdin. Otherwise the write will still block.
Also set stdout/stderr on the subprocess, so that any errors are visible
to the user.