To avoid unintended tampering with remote downstreams via service
config, refactor BasicEnvoyExtender and RuntimeConfig to disallow
typical Envoy extensions from being applied to non-local proxies.
Continue to allow this behavior for AWS Lambda and the read-only
Validate builtin extensions.
Addresses CVE-2023-2816.
* API Gateway XDS Primitives, endpoints and clusters (#17002)
* XDS primitive generation for endpoints and clusters
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* server_test
* deleted extra file
* add missing parents to test
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Routes for API Gateway (#17158)
* XDS primitive generation for endpoints and clusters
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* server_test
* deleted extra file
* add missing parents to test
* checkpoint
* delete extra file
* httproute flattening code
* linting issue
* so close on this, calling for tonight
* unit test passing
* add in header manip to virtual host
* upstream rebuild commented out
* Use consistent upstream name whether or not we're rebuilding
* Start working through route naming logic
* Fix typos in test descriptions
* Simplify route naming logic
* Simplify RebuildHTTPRouteUpstream
* Merge additional compiled discovery chains instead of overwriting
* Use correct chain for flattened route, clean up + add TODOs
* Remove empty conditional branch
* Restore previous variable declaration
Limit the scope of this PR
* Clean up, improve TODO
* add logging, clean up todos
* clean up function
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* checkpoint, skeleton, tests not passing
* checkpoint
* endpoints xds cluster configuration
* resources test fix
* fix reversion in resources_test
* checkpoint
* Update agent/proxycfg/api_gateway.go
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
* unit tests passing
* gofmt
* add deterministic sorting to appease the unit test gods
* remove panic
* Find ready upstream matching listener instead of first in list
* Clean up, improve TODO
* Modify getReadyUpstreams to filter upstreams by listener (#17410)
Each listener would previously have all upstreams from any route that bound to the listener. This is problematic when a route bound to one listener also binds to other listeners and so includes upstreams for multiple listeners. The list for a given listener would then wind up including upstreams for other listeners.
* clean up todos, references to api gateway in listeners_ingress
* merge in Nathan's fix
* Update agent/consul/discoverychain/gateway.go
* cleanup current todos, remove snapshot manipulation from generation code
* Update agent/structs/config_entry_gateways.go
Co-authored-by: Thomas Eckert <teckert@hashicorp.com>
* Update agent/consul/discoverychain/gateway.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Update agent/consul/discoverychain/gateway.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Update agent/proxycfg/snapshot.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* clarified header comment for FlattenHTTPRoute, changed RebuildHTTPRouteUpstream to BuildHTTPRouteUpstream
* simplify cert logic
* Delete scratch
* revert route related changes in listener PR
* Update agent/consul/discoverychain/gateway.go
* Update agent/proxycfg/snapshot.go
* clean up uneeded extra lines in endpoints
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
Co-authored-by: Thomas Eckert <teckert@hashicorp.com>
* endpoints xds cluster configuration
* clusters xds native generation
* resources test fix
* fix reversion in resources_test
* Update agent/proxycfg/api_gateway.go
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
* gofmt
* Modify getReadyUpstreams to filter upstreams by listener (#17410)
Each listener would previously have all upstreams from any route that bound to the listener. This is problematic when a route bound to one listener also binds to other listeners and so includes upstreams for multiple listeners. The list for a given listener would then wind up including upstreams for other listeners.
* Update agent/proxycfg/api_gateway.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Restore import blocking
* Undo removal of unrelated code
---------
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* endpoints xds cluster configuration
* resources test fix
* fix reversion in resources_test
* Update agent/proxycfg/api_gateway.go
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
* gofmt
* Modify getReadyUpstreams to filter upstreams by listener (#17410)
Each listener would previously have all upstreams from any route that bound to the listener. This is problematic when a route bound to one listener also binds to other listeners and so includes upstreams for multiple listeners. The list for a given listener would then wind up including upstreams for other listeners.
* Update agent/proxycfg/api_gateway.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Restore import blocking
* Skip to next route if route has no upstreams
* cleanup
* change set from bool to empty struct
---------
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
UNIX domain socket paths are limited to 104-108 characters, depending on
the OS. This limit was quite easy to exceed when testing the feature on
Kubernetes, due to how proxy IDs encode the Pod ID eg:
metrics-collector-59467bcb9b-fkkzl-hcp-metrics-collector-sidecar-proxy
To ensure we stay under that character limit this commit makes a
couple changes:
- Use a b64 encoded SHA1 hash of the namespace + proxy ID to create a
short and deterministic socket file name.
- Add validation to proxy registrations and proxy-defaults to enforce a
limit on the socket directory length.
* Add MaxEjectionPercent to config entry
* Add BaseEjectionTime to config entry
* Add MaxEjectionPercent and BaseEjectionTime to protobufs
* Add MaxEjectionPercent and BaseEjectionTime to api
* Fix integration test breakage
* Verify MaxEjectionPercent and BaseEjectionTime in integration test upstream confings
* Website docs for MaxEjectionPercent and BaseEjection time
* Add `make docs` to browse docs at http://localhost:3000
* Changelog entry
* so that is the difference between consul-docker and dev-docker
* blah
* update proto funcs
* update proto
---------
Co-authored-by: Maliz <maliheh.monshizadeh@hashicorp.com>
This implements permissive mTLS , which allows toggling services into "permissive" mTLS mode.
Permissive mTLS mode allows incoming "non Consul-mTLS" traffic to be forward unmodified to the application.
* Update service-defaults and proxy-defaults config entries with a MutualTLSMode field
* Update the mesh config entry with an AllowEnablingPermissiveMutualTLS field and implement the necessary validation. AllowEnablingPermissiveMutualTLS must be true to allow changing to MutualTLSMode=permissive, but this does not require that all proxy-defaults and service-defaults are currently in strict mode.
* Update xDS listener config to add a "permissive filter chain" when MutualTLSMode=permissive for a particular service. The permissive filter chain matches incoming traffic by the destination port. If the destination port matches the service port from the catalog, then no mTLS is required and the traffic sent is forwarded unmodified to the application.
This commit swaps the partition field to the local partition for
discovery chains targeting peers. Prior to this change, peer upstreams
would always use a value of default regardless of which partition they
exist in. This caused several issues in xds / proxycfg because of id
mismatches.
Some prior fixes were made to deal with one-off id mismatches that this
PR also cleans up, since they are no longer needed.
Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Add a new envoy flag: "envoy_hcp_metrics_bind_socket_dir", a directory
where a unix socket will be created with the name
`<namespace>_<proxy_id>.sock` to forward Envoy metrics.
If set, this will configure:
- In bootstrap configuration a local stats_sink and static cluster.
These will forward metrics to a loopback listener sent over xDS.
- A dynamic listener listening at the socket path that the previously
defined static cluster is sending metrics to.
- A dynamic cluster that will forward traffic received at this listener
to the hcp-metrics-collector service.
Reasons for having a static cluster pointing at a dynamic listener:
- We want to secure the metrics stream using TLS, but the stats sink can
only be defined in bootstrap config. With dynamic listeners/clusters
we can use the proxy's leaf certificate issued by the Connect CA,
which isn't available at bootstrap time.
- We want to intelligently route to the HCP collector. Configuring its
addreess at bootstrap time limits our flexibility routing-wise. More
on this below.
Reasons for defining the collector as an upstream in `proxycfg`:
- The HCP collector will be deployed as a mesh service.
- Certificate management is taken care of, as mentioned above.
- Service discovery and routing logic is automatically taken care of,
meaning that no code changes are required in the xds package.
- Custom routing rules can be added for the collector using discovery
chain config entries. Initially the collector is expected to be
deployed to each admin partition, but in the future could be deployed
centrally in the default partition. These config entries could even be
managed by HCP itself.
* Leverage ServiceResolver ConnectTimeout for route timeouts to make TerminatingGateway upstream timeouts configurable
* Regenerate golden files
* Add RequestTimeout field
* Add changelog entry
Protobuf Refactoring for Multi-Module Cleanliness
This commit includes the following:
Moves all packages that were within proto/ to proto/private
Rewrites imports to account for the packages being moved
Adds in buf.work.yaml to enable buf workspaces
Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml
Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes)
Why:
In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage.
There were some recent changes to have our own ratelimiting annotations.
The two combined were not working when I was trying to use them together (attempting to rebase another branch)
Buf workspaces should be the solution to the problem
Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root.
This resulted in proto file name conflicts in the Go global protobuf type registry.
The solution to that was to add in a private/ directory into the path within the proto/ directory.
That then required rewriting all the imports.
Is this safe?
AFAICT yes
The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc)
Other than imports, there were no changes to any generated code as a result of this.
* Include secret type when building resources from config snapshot
* First pass at generating envoy secrets from api-gateway snapshot
* Update comments for xDS update order
* Add secret type + corresponding golden files to existing tests
* Initialize test helpers for testing api-gateway resource generation
* Generate golden files for new api-gateway xDS resource test
* Support ADS for TLS certificates on api-gateway
* Configure TLS on api-gateway listeners
* Inline TLS cert code
* update tests
* Add SNI support so we can have multiple certificates
* Remove commented out section from helper
* regen deep-copy
* Add tcp tls test
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Simple API Gateway e2e test for tcp routes
* Drop DNSSans since we don't front the Gateway with a leaf cert
* WIP listener tests for api-gateway
* Return early if no routes
* Add back in leaf cert to testing
* Fix merge conflicts
* Re-add kind to setup
* Fix iteration over listener upstreams
* New tcp listener test
* Add tests for API Gateway with TCP and HTTP routes
* Move zero-route check back
* Drop generateIngressDNSSANs
* Check for chains not routes
---------
Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com>
Fix mesh gateways incorrectly matching peer locality.
This fixes an issue where local mesh gateways use an
incorrect address when attempting to forward traffic to a
peered datacenter. Prior to this change it would use the
lan address instead of the wan if the locality matched. This
should never be done for peering, since we must route all
traffic through the remote mesh gateway.
* Stub proxycfg handler for API gateway
* Add Service Kind constants/handling for API Gateway
* Begin stubbing for SDS
* Add new Secret type to xDS order of operations
* Continue stubbing of SDS
* Iterate on proxycfg handler for API gateway
* Handle BoundAPIGateway config entry subscription in proxycfg-glue
* Add API gateway to config snapshot validation
* Add API gateway to config snapshot clone, leaf, etc.
* Subscribe to bound route + cert config entries on bound-api-gateway
* Track routes + certs on API gateway config snapshot
* Generate DeepCopy() for types used in watch.Map
* Watch all active references on api-gateway, unwatch inactive
* Track loading of initial bound-api-gateway config entry
* Use proper proto package for SDS mapping
* Use ResourceReference instead of ServiceName, collect resources
* Fix typo, add + remove TODOs
* Watch discovery chains for TCPRoute
* Add TODO for updating gateway services for api-gateway
* make proto
* Regenerate deep-copy for proxycfg
* Set datacenter on upstream ID from query source
* Watch discovery chains for http-route service backends
* Add ServiceName getter to HTTP+TCP Service structs
* Clean up unwatched discovery chains on API Gateway
* Implement watch for ingress leaf certificate
* Collect upstreams on http-route + tcp-route updates
* Remove unused GatewayServices update handler
* Remove unnecessary gateway services logic for API Gateway
* Remove outdate TODO
* Use .ToIngress where appropriate, including TODO for cleaning up
* Cancel before returning error
* Remove GatewayServices subscription
* Add godoc for handlerAPIGateway functions
* Update terminology from Connect => Consul Service Mesh
Consistent with terminology changes in https://github.com/hashicorp/consul/pull/12690
* Add missing TODO
* Remove duplicate switch case
* Rerun deep-copy generator
* Use correct property on config snapshot
* Remove unnecessary leaf cert watch
* Clean up based on code review feedback
* Note handler properties that are initialized but set elsewhere
* Add TODO for moving helper func into structs pkg
* Update generated DeepCopy code
* gofmt
* Begin stubbing for SDS
* Start adding tests
* Remove second BoundAPIGateway case in glue
* TO BE PICKED: fix formatting of str
* WIP
* Fix merge conflict
* Implement HTTP Route to Discovery Chain config entries
* Stub out function to create discovery chain
* Add discovery chain merging code (#16131)
* Test adding TCP and HTTP routes
* Add some tests for the synthesizer
* Run go mod tidy
* Pairing with N8
* Run deep copy
* Clean up GatewayChainSynthesizer
* Fix missing assignment of BoundAPIGateway topic
* Separate out synthesizeChains and toIngressTLS
* Fix build errors
* Ensure synthesizer skips non-matching routes by protocol
* Rebase on N8s work
* Generate DeepCopy() for API gateway listener types
* Improve variable name
* Regenerate DeepCopy() code
* Fix linting issue
* fix protobuf import
* Fix more merge conflict errors
* Fix synthesize test
* Run deep copy
* Add URLRewrite to proto
* Update agent/consul/discoverychain/gateway_tcproute.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Remove APIGatewayConfigEntry that was extra
* Error out if route kind is unknown
* Fix formatting errors in proto
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com>
* Stub proxycfg handler for API gateway
* Add Service Kind constants/handling for API Gateway
* Begin stubbing for SDS
* Add new Secret type to xDS order of operations
* Continue stubbing of SDS
* Iterate on proxycfg handler for API gateway
* Handle BoundAPIGateway config entry subscription in proxycfg-glue
* Add API gateway to config snapshot validation
* Add API gateway to config snapshot clone, leaf, etc.
* Subscribe to bound route + cert config entries on bound-api-gateway
* Track routes + certs on API gateway config snapshot
* Generate DeepCopy() for types used in watch.Map
* Watch all active references on api-gateway, unwatch inactive
* Track loading of initial bound-api-gateway config entry
* Use proper proto package for SDS mapping
* Use ResourceReference instead of ServiceName, collect resources
* Fix typo, add + remove TODOs
* Watch discovery chains for TCPRoute
* Add TODO for updating gateway services for api-gateway
* make proto
* Regenerate deep-copy for proxycfg
* Set datacenter on upstream ID from query source
* Watch discovery chains for http-route service backends
* Add ServiceName getter to HTTP+TCP Service structs
* Clean up unwatched discovery chains on API Gateway
* Implement watch for ingress leaf certificate
* Collect upstreams on http-route + tcp-route updates
* Remove unused GatewayServices update handler
* Remove unnecessary gateway services logic for API Gateway
* Remove outdate TODO
* Use .ToIngress where appropriate, including TODO for cleaning up
* Cancel before returning error
* Remove GatewayServices subscription
* Add godoc for handlerAPIGateway functions
* Update terminology from Connect => Consul Service Mesh
Consistent with terminology changes in https://github.com/hashicorp/consul/pull/12690
* Add missing TODO
* Remove duplicate switch case
* Rerun deep-copy generator
* Use correct property on config snapshot
* Remove unnecessary leaf cert watch
* Clean up based on code review feedback
* Note handler properties that are initialized but set elsewhere
* Add TODO for moving helper func into structs pkg
* Update generated DeepCopy code
* gofmt
* Generate DeepCopy() for API gateway listener types
* Improve variable name
* Regenerate DeepCopy() code
* Fix linting issue
* Temporarily remove the secret type from resource generation
* remove legacy tokens
* remove lingering legacy token references from docs
* update language and naming for token secrets and accessor IDs
* updates all tokenID references to clarify accessorID
* remove token type references and lookup tokens by accessorID index
* remove unnecessary constants
* replace additional tokenID param names
* Add warning info for deprecated -id parameter
Co-authored-by: Paul Glass <pglass@hashicorp.com>
* Update field comment
Co-authored-by: Paul Glass <pglass@hashicorp.com>
---------
Co-authored-by: Paul Glass <pglass@hashicorp.com>
Ensure nothing in the troubleshoot go module depends on consul's top level module. This is so we can import troubleshoot into consul-k8s and not import all of consul.
* turns troubleshoot into a go module [authored by @curtbushko]
* gets the envoy protos into the troubleshoot module [authored by @curtbushko]
* adds a new go module `envoyextensions` which has xdscommon and extensioncommon folders that both the xds package and the troubleshoot package can import
* adds testing and linting for the new go modules
* moves the unit tests in `troubleshoot/validateupstream` that depend on proxycfg/xds into the xds package, with a comment describing why those tests cannot be in the troubleshoot package
* fixes all the imports everywhere as a result of these changes
Co-authored-by: Curt Bushko <cbushko@gmail.com>
* Add Peer field to service-defaults upstream overrides.
* add api changes, compat mode for service default overrides
* Fixes based on testing
---------
Co-authored-by: DanStough <dan.stough@hashicorp.com>
* Add Tproxy support to Envoy Extensions (this is needed for service to service validation)
* Add validation for Envoy configuration for an upstream service
* Use both /config_dump and /cluster to validate Envoy configuration
This is because of a bug in Envoy where the EndpointsConfigDump does not
include a cluster_name, making it impossible to match an endpoint to
verify it exists.
This removes endpoints support for builtin extensions since only the
validate plugin was using it, and it is no longer used. It also removes
test cases for endpoint validation. Endpoints validation now only occurs
in the top level test from config_dump and clusters json files.
Co-authored-by: Eric <eric@haberkorn.co>
* updated builtin extension to parse region directly from ARN
- added a unit test
- added some comments/light refactoring
* updated golden files with proper ARNs
- ARNs need to be right format now that they are being processed
* updated tests and integration tests
- removed 'region' from all EnvoyExtension arguments
- added properly formatted ARN which includes the same region found in the removed "Region" field: 'us-east-1'
Fix configuration merging for implicit tproxy upstreams.
Change the merging logic so that the wildcard upstream has correct proxy-defaults
and service-defaults values combined into it. It did not previously merge all fields,
and the wildcard upstream did not exist unless service-defaults existed (it ignored
proxy-defaults, essentially).
Change the way we fetch upstream configuration in the xDS layer so that it falls back
to the wildcard when no matching upstream is found. This is what allows implicit peer
upstreams to have the correct "merged" config.
Change proxycfg to always watch local mesh gateway endpoints whenever a peer upstream
is found. This simplifies the logic so that we do not have to inspect the "merged"
configuration on peer upstreams to extract the mesh gateway mode.