If a CA config update did not cause a root change, the codepath would return early and skip some steps which preserve its intermediate certificates and signing key ID. This commit re-orders some code and prevents updates from generating new intermediate certificates.
This commit adds a sameness-group config entry to the API and structs packages. It includes some validation logic and a new memdb index that tracks the default sameness-group for each partition. Sameness groups will simplify the effort of managing failovers / intentions / exports for peers and partitions.
Note that this change purely to introduce the configuration entry and does not include the full functionality of sameness-groups.
* Leverage ServiceResolver ConnectTimeout for route timeouts to make TerminatingGateway upstream timeouts configurable
* Regenerate golden files
* Add RequestTimeout field
* Add changelog entry
Registering gRPC balancers is thread-unsafe because they are stored in a
global map variable that is accessed without holding a lock. Therefore,
it's expected that balancers are registered _once_ at the beginning of
your program (e.g. in a package `init` function) and certainly not after
you've started dialing connections, etc.
> NOTE: this function must only be called during initialization time
> (i.e. in an init() function), and is not thread-safe.
While this is fine for us in production, it's challenging for tests that
spin up multiple agents in-memory. We currently register a balancer per-
agent which holds agent-specific state that cannot safely be shared.
This commit introduces our own registry that _is_ thread-safe, and
implements the Builder interface such that we can call gRPC's `Register`
method once, on start-up. It uses the same pattern as our resolver
registry where we use the dial target's host (aka "authority"), which is
unique per-agent, to determine which builder to use.
Prior to this commit, all peer services were transmitted as connect-enabled
as long as a one or more mesh-gateways were healthy. With this change, there
is now a difference between typical services and connect services transmitted
via peering.
A service will be reported as "connect-enabled" as long as any of these
conditions are met:
1. a connect-proxy sidecar is registered for the service name.
2. a connect-native instance of the service is registered.
3. a service resolver / splitter / router is registered for the service name.
4. a terminating gateway has registered the service.
Protobuf Refactoring for Multi-Module Cleanliness
This commit includes the following:
Moves all packages that were within proto/ to proto/private
Rewrites imports to account for the packages being moved
Adds in buf.work.yaml to enable buf workspaces
Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml
Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes)
Why:
In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage.
There were some recent changes to have our own ratelimiting annotations.
The two combined were not working when I was trying to use them together (attempting to rebase another branch)
Buf workspaces should be the solution to the problem
Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root.
This resulted in proto file name conflicts in the Go global protobuf type registry.
The solution to that was to add in a private/ directory into the path within the proto/ directory.
That then required rewriting all the imports.
Is this safe?
AFAICT yes
The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc)
Other than imports, there were no changes to any generated code as a result of this.
* [API Gateway] Add integration test for conflicted TCP listeners
* [API Gateway] Update simple test to leverage intentions and multiple listeners
* Fix broken unit test
* [API Gateway] Add integration test for HTTP routes
Prior to this commit, secondary datacenters could not be initialized
as peering acceptors if ACLs were enabled. This is due to the fact that
internal server-to-server API calls would fail because the management
token was not generated. This PR makes it so that both primary and
secondary datacenters generate their own management token whenever
a leader is elected in their respective clusters.
* Stub proxycfg handler for API gateway
* Add Service Kind constants/handling for API Gateway
* Begin stubbing for SDS
* Add new Secret type to xDS order of operations
* Continue stubbing of SDS
* Iterate on proxycfg handler for API gateway
* Handle BoundAPIGateway config entry subscription in proxycfg-glue
* Add API gateway to config snapshot validation
* Add API gateway to config snapshot clone, leaf, etc.
* Subscribe to bound route + cert config entries on bound-api-gateway
* Track routes + certs on API gateway config snapshot
* Generate DeepCopy() for types used in watch.Map
* Watch all active references on api-gateway, unwatch inactive
* Track loading of initial bound-api-gateway config entry
* Use proper proto package for SDS mapping
* Use ResourceReference instead of ServiceName, collect resources
* Fix typo, add + remove TODOs
* Watch discovery chains for TCPRoute
* Add TODO for updating gateway services for api-gateway
* make proto
* Regenerate deep-copy for proxycfg
* Set datacenter on upstream ID from query source
* Watch discovery chains for http-route service backends
* Add ServiceName getter to HTTP+TCP Service structs
* Clean up unwatched discovery chains on API Gateway
* Implement watch for ingress leaf certificate
* Collect upstreams on http-route + tcp-route updates
* Remove unused GatewayServices update handler
* Remove unnecessary gateway services logic for API Gateway
* Remove outdate TODO
* Use .ToIngress where appropriate, including TODO for cleaning up
* Cancel before returning error
* Remove GatewayServices subscription
* Add godoc for handlerAPIGateway functions
* Update terminology from Connect => Consul Service Mesh
Consistent with terminology changes in https://github.com/hashicorp/consul/pull/12690
* Add missing TODO
* Remove duplicate switch case
* Rerun deep-copy generator
* Use correct property on config snapshot
* Remove unnecessary leaf cert watch
* Clean up based on code review feedback
* Note handler properties that are initialized but set elsewhere
* Add TODO for moving helper func into structs pkg
* Update generated DeepCopy code
* gofmt
* Begin stubbing for SDS
* Start adding tests
* Remove second BoundAPIGateway case in glue
* TO BE PICKED: fix formatting of str
* WIP
* Fix merge conflict
* Implement HTTP Route to Discovery Chain config entries
* Stub out function to create discovery chain
* Add discovery chain merging code (#16131)
* Test adding TCP and HTTP routes
* Add some tests for the synthesizer
* Run go mod tidy
* Pairing with N8
* Run deep copy
* Clean up GatewayChainSynthesizer
* Fix missing assignment of BoundAPIGateway topic
* Separate out synthesizeChains and toIngressTLS
* Fix build errors
* Ensure synthesizer skips non-matching routes by protocol
* Rebase on N8s work
* Generate DeepCopy() for API gateway listener types
* Improve variable name
* Regenerate DeepCopy() code
* Fix linting issue
* fix protobuf import
* Fix more merge conflict errors
* Fix synthesize test
* Run deep copy
* Add URLRewrite to proto
* Update agent/consul/discoverychain/gateway_tcproute.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Remove APIGatewayConfigEntry that was extra
* Error out if route kind is unknown
* Fix formatting errors in proto
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com>
* Fix detecting when a route doesn't bind to a gateway because it's already bound
* Clean up status setting code
* rework binding a bit
* More cleanup
* Flatten all files
* Fix up docstrings
* Stub proxycfg handler for API gateway
* Add Service Kind constants/handling for API Gateway
* Begin stubbing for SDS
* Add new Secret type to xDS order of operations
* Continue stubbing of SDS
* Iterate on proxycfg handler for API gateway
* Handle BoundAPIGateway config entry subscription in proxycfg-glue
* Add API gateway to config snapshot validation
* Add API gateway to config snapshot clone, leaf, etc.
* Subscribe to bound route + cert config entries on bound-api-gateway
* Track routes + certs on API gateway config snapshot
* Generate DeepCopy() for types used in watch.Map
* Watch all active references on api-gateway, unwatch inactive
* Track loading of initial bound-api-gateway config entry
* Use proper proto package for SDS mapping
* Use ResourceReference instead of ServiceName, collect resources
* Fix typo, add + remove TODOs
* Watch discovery chains for TCPRoute
* Add TODO for updating gateway services for api-gateway
* make proto
* Regenerate deep-copy for proxycfg
* Set datacenter on upstream ID from query source
* Watch discovery chains for http-route service backends
* Add ServiceName getter to HTTP+TCP Service structs
* Clean up unwatched discovery chains on API Gateway
* Implement watch for ingress leaf certificate
* Collect upstreams on http-route + tcp-route updates
* Remove unused GatewayServices update handler
* Remove unnecessary gateway services logic for API Gateway
* Remove outdate TODO
* Use .ToIngress where appropriate, including TODO for cleaning up
* Cancel before returning error
* Remove GatewayServices subscription
* Add godoc for handlerAPIGateway functions
* Update terminology from Connect => Consul Service Mesh
Consistent with terminology changes in https://github.com/hashicorp/consul/pull/12690
* Add missing TODO
* Remove duplicate switch case
* Rerun deep-copy generator
* Use correct property on config snapshot
* Remove unnecessary leaf cert watch
* Clean up based on code review feedback
* Note handler properties that are initialized but set elsewhere
* Add TODO for moving helper func into structs pkg
* Update generated DeepCopy code
* gofmt
* Generate DeepCopy() for API gateway listener types
* Improve variable name
* Regenerate DeepCopy() code
* Fix linting issue
* Temporarily remove the secret type from resource generation
This endpoint shows total services, connect service instances and
billable service instances in the local datacenter or globally. Billable
instances = total service instances - connect services - consul server instances.
* Add additional controller implementations
* remove additional interface
* Fix comparison checks and mark unused contexts
* Switch to time.Now().UTC()
* Add a pointer helper for shadowing loop variables
* Extract anonymous functions for readability
* clean up logging
* Add Type to the Condition proto
* Update some comments and add additional space for readability
* Address PR feedback
* Fix up dirty checks and change to pointer receiver
* remove legacy tokens
* remove lingering legacy token references from docs
* update language and naming for token secrets and accessor IDs
* updates all tokenID references to clarify accessorID
* remove token type references and lookup tokens by accessorID index
* remove unnecessary constants
* replace additional tokenID param names
* Add warning info for deprecated -id parameter
Co-authored-by: Paul Glass <pglass@hashicorp.com>
* Update field comment
Co-authored-by: Paul Glass <pglass@hashicorp.com>
---------
Co-authored-by: Paul Glass <pglass@hashicorp.com>
* feat: calculate retry wait time with exponential back-off
* test: add test for getWaitTime method
* feat: enforce random jitter between min value from previous iteration and current
* extract randomStagger to simplify tests and use Milliseconds to avoid float math.
* rename variables
* add test and rename comment
---------
Co-authored-by: Poonam Jadhav <poonam.jadhav@hashicorp.com>
* Add Peer field to service-defaults upstream overrides.
* add api changes, compat mode for service default overrides
* Fixes based on testing
---------
Co-authored-by: DanStough <dan.stough@hashicorp.com>
* remove legacy tokens
* Update test comment
Co-authored-by: Paul Glass <pglass@hashicorp.com>
* fix imports
* update docs for additional CLI changes
* add test case for anonymous token
* set deprecated api fields to json ignore and fix patch errors
* update changelog to breaking-change
* fix import
* update api docs to remove legacy reference
* fix docs nav data
---------
Co-authored-by: Paul Glass <pglass@hashicorp.com>
* Stub out bind code
* Move into a new package and flesh out binding
* Fill in the actual binding logic
* Bind to all listeners if not specified
* Move bind code up to gateways package
* Fix resource type check
* Add UpsertRoute to listeners
* Add RemoveRoute to listener
* Implement binding as associated functions
* Pass in gateways to BindRouteToGateways
* Add a bunch of tests
* Fix hopping from one listener on a gateway to another
* Remove parents from HTTPRoute
* Apply suggestions from code review
* Fix merge conflict
* Unify binding into a single variadic function 🙌 @nathancoleman
* Remove vestigial error
* Add TODO on protocol check
* rate limit test
* Have tests for the 3 modes
* added assertions for logs and metrics
* add comments to test sections
* add check for rate limit exceeded text in log assertion section.
* fix linting error
* updating test to use KV get and put. move log assertion tolast.
* Adding logging for blocking messages in enforcing mode. refactoring tests.
* modified test description
* formatting
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* Update test/integration/consul-container/test/ratelimit/ratelimit_test.go
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* expand log checking so that it ensures both logs are they when they are supposed to be and not there when they are not expected to be.
* add retry on test
* Warn once when rate limit exceed regardless of enforcing vs permissive.
* Update test/integration/consul-container/test/ratelimit/ratelimit_test.go
Co-authored-by: Dan Upton <daniel@floppy.co>
Co-authored-by: Dan Upton <daniel@floppy.co>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* Stub Config Entries for Consul Native API Gateway (#15644)
* Add empty InlineCertificate struct and protobuf
* apigateway stubs
* Stub HTTPRoute in api pkg
* Stub HTTPRoute in structs pkg
* Simplify api.APIGatewayConfigEntry to be consistent w/ other entries
* Update makeConfigEntry switch, add docstring for HTTPRouteConfigEntry
* Add TCPRoute to MakeConfigEntry, return unique Kind
* Stub BoundAPIGatewayConfigEntry in agent
* Add RaftIndex to APIGatewayConfigEntry stub
* Add new config entry kinds to validation allow-list
* Add RaftIndex to other added config entry stubs
* Update usage metrics assertions to include new cfg entries
* Add Meta and acl.EnterpriseMeta to all new ConfigEntry types
* Remove unnecessary Services field from added config entry types
* Implement GetMeta(), GetEnterpriseMeta() for added config entry types
* Add meta field to proto, name consistently w/ existing config entries
* Format config_entry.proto
* Add initial implementation of CanRead + CanWrite for new config entry types
* Add unit tests for decoding of new config entry types
* Add unit tests for parsing of new config entry types
* Add unit tests for API Gateway config entry ACLs
* Return typed PermissionDeniedError on BoundAPIGateway CanWrite
* Add unit tests for added config entry ACLs
* Add BoundAPIGateway type to AllConfigEntryKinds
* Return proper kind from BoundAPIGateway
* Add docstrings for new config entry types
* Add missing config entry kinds to proto def
* Update usagemetrics_oss_test.go
* Use utility func for returning PermissionDeniedError
* EventPublisher subscriptions for Consul Native API Gateway (#15757)
* Create new event topics in subscribe proto
* Add tests for PBSubscribe func
* Make configs singular, add all configs to PBToStreamSubscribeRequest
* Add snapshot methods
* Add config_entry_events tests
* Add config entry kind to topic for new configs
* Add unit tests for snapshot methods
* Start adding integration test
* Test using the new controller code
* Update agent/consul/state/config_entry_events.go
* Check value of error
* Add controller stubs for API Gateway (#15837)
* update initial stub implementation
* move files, clean up mutex references
* Remove embed, use idiomatic names for constructors
* Remove stray file introduced in merge
* Add APIGateway validation (#15847)
* Add APIGateway validation
* Add additional validations
* Add cert ref validation
* Add protobuf definitions
* Fix up field types
* Add API structs
* Move struct fields around a bit
* APIGateway InlineCertificate validation (#15856)
* Add APIGateway validation
* Add additional validations
* Add protobuf definitions
* Tabs to spaces
* Add API structs
* Move struct fields around a bit
* Add validation for InlineCertificate
* Fix ACL test
* APIGateway BoundAPIGateway validation (#15858)
* Add APIGateway validation
* Add additional validations
* Add cert ref validation
* Add protobuf definitions
* Fix up field types
* Add API structs
* Move struct fields around a bit
* Add validation for BoundAPIGateway
* APIGateway TCPRoute validation (#15855)
* Add APIGateway validation
* Add additional validations
* Add cert ref validation
* Add protobuf definitions
* Fix up field types
* Add API structs
* Add TCPRoute normalization and validation
* Add forgotten Status
* Add some more field docs in api package
* Fix test
* Format imports
* Rename snapshot test variable names
* Add plumbing for Native API GW Subscriptions (#16003)
Co-authored-by: Sarah Alsmiller <sarah.alsmiller@hashicorp.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: sarahalsmiller <100602640+sarahalsmiller@users.noreply.github.com>
Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com>
Fix configuration merging for implicit tproxy upstreams.
Change the merging logic so that the wildcard upstream has correct proxy-defaults
and service-defaults values combined into it. It did not previously merge all fields,
and the wildcard upstream did not exist unless service-defaults existed (it ignored
proxy-defaults, essentially).
Change the way we fetch upstream configuration in the xDS layer so that it falls back
to the wildcard when no matching upstream is found. This is what allows implicit peer
upstreams to have the correct "merged" config.
Change proxycfg to always watch local mesh gateway endpoints whenever a peer upstream
is found. This simplifies the logic so that we do not have to inspect the "merged"
configuration on peer upstreams to extract the mesh gateway mode.
Previously, we'd begin a session with the xDS concurrency limiter
regardless of whether the proxy was registered in the catalog or in
the server's local agent state.
This caused problems for users who run `consul connect envoy` directly
against a server rather than a client agent, as the server's locally
registered proxies wouldn't be included in the limiter's capacity.
Now, the `ConfigSource` is responsible for beginning the session and we
only do so for services in the catalog.
Fixes: https://github.com/hashicorp/consul/issues/15753
* Protobuf Modernization
Remove direct usage of golang/protobuf in favor of google.golang.org/protobuf
Marshallers (protobuf and json) needed some changes to account for different APIs.
Moved to using the google.golang.org/protobuf/types/known/* for the well known types including replacing some custom Struct manipulation with whats available in the structpb well known type package.
This also updates our devtools script to install protoc-gen-go from the right location so that files it generates conform to the correct interfaces.
* Fix go-mod-tidy make target to work on all modules
- Fixes a panic when Operation.SourceAddr is nil (internal net/rpc calls)
- Adds proper HTTP response codes (429 and 503) for rate limit errors
- Makes the error messages clearer
- Enables automatic retries for rate-limit errors in the net/rpc stack
* inject logger and create logdrop sink
* init sink with an empty struct instead of nil
* wrap a logger instead of a sink and add a discard logger to avoid double logging
* fix compile errors
* fix linter errors
* Fix bug where log arguments aren't properly formatted
* Move log sink construction outside of handler
* Add prometheus definition and docs for log drop counter
Co-authored-by: Daniel Upton <daniel@floppy.co>
This is the OSS portion of enterprise PR 3822.
Adds a custom gRPC balancer that replicates the router's server cycling
behavior. Also enables automatic retries for RESOURCE_EXHAUSTED errors,
which we now get for free.
* Rate limiting handler - ensure configuration has changed before modifying limiters
* Updating test to validate arguments to UpdateConfig
* Removing duplicate test. Updating mock.
* Renaming NullRateLimiter to NullRequestLimitsHandler
* Rate Limit Handler - ensure rate limiting is not in the code path when not configured
* Update agent/consul/rate/handler.go
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* formatting handler.go
* Rate limiting handler - ensure configuration has changed before modifying limiters
* Updating test to validate arguments to UpdateConfig
* Removing duplicate test. Updating mock.
* adding logging for when UpdateConfig is called but the config has not changed.
* Update agent/consul/rate/handler.go
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* Update agent/consul/rate/handler_test.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* modifying existing variable name based on pr feedback
* updating a broken merge conflict;
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
* Rate limiting handler - ensure configuration has changed before modifying limiters
* Updating test to validate arguments to UpdateConfig
* Removing duplicate test. Updating mock.
* adding logging for when UpdateConfig is called but the config has not changed.
* Update agent/consul/rate/handler.go
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* change to perform all tree writes in the same go routine to avoid race condition.
* rename runStoreOnce to reconcile
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* reduce nesting
Co-authored-by: Dan Upton <daniel@floppy.co>
* server: add placeholder glue for rate limit handler
This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.
This allows us to start working on the net/rpc and gRPC interceptors and
config logic.
* Add handler errors
* Set the global read and write limits
* fixing multilimiter moving packages
* Fix typo
* Simplify globalLimit usage
* add multilimiter and tests
* exporting LimitedEntity
* Apply suggestions from code review
Co-authored-by: John Murret <john.murret@hashicorp.com>
* add config update and rename config params
* add doc string and split config
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* use timer to avoid go routine leak and change the interface
* add comments to tests
* fix failing test
* add prefix with config edge, refactor tests
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* refactor to apply configs for limiters under a prefix
* add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic
* make KeyType an exported type
* split the config and limiter trees to fix race conditions in config update
* rename variables
* fix race in test and remove dead code
* fix reconcile loop to not create a timer on each loop
* add extra benchmark tests and fix tests
* fix benchmark test to pass value to func
* server: add placeholder glue for rate limit handler
This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.
This allows us to start working on the net/rpc and gRPC interceptors and
config logic.
* Set the global read and write limits
* fixing multilimiter moving packages
* add server configuration for global rate limiting.
* remove agent test
* remove added stuff from handler
* remove added stuff from multilimiter
* removing unnecessary TODOs
* Removing TODO comment from handler
* adding in defaulting to infinite
* add disabled status in there
* adding in documentation for disabled mode.
* make disabled the default.
* Add mock and agent test
* addig documentation and missing mock file.
* Fixing test TestLoad_IntegrationWithFlags
* updating docs based on PR feedback.
* Updating Request Limits mode to use int based on PR feedback.
* Adding RequestLimits struct so we have a nested struct in ReloadableConfig.
* fixing linting references
* Update agent/consul/rate/handler.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* Update agent/consul/config.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* removing the ignore of the request limits in JSON. addingbuilder logic to convert any read rate or write rate less than 0 to rate.Inf
* added conversion function to convert request limits object to handler config.
* Updating docs to reflect gRPC and RPC are rate limit and as a result, HTTP requests are as well.
* Updating values for TestLoad_FullConfig() so that they were different and discernable.
* Updating TestRuntimeConfig_Sanitize
* Fixing TestLoad_IntegrationWithFlags test
* putting nil check in place
* fixing rebase
* removing change for missing error checks. will put in another PR
* Rebasing after default multilimiter config change
* resolving rebase issues
* updating reference for incomingRPCLimiter to use interface
* updating interface
* Updating interfaces
* Fixing mock reference
Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
Implements the gRPC middleware for rate-limiting as a tap.ServerInHandle
function (executed before the request is unmarshaled).
Mappings between gRPC methods and their operation type are generated by
a protoc plugin introduced by #15564.
Adds a no-op implementation of the rate-limit handler and exposes
it on the consul.Server struct.
It allows us to start working on the net/rpc and gRPC interceptors
and config (re)loading logic, without having to implement the full
handler up-front.
Co-authored-by: John Murret <john.murret@hashicorp.com>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* add multilimiter and tests
* exporting LimitedEntity
* go mod tidy
* Apply suggestions from code review
Co-authored-by: John Murret <john.murret@hashicorp.com>
* add config update and rename config params
* add doc string and split config
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* use timer to avoid go routine leak and change the interface
* add comments to tests
* fix failing test
* add prefix with config edge, refactor tests
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* refactor to apply configs for limiters under a prefix
* add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic
* make KeyType an exported type
* split the config and limiter trees to fix race conditions in config update
* rename variables
* fix race in test and remove dead code
* fix reconcile loop to not create a timer on each loop
* add extra benchmark tests and fix tests
* fix benchmark test to pass value to func
* use a separate go routine to write limiters (#15643)
* use a separate go routine to write limiters
* Add updating limiter when another limiter is created
* fix waiter to be a ticker, so we commit more than once.
* fix tests and add tests for coverage
* unexport members and add tests
* make UpdateConfig thread safe and multi call to Run safe
* replace swith with if
* fix review comments
* replace time.sleep with retries
* fix flaky test and remove unnecessary init
* fix test races
* remove unnecessary negative test case
* remove fixed todo
Co-authored-by: John Murret <john.murret@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
All of the current integration tests where Vault is the Connect CA now use non-root tokens for the test. This helps us detect privilege changes in the vault model so we can keep our guides up to date.
One larger change was that the RenewIntermediate function got refactored slightly so it could be used from a test, rather than the large duplicated function we were testing in a test which seemed error prone.
The fix outlined and merged in #15253 fixed the issue as it occurs in the primary
DC. There is a similar issue that arises when vault is used as the Connect CA in a
secondary datacenter that is fixed by this PR.
Additionally: this PR adds support to run the existing suite of vault related integration
tests against the last 4 versions of vault (1.9, 1.10, 1.11, 1.12)
* Remove log line about server mgmt token init
Currently the server management token is only being bootstrapped in the
primary datacenter. That means that servers on the secondary datacenter
will never have this token available, and would log this line any time a
token is resolved.
Bootstrapping the token in secondary datacenters will be done in a
follow-up.
* Add changelog entry
* auto-config: relax node name validation for JWT authorization
This changes the JWT authorization logic to allow all non-whitespace,
non-quote characters when validating node names. Consul had previously
allowed these characters in node names, until this validation was added
to fix a security vulnerability with whitespace/quotes being passed to
the `bexpr` library. This unintentionally broke node names with
characters like `.` which aren't related to this vulnerability.
* Update website/content/docs/agent/config/cli-flags.mdx
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
* add leadership transfer command
* add RPC call test (flaky)
* add missing import
* add changelog
* add command registration
* Apply suggestions from code review
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* add the possibility of providing an id to raft leadership transfer. Add few tests.
* delete old file from cherry pick
* rename changelog filename to PR #
* rename changelog and fix import
* fix failing test
* check for OperatorWrite
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* rename from leader-transfer to transfer-leader
* remove version check and add test for operator read
* move struct to operator.go
* first pass
* add code for leader transfer in the grpc backend and tests
* wire the http endpoint to the new grpc endpoint
* remove the RPC endpoint
* remove non needed struct
* fix naming
* add mog glue to API
* fix comment
* remove dead code
* fix linter error
* change package name for proto file
* remove error wrapping
* fix failing test
* add command registration
* add grpc service mock tests
* fix receiver to be pointer
* use defined values
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* reuse MockAclAuthorizer
* add documentation
* remove usage of external.TokenFromContext
* fix failing tests
* fix proto generation
* Apply suggestions from code review
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
* Apply suggestions from code review
* add more context in doc for the reason
* Apply suggestions from docs code review
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
* regenerate proto
* fix linter errors
Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
There are a few changes that needed to be made to to handle authorizing
reads for imported data:
- If the data was imported from a peer we should not attempt to read the
data using the traditional authz rules. This is because the name of
services/nodes in a peer cluster are not equivalent to those of the
importing cluster.
- If the data was imported from a peer we need to check whether the
token corresponds to a service, meaning that it has service:write
permissions, or to a local read only token that can read all
nodes/services in a namespace.
This required changes at the policyAuthorizer level, since that is the
only view available to OSS Consul, and at the enterprise
partition/namespace level.
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* Fix mesh gateway proxy-defaults not affecting upstreams.
* Clarify distinction with upstream settings
Top-level mesh gateway mode in proxy-defaults and service-defaults gets
merged into NodeService.Proxy.MeshGateway, and only gets merged with
the mode attached to an an upstream in proxycfg/xds.
* Fix mgw mode usage for peered upstreams
There were a couple issues with how mgw mode was being handled for
peered upstreams.
For starters, mesh gateway mode from proxy-defaults
and the top-level of service-defaults gets stored in
NodeService.Proxy.MeshGateway, but the upstream watch for peered data
was only considering the mesh gateway config attached in
NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway
mode via global proxy-defaults or service-defaults on the downstream
would not have an effect.
Separately, transparent proxy watches for peered upstreams didn't
consider mesh gateway mode at all.
This commit addresses the first issue by ensuring that we overlay the
upstream config for peered upstreams as we do for non-peered. The second
issue is addressed by re-using setupWatchesForPeeredUpstream when
handling transparent proxy updates.
Note that for transparent proxies we do not yet support mesh gateway
mode per upstream, so the NodeService.Proxy.MeshGateway mode is used.
* Fix upstream mesh gateway mode handling in xds
This commit ensures that when determining the mesh gateway mode for
peered upstreams we consider the NodeService.Proxy.MeshGateway config as
a baseline.
In absense of this change, setting a mesh gateway mode via
proxy-defaults or the top-level of service-defaults will not have an
effect for peered upstreams.
* Merge service/proxy defaults in cfg resolver
Previously the mesh gateway mode for connect proxies would be
merged at three points:
1. On servers, in ComputeResolvedServiceConfig.
2. On clients, in MergeServiceConfig.
3. On clients, in proxycfg/xds.
The first merge returns a ServiceConfigResponse where there is a
top-level MeshGateway config from proxy/service-defaults, along with
per-upstream config.
The second merge combines per-upstream config specified at the service
instance with per-upstream config specified centrally.
The third merge combines the NodeService.Proxy.MeshGateway
config containing proxy/service-defaults data with the per-upstream
mode. This third merge is easy to miss, which led to peered upstreams
not considering the mesh gateway mode from proxy-defaults.
This commit removes the third merge, and ensures that all mesh gateway
config is available at the upstream. This way proxycfg/xds do not need
to do additional overlays.
* Ensure that proxy-defaults is considered in wc
Upstream defaults become a synthetic Upstream definition under a
wildcard key "*". Now that proxycfg/xds expect Upstream definitions to
have the final MeshGateway values, this commit ensures that values from
proxy-defaults/service-defaults are the default for this synthetic
upstream.
* Add changelog.
Co-authored-by: freddygv <freddy@hashicorp.com>
Previously, the MergeNodeServiceWithCentralConfig method accepted a
ServiceSpecificRequest argument, of which only the Datacenter and
QueryOptions fields were used.
Digging a little deeper, it turns out these fields were only passed
down to the ComputeResolvedServiceConfig method (through the
ServiceConfigRequest struct) which didn't actually use them.
As such, not all call-sites passed a valid ServiceSpecificRequest
so it's safer to remove the argument altogether to prevent future
changes from depending on it.
Re-add ServerExternalAddresses parameter in GenerateToken endpoint
This reverts commit 5e156772f6a7fba5324eb6804ae4e93c091229a6
and adds extra functionality to support newer peering behaviors.
Allow for some message duplication in subscription events during assertions.
I'm pretty sure the subscriptions machinery allows for messages to occasionally
be duplicated instead of dropping them, as a once-and-only-once queue is a pipe
dream and you have to pick one of the other two options.
* autoencrypt: helpful error for clients with wrong dc
If clients have set a different datacenter than the servers they're
connecting with for autoencrypt, give a helpful error message.
To support Destinations on the service-defaults (for tproxy with terminating gateway), we need to now also make servers watch service-defaults config entries.
* peering: skip register duplicate node and check from the peer
* Prebuilt the nodes map and checks map to avoid repeated for loop
* use key type to struct: node id, service id, and check id
Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created.
Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.
When peering through mesh gateways we expect outbound dials to peer
servers to flow through the local mesh gateway addresses.
Now when establishing a peering we get a list of dial addresses as a
ring buffer that includes local mesh gateway addresses if the local DC
is configured to peer through mesh gateways. The ring buffer includes
the mesh gateway addresses first, but also includes the remote server
addresses as a fallback.
This fallback is present because it's possible that direct egress from
the servers may be allowed. If not allowed then the leader will cycle
back to a mesh gateway address through the ring.
When attempting to dial the remote servers we retry up to a fixed
timeout. If using mesh gateways we also have an initial wait in
order to allow for the mesh gateways to configure themselves.
Note that if we encounter a permission denied error we do not retry
since that error indicates that the secret in the peering token is
invalid.
memdb's `WatchCh` method creates a goroutine that will publish to the
returned channel when the watchset is triggered or the given context
is canceled. Although this is called out in its godoc comment, it's
not obvious that this method creates a goroutine who's lifecycle you
need to manage.
In the xDS capacity controller, we were calling `WatchCh` on each
iteration of the control loop, meaning the number of goroutines would
grow on each autopilot event until there was catalog churn.
In the catalog config source, we were calling `WatchCh` with the
background context, meaning that the goroutine would keep running after
the sync loop had terminated.
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
A previous commit introduced an internally-managed server certificate
to use for peering-related purposes.
Now the peering token has been updated to match that behavior:
- The server name matches the structure of the server cert
- The CA PEMs correspond to the Connect CA
Note that if Conect is disabled, and by extension the Connect CA, we
fall back to the previous behavior of returning the manually configured
certs and local server SNI.
Several tests were updated to use the gRPC TLS port since they enable
Connect by default. This means that the peering token will embed the
Connect CA, and the dialer will expect a TLS listener.
* updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric
* update changelog
* update changelog
* correcting changelog
* adding "QueueCheckInterval" for memberlist to test
* updating integration test containers to grab latest api
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.
The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.
This commit introduces a new ACL token used for internal server
management purposes.
It has a few key properties:
- It has unlimited permissions.
- It is persisted through Raft as System Metadata rather than in the
ACL tokens table. This is to avoid users seeing or modifying it.
- It is re-generated on leadership establishment.
Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.
In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.
This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.
If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.
Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.
The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.