The nodeCheck slice was being used as the first arg in append, which in some cases will modify the array backing the slice. This would lead to service checks for other services in the wrong event.
Also refactor some things to reduce the arguments to functions.
Creating a new readTxn does not work because it will not see the newly created objects that are about to be committed. Instead use the active write Txn.
Whenever an upsert/deletion of a config entry happens, within the open
state store transaction we speculatively test compile all discovery
chains that may be affected by the pending modification to verify that
the write would not create an erroneous scenario (such as splitting
traffic to a subset that did not exist).
If a single discovery chain evaluation references two config entries
with the same kind and name in different namespaces then sometimes the
upsert/deletion would be falsely rejected. It does not appear as though
this bug would've let invalid writes through to the state store so the
correction does not require a cleanup phase.
This commit refactors the state store usage code to track unique service
name changes on transaction commit. This means we only need to lookup
usage entries when reading the information, as opposed to iterating over
a large number of service indices.
- Take into account a service instance's name being changed
- Do not iterate through entire list of service instances, we only care
about whether there is 0, 1, or more than 1.
We add a WriteTxn interface for use in updating the usage memdb table,
with the forward-looking prospect of incrementally converting other
functions to accept interfaces.
As well, we use the ReadTxn in new usage code, and as a side effect
convert a couple of existing functions to use that interface as well.
Fixes#8466
Since Consul 1.8.0 there was a bug in how ingress gateway protocol
compatibility was enforced. At the point in time that an ingress-gateway
config entry was modified the discovery chain for each upstream was
checked to ensure the ingress gateway protocol matched. Unfortunately
future modifications of other config entries were not validated against
existing ingress-gateway definitions, such as:
1. create tcp ingress-gateway pointing to 'api' (ok)
2. create service-defaults for 'api' setting protocol=http (worked, but not ok)
3. create service-splitter or service-router for 'api' (worked, but caused an agent panic)
If you were to do these in a different order, it would fail without a
crash:
1. create service-defaults for 'api' setting protocol=http (ok)
2. create service-splitter or service-router for 'api' (ok)
3. create tcp ingress-gateway pointing to 'api' (fail with message about
protocol mismatch)
This PR introduces the missing validation. The two new behaviors are:
1. create tcp ingress-gateway pointing to 'api' (ok)
2. (NEW) create service-defaults for 'api' setting protocol=http ("ok" for back compat)
3. (NEW) create service-splitter or service-router for 'api' (fail with
message about protocol mismatch)
In consideration for any existing users that may be inadvertently be
falling into item (2) above, that is now officiall a valid configuration
to be in. For anyone falling into item (3) above while you cannot use
the API to manufacture that scenario anymore, anyone that has old (now
bad) data will still be able to have the agent use them just enough to
generate a new agent/proxycfg error message rather than a panic.
Unfortunately we just don't have enough information to properly fix the
config entries.
This change was mostly automated with the following
First generate a list of functions with:
git grep -o 'Store) \([^(]\+\)(tx \*txn' ./agent/consul/state | awk '{print $2}' | grep -o '^[^(]\+'
Then the list was curated a bit with trial/error to remove and add funcs
as necessary.
Finally the replacement was done with:
dir=agent/consul/state
file=${1-funcnames}
while read fn; do
echo "$fn"
sed -i -e "s/(s \*Store) $fn(/$fn(/" $dir/*.go
sed -i -e "s/s\.$fn(/$fn(/" $dir/*.go
sed -i -e "s/s\.store\.$fn(/$fn(/" $dir/*.go
done < $file
Making these functions allows them to be used without introducing
an artificial dependency on the struct. Many of these will be called
from streaming Event processors, which do not have a store.
This change is being made ahead of the streaming work to get to reduce
the size of the streaming diff.
Move the subscription context to Next. context.Context should generally
never be stored in a struct because it makes that struct only valid
while the context is valid. This is rarely obvious from the caller.
Adds a forceClosed channel in place of the old context, and uses the new
context as a way for the caller to stop the Subscription blocking.
Remove some recursion out of bufferImte.Next. The caller is already looping so we can continue
in that loop instead of recursing. This ensures currentItem is updated immediately (which probably
does not matter in practice), and also removes the chance that we overflow the stack.
NextNoBlock and FollowAfter do not need to handle bufferItem.Err, the caller already
handles it.
Moves filter to a method to simplify Next, and more explicitly separate filtering from looping.
Also improve some godoc
Only unwrap itemBuffer.Err when necessary
EventPublisher was receiving TopicHandlers, which had a couple of
problems:
- ChangeProcessors were being grouped by Topic, but they completely
ignored the topic and were performed on every change
- ChangeProcessors required EventPublisher to be aware of database
changes
By moving ChangeProcesors out of EventPublisher, and having Publish
accept events instead of changes, EventPublisher no longer needs to
be aware of these things.
Handlers is now only SnapshotHandlers, which are still mapped by Topic.
Also allows us to remove the small 'db' package that had only two types.
They can now be unexported types in state.
The EventPublisher is the central hub of the PubSub system. It is toughly coupled with much of
stream. Some stream internals were exported exclusively for EventPublisher.
The two Subscribe cases (with or without index) were also awkwardly split between two packages. By
moving EventPublisher into stream they are now both in the same package (although still in different files).
Also store the index in Changes instead of the Txn.
This change is in preparation for movinng EventPublisher to the stream package, and
making handleACLUpdates async once again.
It is critical that Unsubscribe be called with the same pointer to a
SubscriptionRequest that was used to create the Subscription. The
docstring made that clear, but it sill allowed a caler to get it wrong by
creating a new SubscriptionRequest.
By hiding this detail from the caller, and only exposing an Unsubscribe
method, it should be impossible to fail to Unsubscribe.
Also update some godoc strings.
Use a separate lock for subscriptions.ByToken to allow it to happen synchronously
in the commit flow.
This removes the need to create a new txn for the goroutine, and removes
the need for EventPublisher to contain a reference to DB.
Make topicRegistry use functions instead of unbound methods
Use a regular memDB in EventPublisher to remove a reference cycle
Removes the need for EventPublisher to use a store
Also remove secretHash, which was used to hash tokens. We don't expose
these tokens anywhere, so we can use the string itself instead of a
Hash.
Fix acl_events_test.go for storing a structs type.
Highlights:
- add new endpoint to query for intentions by exact match
- using this endpoint from the CLI instead of the dump+filter approach
- enforcing that OSS can only read/write intentions with a SourceNS or
DestinationNS field of "default".
- preexisting OSS intentions with now-invalid namespace fields will
delete those intentions on initial election or for wildcard namespaces
an attempt will be made to downgrade them to "default" unless one
exists.
- also allow the '-namespace' CLI arg on all of the intention subcommands
- update lots of docs
And fix the 'value not used' issues.
Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
A Node Identity is very similar to a service identity. Its main targeted use is to allow creating tokens for use by Consul agents that will grant the necessary permissions for all the typical agent operations (node registration, coordinate updates, anti-entropy).
Half of this commit is for golden file based tests of the acl token and role cli output. Another big updates was to refactor many of the tests in agent/consul/acl_endpoint_test.go to use the same style of tests and the same helpers. Besides being less boiler plate in the tests it also uses a common way of starting a test server with ACLs that should operate without any warnings regarding deprecated non-uuid master tokens etc.
* Fixes#5606: Tokens converted from legacy ACLs get their Hash computed
This allows new style token replication to work for legacy tokens as well when they change.
* tests: fix timestamp comparison
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
* testing: replace most goe/verify.Values with require.Equal
One difference between these two comparisons is that go/verify considers
nil slices/maps to be equal to empty slices/maps, where as testify/require
does not, and does not appear to provide any way to enable that behaviour.
Because of this difference some expected values were changed from empty
slices to nil slices, and some calls to verify.Values were left.
* Remove github.com/pascaldekloe/goe/verify
Reduce the number of assertion packages we use from 2 to 1
Errors are values. We can use the error value to identify the 'comparison failed' case which makes the function easier to use and should make it harder to miss handle the error case
Handling errors at the end of a log switch/case block is somewhat
brittle. This block included a couple cases where errors were ignored,
but it was not obvious the way it was written.
This change moves all error handling into each case block. There is
still potentially one case where err is ignored, which will be handled
in a follow up.
Some of these problems are minor (unused vars), but others are real bugs (ignored errors).
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* Return early from updateGatewayServices if nothing to update
Previously, we returned an empty slice of gatewayServices, which caused
us to accidentally delete everything in the memdb table
* PR comment and better formatting
We require any non-wildcard services to match the protocol defined in
the listener on write, so that we can maintain a consistent experience
through ingress gateways. This also helps guard against accidental
misconfiguration by a user.
- Update tests that require an updated protocol for ingress gateways
This now requires some type of protocol setting in ingress gateway tests
to ensure the services are not filtered out.
- small refactor to add a max(x, y) function
- Use internal configEntryTxn function and add MaxUint64 to lib
- Validate that this cannot be set on a 'tcp' listener nor on a wildcard
service.
- Add Hosts field to api and test in consul config write CLI
- xds: Configure envoy with user-provided hosts from ingress gateways
This commit adds the necessary changes to allow an ingress gateway to
route traffic from a single defined port to multiple different upstream
services in the Consul mesh.
To do this, we now require all HTTP requests coming into the ingress
gateway to specify a Host header that matches "<service-name>.*" in
order to correctly route traffic to the correct service.
- Differentiate multiple listener's route names by port
- Adds a case in xds for allowing default discovery chains to create a
route configuration when on an ingress gateway. This allows default
services to easily use host header routing
- ingress-gateways have a single route config for each listener
that utilizes domain matching to route to different services.
Also ensure that WatchSets in tests are reset between calls to watchFired.
Any time a watch fires, subsequent calls to watchFired on the same WatchSet
will also return true even if there were no changes.
Previously, if a blocking query called CheckConnectServiceNodes
before the gateway-services memdb table had any entries,
a nil watchCh would be returned when calling serviceTerminatingGatewayNodes.
This means that the blocking query would not fire if a gateway config entry
was added after the watch started.
In cases where the blocking query started on proxy registration,
the proxy could potentially never become aware of an upstream endpoint
if that upstream was going to be represented by a gateway.
On every service registration, we check to see if a service should be
assassociated to a wildcard gateway-service. This fixes an issue where
we did not correctly check to see if the service being registered was a
"typical" service or not.
* Implements a simple, tcp ingress gateway workflow
This adds a new type of gateway for allowing Ingress traffic into Connect from external services.
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>