Protobuf Refactoring for Multi-Module Cleanliness
This commit includes the following:
Moves all packages that were within proto/ to proto/private
Rewrites imports to account for the packages being moved
Adds in buf.work.yaml to enable buf workspaces
Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml
Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes)
Why:
In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage.
There were some recent changes to have our own ratelimiting annotations.
The two combined were not working when I was trying to use them together (attempting to rebase another branch)
Buf workspaces should be the solution to the problem
Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root.
This resulted in proto file name conflicts in the Go global protobuf type registry.
The solution to that was to add in a private/ directory into the path within the proto/ directory.
That then required rewriting all the imports.
Is this safe?
AFAICT yes
The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc)
Other than imports, there were no changes to any generated code as a result of this.
* Protobuf Modernization
Remove direct usage of golang/protobuf in favor of google.golang.org/protobuf
Marshallers (protobuf and json) needed some changes to account for different APIs.
Moved to using the google.golang.org/protobuf/types/known/* for the well known types including replacing some custom Struct manipulation with whats available in the structpb well known type package.
This also updates our devtools script to install protoc-gen-go from the right location so that files it generates conform to the correct interfaces.
* Fix go-mod-tidy make target to work on all modules
* server: add placeholder glue for rate limit handler
This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.
This allows us to start working on the net/rpc and gRPC interceptors and
config logic.
* Add handler errors
* Set the global read and write limits
* fixing multilimiter moving packages
* Fix typo
* Simplify globalLimit usage
* add multilimiter and tests
* exporting LimitedEntity
* Apply suggestions from code review
Co-authored-by: John Murret <john.murret@hashicorp.com>
* add config update and rename config params
* add doc string and split config
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* use timer to avoid go routine leak and change the interface
* add comments to tests
* fix failing test
* add prefix with config edge, refactor tests
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* refactor to apply configs for limiters under a prefix
* add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic
* make KeyType an exported type
* split the config and limiter trees to fix race conditions in config update
* rename variables
* fix race in test and remove dead code
* fix reconcile loop to not create a timer on each loop
* add extra benchmark tests and fix tests
* fix benchmark test to pass value to func
* server: add placeholder glue for rate limit handler
This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.
This allows us to start working on the net/rpc and gRPC interceptors and
config logic.
* Set the global read and write limits
* fixing multilimiter moving packages
* add server configuration for global rate limiting.
* remove agent test
* remove added stuff from handler
* remove added stuff from multilimiter
* removing unnecessary TODOs
* Removing TODO comment from handler
* adding in defaulting to infinite
* add disabled status in there
* adding in documentation for disabled mode.
* make disabled the default.
* Add mock and agent test
* addig documentation and missing mock file.
* Fixing test TestLoad_IntegrationWithFlags
* updating docs based on PR feedback.
* Updating Request Limits mode to use int based on PR feedback.
* Adding RequestLimits struct so we have a nested struct in ReloadableConfig.
* fixing linting references
* Update agent/consul/rate/handler.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* Update agent/consul/config.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* removing the ignore of the request limits in JSON. addingbuilder logic to convert any read rate or write rate less than 0 to rate.Inf
* added conversion function to convert request limits object to handler config.
* Updating docs to reflect gRPC and RPC are rate limit and as a result, HTTP requests are as well.
* Updating values for TestLoad_FullConfig() so that they were different and discernable.
* Updating TestRuntimeConfig_Sanitize
* Fixing TestLoad_IntegrationWithFlags test
* putting nil check in place
* fixing rebase
* removing change for missing error checks. will put in another PR
* Rebasing after default multilimiter config change
* resolving rebase issues
* updating reference for incomingRPCLimiter to use interface
* updating interface
* Updating interfaces
* Fixing mock reference
Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* Backport agent tests.
Original commit: 0710b2d12fb51a29cedd1119b5fb086e5c71f632
Original commit: aaedb3c28bfe247266f21013d500147d8decb7cd (partial)
* Backport test fix and reduce flaky failures.
This continues the work done in #14908 where a crude solution to prevent a
goroutine leak was implemented. The former code would launch a perpetual
goroutine family every iteration (+1 +1) and the fixed code simply caused a
new goroutine family to first cancel the prior one to prevent the
leak (-1 +1 == 0).
This PR refactors this code completely to:
- make it more understandable
- remove the recursion-via-goroutine strangeness
- prevent unnecessary RPC fetches when the prior one has errored.
The core issue arose from a conflation of the entry.Fetching field to mean:
- there is an RPC (blocking query) in flight right now
- there is a goroutine running to manage the RPC fetch retry loop
The problem is that the goroutine-leak-avoidance check would treat
Fetching like (2), but within the body of a goroutine it would flip that
boolean back to false before the retry sleep. This would cause a new
chain of goroutines to launch which #14908 would correct crudely.
The refactored code uses a plain for-loop and changes the semantics
to track state for "is there a goroutine associated with this cache entry"
instead of the former.
We use a uint64 unique identity per goroutine instead of a boolean so
that any orphaned goroutines can tell when they've been replaced when
the expiry loop deletes a cache entry while the goroutine is still running
and is later replaced.
Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created.
Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.
Adds a user-configurable rate limiter to proxycfg snapshot delivery,
with a default limit of 250 updates per second.
This addresses a problem observed in our load testing of Consul
Dataplane where updating a "global" resource such as a wildcard
intention or the proxy-defaults config entry could starve the Raft or
Memberlist goroutines of CPU time, causing general cluster instability.
Preivously when alias check was removed it would not be stopped nor
cleaned up from the associated aliasChecks map.
This means that any time an alias check was deregistered we would
leak a goroutine for CheckAlias.run() because the stopCh would never
be closed.
This issue mostly affects service mesh deployments on platforms where
the client agent is mostly static but proxy services come and go
regularly, since by default sidecars are registered with an alias check.
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.
The resulting behavior from this commit is:
`ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
`ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
`ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
If startListeners successfully created listeners for some of its input addresses but eventually failed, the function would return an error and existing listeners would not be cleaned up.
* add config watcher to the config package
* add logging to watcher
* add test and refactor to add WatcherEvent.
* add all API calls and fix a bug with recreated files
* add tests for watcher
* remove the unnecessary use of context
* Add debug log and a test for file rename
* use inode to detect if the file is recreated/replaced and only listen to create events.
* tidy ups (#1535)
* tidy ups
* Add tests for inode reconcile
* fix linux vs windows syscall
* fix linux vs windows syscall
* fix windows compile error
* increase timeout
* use ctime ID
* remove remove/creation test as it's a use case that fail in linux
* fix linux/windows to use Ino/CreationTime
* fix the watcher to only overwrite current file id
* fix linter error
* fix remove/create test
* set reconcile loop to 200 Milliseconds
* fix watcher to not trigger event on remove, add more tests
* on a remove event try to add the file back to the watcher and trigger the handler if success
* fix race condition
* fix flaky test
* fix race conditions
* set level to info
* fix when file is removed and get an event for it after
* fix to trigger handler when we get a remove but re-add fail
* fix error message
* add tests for directory watch and fixes
* detect if a file is a symlink and return an error on Add
* rename Watcher to FileWatcher and remove symlink deref
* add fsnotify@v1.5.1
* fix go mod
* do not reset timer on errors, rename OS specific files
* rename New func
* events trigger on write and rename
* add missing test
* fix flaking tests
* fix flaky test
* check reconcile when removed
* delete invalid file
* fix test to create files with different mod time.
* back date file instead of sleeping
* add watching file in agent command.
* fix watcher call to use new API
* add configuration and stop watcher when server stop
* add certs as watched files
* move FileWatcher to the agent start instead of the command code
* stop watcher before replacing it
* save watched files in agent
* add add and remove interfaces to the file watcher
* fix remove to not return an error
* use `Add` and `Remove` to update certs files
* fix tests
* close events channel on the file watcher even when the context is done
* extract `NotAutoReloadableRuntimeConfig` is a separate struct
* fix linter errors
* add Ca configs and outgoing verify to the not auto reloadable config
* add some logs and fix to use background context
* add tests to auto-config reload
* remove stale test
* add tests to changes to config files
* add check to see if old cert files still trigger updates
* rename `NotAutoReloadableRuntimeConfig` to `StaticRuntimeConfig`
* fix to re add both key and cert file. Add test to cover this case.
* review suggestion
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add check to static runtime config changes
* fix test
* add changelog file
* fix review comments
* Apply suggestions from code review
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* update flag description
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
* fix compilation error
* add static runtime config support
* fix test
* fix review comments
* fix log test
* Update .changelog/12329.txt
Co-authored-by: Dan Upton <daniel@floppy.co>
* transfer tests to runtime_test.go
* fix filewatcher Replace to not deadlock.
* avoid having lingering locks
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* split ReloadConfig func
* fix warning message
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* convert `FileWatcher` into an interface
* fix compilation errors
* fix tests
* extract func for adding and removing files
* add a coalesceTimer with a very small timer
* extract coaelsce Timer and add a shim for testing
* add tests to coalesceTimer fix to send remaining events
* set `coalesceTimer` to 1 Second
* support symlink, fix a nil deref.
* fix compile error
* fix compile error
* refactor file watcher rate limiting to be a Watcher implementation
* fix linter issue
* fix runtime config
* fix runtime test
* fix flaky tests
* fix compile error
* Apply suggestions from code review
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* fix agent New to return an error if File watcher New return an error
* quit timer loop if ctx is canceled
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
* add config watcher to the config package
* add logging to watcher
* add test and refactor to add WatcherEvent.
* add all API calls and fix a bug with recreated files
* add tests for watcher
* remove the unnecessary use of context
* Add debug log and a test for file rename
* use inode to detect if the file is recreated/replaced and only listen to create events.
* tidy ups (#1535)
* tidy ups
* Add tests for inode reconcile
* fix linux vs windows syscall
* fix linux vs windows syscall
* fix windows compile error
* increase timeout
* use ctime ID
* remove remove/creation test as it's a use case that fail in linux
* fix linux/windows to use Ino/CreationTime
* fix the watcher to only overwrite current file id
* fix linter error
* fix remove/create test
* set reconcile loop to 200 Milliseconds
* fix watcher to not trigger event on remove, add more tests
* on a remove event try to add the file back to the watcher and trigger the handler if success
* fix race condition
* fix flaky test
* fix race conditions
* set level to info
* fix when file is removed and get an event for it after
* fix to trigger handler when we get a remove but re-add fail
* fix error message
* add tests for directory watch and fixes
* detect if a file is a symlink and return an error on Add
* rename Watcher to FileWatcher and remove symlink deref
* add fsnotify@v1.5.1
* fix go mod
* do not reset timer on errors, rename OS specific files
* rename New func
* events trigger on write and rename
* add missing test
* fix flaking tests
* fix flaky test
* check reconcile when removed
* delete invalid file
* fix test to create files with different mod time.
* back date file instead of sleeping
* add watching file in agent command.
* fix watcher call to use new API
* add configuration and stop watcher when server stop
* add certs as watched files
* move FileWatcher to the agent start instead of the command code
* stop watcher before replacing it
* save watched files in agent
* add add and remove interfaces to the file watcher
* fix remove to not return an error
* use `Add` and `Remove` to update certs files
* fix tests
* close events channel on the file watcher even when the context is done
* extract `NotAutoReloadableRuntimeConfig` is a separate struct
* fix linter errors
* add Ca configs and outgoing verify to the not auto reloadable config
* add some logs and fix to use background context
* add tests to auto-config reload
* remove stale test
* add tests to changes to config files
* add check to see if old cert files still trigger updates
* rename `NotAutoReloadableRuntimeConfig` to `StaticRuntimeConfig`
* fix to re add both key and cert file. Add test to cover this case.
* review suggestion
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add check to static runtime config changes
* fix test
* add changelog file
* fix review comments
* Apply suggestions from code review
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* update flag description
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
* fix compilation error
* add static runtime config support
* fix test
* fix review comments
* fix log test
* Update .changelog/12329.txt
Co-authored-by: Dan Upton <daniel@floppy.co>
* transfer tests to runtime_test.go
* fix filewatcher Replace to not deadlock.
* avoid having lingering locks
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* split ReloadConfig func
* fix warning message
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* convert `FileWatcher` into an interface
* fix compilation errors
* fix tests
* extract func for adding and removing files
Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Daniel Upton <daniel@floppy.co>
set -euo pipefail
unset CDPATH
cd "$(dirname "$0")"
for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
echo "=== require: $f ==="
sed -i '/require := require.New(t)/d' $f
# require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
# require.XXX(tblah) but not require.XXX(t, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
# require.XXX(rblah) but not require.XXX(r, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
gofmt -s -w $f
done
for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
echo "=== assert: $f ==="
sed -i '/assert := assert.New(t)/d' $f
# assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
# assert.XXX(tblah) but not assert.XXX(t, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
# assert.XXX(rblah) but not assert.XXX(r, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
gofmt -s -w $f
done
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.
https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.
This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.
This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
* Remove some usage of md5 from the system
OSS side of https://github.com/hashicorp/consul-enterprise/pull/1253
This is a potential security issue because an attacker could conceivably manipulate inputs to cause persistence files to collide, effectively deleting the persistence file for one of the colliding elements.
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
The LogOutput io.Writer used by TestAgent must allow concurrent reads and writes, and a
bytes.Buffer does not allow this. The bytes.Buffer must be wrapped with a lock to make this safe.
This test is super racy (it's not just a single line).
This test also starts failing once streaming is enabled, because the
cache rate limit no longer applies to the requests in the test. The
queries use streaming instead of the cache.
This test is no longer valid, and the functionality is already well
tested by TestCacheThrottle. Instead of spending time rewriting this
test, let's remove it.
```
WARNING: DATA RACE
Read at 0x00c01de410fc by goroutine 735:
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
/home/daniel/pers/code/consul/agent/agent_test.go:1024 +0x9af
github.com/hashicorp/consul/testrpc.WaitForTestAgent()
/home/daniel/pers/code/consul/testrpc/wait.go:99 +0x209
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
/home/daniel/pers/code/consul/agent/agent_test.go:966 +0x1ad
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1193 +0x202
Previous write at 0x00c01de410fc by goroutine 605:
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1.2()
/home/daniel/pers/code/consul/agent/agent_test.go:998 +0xe9
Goroutine 735 (running) created at:
testing.(*T).Run()
/usr/lib/go/src/testing/testing.go:1238 +0x5d7
github.com/hashicorp/consul/agent.TestCacheRateLimit()
/home/daniel/pers/code/consul/agent/agent_test.go:961 +0x375
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1193 +0x202
Goroutine 605 (finished) created at:
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
/home/daniel/pers/code/consul/agent/agent_test.go:1022 +0x91e
github.com/hashicorp/consul/testrpc.WaitForTestAgent()
/home/daniel/pers/code/consul/testrpc/wait.go:99 +0x209
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
/home/daniel/pers/code/consul/agent/agent_test.go:966 +0x1ad
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1193 +0x202
```
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.
Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.
Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:
ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
* Save exposed HTTP or GRPC ports to the agent's store
* Add those the health checks API so we can retrieve them from the API
* Change redirect-traffic command to also exclude those ports from inbound traffic redirection when expose.checks is set to true.