Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect:
- `DiscoveryChain.Get`
- `ConfigEntry.ResolveServiceConfig`
- `Intentions.Match`
All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism.
### No Config Entries Found
In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110.
### No Change Since Last Wakeup
Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110.
This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report.
### Extra Endpoints
Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated:
- `ConfigEntry.List`
- `Internal.IntentionUpstreams` (tproxy)
* add config watcher to the config package
* add logging to watcher
* add test and refactor to add WatcherEvent.
* add all API calls and fix a bug with recreated files
* add tests for watcher
* remove the unnecessary use of context
* Add debug log and a test for file rename
* use inode to detect if the file is recreated/replaced and only listen to create events.
* tidy ups (#1535)
* tidy ups
* Add tests for inode reconcile
* fix linux vs windows syscall
* fix linux vs windows syscall
* fix windows compile error
* increase timeout
* use ctime ID
* remove remove/creation test as it's a use case that fail in linux
* fix linux/windows to use Ino/CreationTime
* fix the watcher to only overwrite current file id
* fix linter error
* fix remove/create test
* set reconcile loop to 200 Milliseconds
* fix watcher to not trigger event on remove, add more tests
* on a remove event try to add the file back to the watcher and trigger the handler if success
* fix race condition
* fix flaky test
* fix race conditions
* set level to info
* fix when file is removed and get an event for it after
* fix to trigger handler when we get a remove but re-add fail
* fix error message
* add tests for directory watch and fixes
* detect if a file is a symlink and return an error on Add
* rename Watcher to FileWatcher and remove symlink deref
* add fsnotify@v1.5.1
* fix go mod
* fix flaky test
* Apply suggestions from code review
Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
* fix a possible stack overflow
* do not reset timer on errors, rename OS specific files
* start the watcher when creating it
* fix data race in tests
* rename New func
* do not call handler when a remove event happen
* events trigger on write and rename
* fix watcher tests
* make handler async
* remove recursive call
* do not produce events for sub directories
* trim "/" at the end of a directory when adding
* add missing test
* fix logging
* add todo
* fix failing test
* fix flaking tests
* fix flaky test
* add logs
* fix log text
* increase timeout
* reconcile when remove
* check reconcile when removed
* fix reconcile move test
* fix logging
* delete invalid file
* Apply suggestions from code review
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* fix review comments
* fix is watched to properly catch a remove
* change test timeout
* fix test and rename id
* fix test to create files with different mod time.
* fix deadlock when stopping watcher
* Apply suggestions from code review
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* fix a deadlock when calling stop while emitting event is blocked
* make sure to close the event channel after the event loop is done
* add go doc
* back date file instead of sleeping
* Apply suggestions from code review
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* check error
Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
This commit syncs ENT changes to the OSS repo.
Original commit details in ENT:
```
commit 569d25f7f4578981c3801e6e067295668210f748
Author: FFMMM <FFMMM@users.noreply.github.com>
Date: Thu Feb 10 10:23:33 2022 -0800
Vendor fork net rpc (#1538)
* replace net/rpc w consul-net-rpc/net/rpc
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* replace msgpackrpc and go-msgpack with fork from mono repo
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* gofmt all files touched
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
```
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
Use gotest.tools/v3/fs to make better assertions about the files
Remove the TestAgent from TestDebugCommand_Prepare_ValidateTiming, since we can test that validation
without making any API calls.
* deps: upgrade gogo-protobuf to v1.3.2
* go mod tidy using go 1.16
* proto: regen protobufs after upgrading gogo/protobuf
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* debug: remove the CLI check for debug_enabled
The API allows collecting profiles even debug_enabled=false as long as
ACLs are enabled. Remove this check from the CLI so that users do not
need to set debug_enabled=true for no reason.
Also:
- fix the API client to return errors on non-200 status codes for debug
endpoints
- improve the failure messages when pprof data can not be collected
Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com>
* remove parallel test runs
parallel runs create a race condition that fail the debug tests
* snapshot the timestamp at the beginning of the capture
- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist
* Revert "snapshot the timestamp at the beginning of the capture"
This reverts commit c2d03346
* Refactor captureDynamic to extract capture logic for each item in a different func
* snapshot the timestamp at the beginning of the capture
- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist
* Revert "snapshot the timestamp at the beginning of the capture"
This reverts commit c2d03346
* Refactor captureDynamic to extract capture logic for each item in a different func
* extract wait group outside the go routine to avoid a race condition
* capture pprof in a separate go routine
* perform a single capture for pprof data for the whole duration
* add missing vendor dependency
* add a change log and fix documentation to reflect the change
* create function for timestamp dir creation and simplify error handling
* use error groups and ticker to simplify interval capture loop
* Logs, profile and traces are captured for the full duration. Metrics, Heap and Go routines are captured every interval
* refactor Logs capture routine and add log capture specific test
* improve error reporting when log test fail
* change test duration to 1s
* make time parsing in log line more robust
* refactor log time format in a const
* test on log line empty the earliest possible and return
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* rename function to captureShortLived
* more specific changelog
Co-authored-by: Paul Banks <banks@banksco.de>
* update documentation to reflect current implementation
* add test for behavior when invalid param is passed to the command
* fix argument line in test
* a more detailed description of the new behaviour
Co-authored-by: Paul Banks <banks@banksco.de>
* print success right after the capture is done
* remove an unnecessary error check
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* upgraded github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57 => v0.0.0-20210601050228-01bbb1931b22
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Co-authored-by: Paul Banks <banks@banksco.de>
* WIP reloadable raft config
* Pre-define new raft gauges
* Update go-metrics to change gauge reset behaviour
* Update raft to pull in new metric and reloadable config
* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important
* Update telemetry docs
* Update config and telemetry docs
* Add note to oldestLogAge on when it is visible
* Add changelog entry
* Update website/content/docs/agent/options.mdx
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>