* Add support for unauthenticated pprof access on a per-listener basis, as we do for metrics.
* Add missing pprof sub-targets like 'allocs' and 'block'. Capture the goroutine subtarget a second time in text form. This is mostly a convenience, but also I think the pprof format might be a bit lossy?
* k8s doc: update for 0.9.1 and 0.8.0 releases (#10825)
* k8s doc: update for 0.9.1 and 0.8.0 releases
* Update website/content/docs/platform/k8s/helm/configuration.mdx
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
* Autopilot initial commit
* Move autopilot related backend implementations to its own file
* Abstract promoter creation
* Add nil check for health
* Add server state oss no-ops
* Config ext stub for oss
* Make way for non-voters
* s/health/state
* s/ReadReplica/NonVoter
* Add synopsis and description
* Remove struct tags from AutopilotConfig
* Use var for config storage path
* Handle nin-config when reading
* Enable testing autopilot by using inmem cluster
* First passing test
* Only report the server as known if it is present in raft config
* Autopilot defaults to on for all existing and new clusters
* Add locking to some functions
* Persist initial config
* Clarify the command usage doc
* Add health metric for each node
* Fix audit logging issue
* Don't set DisablePerformanceStandby to true in test
* Use node id label for health metric
* Log updates to autopilot config
* Less aggressively consume config loading failures
* Return a mutable config
* Return early from known servers if raft config is unable to be pulled
* Update metrics name
* Reduce log level for potentially noisy log
* Add knob to disable autopilot
* Don't persist if default config is in use
* Autopilot: Dead server cleanup (#10857)
* Dead server cleanup
* Initialize channel in any case
* Fix a bunch of tests
* Fix panic
* Add follower locking in heartbeat tracker
* Add LastContactFailureThreshold to config
* Add log when marking node as dead
* Update follower state locking in heartbeat tracker
* Avoid follower states being nil
* Pull test to its own file
* Add execution status to state response
* Optionally enable autopilot in some tests
* Updates
* Added API function to fetch autopilot configuration
* Add test for default autopilot configuration
* Configuration tests
* Add State API test
* Update test
* Added TestClusterOptions.PhysicalFactoryConfig
* Update locking
* Adjust locking in heartbeat tracker
* s/last_contact_failure_threshold/left_server_last_contact_threshold
* Add disabling autopilot as a core config option
* Disable autopilot in some tests
* s/left_server_last_contact_threshold/dead_server_last_contact_threshold
* Set the lastheartbeat of followers to now when setting up active node
* Don't use config defaults from CLI command
* Remove config file support
* Remove HCL test as well
* Persist only supplied config; merge supplied config with default to operate
* Use pointer to structs for storing follower information
* Test update
* Retrieve non voter status from configbucket and set it up when a node comes up
* Manage desired suffrage
* Consider bucket being created already
* Move desired suffrage to its own entry
* s/DesiredSuffrageKey/LocalNodeConfigKey
* s/witnessSuffrage/recordSuffrage
* Fix test compilation
* Handle local node config post a snapshot install
* Commit to storage first; then record suffrage in fsm
* No need of local node config being nili case, post snapshot restore
* Reconcile autopilot config when a new leader takes over duty
* Grab fsm lock when recording suffrage
* s/Suffrage/DesiredSuffrage in FollowerState
* Instantiate autopilot only in leader
* Default to old ways in more scenarios
* Make API gracefully handle 404
* Address some feedback
* Make IsDead an atomic.Value
* Simplify follower hearbeat tracking
* Use uber.atomic
* Don't have multiple causes for having autopilot disabled
* Don't remove node from follower states if we fail to remove the dead server
* Autopilot server removals map (#11019)
* Don't remove node from follower states if we fail to remove the dead server
* Use map to track dead server removals
* Use lock and map
* Use delegate lock
* Adjust when to remove entry from map
* Only hold the lock while accessing map
* Fix race
* Don't set default min_quorum
* Fix test
* Ensure follower states is not nil before starting autopilot
* Fix race
Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
* sketch out partial month activity log client API
* unit test partialMonthClientCount
* cleanup api
* add api doc, fix test, update api nomenclature to match existing
* cleanup
* add PR changelog file
* integration test for API
* report entities and tokens separately
* first commit
* update
* removed some ent features from backport
* final refactor
* backport patch
Co-authored-by: Hridoy Roy <hridoyroy@Hridoys-MacBook-Pro.local>
Co-authored-by: Hridoy Roy <hridoyroy@Hridoys-MBP.hitronhub.home>
* backport VAULT-672
* backport VAULT-672
* go mod tidy
* go mod tidy
* add back indirect import
* replace go mod and go sum with master version
* go mod vendor
* more go mod vendor
Co-authored-by: Hridoy Roy <hridoyroy@Hridoys-MBP.hitronhub.home>
Co-authored-by: Hridoy Roy <hridoyroy@Hridoys-MacBook-Pro.local>
* raft: initial work on raft ha storage support
* add note on join
* add todo note
* raft: add support for bootstrapping and joining existing nodes
* raft: gate bootstrap join by reading leader api address from storage
* raft: properly check for raft-only for certain conditionals
* raft: add bootstrap to api and cli
* raft: fix bootstrap cli command
* raft: add test for setting up new cluster with raft HA
* raft: extend TestRaft_HA_NewCluster to include inmem and consul backends
* raft: add test for updating an existing cluster to use raft HA
* raft: remove debug log lines, clean up verifyRaftPeers
* raft: minor cleanup
* raft: minor cleanup
* Update physical/raft/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/ha.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/ha.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/logical_system_raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* address feedback comments
* address feedback comments
* raft: refactor tls keyring logic
* address feedback comments
* Update vault/raft.go
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
* Update vault/raft.go
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
* address feedback comments
* testing: fix import ordering
* raft: rename var, cleanup comment line
* docs: remove ha_storage restriction note on raft
* docs: more raft HA interaction updates with migration and recovery mode
* docs: update the raft join command
* raft: update comments
* raft: add missing isRaftHAOnly check for clearing out state set earlier
* raft: update a few ha_storage config checks
* Update command/operator_raft_bootstrap.go
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
* raft: address feedback comments
* raft: fix panic when checking for config.HAStorage.Type
* Update vault/raft.go
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
* Update website/pages/docs/commands/operator/raft.mdx
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
* raft: remove bootstrap cli command
* Update vault/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* raft: address review feedback
* raft: revert vendored sdk
* raft: don't send applied index and node ID info if we're HA-only
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
* move adjustForSealMigration to vault package
* fix adjustForSealMigration
* begin working on new seal migration test
* create shamir seal migration test
* refactor testhelpers
* add VerifyRaftConfiguration to testhelpers
* stub out TestTransit
* Revert "refactor testhelpers"
This reverts commit 39593defd0d4c6fd79aedfd37df6298391abb9db.
* get shamir test working again
* stub out transit join
* work on transit join
* remove debug code
* initTransit now works with raft join
* runTransit works with inmem
* work on runTransit with raft
* runTransit works with raft
* cleanup tests
* TestSealMigration_TransitToShamir_Pre14
* TestSealMigration_ShamirToTransit_Pre14
* split for pre-1.4 testing
* add simple tests for transit and shamir
* fix typo in test suite
* debug wrapper type
* test debug
* test-debug
* refactor core migration
* Revert "refactor core migration"
This reverts commit a776452d32a9dca7a51e3df4a76b9234d8c0c7ce.
* begin refactor of adjustForSealMigration
* fix bug in adjustForSealMigration
* clean up tests
* clean up core refactoring
* fix bug in shamir->transit migration
* stub out test that brings individual nodes up and down
* refactor NewTestCluster
* pass listeners into newCore()
* simplify cluster address setup
* simplify extra test core setup
* refactor TestCluster for readability
* refactor TestCluster for readability
* refactor TestCluster for readability
* add shutdown func to TestCore
* add cleanup func to TestCore
* create RestartCore
* stub out TestSealMigration_ShamirToTransit_Post14
* refactor address handling in NewTestCluster
* fix listener setup in newCore()
* remove unnecessary lock from setSealsForMigration()
* rename sealmigration test package
* use ephemeral ports below 30000
* work on post-1.4 migration testing
* clean up pre-1.4 test
* TestSealMigration_ShamirToTransit_Post14 works for non-raft
* work on raft TestSealMigration_ShamirToTransit_Post14
* clean up test code
* refactor TestClusterCore
* clean up TestClusterCore
* stub out some temporary tests
* use HardcodedServerAddressProvider in seal migration tests
* work on raft for TestSealMigration_ShamirToTransit_Post14
* always use hardcoded raft address provider in seal migration tests
* debug TestSealMigration_ShamirToTransit_Post14
* fix bug in RestartCore
* remove debug code
* TestSealMigration_ShamirToTransit_Post14 works now
* clean up debug code
* clean up tests
* cleanup tests
* refactor test code
* stub out TestSealMigration_TransitToShamir_Post14
* set seals properly for transit->shamir migration
* migrateFromTransitToShamir_Post14 works for inmem
* migrateFromTransitToShamir_Post14 works for raft
* use base ports per-test
* fix seal verification test code
* simplify seal migration test suite
* simplify test suite
* cleanup test suite
* use explicit ports below 30000
* simplify use of numTestCores
* Update vault/external_tests/sealmigration/seal_migration_test.go
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* Update vault/external_tests/sealmigration/seal_migration_test.go
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* clean up imports
* rename to StartCore()
* Update vault/testing.go
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* simplify test suite
* clean up tests
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* enable seal wrap in all seal migration tests
* move adjustForSealMigration to vault package
* fix adjustForSealMigration
* begin working on new seal migration test
* create shamir seal migration test
* refactor testhelpers
* add VerifyRaftConfiguration to testhelpers
* stub out TestTransit
* Revert "refactor testhelpers"
This reverts commit 39593defd0d4c6fd79aedfd37df6298391abb9db.
* get shamir test working again
* stub out transit join
* work on transit join
* Revert "move resuable storage test to avoid creating import cycle"
This reverts commit b3ff2317381a5af12a53117f87d1c6fbb093af6b.
* remove debug code
* initTransit now works with raft join
* runTransit works with inmem
* work on runTransit with raft
* runTransit works with raft
* get rid of dis-used test
* cleanup tests
* TestSealMigration_TransitToShamir_Pre14
* TestSealMigration_ShamirToTransit_Pre14
* split for pre-1.4 testing
* add simple tests for transit and shamir
* fix typo in test suite
* debug wrapper type
* test debug
* test-debug
* refactor core migration
* Revert "refactor core migration"
This reverts commit a776452d32a9dca7a51e3df4a76b9234d8c0c7ce.
* begin refactor of adjustForSealMigration
* fix bug in adjustForSealMigration
* clean up tests
* clean up core refactoring
* fix bug in shamir->transit migration
* remove unnecessary lock from setSealsForMigration()
* rename sealmigration test package
* use ephemeral ports below 30000
* simplify use of numTestCores
* Refactor PG container creation.
* Rework rotation tests to use shorter sleeps.
* Refactor rotation tests.
* Add a static role rotation test for MongoDB Atlas.