Unlike the other libraries that were migrated, there are no usages of
this lib in any of our plugins, and the only other known usage was in
go-kms-wrapping, which has been updated. Aliasing it like the other libs
would still keep the aws-sdk-go dep in the sdk module because of the
function signatures. So I've simply removed it entirely here.
* save
* save
* save
* first round of the diagnose language pass
* capitalization
* first round of feedback
* fix bug in advise
* a few more nouns to verbs
* Diagnose warns if HTTPS is not used for ha-storage-tls-consul
* Skipping TLS verification if https is not used in ha storage tls consul
* Adding diagnose skip message for consul service registration
* `vault delete` and `vault kv delete` should allow the same output options as `vault write`, as delete operations can similarly return data. This is needed if you want to use control groups with deletion.
* diagnose: Add seal transit tls check
* Fixing the path to the config file and the path to the cert files
* Addressing comment
* Addressing seal transit tls check comments
* tls verification bugfix
* tls verification bugfix
* allow diagnose fail to report status when there are also warnings
* allow diagnose fail to report status when there are also warnings
* Update vault/diagnose/helpers_test.go
Co-authored-by: swayne275 <swayne275@gmail.com>
* comments
Co-authored-by: swayne275 <swayne275@gmail.com>
* Fix diagnose panic when configuration file does not exist
* Addressing comments
* Update command/operator_diagnose.go
Co-authored-by: Hridoy Roy <roy@hashicorp.com>
Co-authored-by: Hridoy Roy <roy@hashicorp.com>
* agent: restart template runner on retry for unlimited retries
* template: log error message early
* template: delegate retries back to template if param is set to true
* agent: add and use the new template config stanza
* agent: fix panic, fix existing tests
* changelog: add changelog entry
* agent: add tests for exit_on_retry_failure
* agent: properly check on agent exit cases, add separate tests for missing key vs missing secrets
* agent: add note on difference between missing key vs missing secret
* docs: add docs for template_config
* Update website/content/docs/agent/template-config.mdx
Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
* Update website/content/docs/agent/template-config.mdx
Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
* Update website/content/docs/agent/template-config.mdx
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
* Update website/content/docs/agent/template-config.mdx
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
* Update website/content/docs/agent/template-config.mdx
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
* docs: fix exit_on_retry_failure, fix Functionality section
* docs: update interaction title
* template: add internal note on behavior for persist case
* docs: update agent, template, and template-config docs
* docs: update agent docs on retry stanza
* Apply suggestions from code review
Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
* Update changelog/11775.txt
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* agent/test: rename expectExit to expectExitFromError
* agent/test: add check on early exits on the happy path
* Update website/content/docs/agent/template-config.mdx
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Actually call config.Validate in diagnose
* Wire configuration checks into diagnose and fix resulting bugs.
* go mod vendor
* Merge to vendorless version
* Remove sentinel section to allow diagnose_ok to pass
* Fix unit tests
* raft file and quorum checks
* raft checks
* backup
* raft file checks test
* address comments and add more raft and file and process checks
* syntax issues
* modularize functions to compile differently on different os
* compile raft checks everywhere
* more build tag issues
* raft-diagnose
* correct file permission checks
* upgrade tests and add a getConfigOffline test that currently does not work
* comment
* update file checks method signature on windows
* Update physical/raft/raft_test.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* raft tests
* add todo comment for windows root ownership
* voter count message
* raft checks test fixes
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* initial refactoring of unseal step in run
* remove waitgroup
* remove waitgroup
* backup work
* backup
* backup
* completely modularize run and move into diagnose
* add diagnose errors for incorrect number of unseal keys
* comment tests back in
* backup
* first subspan
* finished subspanning but running into error with timeouts
* remove runtime checks
* merge main branch
* meeting updates
* remove telemetry block
* roy comment
* subspans for seal finalization and wrapping diagnose latency checks
* backup while I fix something else
* fix storage latency test errors
* runtime checks
* diagnose with timeout on seal
* wip
* wip
* Finish implementing advice handling and word wrapping
* Properly word wrap messages and warnings
* Remove debugging
* Remove debugging
* Remove unnecessary test
* unit test bug
* go vendor
This allows operators to run diagnose in scripts and detect the difference between success, warning, and failure.
Exit codes are now:
0: Success (no warnings)
1: Failure (some test failed)
2: Warning (some test warned)
3: User input failure such as a bad flag
4: Other error
* Segment out disk checks to disable on openbsd/arm
Also add a spot skipped helper.
* Expected results may be fewer than actual because of variable length tests like disk usage
* Move to os_common and build on windows
* Add ulimit check, and tidy unit test cases to avoid needing to have all results and perfect ordering
* Make order independent check recursive
* Fix unit tests
* Try a 5s request timeout
Vault cares about the VAULT_LICENSE env var, but we don't want to set that in CI because it would change behaviour of tests that don't intend for it to be set. Instead, we use VAULT_LICENSE_CI so that only packages/tests that opt-in will use it.
* Disk usage checks
* Move disk free earlier
* Move logic to helpers
* Bring over test logic from the ulimit PR
* imports
* Report error
* Get unit tests working
* initial refactoring of unseal step in run
* remove waitgroup
* remove waitgroup
* backup work
* backup
* backup
* completely modularize run and move into diagnose
* add diagnose errors for incorrect number of unseal keys
* comment tests back in
* backup
* first subspan
* finished subspanning but running into error with timeouts
* remove runtime checks
* meeting updates
* remove telemetry block
* roy comment
* subspans for seal finalization and wrapping diagnose latency checks
* fix storage latency test errors
* review comments
* use random uuid for latency checks instead of static id
* Create helpers which integrate with OpenTelemetry for diagnose collection
* Go mod vendor
* Comments
* Update vault/diagnose/helpers.go
Co-authored-by: swayne275 <swayne275@gmail.com>
* Add unit test/example
* tweak output
* More comments
* add spot check concept
* Get unit tests working on Result structs
* wip
* Fix unit test
* Get unit tests working, and make diagnose sessions local rather than global
* Comments
* Last comments
* No need for init
* :|
* Fix helpers_test
* wip
* wip
* wip
* Revendor otel
* Fix merge related problems
* imports
* Fix unit tests
Co-authored-by: swayne275 <swayne275@gmail.com>
* Add infrastructure for skipping tests
* Add infrastructure for skipping tests
* Set it
* Update vault/diagnose/helpers.go
Co-authored-by: swayne275 <swayne275@gmail.com>
* Implement type alias for test functions
Co-authored-by: swayne275 <swayne275@gmail.com>
* Expose unknown fields and duplicate sections as diagnose warnings
* section counts not needed, already handled
* Address PR feedback
* Prune more of the new fields before tests call deep.Equals
* Update go.mod
* Create helpers which integrate with OpenTelemetry for diagnose collection
* Go mod vendor
* consul tls checks
* draft for storage end to end check
* Comments
* Update vault/diagnose/helpers.go
Co-authored-by: swayne275 <swayne275@gmail.com>
* Add unit test/example
* tweak output
* More comments
* add spot check concept
* Get unit tests working on Result structs
* Fix unit test
* Get unit tests working, and make diagnose sessions local rather than global
* Comments
* Last comments
* No need for init
* :|
* Fix helpers_test
* cleaned up chan logic. Tests next.
* fix tests
* remove a comment
* tests
* remove a comment
* run direct access checks in diagnose command
* review comments
Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
Co-authored-by: swayne275 <swayne275@gmail.com>
* Create helpers which integrate with OpenTelemetry for diagnose collection
* Go mod vendor
* consul tls checks
* draft for storage end to end check
* Comments
* Update vault/diagnose/helpers.go
Co-authored-by: swayne275 <swayne275@gmail.com>
* Add unit test/example
* tweak output
* More comments
* add spot check concept
* Get unit tests working on Result structs
* Fix unit test
* Get unit tests working, and make diagnose sessions local rather than global
* Comments
* Last comments
* No need for init
* :|
* Fix helpers_test
* cleaned up chan logic. Tests next.
* fix tests
* remove a comment
* tests
* remove a comment
* cosmetic changes
Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
Co-authored-by: swayne275 <swayne275@gmail.com>
* Update Agent Auth with GCP to use new SignJWT endpoint
* use iamcredentials name instead of renaming the package on import
* add changelog
* Update changelog/11473.txt
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
* Create helpers which integrate with OpenTelemetry for diagnose collection
* Go mod vendor
* Comments
* Update vault/diagnose/helpers.go
Co-authored-by: swayne275 <swayne275@gmail.com>
* Add unit test/example
* tweak output
* More comments
* add spot check concept
* Get unit tests working on Result structs
* Fix unit test
* Get unit tests working, and make diagnose sessions local rather than global
* Comments
* Last comments
* No need for init
* :|
* Fix helpers_test
Co-authored-by: swayne275 <swayne275@gmail.com>
* Add support for unauthenticated pprof access on a per-listener basis, as we do for metrics.
* Add missing pprof sub-targets like 'allocs' and 'block'. Capture the goroutine subtarget a second time in text form. This is mostly a convenience, but also I think the pprof format might be a bit lossy?
* sanity checks for tls config in diagnose
* backup
* backup
* backup
* added necessary tests
* remove comment
* remove parallels causing test flakiness
* comments
* small fix
* separate out config hcl test case into new hcl file
* newline
* addressed comments
* addressed comments
* addressed comments
* addressed comments
* addressed comments
* reload funcs should be allowed to be nil
* a few tests to the operator diagnose stub command
* a few tests to the operator diagnose stub command
* a few tests to the operator diagnose stub command
* empty commit to fix circle ci permissions issue
* empty commit to fix circle ci permissions issue
Remove template_retry config section. Add new vault.retry section which only has num_retries field; if num_retries is 0 or absent, default it to 12 for backwards compat with pre-1.7 template retrying. Setting num_retries=-1 disables retries.
Configured retries are used for both templating and api proxy, though if template requests go through proxy (currently requires persistence enabled) we'll only configure retries for the latter to avoid duplicate retrying. Though there is some duplicate retrying already because whenever the template server does a retry when not going through the proxy, the Vault client it uses allows for 2 behind-the-scenes retries for some 400/500 http error codes.
* snapshot
* basic test
* update command and add documentation
* update help text
* typo
* add changelog for lease lookup command
* run go mod vendor
* remove tabs from help output
The existing code would retain the previous backoff value even after the
system had recovered. This PR fixes that issue and improves the
structure of the backoff code.
Adds the option of a write-through cache, backed by boltdb
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* k8s doc: update for 0.9.1 and 0.8.0 releases (#10825)
* k8s doc: update for 0.9.1 and 0.8.0 releases
* Update website/content/docs/platform/k8s/helm/configuration.mdx
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
* Autopilot initial commit
* Move autopilot related backend implementations to its own file
* Abstract promoter creation
* Add nil check for health
* Add server state oss no-ops
* Config ext stub for oss
* Make way for non-voters
* s/health/state
* s/ReadReplica/NonVoter
* Add synopsis and description
* Remove struct tags from AutopilotConfig
* Use var for config storage path
* Handle nin-config when reading
* Enable testing autopilot by using inmem cluster
* First passing test
* Only report the server as known if it is present in raft config
* Autopilot defaults to on for all existing and new clusters
* Add locking to some functions
* Persist initial config
* Clarify the command usage doc
* Add health metric for each node
* Fix audit logging issue
* Don't set DisablePerformanceStandby to true in test
* Use node id label for health metric
* Log updates to autopilot config
* Less aggressively consume config loading failures
* Return a mutable config
* Return early from known servers if raft config is unable to be pulled
* Update metrics name
* Reduce log level for potentially noisy log
* Add knob to disable autopilot
* Don't persist if default config is in use
* Autopilot: Dead server cleanup (#10857)
* Dead server cleanup
* Initialize channel in any case
* Fix a bunch of tests
* Fix panic
* Add follower locking in heartbeat tracker
* Add LastContactFailureThreshold to config
* Add log when marking node as dead
* Update follower state locking in heartbeat tracker
* Avoid follower states being nil
* Pull test to its own file
* Add execution status to state response
* Optionally enable autopilot in some tests
* Updates
* Added API function to fetch autopilot configuration
* Add test for default autopilot configuration
* Configuration tests
* Add State API test
* Update test
* Added TestClusterOptions.PhysicalFactoryConfig
* Update locking
* Adjust locking in heartbeat tracker
* s/last_contact_failure_threshold/left_server_last_contact_threshold
* Add disabling autopilot as a core config option
* Disable autopilot in some tests
* s/left_server_last_contact_threshold/dead_server_last_contact_threshold
* Set the lastheartbeat of followers to now when setting up active node
* Don't use config defaults from CLI command
* Remove config file support
* Remove HCL test as well
* Persist only supplied config; merge supplied config with default to operate
* Use pointer to structs for storing follower information
* Test update
* Retrieve non voter status from configbucket and set it up when a node comes up
* Manage desired suffrage
* Consider bucket being created already
* Move desired suffrage to its own entry
* s/DesiredSuffrageKey/LocalNodeConfigKey
* s/witnessSuffrage/recordSuffrage
* Fix test compilation
* Handle local node config post a snapshot install
* Commit to storage first; then record suffrage in fsm
* No need of local node config being nili case, post snapshot restore
* Reconcile autopilot config when a new leader takes over duty
* Grab fsm lock when recording suffrage
* s/Suffrage/DesiredSuffrage in FollowerState
* Instantiate autopilot only in leader
* Default to old ways in more scenarios
* Make API gracefully handle 404
* Address some feedback
* Make IsDead an atomic.Value
* Simplify follower hearbeat tracking
* Use uber.atomic
* Don't have multiple causes for having autopilot disabled
* Don't remove node from follower states if we fail to remove the dead server
* Autopilot server removals map (#11019)
* Don't remove node from follower states if we fail to remove the dead server
* Use map to track dead server removals
* Use lock and map
* Use delegate lock
* Adjust when to remove entry from map
* Only hold the lock while accessing map
* Fix race
* Don't set default min_quorum
* Fix test
* Ensure follower states is not nil before starting autopilot
* Fix race
Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
* Stub "operator diagnose" command.
* Parse configuration files.
* Refactor storage setup to call from diagnose.
* Add the ability to run Diagnose as a prequel to server start.
* agent: do not grap idLock writelock until caching entry
* agent: inflight cache using sync.Map
* agent: implement an inflight caching mechanism
* agent/lease: add lock for inflight cache to prevent simultaneous Set calls
* agent/lease: lock on a per-ID basis so unique requests can be processed independently
* agent/lease: add some concurrency tests
* test: use lease_id for uniqueness
* agent: remove env flags, add comments around locks
* agent: clean up test comment
* agent: clean up test comment
* agent: remove commented debug code
* agent/lease: word-smithing
* Update command/agent/cache/lease_cache.go
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* agent/lease: return the context error if the Done ch got closed
* agent/lease: fix data race in concurrency tests
* agent/lease: mockDelayProxier: return ctx.Err() if context got canceled
* agent/lease: remove unused inflightCacheLock
* agent/lease: test: bump context timeout to 3s
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* feat(agent): add retry configuration for vault agent
* feat(agent): add test fixtures for retry
* fix(retry): move retry stanza to top level as template_retry
* fix(retry): add retry config to ServerConfig struct
* fix(retry): point config parser to parse template_retry instead of retry
* remove netlify config (#10711)
* Fix build (#10749)
* Move the declaration to a OSS build tag file to not have it collide w… (#10750)
* Move the declaration to a OSS build tag file to not have it collide with ent declarations
* Add comment
* Remove comment to trigger ci
* Unconditionally use the root namespace when calling sys/seal-status. (#10742)
* feat(agent): add retry configuration for vault agent
* feat(agent): add test fixtures for retry
* fix(retry): move retry stanza to top level as template_retry
* fix(retry): add retry config to ServerConfig struct
* fix(retry): point config parser to parse template_retry instead of retry
Co-authored-by: Hridoy Roy <roy@hashicorp.com>
Co-authored-by: Jeff Escalante <jescalan@users.noreply.github.com>
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
Co-authored-by: Mark Gritter <mgritter@hashicorp.com>
* Adding snowflake as a bundled database secrets plugin
* Add snowflake-database-plugin to expected bundled plugins
* Add snowflake plugin name to the mockBuiltinRegistry
* Send a test message before committing a new audit device.
Also, lower timeout on connection attempts in socket device.
* added changelog
* go mod vendor (picked up some unrelated changes.)
* Skip audit device check in integration test.
Co-authored-by: swayne275 <swayne@hashicorp.com>
* core: Record the time a node became active
* Update vault/core.go
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* Add omitempty field
* Update vendor
* Added CL entry and fixed test
* Fix test
* Fix command package tests
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* Fix KV list command with whitespaces
* Fix kv list whitespace
* Fix list whitespace
* Fix failing test
Co-authored-by: swayne275 <swayne@hashicorp.com>
* Update go version to 1.15.3
* Fix OU ordering for go1.15.x testing
* Fix CI version
* Update docker image
* Fix test
* packagespec upgrade -version 0.1.8
Co-authored-by: Sam Salisbury <samsalisbury@gmail.com>
This also temporarily disables couchbase, elasticsearch, and
mongodbatlas because the `Serve` function needs to change signatures
and those plugins are vendored in from external repos, causing problems
when building.
* agent: return a non-zero exit code on error
* agent/template: always return on template server error, add case for error_on_missing_key
* agent: fix tests by updating Run params to use an errCh
* agent/template: add permission denied test case, clean up test var
* agent: use unbuffered errCh, emit fatal errors directly to the UI output
* agent: use oklog's run.Group to schedule subsystem runners (#9761)
* agent: use oklog's run.Group to schedule subsystem runners
* agent: clean up unused DoneCh, clean up agent's main Run func
* agent/template: use ts.stopped.CAS to atomically swap value
* fix tests
* fix tests
* agent/template: add timeout on TestRunServer
* agent: output error via logs and return a generic error on non-zero exit
* fix TestAgent_ExitAfterAuth
* agent/template: do not restart ct runner on new incoming token if exit_after_auth is set to true
* agent: drain ah.OutputCh after sink exits to avoid blocking on the channel
* use context.WithTimeout, expand comments around ordering of defer cancel()
Adds debug and warn logging around AWS credential chain generation,
specifically to help users debugging auto-unseal problems on AWS, by
logging which role is being used in the case of a webidentity token.
Adds a deferred call to flush the log output as well, to ensure logs
are output in the event of an initialization failure.
* normalize format output for vault status
* interim commit
* interim commit
* make formatting idiomatic
* clean up comments
* added formatting test
* updated comments in format test to match godocs
Co-authored-by: HridoyRoy <hridoyroy@Hridoys-MBP.hitronhub.home>
Co-authored-by: HridoyRoy <hridoyroy@Hridoys-MacBook-Pro.local>
* strip redundant field type declarations
* root credential rotation for aws creds plugin
* Change location of mocks awsutil and update methods that no longer exist
* Update website/pages/docs/auth/aws.mdx
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* Update sdk version to get the awsutil mock file
* Re-vendor modules to pass CI
* Use write lock for the entirety of AWS root cred rotation
* Update docs for AWS root cred rotation for clarity
Co-authored-by: Becca Petrin <beccapetrin@gmail.com>
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* Register a log sink that delays the printing of the big dev warning until logs have settled down
* Since this is always an intercept logger, just be explicit about the type
* changelog++
* Add date/time argument type.
* Add an argument to select which time formats are valid.
* Increase minimum date for epoch timestamps to avoid ambiguity.
* TOB-018 remediation
* Make key derivation an optional config flag, off by default, for backwards compatibility
* Fix unit tests
* Address some feedback
* Set config on unit test
* Fix another test failure
* One more conf fail
* Switch one of the test cases to not use a derive dkey
* wip
* comments
Hexadecimal integers will be converted to decimal, which is unfortunate but shouldn't have any negative effects other than perhaps confusion in the `vault debug` output.