Commit graph

39 commits

Author SHA1 Message Date
Brian Kassouf fd72d92434
raft: Fix some snapshot restore issues (#9533)
* raft: Remove double read lock

* Reload TLS keyring after reloading the barrier keys
2020-07-21 10:59:07 -07:00
Mike Jarmy a3ab902e18
set path properly in NewRaftBackend() (#9128)
* set path properly in NewRaftBackend()

* get rid of storeLatestState
2020-07-21 12:48:24 -04:00
Alexander Bezobchuk bd96d1dae1
physical/raft: Add nil check to shutdown (#9322) 2020-06-25 17:51:13 -07:00
Calvin Leung Huang c45bdca0b3
raft: add support for using backend for ha_storage (#9193)
* raft: initial work on raft ha storage support

* add note on join

* add todo note

* raft: add support for bootstrapping and joining existing nodes

* raft: gate bootstrap join by reading leader api address from storage

* raft: properly check for raft-only for certain conditionals

* raft: add bootstrap to api and cli

* raft: fix bootstrap cli command

* raft: add test for setting up new cluster with raft HA

* raft: extend TestRaft_HA_NewCluster to include inmem and consul backends

* raft: add test for updating an existing cluster to use raft HA

* raft: remove debug log lines, clean up verifyRaftPeers

* raft: minor cleanup

* raft: minor cleanup

* Update physical/raft/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/ha.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/ha.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/logical_system_raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* address feedback comments

* address feedback comments

* raft: refactor tls keyring logic

* address feedback comments

* Update vault/raft.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* address feedback comments

* testing: fix import ordering

* raft: rename var, cleanup comment line

* docs: remove ha_storage restriction note on raft

* docs: more raft HA interaction updates with migration and recovery mode

* docs: update the raft join command

* raft: update comments

* raft: add missing isRaftHAOnly check for clearing out state set earlier

* raft: update a few ha_storage config checks

* Update command/operator_raft_bootstrap.go

Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>

* raft: address feedback comments

* raft: fix panic when checking for config.HAStorage.Type

* Update vault/raft.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* Update website/pages/docs/commands/operator/raft.mdx

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* raft: remove bootstrap cli command

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* raft: address review feedback

* raft: revert vendored sdk

* raft: don't send applied index and node ID info if we're HA-only

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
2020-06-23 12:04:13 -07:00
Brian Kassouf 09593283b8
Improve the performance of snapshot installs by using rename (#9247)
* initial work on improving snapshot performance

* Work on snapshots

* rename a few functions

* Cleanup the snapshot file

* vendor the safeio library

* Add a test

* Add more tests

* Some review comments

* Fix comment

* Update physical/raft/snapshot.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* Update physical/raft/snapshot.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* Review feedback

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
2020-06-23 11:08:30 -07:00
Brian Kassouf 3b4ba9d1fb
Upgrade raft library (#9170)
* Upgrade raft library

* Update vendor

* Update physical/raft/snapshot_test.go

Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>

* Update physical/raft/snapshot_test.go

Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
2020-06-08 16:34:20 -07:00
Alexander Bezobchuk 9dd67cbeb6
Merge PR #9027: Integrated Storage (Raft): Add Support for max_entry_size Config 2020-06-01 10:17:24 -04:00
Brian Kassouf 1bb0bd489d
storage/raft: Add committed and applied indexes to the status output (#9011)
* storage/raft: Add committed and applied indexes to the status output

* Update api vendor

* changelog++

* Update http/sys_leader.go

Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>

Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>
2020-05-18 16:07:27 -07:00
Mike Jarmy 724af764bb
Test reusable storage (#8983)
* stub out reusable storage test

* implement reusable inmem test

* work on reusable raft test

* stub out simple raft test

* switch to reusable raft storage

* cleanup tests

* cleanup tests

* refactor tests

* verify raft configuration

* cleanup tests

* stub out reuseStorage

* use common base address across clusters

* attempt to reuse raft cluster

* tinker with test

* fix typo

* start debugging

* debug raft configuration

* add BaseClusterListenPort to TestCluster options

* use BaseClusterListenPort in test

* raft join works now

* misc cleanup of raft tests

* use configurable base port for raft test

* clean up raft tests

* add parallelized tests for all backends

* clean up reusable storage tests

* remove debugging code from startClusterListener()

* improve comments in testhelpers

* improve comments in teststorage

* improve comments and test logging

* fix typo in vault/testing

* fix typo in comments

* remove debugging code

* make number of cores parameterizable in test
2020-05-14 08:31:02 -04:00
Calvin Leung Huang e7af25b969
raft: use file paths for TLS info in the retry_join block (#8894)
* raft: use file paths for TLS info in the retry_join stanza

* raft: maintain backward compat for existing tls params

* docs: update raft docs with new file-based TLS params

* Update godoc comment, fix docs
2020-05-06 18:26:08 -07:00
Calvin Leung Huang 2659c34910
raft: check for nil on concrete type in SetupCluster (#8784)
* raft: check for nil on concrete type in SetupCluster

* raft: move check to its own func

* raft: func cleanup

* raft: disallow disable_clustering = true when raft storage is used

* docs: update disable_clustering to mention new behavior
2020-04-21 13:45:07 -07:00
Vishal Nayak 387677e251
Raft recovery peers non voter (#8681)
* Disallow non-voter setting from peers.json

* Fix bug that would make the actual fix a no-op

* Change order of evaluation

* Error out instead of resetting the value

* Update physical/raft/raft.go

Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>

* Print node ID

Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
2020-04-03 19:13:51 -04:00
Brian Kassouf 1ede406559
storage/raft: Buffer leader notify channel more aggresively (#8650) 2020-03-31 17:42:48 -07:00
Brian Kassouf 4aa582a6f4
storage/raft: Fix leadership deadlock (#8547)
* storage/raft: Fix leadership deadlock

* Update comment
2020-03-13 15:03:58 -07:00
Clint d3cda0fe2c
Guard against using Raft as a seperate HA Storage (#8239)
* Guard against using Raft as a seperate HA Storage

* Document that Raft cannot be used as a seperate ha_storage backend at this time

* remove duplicate imports from updating with master
2020-02-14 14:25:53 -06:00
Jim Kalafut f17fc4e5c1
Run goimports (#8251) 2020-01-27 21:11:00 -08:00
Brian Kassouf f32a86ee7a
Create network layer abstraction to allow in-memory cluster traffic (#8173) 2020-01-16 23:03:02 -08:00
Vishal Nayak 8891f2ba88 Raft retry join (#7856)
* Raft retry join

* update

* Make retry join work with shamir seal

* Return upon context completion

* Update vault/raft.go

Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>

* Address some review comments

* send leader information slice as a parameter

* Make retry join work properly with Shamir case. This commit has a blocking issue

* Fix join goroutine exiting before the job is done

* Polishing changes

* Don't return after a successful join during unseal

* Added config parsing test

* Add test and fix bugs

* minor changes

* Address review comments

* Fix build error

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
2020-01-13 17:02:16 -08:00
Jeff Mitchell a0694943cc
Migrate built in auto seal to go-kms-wrapping (#8118) 2020-01-10 20:39:52 -05:00
Daniel Lohse de2d3073d7 Allow Raft storage to be configured via env variables (#7745)
* Fix unordered imports

* Allow Raft node ID to be set via the environment variable `VAULT_RAFT_NODE_ID`

* Allow Raft path to be set via the environment variable `VAULT_RAFT_PATH`

* Prioritize the environment when fetching the Raft configuration values

Values in environment variables should override the config as per the
documentation as well as common sense.
2019-10-28 09:43:12 -07:00
Vishal Nayak 0d077d7945
Recovery Mode (#7559)
* Initial work

* rework

* s/dr/recovery

* Add sys/raw support to recovery mode (#7577)

* Factor the raw paths out so they can be run with a SystemBackend.

# Conflicts:
#	vault/logical_system.go

* Add handleLogicalRecovery which is like handleLogical but is only
sufficient for use with the sys-raw endpoint in recovery mode.  No
authentication is done yet.

* Integrate with recovery-mode.  We now handle unauthenticated sys/raw
requests, albeit on path v1/raw instead v1/sys/raw.

* Use sys/raw instead raw during recovery.

* Don't bother persisting the recovery token.  Authenticate sys/raw
requests with it.

* RecoveryMode: Support generate-root for autounseals (#7591)

* Recovery: Abstract config creation and log settings

* Recovery mode integration test. (#7600)

* Recovery: Touch up (#7607)

* Recovery: Touch up

* revert the raw backend creation changes

* Added recovery operation token prefix

* Move RawBackend to its own file

* Update API path and hit it using CLI flag on generate-root

* Fix a panic triggered when handling a request that yields a nil response. (#7618)

* Improve integ test to actually make changes while in recovery mode and
verify they're still there after coming back in regular mode.

* Refuse to allow a second recovery token to be generated.

* Resize raft cluster to size 1 and start as leader (#7626)

* RecoveryMode: Setup raft cluster post unseal (#7635)

* Setup raft cluster post unseal in recovery mode

* Remove marking as unsealed as its not needed

* Address review comments

* Accept only one seal config in recovery mode as there is no scope for migration
2019-10-15 00:55:31 -04:00
Brian Kassouf 1167fad704
Improve raft write performance by utilizing FSM Batching (#7527)
* Start benchmark work

* Add batching FSM function

* dedupe some code

* Update dependency on chunking FSM

* fix raft external tests

* fix go.mod

* Add batching test

* uncomment test

* update raft deps

* update vendor

* Update physical/raft/fsm.go

Co-Authored-By: Michel Vocks <michelvocks@gmail.com>

* Update physical/raft/fsm.go
2019-10-14 09:25:07 -06:00
Brian Kassouf 024c29c36a
OSS portions of raft non-voters (#7634)
* OSS portions of raft non-voters

* add file

* Update vault/raft.go

Co-Authored-By: Vishal Nayak <vishalnayak@users.noreply.github.com>
2019-10-11 11:56:59 -07:00
Brian Kassouf c2905773e4
Add download headers to snapshot take API (#7369)
* Add download headers to snapshot take API

* Add content type
2019-09-06 10:34:36 -07:00
Brian Kassouf bdfa2c7828
Add additional raft chunk test (#7192)
* Add an end-to-end raft chunk test

* Apply suggestions from code review

Co-Authored-By: Jim Kalafut <jkalafut@hashicorp.com>
2019-07-29 14:11:46 -07:00
Brian Kassouf b83aaf7331
storage/raft: Support storage migration to raft storage (#7207)
* Support raft in the migration command

* Add comments
2019-07-29 13:05:43 -07:00
Brian Kassouf a77995cdb1 Ensure raft configuration properly lists the leader (#7188) 2019-07-25 08:41:14 -04:00
Jeff Mitchell 0425db59ab
Raft chunk snapshotting (#7185)
Support chunking, including snapshot handling
2019-07-24 20:44:13 -04:00
Brian Kassouf 965066161a
Revert "Fix the config output (#7113)" (#7184)
This reverts commit 2f7cfc9aae911c8860db37e556363fbfb1567075.
2019-07-24 10:23:30 -07:00
Jeff Mitchell fd376b4bdf Use ChunkingConfigurationStore for raft 2019-07-23 10:59:21 -04:00
Vishal Nayak 0010d79498 Fix the config output (#7113) 2019-07-22 12:59:46 -04:00
Jeff Mitchell 3b22ab2486 Add chunking support to raft 2019-07-22 12:17:58 -04:00
Christian Muehlhaeuser e6febc5839 Fixed a bunch of typos (#7146) 2019-07-18 21:10:15 -04:00
Brian Kassouf 4d7d0d729a
storage/raft: When restoring a snapshot preseal first (#7011)
* storage/raft: When restoring a snapshot preseal first

* best-effort allow standbys to apply the restoreOp before sealing active node

* Don't cache the raft tls key

* Update physical/raft/raft.go

* Move pending raft peers to core

* Fix race on close bool

* Extend the leaderlease time for tests

* Update raft deps

* Fix audit hashing

* Fix race with auditing
2019-07-03 13:56:30 -07:00
Vishal Nayak 4484de3ea6
Fix raft config response (#6975) 2019-06-27 17:39:52 -04:00
Brian Kassouf 62e14c280d
storage/raft: fix races in tests (#6996)
* storage/raft: fix races in tests

* Fix another test race
2019-06-27 10:00:03 -07:00
Vishal Nayak 53035ce390
Raft CLI (#6893)
* raft cli

* Reuse the command's client

* Better response handling

* minor touchups
2019-06-20 21:32:00 -04:00
Jeff Mitchell 07dcdc8b79 Sync 2019-06-20 20:55:10 -04:00
Brian Kassouf ed14061578
Raft Storage Backend (#6888)
* Work on raft backend

* Add logstore locally

* Add encryptor and unsealable interfaces

* Add clustering support to raft

* Remove client and handler

* Bootstrap raft on init

* Cleanup raft logic a bit

* More raft work

* Work on TLS config

* More work on bootstrapping

* Fix build

* More work on bootstrapping

* More bootstrapping work

* fix build

* Remove consul dep

* Fix build

* merged oss/master into raft-storage

* Work on bootstrapping

* Get bootstrapping to work

* Clean up FMS and node-id

* Update local node ID logic

* Cleanup node-id change

* Work on snapshotting

* Raft: Add remove peer API (#906)

* Add remove peer API

* Add some comments

* Fix existing snapshotting (#909)

* Raft get peers API (#912)

* Read raft configuration

* address review feedback

* Use the Leadership Transfer API to step-down the active node (#918)

* Raft join and unseal using Shamir keys (#917)

* Raft join using shamir

* Store AEAD instead of master key

* Split the raft join process to answer the challenge after a successful unseal

* get the follower to standby state

* Make unseal work

* minor changes

* Some input checks

* reuse the shamir seal access instead of new default seal access

* refactor joinRaftSendAnswer function

* Synchronously send answer in auto-unseal case

* Address review feedback

* Raft snapshots (#910)

* Fix existing snapshotting

* implement the noop snapshotting

* Add comments and switch log libraries

* add some snapshot tests

* add snapshot test file

* add TODO

* More work on raft snapshotting

* progress on the ConfigStore strategy

* Don't use two buckets

* Update the snapshot store logic to hide the file logic

* Add more backend tests

* Cleanup code a bit

* [WIP] Raft recovery (#938)

* Add recovery functionality

* remove fmt.Printfs

* Fix a few fsm bugs

* Add max size value for raft backend (#942)

* Add max size value for raft backend

* Include physical.ErrValueTooLarge in the message

* Raft snapshot Take/Restore API  (#926)

* Inital work on raft snapshot APIs

* Always redirect snapshot install/download requests

* More work on the snapshot APIs

* Cleanup code a bit

* On restore handle special cases

* Use the seal to encrypt the sha sum file

* Add sealer mechanism and fix some bugs

* Call restore while state lock is held

* Send restore cb trigger through raft log

* Make error messages nicer

* Add test helpers

* Add snapshot test

* Add shamir unseal test

* Add more raft snapshot API tests

* Fix locking

* Change working to initalize

* Add underlying raw object to test cluster core

* Move leaderUUID to core

* Add raft TLS rotation logic (#950)

* Add TLS rotation logic

* Cleanup logic a bit

* Add/Remove from follower state on add/remove peer

* add comments

* Update more comments

* Update request_forwarding_service.proto

* Make sure we populate all nodes in the followerstate obj

* Update times

* Apply review feedback

* Add more raft config setting (#947)

* Add performance config setting

* Add more config options and fix tests

* Test Raft Recovery (#944)

* Test raft recovery

* Leave out a node during recovery

* remove unused struct

* Update physical/raft/snapshot_test.go

* Update physical/raft/snapshot_test.go

* fix vendoring

* Switch to new raft interface

* Remove unused files

* Switch a gogo -> proto instance

* Remove unneeded vault dep in go.sum

* Update helper/testhelpers/testhelpers.go

Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>

* Update vault/cluster/cluster.go

* track active key within the keyring itself (#6915)

* track active key within the keyring itself

* lookup and store using the active key ID

* update docstring

* minor refactor

* Small text fixes (#6912)

* Update physical/raft/raft.go

Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>

* review feedback

* Move raft logical system into separate file

* Update help text a bit

* Enforce cluster addr is set and use it for raft bootstrapping

* Fix tests

* fix http test panic

* Pull in latest raft-snapshot library

* Add comment
2019-06-20 12:14:58 -07:00