Taking inspiration from
https://github.com/golang/go/issues/17604#issuecomment-256384471
suggests that taking the address of a stack variable for use in atomics
works (at least, the race detector doesn't complain) but is doing it
wrong.
The only other change is a change in Leader() detecting if HA is enabled
to fast-path out. This value never changes after NewCore, so we don't
need to grab the read lock to check it.
* Do some best-effort cleanup in file backend
If put results in an encoding error and after the file is closed we
detect it's zero bytes, it could be caused by an out of space error on
the disk since file info is often stored in filesystem metadata with
reserved space. This tries to detect that scenario and perform
best-effort cleanup. We only do this on zero length files to ensure that
if an encode fails to write but the system hasn't already performed
truncation, we leave the existing data alone.
Vault should never write a zero-byte file (as opposed to a zero-byte
value in the encoded JSON) so if this case is hit it's always an error.
* Also run a check on Get
* Add tests to ExerciseBackend to expose nested-values bug
* Update DynamoDB physical backend Delete and hasChildren logic to prevent overzealous cleanup of folders and values
* Use the AWS SDK's UnmarshalMap method for dynamodb backend, not the deprecated ConvertFromMap method
* Use the AWS SDK's MarshalMap method for dynamodb backend, not the deprecated ConvertToMap method
* Use the AWS SDK's session.NewSession method for dynamodb backend, not the deprecated session.New method
* Fix variable name awserr that colides with imported package in dynamodb backend
* logbridge with hclog and identical output
* Initial search & replace
This compiles, but there is a fair amount of TODO
and commented out code, especially around the
plugin logclient/logserver code.
* strip logbridge
* fix majority of tests
* update logxi aliases
* WIP fixing tests
* more test fixes
* Update test to hclog
* Fix format
* Rename hclog -> log
* WIP making hclog and logxi love each other
* update logger_test.go
* clean up merged comments
* Replace RawLogger interface with a Logger
* Add some logger names
* Replace Trace with Debug
* update builtin logical logging patterns
* Fix build errors
* More log updates
* update log approach in command and builtin
* More log updates
* update helper, http, and logical directories
* Update loggers
* Log updates
* Update logging
* Update logging
* Update logging
* Update logging
* update logging in physical
* prefixing and lowercase
* Update logging
* Move phyisical logging name to server command
* Fix som tests
* address jims feedback so far
* incorporate brians feedback so far
* strip comments
* move vault.go to logging package
* update Debug to Trace
* Update go-plugin deps
* Update logging based on review comments
* Updates from review
* Unvendor logxi
* Remove null_logger.go
* Switch reading from S3 to io.Copy from io.ReadFull
If the Content-Length header wasn't being sent back, the current
behavior could panic. It's unclear when it will not be sent; it appears
to be CORS dependent. But this works around it by not trying to
preallocate a buffer of a specific size and instead just read until EOF.
In addition I noticed that Close wasn't being called.
https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#GetObjectOutput
specifies that Body is an io.ReadCloser so I added a call to Close.
Fixes#4222
* Add some extra efficiency
Allow the storage backend for MySQL to use a custom connection lifetime and max idle connection value if the parameter is specified in the config file of vault otherwise do not set in order to leave at default value.
* Consul service address is blank
Setting an explicit service address eliminates the ability for Consul
to dynamically decide what it should be based on its translate_wan_addrs
setting.
translate_wan_addrs configures Consul to return its lan address to nodes
in its same datacenter but return its wan address to nodes in foreign
datacenters.
* service_address parameter for Consul storage backend
This parameter allows users to override the use of what Vault knows to
be its HA redirect address.
This option is particularly commpelling because if set to a blank
string, Consul will leverage the node configuration where the service is
registered which includes the `translate_wan_addrs` option. This option
conditionally associates nodes' lan or wan address based on where
requests originate.
* Add TestConsul_ServiceAddress
Ensures that the service_address configuration parameter is setting the
serviceAddress field of ConsulBackend instances properly.
If the "service_address" parameter is not set, the ConsulBackend
serviceAddress field must instantiate as nil to indicate that it can be
ignored.
* Add useragent package
This helper provides a consistent user-agent header for Vault, taking into account different versions.
* Add user-agent headers to spanner and gcs
This PR adds a new Storage Backend for Triton's Object Storage - Manta
```
make testacc TEST=./physical/manta
==> Checking that code complies with gofmt requirements...
==> Checking that build is using go version >= 1.9.1...
go generate
VAULT_ACC=1 go test -tags='vault' ./physical/manta -v -timeout 45m
=== RUN TestMantaBackend
--- PASS: TestMantaBackend (61.18s)
PASS
ok github.com/hashicorp/vault/physical/manta 61.210s
```
Manta behaves differently to how S3 works - it has no such concepts of Buckets - it is merely a filesystem style object store
Therefore, we have chosen the approach of when writing a secret `foo` it will actually map (on disk) as foo/.vault_value
The reason for this is because if we write the secret `foo/bar` and then try and Delete a key using the name `foo` then Manta
will complain that the folder is not empty because `foo/bar` exists. Therefore, `foo/bar` is written as `foo/bar/.vault_value`
The value of the key is *always* written to a directory tree of the name and put in a `.vault_value` file.
The original reason for the split was physical's dependencies, but those
haven't been onerous for a long time. Meanwhile it's a totally separate
implementation so we could be getting faulty results from tests. Get rid
of it and use the unified physical/inmem.
This change makes these errors transient instead of permanent:
[ERROR] core: failed to acquire lock: error=etcdserver: requested lease not found
After this change, there can still be one of these errors when a
standby vault that lost its lease tries to become leader, but on the
next lock acquisition attempt a new session will be created. With this
new session, the standby will be able to become the leader.
* Fix cassandra tests, explicitly set cluster port if provided
* Update cassandra.yml test-fixture
* Add port as part of the config option, fix tests
* Remove hostport splitting in cassandraConnectionProducer.createSession
* Include port in API docs
* Add max_parallel parameter to MySQL backend.
This limits the number of concurrent connections, so that vault does not die
suddenly from "Too many connections".
This can happen when e.g. vault starts up, and tries to load all the
existing leases in parallel. At the time of writing this, the value
ExpirationRestoreWorkerCount in vault/helper/consts/const.go is set to
64, meaning that if there are enough leases in the vault's DB, it will
generate AT LEAST 64 concurrent connections to MySQL when loading the
data during start-up. On certain configurations, e.g. smaller AWS
RDS/Aurora instances, this will cause Vault to fail startup.
* Fix a typo in mysql storage readme
This change addresses an issue where deep paths would not be enumerated if parent paths did not contain a key.
Given the keys `shallow` and `deep` at the following paths...
```
secret/shallow
secret/path/deep
```
... a `LIST` request against `/v1/secret` would produce only one result, `shallow`. With this change, the same list request will now list `shallow` and `path/`.
* Add a benchmark for exiration.Restore
* Add benchmarks for consul Restore functions
* Add a parallel version of expiration.Restore
* remove debug code
* Up the MaxIdleConnsPerHost
* Add tests for etcd
* Return errors and ensure go routines are exited
* Refactor inmem benchmark
* Add s3 bench and refactor a bit
* Few tweaks
* Fix race with waitgroup.Add()
* Fix waitgroup race condition
* Move wait above the info log
* Add helper/consts package to store consts that are needed in cyclic packages
* Remove not used benchmarks
S3 results require paging to ensure that all results are returned. This
PR changes the S3 physical backend to use the new ListObjectV2 method
and pages through all the results.
Fixes#2223.
This patch fixes two bugs in Zookeeper backends:
* backend was determining if the node is a leaf or not basing on the number
of the childer given node has. This is incorrect if you consider the fact
that deleteing nested node can leave empty prefixes/dirs behind which have
neither children nor data inside. The fix changes this situation by testing
if the node has any data set - if not then it is not a leaf.
* zookeeper does not delete nodes that do not have childern just like consul
does and this leads to leaving empty nodes behind. In order to fix it, we
scan the logical path of a secret being deleted for empty dirs/prefixes and
remove them up until first non-empty one.
Current tests were not checking if backends are properly removing
nested secrets. We follow here the behaviour of Consul backend, where
empty "directories/prefixes" are automatically removed by Consul itself.
Prepared statements prevent the use of connection multiplexing software
such as PGBouncer. Even when PGBouncer is configured for [session mode][1]
there's a possibility that a connection to PostgreSQL can be re-used by
different clients. This leads to errors when clients use session based
features (like prepared statements).
This change removes prepared statements from the PostgreSQL physical
backend. This will allow vault to successfully work in infrastructures
that employ the use of PGBouncer or other connection multiplexing
software.
[1]: https://pgbouncer.github.io/config.html#poolmode
If the local Consul agent is not available while attempting to step down from active or up to active, retry once a second. Allow for concurrent changes to the state with a single registration updater. Fix standby initialization.
Vault will now register itself with Consul. The active node can be found using `active.vault.service.consul`. All standby vaults are available via `standby.vault.service.consul`. All unsealed vaults are considered healthy and available via `vault.service.consul`. Change in status and registration is event driven and should happen at the speed of a write to Consul (~network RTT + ~1x fsync(2)).
Healthy/active:
```
curl -X GET 'http://127.0.0.1:8500/v1/health/service/vault?pretty' && echo;
[
{
"Node": {
"Node": "vm1",
"Address": "127.0.0.1",
"TaggedAddresses": {
"wan": "127.0.0.1"
},
"CreateIndex": 3,
"ModifyIndex": 20
},
"Service": {
"ID": "vault:127.0.0.1:8200",
"Service": "vault",
"Tags": [
"active"
],
"Address": "127.0.0.1",
"Port": 8200,
"EnableTagOverride": false,
"CreateIndex": 17,
"ModifyIndex": 20
},
"Checks": [
{
"Node": "vm1",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": "",
"CreateIndex": 3,
"ModifyIndex": 3
},
{
"Node": "vm1",
"CheckID": "vault-sealed-check",
"Name": "Vault Sealed Status",
"Status": "passing",
"Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
"Output": "",
"ServiceID": "vault:127.0.0.1:8200",
"ServiceName": "vault",
"CreateIndex": 19,
"ModifyIndex": 19
}
]
}
]
```
Healthy/standby:
```
[snip]
"Service": {
"ID": "vault:127.0.0.2:8200",
"Service": "vault",
"Tags": [
"standby"
],
"Address": "127.0.0.2",
"Port": 8200,
"EnableTagOverride": false,
"CreateIndex": 17,
"ModifyIndex": 20
},
"Checks": [
{
"Node": "vm2",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": "",
"CreateIndex": 3,
"ModifyIndex": 3
},
{
"Node": "vm2",
"CheckID": "vault-sealed-check",
"Name": "Vault Sealed Status",
"Status": "passing",
"Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
"Output": "",
"ServiceID": "vault:127.0.0.2:8200",
"ServiceName": "vault",
"CreateIndex": 19,
"ModifyIndex": 19
}
]
}
]
```
Sealed:
```
"Checks": [
{
"Node": "vm2",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": "",
"CreateIndex": 3,
"ModifyIndex": 3
},
{
"Node": "vm2",
"CheckID": "vault-sealed-check",
"Name": "Vault Sealed Status",
"Status": "critical",
"Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
"Output": "Vault Sealed",
"ServiceID": "vault:127.0.0.2:8200",
"ServiceName": "vault",
"CreateIndex": 19,
"ModifyIndex": 38
}
]
```
From the PostgreSQL docs
(http://www.postgresql.org/docs/9.4/static/datatype-character.html):
> Tip: There is no performance difference among these three types,
> apart from increased storage space when using the blank-padded type,
> and a few extra CPU cycles to check the length when storing into a
> length-constrained column. While character(n) has performance
> advantages in some other database systems, there is no such advantage
> in PostgreSQL; in fact character(n) is usually the slowest of the
> three because of its additional storage costs. In most situations
> text or character varying should be used instead.
Some etcd configurations (such as that provided by compose.io) place the
etcd cluster behind multiple load balancers or proxies. In this
configuration, calling Sync (or AutoSync) on the etcd client will
replace the load balancer addresses with the underlying etcd server
address.
This will cause the etcd client to bypass the load balancers, and may
cause the connection to fail completely if the etcd servers are
protected by a firewall.
This patch provides a "sync" option for the etcd backend, which defaults
to the current behavior, but which can be used to turn off of sync.
This corresponds to etcdctl's --no-sync option.
When Vault is killed without the chance to clean up the lock
entry in DynamoDB, no further Vault nodes can become leaders after
that.
To recover from this situation, this commit adds an environment
variable and a configuration flag that when set to "1" causes Vault
to delete the lock entry from DynamoDB.
The permit pool controls the number of outstanding operations that can
be queued for Consul (and inmem, for testing purposes). This prevents
possible situations where Vault launches thousands of concurrent
connections to Consul if e.g. a huge number of leases need to be
expired.
Fixes#677
This strips out http.DefaultClient everywhere I could immediately find
it. Too many things use it and then modify it in incompatible ways.
Fixes#700, I believe.