After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.
Move some common Vault API data struct decoding out of the Vault client
so it can be reused in other situations.
Make Vault job validation its own function so it's easier to expand it.
Rename the `Job.VaultPolicies` method to just `Job.Vault` since it
returns the full Vault block, not just their policies.
Set `ChangeMode` on `Vault.Canonicalize`.
Add some missing tests.
Allows specifying an entity alias that will be used by Nomad when
deriving the task Vault token.
An entity alias assigns an indentity to a token, allowing better control
and management of Vault clients since all tokens with the same indentity
alias will now be considered the same client. This helps track Nomad
activity in Vault's audit logs and better control over Vault billing.
Add support for a new Nomad server configuration to define a default
entity alias to be used when deriving Vault tokens. This default value
will be used if the task doesn't have an entity alias defined.
As of 0.11.3 Vault token revocation and purging was done in batches.
However the batch size was only limited by the number of *non-expired*
tokens being revoked.
Due to bugs prior to 0.11.3, *expired* tokens were not properly purged.
Long-lived clusters could have thousands to *millions* of very old
expired tokens that never got purged from the state store.
Since these expired tokens did not count against the batch limit, very
large batches could be created and overwhelm servers.
This commit ensures expired tokens count toward the batch limit with
this one line change:
```
- if len(revoking) >= toRevoke {
+ if len(revoking)+len(ttlExpired) >= toRevoke {
```
However, this code was difficult to test due to being in a periodically
executing loop. Most of the changes are to make this one line change
testable and test it.
adds in oss components to support enterprise multi-vault namespace feature
upgrade specific doc on vault multi-namespaces
vault docs
update test to reflect new error
Establishing leadership should be very fast and never make external API
calls.
This fixes a situation where there is a long backlog of Vault tokens to
be revoked on when leadership is gained. In such case, revoking the
tokens will significantly slow down leadership establishment and slow
down processing. Worse, the revocation call does not honor leadership
`stopCh` signals, so it will not stop when the leader loses leadership.
Vault 1.2.0 deprecated `period` field in favor of `token_period` in auth
role:
> * Token store roles use new, common token fields for the values
> that overlap with other auth backends. `period`, `explicit_max_ttl`, and
> `bound_cidrs` will continue to work, with priority being given to the
> `token_` prefixed versions of those parameters. They will also be returned
> when doing a read on the role if they were used to provide values initially;
> however, in Vault 1.4 if `period` or `explicit_max_ttl` is zero they will no
> longer be returned. (`explicit_max_ttl` was already not returned if empty.)
https://github.com/hashicorp/vault/blob/master/CHANGELOG.md#120-july-30th-2019
This seems to be the minimum viable patch for fixing a deadlock between
establishConnection and SetConfig.
SetConfig calls tomb.Kill+tomb.Wait while holding v.lock.
establishConnection needs to acquire v.lock to exit but SetConfig is
holding v.lock until tomb.Wait exits. tomb.Wait can't exit until
establishConnect does!
```
SetConfig -> tomb.Wait
^ |
| v
v.lock <- establishConnection
```
Keep attempting to renew Vault token past locally recorded expiry, just
in case the token was renewed out of band, e.g. on another Nomad server,
until Vault returns an unrecoverable error.
This PR removes enforcement that the Vault token role disallows orphaned
tokens and recommends orphaned tokens to simplify the
bootstrapping/upgrading of Nomad clusters. The requirement that Nomad's
Vault token never expire and be shared by all instances of Nomad servers
is not operationally friendly.
The Vault API returns a nil secret and nil error when reading an object
that doesn't exist. The old code assumed an error would be returned and
thus will panic when trying to validate a non-existant role.
This PR fixes our vet script and fixes all the missed vet changes.
It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.