As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.
Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.
Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:
ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
This function was only used in one place, and the indirection makes it slightly
harder to see what the one caller is doing. Since it's only accesing a couple fields
it seems like the logic can exist in the one caller.
VerifyIncomingRPC and verifyIncomingRPC were duplicate functions, and
once one is removed, Config.verifyIncomingRPC is only called in one place.
Remove 2 of the 3 functions to make the behaviour easier to follow (less indirection).
Unexport outgoingALPNRPCConfig since it is only used internally
Remove the MutualTLSCapable->mutualTLSCapable indirection, we only need the exported method.
Inline enableAgentTLSForChecks to make it more clear what it does, since it only has a single caller and is wrapping a single field lookup.
Replace two methods with a single one that returns the cert. This moves more
of the logic into the single caller (auto-config).
tlsutil.Configurator is widely used. By keeping it smaller and focused only on storing and
returning TLS config, we make the code easier to follow.
These two methods were more related to auto-config than to tlsutil, so reducing the interface
moves the logic closer to the feature that requires it.
the autoTLS field on Configurator is only set once. By making it a value receiver it
should be allocated as a single block of memory along with Configurator.
Also add godoc to document what it is used for.
The log method only needed the lock because it accessed version. By using an atomic
instead of a lock, we can remove the risk that the comments call out, making log safer
to use.
Also updates the log name to match the function names, and adds some comments about how
the lock is used.
Embedded structs make code harder to navidate because an IDE can not show all uses of
the methods of that field separate from other uses.
Generally embedding of structs should only be used to satisfy an interface, and in this
case the Configurator type does not need to implement the RWMutex interface.
Using a TestSigner was causing problems because go1.16 has this change:
> CreateCertificate now verifies the generated certificate's signature
> using the signer's public key. If the signature is invalid, an error is
> returned, instead of a malformed certificate.
See https://golang.org/doc/go1.16#crypto/x509
Some TLS servers require SNI, but the Golang HTTP client doesn't
include it in the ClientHello when connecting to an IP address. This
change adds a new TLSServerName field to health check definitions to
optionally set it. This fixes#9473.
An old PR (#7623) was merged after #9585. The old code was incompatible with the new
changes, but none of the lines caused a git conflict so the merge was allowed.
The incompatible changes caused the tests to fail. This fixes the old code to
work with the new changes.
* ci: stop building darwin/386 binaries
Go 1.15 drops support for 32-bit binaries on Darwin https://golang.org/doc/go1.15#darwin
* tls: ConnectionState::NegotiatedProtocolIsMutual is deprecated in Go 1.15, this value is always true
* correct error messages that changed slightly
* Completely regenerate some TLS test data
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
Ensure that enabling AutoConfig sets the tls configurator properly
This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.
Also fix a bug where Consul could segfault if TLS was enabled but no client certificate was provided. How no one has reported this as a problem I am not sure.
Right now this is only hooked into the insecure RPC server and requires JWT authorization. If no JWT authorizer is setup in the configuration then we inject a disabled “authorizer” to always report that JWT authorization is disabled.
And fix the 'value not used' issues.
Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.
In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:
There are several distinct chunks of code that are affected:
* new flags and config options for the server
* retry join WAN is slightly different
* retry join code is shared to discover primary mesh gateways from secondary datacenters
* because retry join logic runs in the *agent* and the results of that
operation for primary mesh gateways are needed in the *server* there are
some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
at multiple layers of abstraction just to pass the data down to the right
layer.
* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers
* the function signature for RPC dialing picked up a new required field (the
node name of the destination)
* several new RPCs for manipulating a FederationState object:
`FederationState:{Apply,Get,List,ListMeshGateways}`
* 3 read-only internal APIs for debugging use to invoke those RPCs from curl
* raft and fsm changes to persist these FederationStates
* replication for FederationStates as they are canonically stored in the
Primary and replicated to the Secondaries.
* a special derivative of anti-entropy that runs in secondaries to snapshot
their local mesh gateway `CheckServiceNodes` and sync them into their upstream
FederationState in the primary (this works in conjunction with the
replication to distribute addresses for all mesh gateways in all DCs to all
other DCs)
* a "gateway locator" convenience object to make use of this data to choose
the addresses of gateways to use for any given RPC or gossip operation to a
remote DC. This gets data from the "retry join" logic in the agent and also
directly calls into the FSM.
* RPC (`:8300`) on the server sniffs the first byte of a new connection to
determine if it's actually doing native TLS. If so it checks the ALPN header
for protocol determination (just like how the existing system uses the
type-byte marker).
* 2 new kinds of protocols are exclusively decoded via this native TLS
mechanism: one for ferrying "packet" operations (udp-like) from the gossip
layer and one for "stream" operations (tcp-like). The packet operations
re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
overhead.
* the server instances specially wrap the `memberlist.NetTransport` when running
with gateway federation enabled (in a `wanfed.Transport`). The general gist is
that if it tries to dial a node in the SAME datacenter (deduced by looking
at the suffix of the node name) there is no change. If dialing a DIFFERENT
datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
gateways to eventually end up in a server's :8300 port.
* a new flag when launching a mesh gateway via `consul connect envoy` to
indicate that the servers are to be exposed. This sets a special service
meta when registering the gateway into the catalog.
* `proxycfg/xds` notice this metadata blob to activate additional watches for
the FederationState objects as well as the location of all of the consul
servers in that datacenter.
* `xds:` if the extra metadata is in place additional clusters are defined in a
DC to bulk sink all traffic to another DC's gateways. For the current
datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
balances all servers as well as one mini-cluster per node
(`<node>.server.<dc>.consul`)
* the `consul tls cert create` command got a new flag (`-node`) to help create
an additional SAN in certs that can be used with this flavor of federation.
* Switch to WithinDuration to improve test
This test was flaky before because of the time logic. Now it uses
WithinDuration and should be correct.
Fixes https://github.com/hashicorp/consul/issues/6857.
* relax requirements for auto_encrypt on server
* better error message when auto_encrypt and verify_incoming on
* docs: explain verify_incoming on Consul clients.
The fields in the certs are meant to hold the original binary
representation of this data, not some ascii-encoded version.
The only time we should be colon-hex-encoding fields is for display
purposes or marshaling through non-TLS mediums (like RPC).