* Change CA Configure struct to pass Datacenter through
* Remove connect/ca/plugin as we don't have immediate plans to use it.
We still intend to one day but there are likely to be several changes to the CA provider interface before we do so it's better to rebuild from history when we do that work properly.
* Rename PrimaryDC; fix endpoint in secondary DCs
Refactor dns to have same behavior between A and SRV.
Current implementation returns the node name instead of the service
address.
With this fix when querying for SRV record service address is return in
the SRV record.
And when performing a simple dns lookup it returns a CNAME to the
service address.
* Support Connect CAs that can't cross sign
* revert spurios mod changes from make tools
* Add log warning when forcing CA rotation
* Fixup SupportsCrossSigning to report errors and work with Plugin interface (fixes tests)
* Fix failing snake_case test
* Remove misleading comment
* Revert "Remove misleading comment"
This reverts commit bc4db9cabed8ad5d0e39b30e1fe79196d248349c.
* Remove misleading comment
* Regen proto files messed up by rebase
* pass logger through to provider
* test for proper operation of NeedsLogger
* remove public testServer function
* Ooops actually set the logger in all the places we need it - CA config set wasn't and causing segfault
* Fix all the other places in tests where we set the logger
* Allow CA Providers to persist some state
* Update CA provider plugin interface
* Fix plugin stubs to match provider changes
* Update agent/connect/ca/provider.go
Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
* Cleanup review comments
* add NeedsLogger to Provider interface
* implements NeedsLogger in default provider
* pass logger through to provider
* test for proper operation of NeedsLogger
* remove public testServer function
* Switch test to actually assert on logging output rather than reflection.
--amend
* Ooops actually set the logger in all the places we need it - CA config set wasn't and causing segfault
* Fix all the other places in tests where we set the logger
* Add TODO comment
* Allow RSA CA certs for consul and vault providers to correctly sign EC leaf certs.
* Ensure key type ad bits are populated from CA cert and clean up tests
* Add integration test and fix error when initializing secondary CA with RSA key.
* Add more tests, fix review feedback
* Update docs with key type config and output
* Apply suggestions from code review
Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
Main Changes:
• method signature updates everywhere to account for passing around enterprise meta.
• populate the EnterpriseAuthorizerContext for all ACL related authorizations.
• ACL resource listings now operate like the catalog or kv listings in that the returned entries are filtered down to what the token is allowed to see. With Namespaces its no longer all or nothing.
• Modified the acl.Policy parsing to abstract away basic decoding so that enterprise can do it slightly differently. Also updated method signatures so that when parsing a policy it can take extra ent metadata to use during rules validation and policy creation.
Secondary Changes:
• Moved protobuf encoding functions out of the agentpb package to eliminate circular dependencies.
• Added custom JSON unmarshalers for a few ACL resource types (to support snake case and to get rid of mapstructure)
• AuthMethod validator cache is now an interface as these will be cached per-namespace for Consul Enterprise.
• Added checks for policy/role link existence at the RPC API so we don’t push the request through raft to have it fail internally.
• Forward ACL token delete request to the primary datacenter when the secondary DC doesn’t have the token.
• Added a bunch of ACL test helpers for inserting ACL resource test data.
Previously the logic for configuring RDS during LDS for L7 upstreams was
overapplied to TCP proxies resulting in a cluster name of <emptystring>
being used incorrectly.
Fixes#6621
If acls have not yet replicated to the secondary then authz requests
will be remotely resolved by the primary. Now these tests explicitly
wait until replication has caught up first.
* ACL Authorizer overhaul
To account for upcoming features every Authorization function can now take an extra *acl.EnterpriseAuthorizerContext. These are unused in OSS and will always be nil.
Additionally the acl package has received some thorough refactoring to enable all of the extra Consul Enterprise specific authorizations including moving sentinel enforcement into the stubbed structs. The Authorizer funcs now return an acl.EnforcementDecision instead of a boolean. This improves the overall interface as it makes multiple Authorizers easily chainable as they now indicate whether they had an authoritative decision or should use some other defaults. A ChainedAuthorizer was added to handle this Authorizer enforcement chain and will never itself return a non-authoritative decision.
* Include stub for extra enterprise rules in the global management policy
* Allow for an upgrade of the global-management policy
A check may be set to become passing/critical only if a specified number of successive
checks return passing/critical in a row. Status will stay identical as before until
the threshold is reached.
This feature is available for HTTP, TCP, gRPC, Docker & Monitor checks.
* Implement leader routine manager
Switch over the following to use it for go routine management:
• Config entry Replication
• ACL replication - tokens, policies, roles and legacy tokens
• ACL legacy token upgrade
• ACL token reaping
• Intention Replication
• Secondary CA Roots Watching
• CA Root Pruning
Also added the StopAll call into the Server Shutdown method to ensure all leader routines get killed off when shutting down.
This should be mostly unnecessary as `revokeLeadership` should manually stop each one but just in case we really want these to go away (eventually).
This only works so long as we use simplistic protobuf types. Constructs such as oneof or Any types that require type annotations for decoding properly will fail hard but that is by design. If/when we want to use any of that we will probably need to consider a v2 API.
* Add JSON and Binary Marshaler Generators for Protobuf Types
* Generate files with the correct version of gogo/protobuf
I have pinned the version in the makefile so when you run make tools you get the right version. This pulls the version out of go.mod so it should remain up to date.
The version at the time of this commit we are using is v1.2.1
* Fixup some shell output
* Update how we determine the version of gogo
This just greps the go.mod file instead of expecting the go mod cache to already be present
* Fixup vendoring and remove no longer needed json encoder functions
## HTTPAdapter (#5637)
## Ember upgrade 2.18 > 3.12 (#6448)
### Proxies can no longer get away with not calling _super
This means that we can't use create anymore to define dynamic methods.
Therefore we dynamically make 2 extended Proxies on demand, and then
create from those. Therefore we can call _super in the init method of
the extended Proxies.
### We aren't allowed to reset a service anymore
We never actually need to now anyway, this is a remnant of the refactor
from browser based confirmations. We fix it as simply as possible here
but will revisit and remove the old browser confirm functionality at a
later date
### Revert classes to use ES5 style to workaround babel transp. probs
Using a mixture of ES6 classes (and hence super) and arrow functions
means that when babel transpiles the arrow functions down to ES5, a
reference to this is moved before the call to super, hence causing a js
error.
Furthermore, we the testing environment no longer lets use use
apply/call on the constructor.
These errors only manifests during testing (only in the testing
environment), the application itself runs fine with no problems without
this change.
Using ES5 style class definitions give us freedom to do all of the above
without causing any errors, so we reverted these classes back to ES5
class definitions
### Skip test that seems to have changed due to a change in RSVP timing
This test tests a usecase/area of the API that will probably never ever
be used, it was more testing out the API. We've skipped the test for now
as this doesn't affect the application itself, but left a note to come
back here later to investigate further
### Remove enumerableContentDidChange
Initial testing looks like we don't need to call this function anymore,
the function no longer exists
### Rework Changeset.isSaving to take into account new ember APIs
Setting/hanging a computedProperty of an instantiated object no longer
works. Move to setting it on the prototype/class definition instead
### Change how we detect whether something requires listening
New ember API's have changed how you can detect whether something is a
computedProperty or not. It's not immediately clear if its even possible
now. Therefore we change how we detect whether something should be
listened to or not by just looking for presence of `addEventListener`
### Potentially temporary change of ci test scripts to ensure deps exist
All our tooling scripts run through a Makefile (for people familiar with
only using those), which then call yarn scripts which can be called
independently (for people familar with only using yarn).
The Makefile targets always check to make sure all the dependencies are
installed before running anything that requires them (building, testing
etc).
The CI scripts/targets didn't follow this same route and called the yarn
scripts directly (usually CI builds a cache of the dependencies first).
For some reason this cache isn't doing what it usually does, and it
looks as though, in CI, ember isn't installed.
This commit makes the CI scripts consistently use the same method as all
of the other tooling scripts (Makefile target > Install Deps if
required > call yarn script). This should install the dependencies if
for some reason the CI cache building doesn't complete/isn't successful.
Potentially this commit may be reverted if, the root of the problem is
elsewhere, although consistency is always good, so it might be a good
idea to leave this commit as is even if we need to debug and fix things
elsewhere.
### Make test-parallel consistent with the rest of the tooling scripts
As we are here making changes for CI purposes (making test-ci
consistent), we spotted that test-parallel is also inconsistent and also
the README manual instructions won't work without `ember` installed
globally.
This commit makes everything consistent and changes the manual
instructions to use the local ember instance that gets installed via
yarn
### Re-wrangle catchable to fit with new ember 3.12 APIs
In the upgrade from ember 3.8 > 3.12 the public interfaces for
ComputedProperties have changed slightly. `meta` is no longer a public
property of ComputedProperty but of a ComputedDecoratorImpl mixin
instead.
7e4ba1096e/packages/%40ember/-internals/metal/lib/computed.ts (L725)
There seems to be no way, by just using publically available
methods, to replicate this behaviour so that we can create our own
'ComputedProperty` factory via injecting the ComputedProperty class as
we did previously.
3f333bada1/ui-v2/app/utils/computed/factory.js (L1-L18)
Instead we dynamically hang our `Catchable` `catch` method off the
instantiated ComputedProperty. In doing it like this `ComputedProperty`
has already has its `meta` method mixed in so we don't have to manually
mix it in ourselves (which doesn't seem possible)
This functionality is only used during our work in trying to ensure
our EventSource/BlockingQuery work was as 'ember-like' as possible (i.e.
using the traditional Route.model hooks and ember-like Controller
properties). Our ongoing/upcoming work on a componentized approach to
data a.k.a `<DataSource />` means we will be able to remove the majority
of the code involved here now that it seems to be under an amount of
flux in ember.
### Build bindata_assetfs.go with new UI changes
* Add support for parameterizing the ACL config used with a TestAgent
Using tokens that are UUIDs will get rid of some warnings
* Refactor to allow setting all tokens and change the template to ignore unset values.
This fixes an issue where leaf certificates issued in secondary
datacenters would be reissued very frequently (every ~20 seconds)
because the logic meant to detect root rotation was errantly triggering
because a hash of the ultimate root (in the primary) was being compared
against a hash of the local intermediate root (in the secondary) and
always failing.
Fixes#6521
Ensure that initial failures to fetch an agent cache entry using the
notify API where the underlying RPC returns a synthetic index of 1
correctly recovers when those RPCs resume working.
The bug in the Cache.notifyBlockingQuery used to incorrectly "fix" the
index for the next query from 0 to 1 for all queries, when it should
have not done so for queries that errored.
Also fixed some things that made debugging difficult:
- config entry read/list endpoints send back QueryMeta headers
- xds event loops don't swallow the cache notification errors
In a previous PR I made it so that we had interfaces that would work enough to allow blockingQueries to work. However to complete this we need all fields to be settable and gettable.
Notes:
• If Go ever gets contracts/generics then we could get rid of all the Getters/Setters
• protoc / protoc-gen-gogo are going to generate all the getters for us.
• I copied all the getters/setters from the protobuf funcs into agent/structs/protobuf_compat.go
• Also added JSON marshaling funcs that use jsonpb for protobuf types.
Fixes: #5396
This PR adds a proxy configuration stanza called expose. These flags register
listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only
listening on the loopback interface, while still accepting traffic from non
Connect-enabled services.
Under expose there is a boolean checks flag that would automatically expose all
registered HTTP and gRPC check paths.
This stanza also accepts a paths list to expose individual paths. The primary
use case for this functionality would be to expose paths for third parties like
Prometheus or the kubelet.
Listeners for requests to exposed paths are be configured dynamically at run
time. Any time a proxy, or check can be registered, a listener can also be
created.
In this initial implementation requests to these paths are not
authenticated/encrypted.
Also:
* Finished threading replaceExistingChecks setting (from GH-4905)
through service manager.
* Respected the original configSource value that was used to register a
service or a check when restoring persisted data.
* Run several existing tests with and without central config enabled
(not exhaustive yet).
* Switch to ioutil.ReadFile for all types of agent persistence.
The fields in the certs are meant to hold the original binary
representation of this data, not some ascii-encoded version.
The only time we should be colon-hex-encoding fields is for display
purposes or marshaling through non-TLS mediums (like RPC).
This only affects vault versions >=1.1.1 because the prior code
accidentally relied upon a bug that was fixed in
https://github.com/hashicorp/vault/pull/6505
The existing tests should have caught this, but they were using a
vendored copy of vault version 0.10.3. This fixes the tests by running
an actual copy of vault instead of an in-process copy. This has the
added benefit of changing the dependency on vault to just vault/api.
Also update VaultProvider to use similar SetIntermediate validation code
as the ConsulProvider implementation.
* Add build system support for protobuf generation
This is done generically so that we don’t have to keep updating the makefile to add another proto generation.
Note: anything not in the vendor directory and with a .proto extension will be run through protoc if the corresponding namespace.pb.go file is not up to date.
If you want to rebuild just a single proto file you can do so with: make proto-rebuild PROTOFILES=<list of proto files to rebuild>
Providing the PROTOFILES var will override the default behavior of finding all the .proto files.
* Start adding types to the agent/proto package
These will be needed for some other work and are by no means comprehensive.
* Add ability to resolve/fixup the agentpb.ACLLinks structure in the state store.
* Use protobuf marshalling of raft requests instead of msgpack for protoc generated types.
This does not change any encoding of existing types.
* Removed structs package automatically encoding with protobuf marshalling
Instead the caller of raftApply that wants to opt-in to protobuf encoding will have to call `raftApplyProtobuf`
* Run update-vendor to fixup modules.txt
Nothing changed as far as dependencies go but the ordering of modules in that file depends on the time they are first seen and its not alphabetical.
* Rename some things and implement the structs.RPCInfo interface bits
agentpb.QueryOptions and agentpb.WriteRequest implement 3 of the 4 RPCInfo funcs and the new TargetDatacenter message type implements the fourth.
* Use the right encoding function.
* Renamed agent/proto package to agent/agentpb to prevent package name conflicts
* Update modules.txt to fix ordering
* Change blockingQuery to take in interfaces for the query options and meta
* Add %T to error output.
* Add/Update some comments
This should cut down on test flakiness.
Problems handled:
- If you had enough parallel test cases running, the former circular
approach to handling the port block could hand out the same port to
multiple cases before they each had a chance to bind them, leading to
one of the two tests to fail.
- The freeport library would allocate out of the ephemeral port range.
This has been corrected for Linux (which should cover CI).
- The library now waits until a formerly-in-use port is verified to be
free before putting it back into circulation.
In normal operations there is a read/write race related to request
QueryOptions fields. An example race:
WARNING: DATA RACE
Read at 0x00c000836950 by goroutine 30:
github.com/hashicorp/consul/agent/structs.(*ServiceConfigRequest).CacheInfo()
/go/src/github.com/hashicorp/consul/agent/structs/config_entry.go:506 +0x109
github.com/hashicorp/consul/agent/cache.(*Cache).getWithIndex()
/go/src/github.com/hashicorp/consul/agent/cache/cache.go:262 +0x5c
github.com/hashicorp/consul/agent/cache.(*Cache).notifyBlockingQuery()
/go/src/github.com/hashicorp/consul/agent/cache/watch.go:89 +0xd7
Previous write at 0x00c000836950 by goroutine 147:
github.com/hashicorp/consul/agent/cache-types.(*ResolvedServiceConfig).Fetch()
/go/src/github.com/hashicorp/consul/agent/cache-types/resolved_service_config.go:31 +0x219
github.com/hashicorp/consul/agent/cache.(*Cache).fetch.func1()
/go/src/github.com/hashicorp/consul/agent/cache/cache.go:495 +0x112
This patch does a lightweight copy of the request struct so that the
embedded QueryOptions fields that are mutated during Fetch() are scoped
to just that one RPC.
* Store primaries root in secondary after intermediate signature
This ensures that the intermediate exists within the CA root stored in raft and not just in the CA provider state. This has the very nice benefit of actually outputting the intermediate cert within the ca roots HTTP/RPC endpoints.
This change means that if signing the intermediate fails it will not set the root within raft. So far I have not come up with a reason why that is bad. The secondary CA roots watch will pull the root again and go through all the motions. So as soon as getting an intermediate CA works the root will get set.
* Make TestAgentAntiEntropy_Check_DeferSync less flaky
I am not sure this is the full fix but it seems to help for me.
When this test flakes sometimes this happens:
--- FAIL: TestCoordinate_Node (1.69s)
panic: interface conversion: interface {} is nil, not structs.Coordinates [recovered]
FAIL github.com/hashicorp/consul/agent 19.999s
Exit code: 1
panic: interface conversion: interface {} is nil, not structs.Coordinates [recovered]
panic: interface conversion: interface {} is nil, not structs.Coordinates
There is definitely a bug lurking, but the code seems to imply this can
only return nil on 404. The tests previously were not checking the
status code.
The underlying cause of the flake is unknown, but this should turn the
failure into a more normal test failure.
When there is an node name conflicts, such messages are displayed within Consul:
`consul.fsm: EnsureRegistration failed: failed inserting node: Error while renaming Node ID: "e1d456bc-f72d-98e5-ebb3-26ae80d785cf": Node name node001 is reserved by node 05f10209-1b9c-b90c-e3e2-059e64556d4a with name node001`
While it is easy to find the node that has reserved the name, it is hard to find
the node trying to aquire the name since it is not registered, because it
is not part of `consul members` output
This PR will display the IP of the offender and solve far more easily those issues.
The embedded `Server` field on a `DNSServer` is only set inside of the
`ListenAndServe` method. If that method fails for reasons like the
address being in use and is not bindable, then the `Server` field will
not be set and the overall `Agent.Start()` will fail.
This will trigger the inner loop of `TestAgent.Start()` to invoke
`ShutdownEndpoints` which will attempt to pretty print the DNS servers
using fields on that inner `Server` field. Because it was never set,
this causes a nil pointer dereference and crashes the test.
Previously `verify_incoming` was required when turning on `auto_encrypt.allow_tls`, but that doesn't work together with HTTPS UI in some scenarios. Adding `verify_incoming_rpc` to the allowed configurations.
AutoEncrypt needs the server-port because it wants to talk via RPC. Information from gossip might not be available at that point and thats why the server-port is being used.
- Bootstrap escape hatches are OK.
- Public listener/cluster escape hatches are OK.
- Upstream listener/cluster escape hatches are not supported.
If an unsupported escape hatch is configured and the discovery chain is
activated log a warning and act like it was not configured.
Fixes#6160
Compiling this will set an optional SNI field on each DiscoveryTarget.
When set this value should be used for TLS connections to the instances
of the target. If not set the default should be used.
Setting ExternalSNI will disable mesh gateway use for that target. It also
disables several service-resolver features that do not make sense for an
external service.