Fixes#10563
The `resourceVersion` map was doing two jobs prior to this PR. The first job was
to track what version of every resource we know envoy currently has. The
second was to track subscriptions to those resources (by way of the empty
string for a version). This mostly works out fine, but occasionally leads to
consul removing a resource and accidentally (effectively) unsubscribing at the
same time.
The fix separates these two jobs. When all of the resources for a subscription
are removed we continue to track the subscription until envoy explicitly
unsubscribes
Signed-off-by: Jakub Sokołowski <jakub@status.im>
* agent: add failures_before_warning setting
The new setting allows users to specify the number of check failures
that have to happen before a service status us updated to be `warning`.
This allows for more visibility for detected issues without creating
alerts and pinging administrators. Unlike the previous behavior, which
caused the service status to not update until it reached the configured
`failures_before_critical` setting, now Consul updates the Web UI view
with the `warning` state and the output of the service check when
`failures_before_warning` is breached.
The default value of `FailuresBeforeWarning` is the same as the value of
`FailuresBeforeCritical`, which allows for retaining the previous default
behavior of not triggering a warning.
When `FailuresBeforeWarning` is set to a value higher than that of
`FailuresBeforeCritical it has no effect as `FailuresBeforeCritical`
takes precedence.
Resolves: https://github.com/hashicorp/consul/issues/10680
Signed-off-by: Jakub Sokołowski <jakub@status.im>
Co-authored-by: Jakub Sokołowski <jakub@status.im>
Licensing recently changed in Consul v1.10 and along with those changes
the client API was updated such that PutLicense and ResetLicense both
immediately return an error to avoid an unecessary round trip that will
inevitably fail.
For reference, see: 08eb600ee5
Unfortunately, this change broke forward compatibility such that a v1.10
client can no longer make these requests to a v1.9 server which is a
valid use case.
This commit reintroduces these requests to fix this compatibility
breakage but leaves the deprecation notices in tact.
This commit fixes a problem where parent Failovers where not showing (subset children were fine).
Seems to have been introduced with a move/glimmer upgrade here #9154 so I'm adding a 1.9.x backport.
This commit fixes 2 problems with our OIDC flow in the UI, the first is straightforwards, the second is relatively more in depth:
1: A typo (1.10.1 only)
During #10503 we injected our settings service into the our oidc-provider service, there are some comments in the PR as to the whys and wherefores for this change (https://github.com/hashicorp/consul/pull/10503/files#diff-aa2ffda6d0a966ba631c079fa3a5f60a2a1bdc7eed5b3a98ee7b5b682f1cb4c3R28)
Fixing the typo so it was no longer looking for an unknown service (repository/settings > settings)
fixed this.
2: URL encoding (1.9.x, 1.10.x)
TL;DR: /oidc/authorize/provider/with/slashes/code/with/slashes/status/with/slashes should be /oidc/authorize/provider%2Fwith%2Fslashes/code%2Fwith%2Fslashes/status%2Fwith%2Fslashes
When we receive our authorization response back from the OIDC 3rd party, we POST the code and status data from that response back to consul via acallback as part of the OIDC flow. From what I remember back when this feature was originally added, the method is a POST request to avoid folks putting secret-like things into API requests/URLs/query params that are more likely to be visible to the human eye, and POSTing is expected behaviour.
Additionally, in the UI we identify all external resources using unique resource identifiers. Our OIDC flow uses these resources and their identifiers to perform the OIDC flow using a declarative state machine. If any information in these identifiers uses non-URL-safe characters then these characters require URL encoding and we added a helper a while back to specifically help us to do this once we started using this for things that required URL encoding.
The final fix here make sure that we URL encode code and status before using them with one of our unique resource identifiers, just like we do with the majority of other places where we use these identifiers.
* deps: upgrade gogo-protobuf to v1.3.2
* go mod tidy using go 1.16
* proto: regen protobufs after upgrading gogo/protobuf
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Missed the need to add support for unix domain socket config via
api/command line. This is a variant of the problems described in
it is easy to drop one.
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
Consul 1.10 (PR #9792) introduced the ability to specify a prefix when
importing KV's. This however introduced a regression on Windows
systems which breaks `kv import`. The key name is joined with
specified`-prefix` using `filepath.Join()` which uses a forward slash
(/) to delimit values on Unix-based systems, and a backslash (\) to
delimit values on Windows – the latter of which is incompatible with
Consul KV paths.
This commit replaces filepath.Join() with path.Join() which uses a
forward slash as the delimiter, providing consistent key join behavior
across supported operating systems.
Fixes#10583
Replace call to /agent/self with /status/leader to verify agent
reachability before initializing a watch. This endpoint is not guarded
by ACLs, and as such can be queried by any API client regardless of
their permissions.
Fixes#9353
* defer setting the state before returning to avoid being stuck in `INITIALIZING` state
* add changelog
* move comment with the right if statement
* ca: report state transition error from setSTate
* update comment to reflect state transition
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Add support for setting QueryOptions on the following agent API endpoints:
- /agent/health/service/name/:name
- /agent/health/service/id/:id
- /agent/service/maintenance/:id
This follows the same pattern used in #9903 to support query options
for other agent API endpoints.
Resolves#9710
Knowing that blocking queries are firing does not provide much
information on its own. If we know the correlation IDs we can
piece together which parts of the snapshot have been populated.
Some of these responses might be empty from the blocking
query timing out. But if they're returning quickly I think we
can reasonably assume they contain data.
* return an error when the index is not valid
* check response as bool when applying `CAOpSetConfig`
* remove check for bool response
* fix error message and add check to test
* fix comment
* add changelog
If multiple instances of a service are co-located on the same node then
their proxies will all share a cache entry for their resolved service
configuration. This is because the cache key contains the name of the
watched service but does not take into account the ID of the watching
proxies.
This means that there will be multiple agent service manager watches
that can wake up on the same cache update. These watchers then
concurrently modify the value in the cache when merging the resolved
config into the local proxy definitions.
To avoid this concurrent map write we will only delete the key from
opaque config in the local proxy definition after the merge, rather
than from the cached value before the merge.
This change adds a new `dns_config.recursor_strategy` option which
controls how Consul queries DNS resolvers listed in the `recursors`
config option. The supported options are `sequential` (default), and
`random`.
Closes#8807
Co-authored-by: Blake Covarrubias <blake@covarrubi.as>
Co-authored-by: Priyanka Sengupta <psengupta@flatiron.com>
Previously when namespaces were enabled, we weren't requesting permission for the actively selected namespace, and instead always checking the permissions for the default namespace.
This commit ensures we request permissions for the actively selected namespace.
This commit adds a bit of string wrangling to avoid the keys in our javascript source file also being transformed. Additionally, whilst looking at this we decided that Maps are a better dictionary than javascript objects, so we moved to use those here also (but this doesn't affect the issue)
Adds 'can access ACLs' which means one of two things
1. When ACLs are disabled I can access the 'please enable ACLs' page
2. When ACLs are enabled, its the same as canRead
When clicking to create a KV within folder name, would would be viewing a form that was a form for creating a KV in the root, which when the user clicked to save, saved the KV in the root.
For the moment at least I've removed the code that strips double slashes, and whilst this isn't ideal, it looks like we've picked up one of those bugs that turns into a 'feature', and completely reworking KV to not rely on the double slashes is not really an option right now.
The compatv2 integration tests were failing because they use an older CLI version with a newer
HTTP API. This commit restores the GRPCPort field to the DebugConfig output to allow older
CIs to continue to fetch the port.
* ca: move provider creation into CAManager
This further decouples the CAManager from Server. It reduces the interface between them and
removes the need for the SetLogger method on providers.
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* ca: move SignCertificate to the file where it is used
* auto-config: move autoConfigBackend impl off of Server
Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.
* fix linter issues
* check error when `raftApplyMsgpack`
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* check expiry date of the intermediate before using it to sign a leaf
* fix typo in comment
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
* Fix test name
* do not check cert start date
* wrap error to mention it is the intermediate expired
* Fix failing test
* update comment
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use shim to avoid sleep in test
* add root cert validation
* remove duplicate code
* Revert "fix linter issues"
This reverts commit 6356302b54f06c8f2dee8e59740409d49e84ef24.
* fix import issue
* gofmt leader_connect_ca
* add changelog entry
* update error message
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* fix error message in test
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* add intermediate ca metric routine
* add Gauge config for intermediate cert
* Stop metrics routine when stopping leader
* add changelog entry
* updage changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use variables instead of a map
* go imports sort
* Add metrics for primary and secondary ca
* start metrics routine in the right DC
* add telemetry documentation
* update docs
* extract expiry fetching in a func
* merge metrics for primary and secondary into signing ca metric
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
The default namespace, and the tokens default namespace (or its origin namespace) is slightly more complicated than other things we deal with in the UI, there's plenty of info/docs on this that I've added in this PR.
Previously:
When a namespace was not specified in the URL, we used to default to the default namespace. When you logged in using a token we automatically forward you the namespace URL that your token originates from, so you are then using the namespace for your token by default. You can of course then edit the URL to remove the namespace portion, or perhaps revisit the UI at the root path with you token already set. In these latter cases we would show you information from the default namespace. So if you had no namespace segment/portion in the URL, we would assume default, perform actions against the default namespace and highlight the default namespace in the namespace selector menu. If you wanted to perform actions in your tokens origin namespace you would have to manually select it from the namespace selector menu.
This PR:
Now, when you have no namespace segment/portion in the URL, we use the token's origin namespace instead (and if you don't have a token, we then use the default namespace like it was previously)
Notes/thoughts:
I originally thought we were showing an incorrectly selected namespace in the namespace selector, but it also matched up with what we were doing with the API, so it was in fact correct. The issue was more that we weren't selecting the origin namespace of the token for the user when a namespace segment was omitted from the URL. Seeing as we automatically forward you to the tokens origin namespace when you log in, and we were correctly showing the namespace we were acting on when you had no namespace segment in the URL (in the previous case default), I'm not entirely sure how much of an issue this actually was.
This characteristic of namespace+token+namespace is a little weird and its easy to miss a subtlety or two so I tried to add some documentation in here for future me/someone else (including some in depth code comment around one of the API endpoints where this is very subtle and very hard to miss). I'm not the greatest at words, so would be great to get some edits there if it doesn't seem clear to folks.
The fact that we used to save your previous datacenter and namespace into local storage for reasons also meant the interaction here was slightly more complicated than it needed to be, so whilst we were here we rejigged things slightly to satisfy said reasons still but not use local storage (we try and grab the info from higher up). A lot of the related code here is from before we had our Routlets which I think could probably make all of this a lot less complicated, but I didn't want to do a wholesale replacement in this PR, we can save that for a separate PR on its own at some point.
* trim carriage return from certificates when inserting rootCA in the inMemDB
* format rootCA properly when returning the CA on the connect CA endpoint
* Fix linter warnings
* Fix providers to trim certs before returning it
* trim newlines on write when possible
* add changelog
* make sure all provider return a trailing newline after the root and intermediate certs
* Fix endpoint to return trailing new line
* Fix failing test with vault provider
* make test more robust
* make sure all provider return a trailing newline after the leaf certs
* Check for suffix before removing newline and use function
* Add comment to consul provider
* Update change log
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* fix typo
* simplify code callflow
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* extract requireNewLine as shared func
* remove dependency to testify in testing file
* remove extra newline in vault provider
* Add cert newline fix to envoy xds
* remove new line from mock provider
* Remove adding a new line from provider and fix it when the cert is read
* Add a comment to explain the fix
* Add missing for leaf certs
* fix missing new line
* fix missing new line in leaf certs
* remove extra new line in test
* updage changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* fix in vault provider and when reading cache (RPC call)
* fix AWS provider
* fix failing test in the provider
* remove comments and empty lines
* add check for empty cert in test
* fix linter warnings
* add new line for leaf and private key
* use string concat instead of Sprintf
* fix new lines for leaf signing
* preallocate slice and remove append
* Add new line to `SignIntermediate` and `CrossSignCA`
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
sync/atomic must be used with 64-bit aligned fields, and that alignment is difficult to
ensure unless the field is the first one in the struct.
https://golang.org/pkg/sync/atomic/#pkg-note-BUG.
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.
Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.
Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:
ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
* return an invalid record when asked for an addr dns with type other then A and AAAA
* add changelog
* fix ANY use case and add a test for it
* update changelog type
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* return empty response if the question record type do not match for addr
* set comment in the right place
* return A\AAAA record in extra section if record type is not A\AAAA for addr
* Fix failing test
* remove commented code
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use require for test validation
* use variable to init struct
* fix failing test
* Update agent/dns.go
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update .changelog/10401.txt
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update agent/dns.go
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update agent/dns.go
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update agent/dns.go
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* fix compilation error
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* ui: Move all our icons to use CSS custom properties
The good thing about SASS vars is, if you don't use them they get removed from the final build. Whereas with CSS we have no tree-shaking to get rid of unused CSS custom properties. We can mostly work around this and for some things like colors its no big deal if we have some hex-codes in the build that we don't use as hex-codes are relatively small.
We've been slowly but surely moving all of our colors (and other things) to use CSS custom properties instead of SASS vars now that we have them available.
This commit makes use of the 'tree-shaking' abilities of @extend to ensure that we only compile in the icons that we use.
This commit is mostly churn-less as we already use @extend for the majority of our icons, so generally there is zero change here for working on the UI, but I did spot one single place where we were using SASS vars instead of @extend. This now uses the new form (second commit)
Interestingly this reduces our CSS payload by ~2kb to ~53kb (around 25kb of that is these icons)
* remove flush for each write to http response in the agent monitor endpoint
* fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover.
* start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it
* flush every 500ms to optimize log writing in the http server side.
* add changelog file
* add issue url to changelog
* fix changelog url
* Update changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use ticker to flush and avoid race condition when flushing in a different goroutine
* stop the ticker when done
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover."
This reverts commit 1eeddf7a
* wait for log consumer loop to start before registering the sink
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Updates to a cluster will clear the associated endpoints, and updates to
a listener will clear the associated routes. Update the incremental xDS
logic to account for this implicit cleanup so that we can finish warming
the clusters and listeners.
Fixes#10379