Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect:
- `DiscoveryChain.Get`
- `ConfigEntry.ResolveServiceConfig`
- `Intentions.Match`
All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism.
### No Config Entries Found
In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110.
### No Change Since Last Wakeup
Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110.
This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report.
### Extra Endpoints
Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated:
- `ConfigEntry.List`
- `Internal.IntentionUpstreams` (tproxy)
Many places in consul already treated node names case insensitively.
The state store indexes already do it, but there are a few places that
did a direct byte comparison which have now been corrected.
One place of particular consideration is ensureCheckIfNodeMatches
which is executed during snapshot restore (among other places). If a
node check used a slightly different casing than the casing of the node
during register then the snapshot restore here would deterministically
fail. This has been fixed.
Primary approach:
git grep -i "node.*[!=]=.*node" -- ':!*_test.go' ':!docs'
git grep -i '\[[^]]*member[^]]*\]
git grep -i '\[[^]]*\(member\|name\|node\)[^]]*\]' -- ':!*_test.go' ':!website' ':!ui' ':!agent/proxycfg/testing.go:' ':!*.md'
There are some cross-config-entry relationships that are enforced during
"graph validation" at persistence time that are required to be
maintained. This means that config entries may form a digraph at times.
Config entry replication procedes in a particular sorted order by kind
and name.
Occasionally there are some fixups to these digraphs that end up
replicating in the wrong order and replicating the leaves
(ingress-gateway) before the roots (service-defaults) leading to
replication halting due to a graph validation error related to things
like mismatched service protocol requirements.
This PR changes replication to give each computed change (upsert/delete)
a fair shot at being applied before deciding to terminate that round of
replication in error. In the case where we've simply tried to do the
operations in the wrong order at least ONE of the outstanding requests
will complete in the right order, leading the subsequent round to have
fewer operations to do, with a smaller likelihood of graph validation
errors.
This does not address all scenarios, but for scenarios where the edits
are being applied in the wrong order this should avoid replication
halting.
Fixes#9319
The scenario that is NOT ADDRESSED by this PR is as follows:
1. create: service-defaults: name=new-web, protocol=http
2. create: service-defaults: name=old-web, protocol=http
3. create: service-resolver: name=old-web, redirect-to=new-web
4. delete: service-resolver: name=old-web
5. update: service-defaults: name=old-web, protocol=grpc
6. update: service-defaults: name=new-web, protocol=grpc
7. create: service-resolver: name=old-web, redirect-to=new-web
If you shutdown dc2 just before (4) and turn it back on after (7)
replication is impossible as there is no single edit you can make to
make forward progress.
* Add %panel CSS component
* Deprecate old menu-panel component
* Various smallish tweaks to disclosure-menu
* Move all menus in the app chrome to use new DisclosureMenu
* Follow up CSS to move all app chrome menus to new components
* Don't prevent default any events from anchors
* Add a tick to click steps
* Delete collapsible notices component and related helper
* Add relative t action/helper to our Route component
* Replace single use CollapsibleNotices with multi-use Disclosure
* Parse datacenter from request
- Parse the value of the datacenter from the create/delete requests for AuthMethods and BindingRules so that they can be created in and deleted from the datacenters specified in the request.
By using the query results as state.
Blocking queries are efficient when the query matches some results,
because the ModifyIndex of those results, returned as queryMeta.Mindex,
will never change unless the items themselves change.
Blocking queries for non-existent items are not efficient because the
queryMeta.Index can (and often does) change when other entities are
written.
This commit reduces the churn of these queries by using a different
comparison for "has changed". Instead of using the modified index, we
use the existence of the results. If the previous result was "not found"
and the new result is still "not found", we know we can ignore the
modified index and continue to block.
This is done by setting the minQueryIndex to the returned
queryMeta.Index, which prevents the query from returning before a state
change is observed.
We've noticed that a trace that is captured over the full duration is
too large to open on most machines. A trace.out captured over just the
interval period (30s by default) should be a more than enough time to
capture trace data.
This will both save on unnecessary raft operations as well as
unnecessarily incrementing the raft modify index of config entries
subject to no-op updates.
The race detector noticed this initially in `TestAgentConfigWatcherSidecarProxy` but it is not restricted to just tests.
The two main changes here were:
- ensure that before we mutate the internal `agent/local` representation of a Service (for tags or VIPs) we clone those fields
- ensure that there's no function argument joint ownership between the caller of a function and the local state when calling `AddService`, `AddCheck`, and related using `copystructure` for now.
* First phase of refactoring PermissionDeniedError
Add extended type PermissionDeniedByACLError that captures information
about the accessor, particular permission type and the object and name
of the thing being checked.
It may be worth folding the test and error return into a single helper
function, that can happen at a later date.
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
This commit excludes the health of any service instances from the Node Listing page. This means that if you are viewing the Node listing page you will only see failing nodes if there are any Node Checks failing, Service Instance Health checks are no longer taken into account.
Co-authored-by: Jamie White <jamie@jgwhite.co.uk>
We noticed that the Service Instance listing on both Node and Service views where not taking into account proxy instance health. This fixes that up so that the small health check information in each Service Instance row includes the proxy instances health checks when displaying Service Instance health (afterall if the proxy instance is unhealthy then so is the service instance that it should be proxying)
* Refactor Consul::InstanceChecks with docs
* Add to-hash helper, which will return an object keyed by a prop
* Stop using/relying on ember-data type things, just use a hash lookup
* For the moment add an equivalent "just give me proxies" model prop
* Start stitching things together, this one requires an extra HTTP request
..previously we weren't even requesting proxies instances here
* Finish up the stitching
* Document Consul::ServiceInstance::List while I'm here
* Fix up navigation mocks Name > Service
Fixes#11876
This enforces that multiple xDS mutations are not issued on the same ADS connection at once, so that we can 100% control the order that they are applied. The original code made assumptions about the way multiple in-flight mutations were applied on the Envoy side that was incorrect.
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream,
prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field
with the full list of currently subscribed items, instead of simply omitting it
to infer that it wanted everything (which is what wildcard mode means).
This upstream issue was filed in envoyproxy/envoy#16063 and fixed in
envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later
versions (later refactored in envoyproxy/envoy#16855).
This PR conditionally forces LDS/CDS to be wildcard-only even when the
connected Envoy requests a non-wildcard subscription, but only does so on
versions prior to `1.19.0`, as we should not need to do this on later versions.
This fixes the failure case as described here: #11833 (comment)
Co-authored-by: Huan Wang <fredwanghuan@gmail.com>
- Adding a 'Partition' and 'RetryJoin' command allows test cases where
one would like to spin up a Consul Agent in a non-default partition to
test use-cases that are common when enabling Admin Partition on
Kubernetes.
The fix here is two fold:
- We shouldn't be providing the DataSource (which loads the data) with an id when we are creating from within a folder (in the buggy code we are providing the parentKey of the new KV you are creating)
- Being able to provide an empty id to the DataSource/KV repository and that repository responding with a newly created object is more towards the "new way of doing forms", therefore the corresponding code to return a newly created ember-data object. As we changed the actual bug in point 1 here, we need to make sure the repository responds with an empty object when the request id is empty.
* xds: refactor ingress listener SDS configuration
* xds: update resolveListenerSDS call args in listeners_test
* ingress: add TLS min, max and cipher suites to GatewayTLSConfig
* xds: implement envoyTLSVersions and envoyTLSCipherSuites
* xds: merge TLS config
* xds: configure TLS parameters with ingress TLS context from leaf
* xds: nil check in resolveListenerTLSConfig validation
* xds: nil check in makeTLSParameters* functions
* changelog: add entry for TLS params on ingress config entries
* xds: remove indirection for TLS params in TLSConfig structs
* xds: return tlsContext, nil instead of ambiguous err
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
* xds: switch zero checks to types.TLSVersionUnspecified
* ingress: add validation for ingress config entry TLS params
* ingress: validate listener TLS config
* xds: add basic ingress with TLS params tests
* xds: add ingress listeners mixed TLS min version defaults precedence test
* xds: add more explicit tests for ingress listeners inheriting gateway defaults
* xds: add test for single TLS listener on gateway without TLS defaults
* xds: regen golden files for TLSVersionInvalid zero value, add TLSVersionAuto listener test
* types/tls: change TLSVersion to string
* types/tls: update TLSCipherSuite to string type
* types/tls: implement validation functions for TLSVersion and TLSCipherSuites, make some maps private
* api: add TLS params to GatewayTLSConfig, add tests
* api: add TLSMinVersion to ingress gateway config entry test JSON
* xds: switch to Envoy TLS cipher suite encoding from types package
* xds: fixup validation for TLSv1_3 min version with cipher suites
* add some kitchen sink tests and add a missing struct tag
* xds: check if mergedCfg.TLSVersion is in TLSVersionsWithConfigurableCipherSuites
* xds: update connectTLSEnabled comment
* xds: remove unsued resolveGatewayServiceTLSConfig function
* xds: add makeCommonTLSContextFromLeafWithoutParams
* types/tls: add LessThan comparator function for concrete values
* types/tls: change tlsVersions validation map from string to TLSVersion keys
* types/tls: remove unused envoyTLSCipherSuites
* types/tls: enable chacha20 cipher suites for Consul agent
* types/tls: remove insecure cipher suites from allowed config
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 and TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 are both explicitly listed as insecure and disabled in the Go source.
Refs https://cs.opensource.google/go/go/+/refs/tags/go1.17.3:src/crypto/tls/cipher_suites.go;l=329-330
* types/tls: add ValidateConsulAgentCipherSuites function, make direct lookup map private
* types/tls: return all unmatched cipher suites in validation errors
* xds: check that Envoy API value matching TLS version is found when building TlsParameters
* types/tls: check that value is found in map before appending to slice in MarshalEnvoyTLSCipherSuiteStrings
* types/tls: cast to string rather than fmt.Printf in TLSCihperSuite.String()
* xds: add TLSVersionUnspecified to list of configurable cipher suites
* structs: update note about config entry warning
* xds: remove TLS min version cipher suite unconfigurable test placeholder
* types/tls: update tests to remove assumption about private map values
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
* Make sure the mocks reflect the requested partition/namespace
* Ensure partition is passed through to the HTTP adapter
* Pass AuthMethod object through to TokenSource in order to use Partition
* Change up docs and add potential improvements for future
* Pass the query partition back onto the response
* Make sure the OIDC callback mock returns a Partition
* Enable OIDC provider mock overwriting during acceptance testing
* Make sure we can enable partitions and SSO post bootup only required
...for now
* Wire up oidc provider mocking
* Add SSO full auth flow acceptance tests
* ui: Don't even ask whether we are authorized for a KV...
...just let the actual API tell us in the response, thin-client style.
* Add some similar commenting for previous PRs related to this problem
* Add some less fake API data
* Rename the models class so as to not be confused with JS Proxies
* Rearrange routlets slightly and add some initial outletFor tests
* Move away from a MeshChecks computed property and just use a helper
* Just use ServiceChecks for healthiness filtering for the moment
* Make TProxy cookie configurable
* Amend exposed paths and upstreams so they know about meta AND proxy
* Slight bit of TaggedAddresses refactor while I was checking for `meta` etc
* Document CONSUL_TPROXY_ENABLE
We recently changed the intentions form to take a full model of a dc rather than just the string identifier (so {Name: 'dc', Primary: true} vs just 'dc' in order to know whether the DC is the primary or not.
Unfortunately, we only did this on the global intentions page not the per service intentions page. This makes it impossible to save an intention from the per service intention page (whilst you can still save intentions from the global intention page as normal).
The fix here pretty much copy/pastes the approach taken in the global intention edit template over to the per service intention edit template.
Tests have been added for creation in the per service intention section, which again are pretty much just copied from the global one, unfortunately this didn't exist previously which would have helped prevent this.
Update the `/agent/check/deregister/` API endpoint to return a 404
HTTP response code when an attempt is made to de-register a check ID
that does not exist on the agent.
This brings the behavior of /agent/check/deregister/ in line with the
behavior of /agent/service/deregister/ which was changed in #10632 to
similarly return a 404 when de-registering non-existent services.
Fixes#5821
* clone the service under lock to avoid a data race
* add change log
* create a struct and copy the pointer to mutate it to avoid a data race
* fix failing test
* revert added space
* add comments, to clarify the data race.
Error messages related to service and check operations previously included
the following substrings:
- service %q
- check %q
From this error message, it isn't clear that the expected field is the ID for
the entity, not the name. For example, if the user has a service named test,
the error message would read 'Unknown service "test"'. This is misleading -
a service with that *name* does exist, but not with that *ID*.
The substrings above have been modified to make it clear that ID is needed,
not name:
- service with ID %q
- check with ID %q
* ui: Add login button to per service intentions for zero results
* Add login button and consistent header for when you have zero nodes
* `services` doesn't exists use `items` consequently:
Previous to this fix we would not show a more tailored message for when
you empty result set was due to a user search rather than an empty
result set straight from the backend
* Fix `error` > `@error` in ErrorState plus code formatting and more docs
* Changelog
When a URL path is not found, return a non-empty message with the 404 status
code to help the user understand what went wrong. If the URL path was not
prefixed with '/v1/', suggest that may be the cause of the problem (which is a
common mistake).
* Update disco fixtures now we have partitions
* Add virtual-admin-6 fixture with partition 'redirects' and failovers
* Properly cope with extra partition segment for splitters and resolvers
* Make 'redirects' and failovers look/act consistently
* Fixup some unit tests
Fixes a bug whereby servers present in multiple network areas would be
properly segmented in the Router, but not in the gRPC mirror. This would
lead servers in the current datacenter leaving from a network area
(possibly during the network area's removal) from deleting their own
records that still exist in the standard WAN area.
The gRPC client stack uses the gRPC server tracker to execute all RPCs,
even those targeting members of the current datacenter (which is unlike
the net/rpc stack which has a bypass mechanism).
This would manifest as a gRPC method call never opening a socket because
it would block forever waiting for the current datacenter's pool of
servers to be non-empty.
types: add TLS constants
types: distinguish between human and Envoy serialization for TLSVersion constants
types: add DeprecatedAgentTLSVersions for backwards compatibility
types: add methods for printing TLSVersion as strings
types: add TLSVersionInvalid error value
types: add a basic test for TLSVersion comparison
types: add TLS cihper suite mapping using IANA constant names and values
types: adding ConsulAutoConfigTLSVersionStrings
changelog: add entry for TLSVersion and TLSCipherSuite types
types: initialize TLSVerison constants starting at zero
types: remove TLSVersionInvalid < 0 test
types: update note for ConsulAutoConfigTLSVersionStrings
types: programmatically invert TLSCipherSuites for HumanTLSCipherSuiteStrings lookup map
Co-authored-by: Dan Upton <daniel@floppy.co>
types: add test for TLSVersion zero-value
types: remove unused EnvoyTLSVersionStrings
types: implement MarshalJSON for TLSVersion
types: implement TLSVersionUnspecified as zero value
types: delegate TLS.MarshalJSON to json.Marshal, use ConsulConfigTLSVersionStrings as default String() values
Co-authored-by: Dan Upton <daniel@floppy.co>
The test added in this commit shows the problem. Previously the
SigningKeyID was set to the RootCert not the local leaf signing cert.
This same bug was fixed in two other places back in 2019, but this last one was
missed.
While fixing this bug I noticed I had the same few lines of code in 3
places, so I extracted a new function for them.
There would be 4 places, but currently the InitializeCA flow sets this
SigningKeyID in a different way, so I've left that alone for now.
We were not adding the local signing cert to the CARoot. This commit
fixes that bug, and also adds support for fixing existing CARoot on
upgrade.
Also update the tests for both primary and secondary to be more strict.
Check the SigningKeyID is correct after initialization and rotation.
This commit uses all our new ways of doing things to Lock Sessions and their interactions with KV and Nodes. This is mostly around are new under-the-hood things, but also I took the opportunity to upgrade some of the CSS to reuse some of our CSS utils that have been made over the past few months (%csv-list and %horizontal-kv-list).
Also added (and worked on existing) documentation for Lock Session related components.
- Moves where they appear up to the <App /> component.
- Instead of a <Notification /> wrapping component to move whatever you use for a notification up to where they need to appear (via ember-cli-flash), we now use a {{notification}} modifier now we have modifiers.
- Global notifications/flashes are no longer special styles of their own. You just use the {{notification}} modifier to hoist whatever component/element you want up to the top of the page. This means we can re-use our existing <Notice /> component for all our global UI notifications (this is the user visible change here)
* Upgrade AuthForm and document current state a little better
* Hoist SSO out of the AuthForm
* Bare minimum admin partitioned SSO
also:
ui: Tabbed Login with Token or SSO interface (#11619)
- I upgraded our super old, almost the first ember component I wrote, to use glimmer/almost template only. This should use slots/contextual components somehow, but thats a bigger upgrade so I didn't go that far.
- I've been wanting to upgrade the shape of our StateChart component for a very long while now, here its very apparent that it would be much better to do this sooner rather than later. I left it as is for now, but there will be a PR coming soon with a slight reshaping of this component.
- Added a did-upsert modifier which is a mix of did-insert/did-update
- Documentation added/amended for all the new things.
* Support vault auth methods for the Vault connect CA provider
* Rotate the token (re-authenticate to vault using auth method) when the token can no longer be renewed
For our dc, nspace and partition 'bucket' menus, sometimes when selecting one 'bucket' we need to reset a different 'bucket' back to the one that your token has by default (or the default if not). For example when switching to a different partition whilst you are in a non-default namespace of another partition, we need to switch you to the token default namespace of the partition you are switching to.
Fixes an issue where the code editor would not resizing to the full extent of the browser window plus CodeEditor restyling/refactoring
- :label named block
- :tools named block
- :content named block
- code and CSS cleanup
- CodeEditor.mdx
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
Co-authored-by: John Cowen <johncowen@users.noreply.github.com>
Most HTTP API calls will use the default namespace of the calling token to additionally filter/select the data used for the response if one is not specified by the frontend.
The internal permissions/authorize endpoint does not do this (you can ask for permissions from different namespaces in on request).
Therefore this PR adds the tokens default namespace in the frontend only to our calls to the authorize endpoint. I tried to do it in a place that made it feel like it's getting added in the backend, i.e. in a place which was least likely to ever require changing or thinking about.
Note: We are probably going to change this internal endpoint to also inspect the tokens default namespace on the backend. At which point we can revert this commit/PR.
* Add the same support for the tokens default partition
* command/redirect_traffic: add rules to redirect DNS to Consul. Currently uses a hack to get the consul dns service ip, and this hack only works when the service is deployed in the same namespace as consul.
* command/redirect_traffic: redirect DNS to Consul when -consul-dns-ip is passed in
* Add unit tests to Consul DNS IP table redirect rules
Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: Iryna Shustava <ishustava@users.noreply.github.com>
Temporarily revert to pre-1.10 UI functionality by overwriting frontend
permissions. These are used to hide certain UI elements, but they are
still enforced on the backend.
This temporary measure should be removed again once https://github.com/hashicorp/consul/issues/11098
has been resolved
The duo of `makeUpstreamFilterChainForDiscoveryChain` and `makeListenerForDiscoveryChain` were really hard to reason about, and led to concealing a bug in their branching logic. There were several issues here:
- They tried to accomplish too much: determining filter name, cluster name, and whether RDS should be used.
- They embedded logic to handle significantly different kinds of upstream listeners (passthrough, prepared query, typical services, and catch-all)
- They needed to coalesce different data sources (Upstream and CompiledDiscoveryChain)
Rather than handling all of those tasks inside of these functions, this PR pulls out the RDS/clusterName/filterName logic.
This refactor also fixed a bug with the handling of [UpstreamDefaults](https://www.consul.io/docs/connect/config-entries/service-defaults#defaults). These defaults get stored as UpstreamConfig in the proxy snapshot with a DestinationName of "*", since they apply to all upstreams. However, this wildcard destination name must not be used when creating the name of the associated upstream cluster. The coalescing logic in the original functions here was in some situations creating clusters with a `*.` prefix, which is not a valid destination.
* ui: Filter global intentions list by namespace and partition
Filters global intention listing by the current partition rather than trying to use a wildcard.
Fixes an issue described in #10132, where if two DCs are WAN federated
over mesh gateways, and the gateway in the non-primary DC is terminated
and receives a new IP address (as is commonly the case when running them
on ephemeral compute instances) the primary DC is unable to re-establish
its connection until the agent running on its own gateway is restarted.
This was happening because we always preferred gateways discovered by
the `Internal.ServiceDump` RPC (which would fail because there's no way
to dial the remote DC) over those discovered in the federation state,
which is replicated as long as the primary DC's gateway is reachable.
* Support Vault Namespaces explicitly in CA config
If there is a Namespace entry included in the Vault CA configuration,
set it as the Vault Namespace on the Vault client
Currently the only way to support Vault namespaces in the Consul CA
config is by doing one of the following:
1) Set the VAULT_NAMESPACE environment variable which will be picked up
by the Vault API client
2) Prefix all Vault paths with the namespace
Neither of these are super pleasant. The first requires direct access
and modification to the Consul runtime environment. It's possible and
expected, not super pleasant.
The second requires more indepth knowledge of Vault and how it uses
Namespaces and could be confusing for anyone without that context. It
also infers that it is not supported
* Add changelog
* Remove fmt.Fprint calls
* Make comment clearer
* Add next consul version to website docs
* Add new test for default configuration
* go mod tidy
* Add skip if vault not present
* Tweak changelog text
* Remove some usage of md5 from the system
OSS side of https://github.com/hashicorp/consul-enterprise/pull/1253
This is a potential security issue because an attacker could conceivably manipulate inputs to cause persistence files to collide, effectively deleting the persistence file for one of the colliding elements.
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
Port of: Ensure we check intention service prefix permissions for per service (#11270)
Previously, when showing some action buttons for 'per service intentions' we used a global 'can I do something with any intention' permission to decide whether to show a certain button or not. If a user has a token that does not have 'global' intention permissions, but does have intention permissions on one or more specific services (for example via service / service_prefix), this meant that we did not show them certain buttons required to create/edit the intentions for this specific service.
This PR adds that extra permissions check so we now check the intentions permissions per service instead of using the 'global' "can I edit intentions" question/request.
**Notes:**
- If a HTML button is `disabled` this means tippy.js doesn't adopt the
popover properly and subsequently hide it from the user, so aswell as
just disabling the button so you can't active the popover, we also don't
even put the popover on the page
- If `ability.item` or `ability.item.Resources` are empty then assume no access
**We don't try to disable service > right hand side intention actions here**
Whether you can create intentions for a service depends on the
_destination_ of the intention you would like to create. For the
topology view going from the LHS to the center, this is straightforwards
as we only need to know the permissions for the central service, as when
you are going from the LHS to the center, the center is the
_destination_.
When going from the center to the RHS the _destination[s]_ are on the
RHS. This means we need to know the permissions for potentially 1000s of
services all in one go in order to know when to show a button or not.
We can't realistically discover the permissions for service > RHS
services as we'd have either make a HTTP request per right hand service,
or potentially make an incredibly large POST request for all the
potentially 1000s of services on the right hand side (more preferable to
1000s of HTTP requests).
Therefore for the moment at least we keep the old functionality (thin client)
for the middle to RHS here. If you do go to click on the button and you
don't have permissions to update the intention you will still not be
able to update it, only you won't know this until you click the button
(at which point you'll get a UI visible 403 error)
Note: We reversed the conditional here between 1.10 and 1.11
So this make 100% sense that the port is different here to 1.11
* add root_cert_ttl option for consul connect, vault ca providers
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
* add changelog, pr feedback
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* Update .changelog/11428.txt, more docs
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update website/content/docs/agent/options.mdx
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
* ui: Ensure dc selector correctly shows the currently selected dc
* ui: Restrict access to non-default partitions in non-primaries (#11420)
This PR restricts access via the UI to only the default partition when in a non-primary datacenter i.e. you can only have multiple (non-default) partitions in the primary datacenter.
* Add `is` and `test` helpers in a similar vein to `can`
Adds 2 new helpers in a similar vein to ember-cans can:
- `is` allows you to use vocab/phrases such as (is "something model") which calls isSomething() on the models ability.
- `test` allows you to use vocab/phrases such as (test "is something model") or (test "can something model")which calls isSomething() / canSomething() on the models ability. Mostly using the is helper and the can helper. It's basically the is/can helper combined.
* Adds TextInput component + related modifiers/helpers/machines/services (#11189)
Adds a few new components/modifiers/helpers to aid building forms.
- state-chart helper, used in lieu of a more generic approach for requiring our statecharts.
- A few modifications to our existing disabled modifier.
- A new 'validation' modifier, a super small form validation approach built to make use of state charts (optionally). Eventually we should be able to replace our current validation approach (ember-changeset-validations + extra deps) with this.
- A new TextInput component, which is the first of our new components specifically to make it easy to build forms with validations. This is still a WIP, I left some comments in pointing out where this one would be progressed, but as we don't need the planned functionality yet, I left it where it was. All of this will be fleshed out more at a later date.
Documentation is included for all of ^
* ui: Adds initial CRUD for partitions (#11190)
Adds basic CRUD support for partitions. Engineering-wise probably the biggest takeaway here is that we needed to write very little javascript code to add this entire feature, and the little javascript we did need to write was very straightforwards. Everything is pretty much just HTML. Another note to make is that both ember-changeset and ember-data (model layer things) are now completely abstracted away from the view layer of the application.
New components:
- Consul::Partition::Form
- Consul::Partition::List
- Consul::Partition::Notifications
- Consul::Partition::SearchBar
- Consul::Partition::Selector
See additional documentation here for more details
New Route templates:
- index.hbs partition listing/searching/filtering
- edit.hbs partition editing and creation
Additionally:
There is some additional debug work here for better observability and to prevent any errors regarding our href-to usage when a dc is not available in our documentation site.
Our softDelete functionality has been DRYed out a little to be used across two repos.
isLinkable was removed from our ListCollection component for lists like upstream and service listing, and instead use our new is helper from within the ListCollection, meaning we've added a few more lighterweight templateOnly components.
* ui: Exclude all debug-like files from the build (#11211)
This PR adds **/*-debug.* to our test/prod excluded files (realised I needed to add test-support.js also so added that here as its more or less the same thing). Conditionally juggling ES6 static imports (specifically debug ones) for this was also getting a little hairy, so I moved it all to use the same approach as our conditional routes. All in all it brings the vendor build back down to ~430kb gzipped.
From an engineers perspective, whenever specifying colors from now on we should use the form:
```
color: rgb(var(--tone-red-500));
```
Please note:
- Use rgb. This lets us do this like rgb(var(--tone-red-500) / 10%) so we can use a 10% opacity red-500 if we ever need to whilst still making use of our color tokens.
- Use --tone-colorName-000 (so the prefix tone). Previously we could use a mix of --gray-500: $gray-500 (note the left hand CSS prop and right hand SASS var) for the things we need to theme currently. As we no longer use SASS we can't do --gray-500: --gray-500, so we now do --tone-gray-500: --gray-500.
Just for clarity after that, whenever specifying a color anywhere, use rgb and --tone. There is only one reason where you might not use tone, and that is if you never want a color to be affected by a theme (for example a background shadow probably always should use --black)
There are a 2 or 3 left for the code editor, plus our custom-query values
> In the future, this should all be moved to each individual repository now, which will mean we can finally get rid of this service.
This PR moves reconciliation to 'each individual repository'. I stopped short of getting rid of the service, but its so small now we pretty much don't need it. I'd rather wait until I look at the equivalent DataSink service and see if we can get rid of both equivalent services together (this also currently dependant on work soon to be merged)
Reconciliation of models (basically doing the extra work to clean up the ember-data store and bring our frontend 'truth' into line with the actual backend truth) when blocking/long-polling on different views/filters of data is slightly more complicated due to figuring out what should be cleaned up and what should be left in the store. This is especially apparent for KVs.
I built in a such a way to hopefully make sure it will all make sense for the future. I also checked that this all worked nicely with all our models, even KV which has never supported blocking queries. I left all that work in so that if we want to enable blocking queries/live updates for KV it now just involves deleting a couple of lines of code.
There is a tonne of old stuff that we can clean up here now (our 'fake headers' that we pass around) and I've added that to my list of thing for a 'Big Cleanup PR' that will remove lots of code that we no longer require.
Add changelog to document what changed.
Add entry to telemetry section of the website to document what changed
Add docs to the usagemetric endpoint to help document the metrics in code
Our DataSource came in very iteratively, when we first started using it we specifically tried not to use it for things that would require portions of the @src="" attribute to be URL encoded (so things like service names couldn't be used, but dc etc would be fine). We then gradually added an easy way to url encode the @src="" attributes with a uri helper and began to use the DataSource component more and more. This meant that some DataSource usage continued to be used without our uri helper.
Recently we hit #10901 which was a direct result of us not encoding @src values/URIs (I didn't realise this was one of the places that required URL encoding) and not going back over things to finish things off once we had implemented our uri helper, resulting in ~half of the codebase using it and ~half of it not.
Now that almost all of the UI uses our DataSource component, this PR makes it even harder to not use the uri helper, by wrapping the string that it requires in a private URI class/object, that is then expected/asserted within the DataSource component/service. This means that as a result of this PR you cannot pass a plain string to the DataSource component without seeing an error in your JS console, which in turn means you have to use the uri helper, and it's very very hard to not URL encode any dynamic/user provided values, which otherwise could lead to bugs/errors similar to the one mentioned above.
The error that you see when you don't use the uri helper is currently a 'soft' dev time only error, but like our other functionality that produces a soft error when you mistakenly pass an undefined value to a uri, at some point soon we will make these hard failing "do not do this" errors.
Both of these 'soft error' DX features have been used this to great effect to implement our Admin Partition feature and these kind of things will minimize the amount of these types of bugs moving forwards in a preventative rather than curative manner. Hopefully these are the some of the kinds of things that get added to our codebase that prevent a multitude of problems and therefore are often never noticed/appreciated.
Additionally here we moved the remaining non-uri using DataSources to use uri (that were now super easy to find), and also fixed up a place where I noticed (due to the soft errors) where we were sometimes passing undefined values to a uri call.
The work here also led me to find another couple of non-important 'bugs' that I've PRed already separately, one of which is yet to be merged (#11105), hence the currently failing tests here. I'll rebase that once that PR is in and the tests here should then pass 🤞
Lastly, I didn't go the whole hog here to make DataSink also be this strict with its uri usage, there is a tiny bit more work on DataSink as a result of recently work, so I may (or may not) make DataSink equally as strict as part of that work in a separate PR.
This PR adds a check to policy, role and namespace list pages to make sure the user has can write those things before offering to create them via a button. (The create page/form would then be a read-only form)
* ui: Don't show the CRD menu for read-only intentions
The UI bug here manifests itself only when a user/token is configured to have read-only access to intentions. Instead of only letting folks click to see a read only page of the intention, we would show an additional message saying that the intention was read-only due to it being 'Managed by [a kubernetes] CRD'. Whilst the intention was still read only, this extra message was still confusing for users.
This PR fixes up the conditional logic and further moves the logic to use ember-can - looking at the history of the files in question, this bug snuck itself in partly due to it being 'permission-y type stuff' previous to using ember-can and when something being editable or not was nothing to do with ACLs. Then we moved to start using ember-can without completely realising what IsEditable previously meant. So overall the code here is a tiny bit clearer/cleaner by adding a proper can view CRD intention instead of overloading the idea of 'editability'.
* ui: Gracefully recover from non-existent DC errors
This PR fixes what happens in the UI if you try to navigate to a non-existing DC.
When we received a 500 error from an API response due to a non-existent DC, previously we would show a 404 error, which is what we were trying to convey. But in the spirit of the UI being a 'thin client', its probably best to just show the 500 error from the API response, which may help folks to debug any issues better.
* Automatically set the CONSUL_DATACENTER_LOCAL env var for testing
* ui: Ignore response from API for KV permissions
Currently there is no way for us to use our HTTP authorization API
endpoint to tell us whether a user has access to any KVs (including the
case where a user may not have access to the root KV store, but do have
access to a sub item)
This is a little weird still as in the above case the user would click
on this link and still get a 403 for the root, and then have to manually
type in the URL for the KV they do have access to.
Despite this we think this change makes sense as at least something about KV is
visible in the main navigation.
Once we have the ability to know if any KVs are accessible, we can add
this guard back in.
We'd initially just removed the logic around the button, but then
noticed there may be further related KV issues due to the nested nature
of KVs so we finally decided on simply ignoring the responses from the
HTTP API, essentially reverting the KV area back to being a thin client.
This means when things are revisited in the backend we can undo this
easily change in one place.
* Move acceptance tests to use ACLs perms instead of KV ones
This PR supersedes #10706 and fixes#10686 whilst making sure that saving intentions continues to work.
The original fix in #10706 ignored the change action configured for the change event on the menus, meaning that the selected source/destination namespace could not be set by the user when editing/creating intentions. This, coupled with the fact that using the later intention exact endpoint for API requests endpoint means that you could not use wildcard namespaces for saving intentions.
All in all this meant that intentions could no longer be saved using the UI (whilst using ENT)
This PR reverts #10706 to fix the intention saving issue, and adds a fix for the original visual issue of nspaces doubling up in the menu once clicked. This meant repeating the existing functionality for nspaces aswell as services. It did seem strange to me that the original issue was only apparent for the nspace menus and not the service menus which should all function exactly the same way.
There is potentially more to come here partly related to what the exact functionality should be, but I'm working with other folks to figure out what the best way forwards is longer term. In the meantime this brings us back to the original functionality with the visual issue fixed.
Squashed commits:
* Revert "ui: Fix dropdown option duplications (#10706)"
This reverts commit eb5512fb74781ea49be743e2f0f16b3f1863ef61.
* ui: Ensure additional nspaces are added to the unique list of nspaces
* Add some acceptance tests
Fixes#10563
The `resourceVersion` map was doing two jobs prior to this PR. The first job was
to track what version of every resource we know envoy currently has. The
second was to track subscriptions to those resources (by way of the empty
string for a version). This mostly works out fine, but occasionally leads to
consul removing a resource and accidentally (effectively) unsubscribing at the
same time.
The fix separates these two jobs. When all of the resources for a subscription
are removed we continue to track the subscription until envoy explicitly
unsubscribes
Signed-off-by: Jakub Sokołowski <jakub@status.im>
* agent: add failures_before_warning setting
The new setting allows users to specify the number of check failures
that have to happen before a service status us updated to be `warning`.
This allows for more visibility for detected issues without creating
alerts and pinging administrators. Unlike the previous behavior, which
caused the service status to not update until it reached the configured
`failures_before_critical` setting, now Consul updates the Web UI view
with the `warning` state and the output of the service check when
`failures_before_warning` is breached.
The default value of `FailuresBeforeWarning` is the same as the value of
`FailuresBeforeCritical`, which allows for retaining the previous default
behavior of not triggering a warning.
When `FailuresBeforeWarning` is set to a value higher than that of
`FailuresBeforeCritical it has no effect as `FailuresBeforeCritical`
takes precedence.
Resolves: https://github.com/hashicorp/consul/issues/10680
Signed-off-by: Jakub Sokołowski <jakub@status.im>
Co-authored-by: Jakub Sokołowski <jakub@status.im>
Licensing recently changed in Consul v1.10 and along with those changes
the client API was updated such that PutLicense and ResetLicense both
immediately return an error to avoid an unecessary round trip that will
inevitably fail.
For reference, see: 08eb600ee5
Unfortunately, this change broke forward compatibility such that a v1.10
client can no longer make these requests to a v1.9 server which is a
valid use case.
This commit reintroduces these requests to fix this compatibility
breakage but leaves the deprecation notices in tact.
This commit fixes a problem where parent Failovers where not showing (subset children were fine).
Seems to have been introduced with a move/glimmer upgrade here #9154 so I'm adding a 1.9.x backport.
This commit fixes 2 problems with our OIDC flow in the UI, the first is straightforwards, the second is relatively more in depth:
1: A typo (1.10.1 only)
During #10503 we injected our settings service into the our oidc-provider service, there are some comments in the PR as to the whys and wherefores for this change (https://github.com/hashicorp/consul/pull/10503/files#diff-aa2ffda6d0a966ba631c079fa3a5f60a2a1bdc7eed5b3a98ee7b5b682f1cb4c3R28)
Fixing the typo so it was no longer looking for an unknown service (repository/settings > settings)
fixed this.
2: URL encoding (1.9.x, 1.10.x)
TL;DR: /oidc/authorize/provider/with/slashes/code/with/slashes/status/with/slashes should be /oidc/authorize/provider%2Fwith%2Fslashes/code%2Fwith%2Fslashes/status%2Fwith%2Fslashes
When we receive our authorization response back from the OIDC 3rd party, we POST the code and status data from that response back to consul via acallback as part of the OIDC flow. From what I remember back when this feature was originally added, the method is a POST request to avoid folks putting secret-like things into API requests/URLs/query params that are more likely to be visible to the human eye, and POSTing is expected behaviour.
Additionally, in the UI we identify all external resources using unique resource identifiers. Our OIDC flow uses these resources and their identifiers to perform the OIDC flow using a declarative state machine. If any information in these identifiers uses non-URL-safe characters then these characters require URL encoding and we added a helper a while back to specifically help us to do this once we started using this for things that required URL encoding.
The final fix here make sure that we URL encode code and status before using them with one of our unique resource identifiers, just like we do with the majority of other places where we use these identifiers.
* deps: upgrade gogo-protobuf to v1.3.2
* go mod tidy using go 1.16
* proto: regen protobufs after upgrading gogo/protobuf
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Missed the need to add support for unix domain socket config via
api/command line. This is a variant of the problems described in
it is easy to drop one.
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
Consul 1.10 (PR #9792) introduced the ability to specify a prefix when
importing KV's. This however introduced a regression on Windows
systems which breaks `kv import`. The key name is joined with
specified`-prefix` using `filepath.Join()` which uses a forward slash
(/) to delimit values on Unix-based systems, and a backslash (\) to
delimit values on Windows – the latter of which is incompatible with
Consul KV paths.
This commit replaces filepath.Join() with path.Join() which uses a
forward slash as the delimiter, providing consistent key join behavior
across supported operating systems.
Fixes#10583
Replace call to /agent/self with /status/leader to verify agent
reachability before initializing a watch. This endpoint is not guarded
by ACLs, and as such can be queried by any API client regardless of
their permissions.
Fixes#9353
* defer setting the state before returning to avoid being stuck in `INITIALIZING` state
* add changelog
* move comment with the right if statement
* ca: report state transition error from setSTate
* update comment to reflect state transition
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Add support for setting QueryOptions on the following agent API endpoints:
- /agent/health/service/name/:name
- /agent/health/service/id/:id
- /agent/service/maintenance/:id
This follows the same pattern used in #9903 to support query options
for other agent API endpoints.
Resolves#9710
Knowing that blocking queries are firing does not provide much
information on its own. If we know the correlation IDs we can
piece together which parts of the snapshot have been populated.
Some of these responses might be empty from the blocking
query timing out. But if they're returning quickly I think we
can reasonably assume they contain data.
* return an error when the index is not valid
* check response as bool when applying `CAOpSetConfig`
* remove check for bool response
* fix error message and add check to test
* fix comment
* add changelog
If multiple instances of a service are co-located on the same node then
their proxies will all share a cache entry for their resolved service
configuration. This is because the cache key contains the name of the
watched service but does not take into account the ID of the watching
proxies.
This means that there will be multiple agent service manager watches
that can wake up on the same cache update. These watchers then
concurrently modify the value in the cache when merging the resolved
config into the local proxy definitions.
To avoid this concurrent map write we will only delete the key from
opaque config in the local proxy definition after the merge, rather
than from the cached value before the merge.
This change adds a new `dns_config.recursor_strategy` option which
controls how Consul queries DNS resolvers listed in the `recursors`
config option. The supported options are `sequential` (default), and
`random`.
Closes#8807
Co-authored-by: Blake Covarrubias <blake@covarrubi.as>
Co-authored-by: Priyanka Sengupta <psengupta@flatiron.com>
Previously when namespaces were enabled, we weren't requesting permission for the actively selected namespace, and instead always checking the permissions for the default namespace.
This commit ensures we request permissions for the actively selected namespace.
This commit adds a bit of string wrangling to avoid the keys in our javascript source file also being transformed. Additionally, whilst looking at this we decided that Maps are a better dictionary than javascript objects, so we moved to use those here also (but this doesn't affect the issue)
Adds 'can access ACLs' which means one of two things
1. When ACLs are disabled I can access the 'please enable ACLs' page
2. When ACLs are enabled, its the same as canRead
When clicking to create a KV within folder name, would would be viewing a form that was a form for creating a KV in the root, which when the user clicked to save, saved the KV in the root.
For the moment at least I've removed the code that strips double slashes, and whilst this isn't ideal, it looks like we've picked up one of those bugs that turns into a 'feature', and completely reworking KV to not rely on the double slashes is not really an option right now.
The compatv2 integration tests were failing because they use an older CLI version with a newer
HTTP API. This commit restores the GRPCPort field to the DebugConfig output to allow older
CIs to continue to fetch the port.
* ca: move provider creation into CAManager
This further decouples the CAManager from Server. It reduces the interface between them and
removes the need for the SetLogger method on providers.
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* ca: move SignCertificate to the file where it is used
* auto-config: move autoConfigBackend impl off of Server
Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.
* fix linter issues
* check error when `raftApplyMsgpack`
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* check expiry date of the intermediate before using it to sign a leaf
* fix typo in comment
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
* Fix test name
* do not check cert start date
* wrap error to mention it is the intermediate expired
* Fix failing test
* update comment
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use shim to avoid sleep in test
* add root cert validation
* remove duplicate code
* Revert "fix linter issues"
This reverts commit 6356302b54f06c8f2dee8e59740409d49e84ef24.
* fix import issue
* gofmt leader_connect_ca
* add changelog entry
* update error message
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* fix error message in test
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* add intermediate ca metric routine
* add Gauge config for intermediate cert
* Stop metrics routine when stopping leader
* add changelog entry
* updage changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use variables instead of a map
* go imports sort
* Add metrics for primary and secondary ca
* start metrics routine in the right DC
* add telemetry documentation
* update docs
* extract expiry fetching in a func
* merge metrics for primary and secondary into signing ca metric
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
The default namespace, and the tokens default namespace (or its origin namespace) is slightly more complicated than other things we deal with in the UI, there's plenty of info/docs on this that I've added in this PR.
Previously:
When a namespace was not specified in the URL, we used to default to the default namespace. When you logged in using a token we automatically forward you the namespace URL that your token originates from, so you are then using the namespace for your token by default. You can of course then edit the URL to remove the namespace portion, or perhaps revisit the UI at the root path with you token already set. In these latter cases we would show you information from the default namespace. So if you had no namespace segment/portion in the URL, we would assume default, perform actions against the default namespace and highlight the default namespace in the namespace selector menu. If you wanted to perform actions in your tokens origin namespace you would have to manually select it from the namespace selector menu.
This PR:
Now, when you have no namespace segment/portion in the URL, we use the token's origin namespace instead (and if you don't have a token, we then use the default namespace like it was previously)
Notes/thoughts:
I originally thought we were showing an incorrectly selected namespace in the namespace selector, but it also matched up with what we were doing with the API, so it was in fact correct. The issue was more that we weren't selecting the origin namespace of the token for the user when a namespace segment was omitted from the URL. Seeing as we automatically forward you to the tokens origin namespace when you log in, and we were correctly showing the namespace we were acting on when you had no namespace segment in the URL (in the previous case default), I'm not entirely sure how much of an issue this actually was.
This characteristic of namespace+token+namespace is a little weird and its easy to miss a subtlety or two so I tried to add some documentation in here for future me/someone else (including some in depth code comment around one of the API endpoints where this is very subtle and very hard to miss). I'm not the greatest at words, so would be great to get some edits there if it doesn't seem clear to folks.
The fact that we used to save your previous datacenter and namespace into local storage for reasons also meant the interaction here was slightly more complicated than it needed to be, so whilst we were here we rejigged things slightly to satisfy said reasons still but not use local storage (we try and grab the info from higher up). A lot of the related code here is from before we had our Routlets which I think could probably make all of this a lot less complicated, but I didn't want to do a wholesale replacement in this PR, we can save that for a separate PR on its own at some point.
* trim carriage return from certificates when inserting rootCA in the inMemDB
* format rootCA properly when returning the CA on the connect CA endpoint
* Fix linter warnings
* Fix providers to trim certs before returning it
* trim newlines on write when possible
* add changelog
* make sure all provider return a trailing newline after the root and intermediate certs
* Fix endpoint to return trailing new line
* Fix failing test with vault provider
* make test more robust
* make sure all provider return a trailing newline after the leaf certs
* Check for suffix before removing newline and use function
* Add comment to consul provider
* Update change log
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* fix typo
* simplify code callflow
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* extract requireNewLine as shared func
* remove dependency to testify in testing file
* remove extra newline in vault provider
* Add cert newline fix to envoy xds
* remove new line from mock provider
* Remove adding a new line from provider and fix it when the cert is read
* Add a comment to explain the fix
* Add missing for leaf certs
* fix missing new line
* fix missing new line in leaf certs
* remove extra new line in test
* updage changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* fix in vault provider and when reading cache (RPC call)
* fix AWS provider
* fix failing test in the provider
* remove comments and empty lines
* add check for empty cert in test
* fix linter warnings
* add new line for leaf and private key
* use string concat instead of Sprintf
* fix new lines for leaf signing
* preallocate slice and remove append
* Add new line to `SignIntermediate` and `CrossSignCA`
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
sync/atomic must be used with 64-bit aligned fields, and that alignment is difficult to
ensure unless the field is the first one in the struct.
https://golang.org/pkg/sync/atomic/#pkg-note-BUG.
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.
Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.
Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:
ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
* return an invalid record when asked for an addr dns with type other then A and AAAA
* add changelog
* fix ANY use case and add a test for it
* update changelog type
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* return empty response if the question record type do not match for addr
* set comment in the right place
* return A\AAAA record in extra section if record type is not A\AAAA for addr
* Fix failing test
* remove commented code
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use require for test validation
* use variable to init struct
* fix failing test
* Update agent/dns.go
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update .changelog/10401.txt
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update agent/dns.go
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update agent/dns.go
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update agent/dns.go
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* fix compilation error
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* ui: Move all our icons to use CSS custom properties
The good thing about SASS vars is, if you don't use them they get removed from the final build. Whereas with CSS we have no tree-shaking to get rid of unused CSS custom properties. We can mostly work around this and for some things like colors its no big deal if we have some hex-codes in the build that we don't use as hex-codes are relatively small.
We've been slowly but surely moving all of our colors (and other things) to use CSS custom properties instead of SASS vars now that we have them available.
This commit makes use of the 'tree-shaking' abilities of @extend to ensure that we only compile in the icons that we use.
This commit is mostly churn-less as we already use @extend for the majority of our icons, so generally there is zero change here for working on the UI, but I did spot one single place where we were using SASS vars instead of @extend. This now uses the new form (second commit)
Interestingly this reduces our CSS payload by ~2kb to ~53kb (around 25kb of that is these icons)
* remove flush for each write to http response in the agent monitor endpoint
* fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover.
* start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it
* flush every 500ms to optimize log writing in the http server side.
* add changelog file
* add issue url to changelog
* fix changelog url
* Update changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use ticker to flush and avoid race condition when flushing in a different goroutine
* stop the ticker when done
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover."
This reverts commit 1eeddf7a
* wait for log consumer loop to start before registering the sink
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Updates to a cluster will clear the associated endpoints, and updates to
a listener will clear the associated routes. Update the incremental xDS
logic to account for this implicit cleanup so that we can finish warming
the clusters and listeners.
Fixes#10379
Previously we would return an error if duplicate paths were specified.
This could lead to problems in cases where a user has the same path,
say /healthz, on two different ports.
This validation was added to signal a potential misconfiguration.
Instead we will only check for duplicate listener ports, since that is
what would lead to ambiguity issues when generating xDS config.
In the future we could look into using a single listener and creating
distinct filter chains for each path/port.
* debug: remove the CLI check for debug_enabled
The API allows collecting profiles even debug_enabled=false as long as
ACLs are enabled. Remove this check from the CLI so that users do not
need to set debug_enabled=true for no reason.
Also:
- fix the API client to return errors on non-200 status codes for debug
endpoints
- improve the failure messages when pprof data can not be collected
Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com>
* remove parallel test runs
parallel runs create a race condition that fail the debug tests
* snapshot the timestamp at the beginning of the capture
- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist
* Revert "snapshot the timestamp at the beginning of the capture"
This reverts commit c2d03346
* Refactor captureDynamic to extract capture logic for each item in a different func
* snapshot the timestamp at the beginning of the capture
- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist
* Revert "snapshot the timestamp at the beginning of the capture"
This reverts commit c2d03346
* Refactor captureDynamic to extract capture logic for each item in a different func
* extract wait group outside the go routine to avoid a race condition
* capture pprof in a separate go routine
* perform a single capture for pprof data for the whole duration
* add missing vendor dependency
* add a change log and fix documentation to reflect the change
* create function for timestamp dir creation and simplify error handling
* use error groups and ticker to simplify interval capture loop
* Logs, profile and traces are captured for the full duration. Metrics, Heap and Go routines are captured every interval
* refactor Logs capture routine and add log capture specific test
* improve error reporting when log test fail
* change test duration to 1s
* make time parsing in log line more robust
* refactor log time format in a const
* test on log line empty the earliest possible and return
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* rename function to captureShortLived
* more specific changelog
Co-authored-by: Paul Banks <banks@banksco.de>
* update documentation to reflect current implementation
* add test for behavior when invalid param is passed to the command
* fix argument line in test
* a more detailed description of the new behaviour
Co-authored-by: Paul Banks <banks@banksco.de>
* print success right after the capture is done
* remove an unnecessary error check
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* upgraded github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57 => v0.0.0-20210601050228-01bbb1931b22
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Co-authored-by: Paul Banks <banks@banksco.de>
This PR adds cluster members to the metrics API. The number of members per
segment are reported as well as the total number of members.
Tested by running a multi-node cluster locally and ensuring the numbers were
correct. Also added unit test coverage to add the new expected gauges to
existing test cases.
Normally the named pipe would buffer up to 64k, but in some cases when a
soft limit is reached, they will start only buffering up to 4k.
In either case, we should not deadlock.
This commit changes the pipe-bootstrap command to first buffer all of
stdin into the process, before trying to write it to the named pipe.
This allows the process memory to act as the buffer, instead of the
named pipe.
Also changed the order of operations in `makeBootstrapPipe`. The new
test added in this PR showed that simply buffering in the process memory
was not enough to fix the issue. We also need to ensure that the
`pipe-bootstrap` process is started before we try to write to its
stdin. Otherwise the write will still block.
Also set stdout/stderr on the subprocess, so that any errors are visible
to the user.
* debug: remove the CLI check for debug_enabled
The API allows collecting profiles even debug_enabled=false as long as
ACLs are enabled. Remove this check from the CLI so that users do not
need to set debug_enabled=true for no reason.
Also:
- fix the API client to return errors on non-200 status codes for debug
endpoints
- improve the failure messages when pprof data can not be collected
Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com>
* remove parallel test runs
parallel runs create a race condition that fail the debug tests
* Add changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Create and use collapsible notices
* Refactor collapsible-notices
* Split up the topology acceptance tests
* Add acceptance tests for tproxy notices
* Add component file
* Adds additional TProxy notices tests
* Adds conditional to only show collapsable if more than 2 notices are present
* Adds changelog
* Refactorting the conditonal for collapsing the notices
* Renaming undefinedIntention to be notDefinedIntention
* Refactor tests
The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
The prior solution to call reply.Reset() aged poorly since newer fields
were added to the reply, but not added to Reset() leading serial
blocking query loops on the server to blend replies.
This could manifest as a service-defaults protocol change from
default=>http not reverting back to default after the config entry
reponsible was deleted.