75b0b93998
* [OSS] Post Consul 1.16 updates (#17606) * chore: update dev build to 1.17 * chore(ci): add nightly 1.16 test Drop the oldest and add the newest running release branch to nightly builds. * Add writeAuditRPCEvent to agent_oss (#17607) * Add writeAuditRPCEvent to agent_oss * fix the other diffs * backport change log * Add Envoy and Consul version constraints to Envoy extensions (#17612) * [API Gateway] Fix trust domain for external peered services in synthesis code (#17609) * [API Gateway] Fix trust domain for external peered services in synthesis code * Add changelog * backport ent changes to oss (#17614) * backport ent changes to oss * Update .changelog/_5669.txt Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com> --------- Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com> * Update intentions.mdx (#17619) Make behaviour of L7 intentions clearer * enterprise changelog update for audit (#17625) * Update list of Envoy versions (#17546) * [API Gateway] Fix rate limiting for API gateways (#17631) * [API Gateway] Fix rate limiting for API gateways * Add changelog * Fix failing unit tests * Fix operator usage tests for api package * sort some imports that are wonky between oss and ent (#17637) * PmTLS and tproxy improvements with failover and L7 traffic mgmt for k8s (#17624) * porting over changes from enterprise repo to oss * applied feedback on service mesh for k8s overview * fixed typo * removed ent-only build script file * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: David Yu <dyu@hashicorp.com> Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Co-authored-by: David Yu <dyu@hashicorp.com> * Delete check-legacy-links-format.yml (#17647) * docs: Reference doc updates for permissive mTLS settings (#17371) * Reference doc updates for permissive mTLS settings * Document config entry filtering * Fix minor doc errors (double slashes in link url paths) --------- Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Add generic experiments configuration and use it to enable catalog v2 resources (#17604) * Add generic experiments configuration and use it to enable catalog v2 resources * Run formatting with -s as CI will validate that this has been done * api-gateway: stop adding all header filters to virtual host when generating xDS (#17644) * Add header filter to api-gateway xDS golden test * Stop adding all header filters to virtual host when generating xDS for api-gateway * Regenerate xDS golden file for api-gateway w/ header filter * fix: add agent info reporting log (#17654) * Add new Consul 1.16 docs (#17651) * Merge pull request #5773 from hashicorp/docs/rate-limiting-from-ip-addresses-1.16 updated docs for rate limiting for IP addresses - 1.16 * Merge pull request #5609 from hashicorp/docs/enterprise-utilization-reporting Add docs for enterprise utilization reporting * Merge pull request #5734 from hashicorp/docs/envoy-ext-1.16 Docs/envoy ext 1.16 * Merge pull request #5773 from hashicorp/docs/rate-limiting-from-ip-addresses-1.16 updated docs for rate limiting for IP addresses - 1.16 * Merge pull request #5609 from hashicorp/docs/enterprise-utilization-reporting Add docs for enterprise utilization reporting * Merge pull request #5734 from hashicorp/docs/envoy-ext-1.16 Docs/envoy ext 1.16 * fix build errors --------- Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Default `ProxyType` for builtin extensions (#17657) * Post 1.16.0-rc1 updates (#17663) - Update changelog to include new entries from release - Update submodule versions to latest published * Update service-defaults.mdx (#17656) * docs: Sameness Groups (#17628) * port from enterprise branch * Apply suggestions from code review Co-authored-by: shanafarkas <105076572+shanafarkas@users.noreply.github.com> * Update website/content/docs/connect/cluster-peering/usage/create-sameness-groups.mdx * next steps * Update website/content/docs/connect/cluster-peering/usage/create-sameness-groups.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/k8s/connect/cluster-peering/usage/create-sameness-groups.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> --------- Co-authored-by: shanafarkas <105076572+shanafarkas@users.noreply.github.com> Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Remove "BETA" marker from config entries (#17670) * CAPIgw for K8s installation updates for 1.16 (#17627) * trimmed CRD step and reqs from installation * updated tech specs * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com> * added upgrade instruction * removed tcp port req * described downtime and DT-less upgrades * applied additional review feedback --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com> * additional feedback on API gateway upgrades (#17677) * additional feedback * Update website/content/docs/api-gateway/upgrades.mdx Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com> --------- Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com> * docs: JWT Authorization for intentions (#17643) * Initial page/nav creation * configuration entry reference page * Usage + fixes * service intentions page * usage * description * config entry updates * formatting fixes * Update website/content/docs/connect/config-entries/service-intentions.mdx Co-authored-by: Paul Glass <pglass@hashicorp.com> * service intentions review fixes * Overview page review fixes * Apply suggestions from code review Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> --------- Co-authored-by: Paul Glass <pglass@hashicorp.com> Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * docs: minor fixes to JWT auth docs (#17680) * Fixes * service intentions fixes * Fix two WAL metrics in docs/agent/telemetry.mdx (#17593) * updated failover for k8s w-tproxy page title (#17683) * Add release notes 1.16 rc (#17665) * Merge pull request #5773 from hashicorp/docs/rate-limiting-from-ip-addresses-1.16 updated docs for rate limiting for IP addresses - 1.16 * Merge pull request #5609 from hashicorp/docs/enterprise-utilization-reporting Add docs for enterprise utilization reporting * Merge pull request #5734 from hashicorp/docs/envoy-ext-1.16 Docs/envoy ext 1.16 * Add release notes for 1.16-rc * Add consul-e license utlization reporting * Update with rc absolute links * Update with rc absolute links * fix typo * Apply suggestions from code review Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update to use callout component * address typo * docs: FIPS 140-2 Compliance (#17668) * Page + nav + formatting * link fix * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * link fix * Apply suggestions from code review Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com> * Update website/content/docs/enterprise/fips.mdx --------- Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com> * fix apigw install values file * fix typos in release notes --------- Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com> * fix release notes links (#17687) * adding redirects for tproxy and envoy extensions (#17688) * adding redirects * Apply suggestions from code review * Fix FIPS copy (#17691) * fix release notes links * fix typos on fips docs * [NET-4107][Supportability] Log Level set to TRACE and duration set to 5m for consul-debug (#17596) * changed duration to 5 mins and log level to trace * documentation update * change log * ENT merge of ext-authz extension updates (#17684) * docs: Update default values for Envoy extension proxy types (#17676) * fix: stop peering delete routine on leader loss (#17483) * Refactor disco chain prioritize by locality structs (#17696) This includes prioritize by localities on disco chain targets rather than resolvers, allowing different targets within the same partition to have different policies. * agent: remove agent cache dependency from service mesh leaf certificate management (#17075) * agent: remove agent cache dependency from service mesh leaf certificate management This extracts the leaf cert management from within the agent cache. This code was produced by the following process: 1. All tests in agent/cache, agent/cache-types, agent/auto-config, agent/consul/servercert were run at each stage. - The tests in agent matching .*Leaf were run at each stage. - The tests in agent/leafcert were run at each stage after they existed. 2. The former leaf cert Fetch implementation was extracted into a new package behind a "fake RPC" endpoint to make it look almost like all other cache type internals. 3. The old cache type was shimmed to use the fake RPC endpoint and generally cleaned up. 4. I selectively duplicated all of Get/Notify/NotifyCallback/Prepopulate from the agent/cache.Cache implementation over into the new package. This was renamed as leafcert.Manager. - Code that was irrelevant to the leaf cert type was deleted (inlining blocking=true, refresh=false) 5. Everything that used the leaf cert cache type (including proxycfg stuff) was shifted to use the leafcert.Manager instead. 6. agent/cache-types tests were moved and gently replumbed to execute as-is against a leafcert.Manager. 7. Inspired by some of the locking changes from derek's branch I split the fat lock into N+1 locks. 8. The waiter chan struct{} was eventually replaced with a singleflight.Group around cache updates, which was likely the biggest net structural change. 9. The awkward two layers or logic produced as a byproduct of marrying the agent cache management code with the leaf cert type code was slowly coalesced and flattened to remove confusion. 10. The .*Leaf tests from the agent package were copied and made to work directly against a leafcert.Manager to increase direct coverage. I have done a best effort attempt to port the previous leaf-cert cache type's tests over in spirit, as well as to take the e2e-ish tests in the agent package with Leaf in the test name and copy those into the agent/leafcert package to get more direct coverage, rather than coverage tangled up in the agent logic. There is no net-new test coverage, just coverage that was pushed around from elsewhere. * [core]: Pin github action workflows (#17695) * docs: missing changelog for _5517 (#17706) * add enterprise notes for IP-based rate limits (#17711) * add enterprise notes for IP-based rate limits * Apply suggestions from code review Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com> Co-authored-by: David Yu <dyu@hashicorp.com> * added bolded 'Enterprise' in list items. --------- Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com> Co-authored-by: David Yu <dyu@hashicorp.com> * Update compatibility.mdx (#17713) * Remove extraneous version info for Config entries (#17716) * Update terminating-gateway.mdx * Update exported-services.mdx * Update mesh.mdx * fix: typo in link to section (#17527) Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Bump Alpine to 3.18 (#17719) * Update Dockerfile * Create 17719.txt * NET-1825: New ACL token creation docs (#16465) Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com> * [NET-3865] [Supportability] Additional Information in the output of 'consul operator raft list-peers' (#17582) * init * fix tests * added -detailed in docs * added change log * fix doc * checking for entry in map * fix tests * removed detailed flag * removed detailed flag * revert unwanted changes * removed unwanted changes * updated change log * pr review comment changes * pr comment changes single API instead of two * fix change log * fix tests * fix tests * fix test operator raft endpoint test * Update .changelog/17582.txt Co-authored-by: Semir Patel <semir.patel@hashicorp.com> * nits * updated docs --------- Co-authored-by: Semir Patel <semir.patel@hashicorp.com> * OSS merge: Update error handling login when applying extensions (#17740) * Bump atlassian/gajira-transition from 3.0.0 to 3.0.1 (#17741) Bumps [atlassian/gajira-transition](https://github.com/atlassian/gajira-transition) from 3.0.0 to 3.0.1. - [Release notes](https://github.com/atlassian/gajira-transition/releases) - [Commits](4749176faf...38fc9cd61b
) --- updated-dependencies: - dependency-name: atlassian/gajira-transition dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add truncation to body (#17723) * docs: Failover overview minor fix (#17743) * Incorrect symbol * Clarification * slight edit for clarity * docs - update Envoy and Dataplane compat matrix (#17752) * Update envoy.mdx added more detail around default versus other compatible versions * validate localities on agent configs and registration endpoints (#17712) * Updated docs added explanation. (#17751) * init * fix tests * added -detailed in docs * added change log * fix doc * checking for entry in map * fix tests * removed detailed flag * removed detailed flag * revert unwanted changes * removed unwanted changes * updated change log * pr review comment changes * pr comment changes single API instead of two * fix change log * fix tests * fix tests * fix test operator raft endpoint test * Update .changelog/17582.txt Co-authored-by: Semir Patel <semir.patel@hashicorp.com> * nits * updated docs * explanation added --------- Co-authored-by: Semir Patel <semir.patel@hashicorp.com> * Update index.mdx (#17749) * added redirects and updated links (#17764) * Add transparent proxy enhancements changelog (#17757) * docs - remove use of consul leave during upgrade instructions (#17758) * Fix issue with streaming service health watches. (#17775) Fix issue with streaming service health watches. This commit fixes an issue where the health streams were unaware of service export changes. Whenever an exported-services config entry is modified, it is effectively an ACL change. The bug would be triggered by the following situation: - no services are exported - an upstream watch to service X is spawned - the streaming backend filters out data for service X (due to lack of exports) - service X is finally exported In the situation above, the streaming backend does not trigger a refresh of its data. This means that any events that were supposed to have been received prior to the export are NOT backfilled, and the watches never see service X spawning. We currently have decided to not trigger a stream refresh in this situation due to the potential for a thundering herd effect (touching exports would cause a re-fetch of all watches for that partition, potentially). Therefore, a local blocking-query approach was added by this commit for agentless. It's also worth noting that the streaming subscription is currently bypassed most of the time with agentful, because proxycfg has a `req.Source.Node != ""` which prevents the `streamingEnabled` check from passing. This means that while agents should technically have this same issue, they don't experience it with mesh health watches. Note that this is a temporary fix that solves the issue for proxycfg, but not service-discovery use cases. * Property Override validation improvements (#17759) * Reject inbound Prop Override patch with Services Services filtering is only supported for outbound TrafficDirection patches. * Improve Prop Override unexpected type validation - Guard against additional invalid parent and target types - Add specific error handling for Any fields (unsupported) * Fixes (#17765) * Update license get explanation (#17782) This PR is to clarify what happens if the license get command is run on a follower if the leader hasn't been updated with a newer license. * Add Patch index to Prop Override validation errors (#17777) When a patch is found invalid, include its index for easier debugging when multiple patches are provided. * Stop referenced jwt providers from being deleted (#17755) * Stop referenced jwt providers from being deleted * Implement a Catalog Controllers Lifecycle Integration Test (#17435) * Implement a Catalog Controllers Lifecycle Integration Test * Prevent triggering the race detector. This allows defining some variables for protobuf constants and using those in comparisons. Without that, something internal in the fmt package ended up looking at the protobuf message size cache and triggering the race detector. * HCP Add node id/name to config (#17750) * Catalog V2 Container Based Integration Test (#17674) * Implement the Catalog V2 controller integration container tests This now allows the container tests to import things from the root module. However for now we want to be very restrictive about which packages we allow importing. * Add an upgrade test for the new catalog Currently this should be dormant and not executed. However its put in place to detect breaking changes in the future and show an example of how to do an upgrade test with integration tests structured like catalog v2. * Make testutil.Retry capable of performing cleanup operations These cleanup operations are executed after each retry attempt. * Move TestContext to taking an interface instead of a concrete testing.T This allows this to be used on a retry.R or generally anything that meets the interface. * Move to using TestContext instead of background contexts Also this forces all test methods to implement the Cleanup method now instead of that being an optional interface. Co-authored-by: Daniel Upton <daniel@floppy.co> * Fix Docs for Trails Leader By (#17763) * init * fix tests * added -detailed in docs * added change log * fix doc * checking for entry in map * fix tests * removed detailed flag * removed detailed flag * revert unwanted changes * removed unwanted changes * updated change log * pr review comment changes * pr comment changes single API instead of two * fix change log * fix tests * fix tests * fix test operator raft endpoint test * Update .changelog/17582.txt Co-authored-by: Semir Patel <semir.patel@hashicorp.com> * nits * updated docs * explanation added * fix doc * fix docs --------- Co-authored-by: Semir Patel <semir.patel@hashicorp.com> * Improve Prop Override docs examples (#17799) - Provide more realistics examples for setting properties not already supported natively by Consul - Remove superfluous commas from HCL, correct target service name, and fix service defaults vs. proxy defaults in examples - Align existing integration test to updated docs * Test permissive mTLS filter chain not configured with tproxy disabled (#17747) * Add documentation for remote debugging of integration tests. (#17800) * Add documentation for remote debugging of integration tests. * add link from main docs page. * changes related to PR feedback * Clarify limitations of Prop Override extension (#17801) Explicitly document the limitations of the extension, particularly what kind of fields it is capable of modifying. * Fix formatting for webhook-certs Consul tutorial (#17810) * Fix formatting for webhook-certs Consul tutorial * Make a small grammar change to also pick up whitespace changes necessary for formatting --------- Co-authored-by: David Yu <dyu@hashicorp.com> * Add jwt-authn metrics to jwt-provider docs (#17816) * [NET-3095] add jwt-authn metrics docs * Change URLs for redirects from RC to default latest (#17822) * Set GOPRIVATE for all hashicorp repos in CI (#17817) Consistently set GOPRIVATE to include all hashicorp repos, s.t. private modules are successfully pulled in enterprise CI. * Make locality aware routing xDS changes (#17826) * Fixup consul-container/test/debugging.md (#17815) Add missing `-t` flag and fix minor typo. * fixes #17732 - AccessorID in request body should be optional when updating ACL token (#17739) * AccessorID in request body should be optional when updating ACL token * add a test case * fix test case * add changelog entry for PR #17739 * CA provider doc updates and Vault provider minor update (#17831) Update CA provider docs Clarify that providers can differ between primary and secondary datacenters Provide a comparison chart for consul vs vault CA providers Loosen Vault CA provider validation for RootPKIPath Update Vault CA provider documentation * ext-authz Envoy extension: support `localhost` as a valid target URI. (#17821) * CI Updates (#17834) * Ensure that git access to private repos uses the ELEVATED_GITHUB_TOKEN * Bump the runner size for the protobuf generation check This has failed previously when the runner process that communicates with GitHub gets starved causing the job to fail. * counter part of ent pr (#17618) * watch: support -filter for consul watch: checks, services, nodes, service (#17780) * watch: support -filter for watch checks * Add filter for watch nodes, services, and service - unit test added - Add changelog - update doc * Trigger OSS => ENT merge for all release branches (#17853) Previously, this only triggered for release/*.*.x branches; however, our release process involves cutting a release/1.16.0 branch, for example, at time of code freeze these days. Any PRs to that branch after code freeze today do not make their way to consul-enterprise. This will make behavior for a .0 branch consistent with current behavior for a .x branch. * Update service-mesh.mdx (#17845) Deleted two commas which looks quite like some leftovers. * Add docs for sameness groups with resolvers. (#17851) * docs: add note about path prefix matching behavior for HTTPRoute config (#17860) * Add note about path prefix matching behavior for HTTPRoute config * Update website/content/docs/connect/gateways/api-gateway/configuration/http-route.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> --------- Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * docs: update upgrade to consul-dataplane docs on k8s (#17852) * resource: add `AuthorizerContext` helper method (#17393) * resource: enforce consistent naming of resource types (#17611) For consistency, resource type names must follow these rules: - `Group` must be snake case, and in most cases a single word. - `GroupVersion` must be lowercase, start with a "v" and end with a number. - `Kind` must be pascal case. These were chosen because they map to our protobuf type naming conventions. * tooling: generate protoset file (#17364) Extends the `proto` make target to generate a protoset file for use with grpcurl etc. * Fix a bug that wrongly trims domains when there is an overlap with DC name (#17160) * Fix a bug that wrongly trims domains when there is an overlap with DC name Before this change, when DC name and domain/alt-domain overlap, the domain name incorrectly trimmed from the query. Example: Given: datacenter = dc-test, alt-domain = test.consul. Querying for "test-node.node.dc-test.consul" will faile, because the code was trimming "test.consul" instead of just ".consul" This change, fixes the issue by adding dot (.) before trimming * trimDomain: ensure domain trimmed without modyfing original domains * update changelog --------- Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> * deps: aws-sdk-go v1.44.289 (#17876) Signed-off-by: Dan Bond <danbond@protonmail.com> * api-gateway: add operation cannot be fulfilled error to common errors (#17874) * add error message * Update website/content/docs/api-gateway/usage/errors.mdx Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * fix formating issues --------- Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * api-gateway: add step to upgrade instructions for creating intentions (#17875) * Changelog - add 1.13.9, 1.14.8, and 1.15.4 (#17889) * docs: update config enable_debug (#17866) * update doc for config enable_debug * Update website/content/docs/agent/config/config-files.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> --------- Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Update wording on WAN fed and intermediate_pki_path (#17850) * Allow service identity tokens the ability to read jwt-providers (#17893) * Allow service identity tokens the ability to read jwt-providers * more tests * service_prefix tests * Update docs (#17476) Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Add emit_tags_as_labels to envoy bootstrap config when using Consul Telemetry Collector (#17888) * Fix command from kg to kubectl get (#17903) * Create and update release notes for 1.16 and 1.2 (#17895) * update release notes for 1.16 and 1.2 * update latest consul core release * Propose new changes to APIgw upgrade instructions (#17693) * Propose new changes to APIgw upgrade instructions * fix build error * update callouts to render correctly * Add hideClipboard to log messages * Added clarification around consul k8s and crds * Add workflow to verify linux release packages (#17904) * adding docker files to verify linux packages. * add verifr-release-linux.yml * updating name * pass inputs directly into jobs * add other linux package platforms * remove on push * fix TARGETARCH on debian and ubuntu so it can check arm64 and amd64 * fixing amazon to use the continue line * add ubuntu i386 * fix comment lines * working * remove commented out workflow jobs * Apply suggestions from code review Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * update fedora and ubuntu to use latest tag --------- Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * Reference hashicorp/consul instead of consul for Docker image (#17914) * Reference hashicorp/consul instead of consul for Docker image * Update Make targets that pull consul directly * Update Consul K8s Upgrade Doc Updates (#17921) Updating upgrade procedures to encompass expected errors during upgrade process from v1.13.x to v1.14.x. * Update sameness-group.mdx (#17915) * Update create-sameness-groups.mdx (#17927) * deps: coredns v1.10.1 (#17912) * Ensure RSA keys are at least 2048 bits in length (#17911) * Ensure RSA keys are at least 2048 bits in length * Add changelog * update key length check for FIPS compliance * Fix no new variables error and failing to return when error exists from validating * clean up code for better readability * actually return value * tlsutil: Fix check TLS configuration (#17481) * tlsutil: Fix check TLS configuration * Rewording docs. * Update website/content/docs/services/configuration/checks-configuration-reference.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Fix typos and add changelog entry. --------- Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * docs: Deprecations for connect-native SDK and specific connect native APIs (#17937) * Update v1_16_x.mdx * Update connect native golang page --------- Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * Revert "Add workflow to verify linux release packages (#17904)" (#17942) This reverts commit 3368f14fab500ebe9f6aeab5631dd1d5f5a453e5. * Fixes Secondary ConnectCA update (#17846) This fixes a bug that was identified which resulted in subsequent ConnectCA configuration update not to persist in the cluster. * fixing typo in link to jwt-validations-with-intentions doc (#17955) * Fix streaming backend link (#17958) * Fix streaming backend link * Update health.mdx * Dynamically create jwks clusters for jwt-providers (#17944) * website: remove deprecated agent rpc docs (#17962) * Fix missing BalanceOutboundConnections in v2 catalog. (#17964) * feature - [NET - 4005] - [Supportability] Reloadable Configuration - enable_debug (#17565) * # This is a combination of 9 commits. # This is the 1st commit message: init without tests # This is the commit message #2: change log # This is the commit message #3: fix tests # This is the commit message #4: fix tests # This is the commit message #5: added tests # This is the commit message #6: change log breaking change # This is the commit message #7: removed breaking change # This is the commit message #8: fix test # This is the commit message #9: keeping the test behaviour same * # This is a combination of 12 commits. # This is the 1st commit message: init without tests # This is the commit message #2: change log # This is the commit message #3: fix tests # This is the commit message #4: fix tests # This is the commit message #5: added tests # This is the commit message #6: change log breaking change # This is the commit message #7: removed breaking change # This is the commit message #8: fix test # This is the commit message #9: keeping the test behaviour same # This is the commit message #10: made enable debug atomic bool # This is the commit message #11: fix lint # This is the commit message #12: fix test true enable debug * parent 10f500e895d92cc3691ade7b74a33db755d22039 author absolutelightning <ashesh.vidyut@hashicorp.com> 1687352587 +0530 committer absolutelightning <ashesh.vidyut@hashicorp.com> 1687352592 +0530 init without tests change log fix tests fix tests added tests change log breaking change removed breaking change fix test keeping the test behaviour same made enable debug atomic bool fix lint fix test true enable debug using enable debug in agent as atomic bool test fixes fix tests fix tests added update on correct locaiton fix tests fix reloadable config enable debug fix tests fix init and acl 403 * revert commit * Fix formatting codeblocks on APIgw docs (#17970) * fix formatting codeblocks * remove unnecessary indents * Remove POC code (#17974) * update doc (#17910) * update doc * update link * Remove duplicate and unused newDecodeConfigEntry func (#17979) * docs: samenessGroup YAML examples (#17984) * configuration entry syntax * Example config * Add changelog entry for 1.16.0 (#17987) * Fix typo (#17198) servcies => services * Expose JWKS cluster config through JWTProviderConfigEntry (#17978) * Expose JWKS cluster config through JWTProviderConfigEntry * fix typos, rename trustedCa to trustedCA * Integration test for ext-authz Envoy extension (#17980) * Fix incorrect protocol for transparent proxy upstreams. (#17894) This PR fixes a bug that was introduced in: https://github.com/hashicorp/consul/pull/16021 A user setting a protocol in proxy-defaults would cause tproxy implicit upstreams to not honor the upstream service's protocol set in its `ServiceDefaults.Protocol` field, and would instead always use the proxy-defaults value. Due to the fact that upstreams configured with "tcp" can successfully contact upstream "http" services, this issue was not recognized until recently (a proxy-defaults with "tcp" and a listening service with "http" would make successful requests, but not the opposite). As a temporary work-around, users experiencing this issue can explicitly set the protocol on the `ServiceDefaults.UpstreamConfig.Overrides`, which should take precedence. The fix in this PR removes the proxy-defaults protocol from the wildcard upstream that tproxy uses to configure implicit upstreams. When the protocol was included, it would always overwrite the value during discovery chain compilation, which was not correct. The discovery chain compiler also consumes proxy defaults to determine the protocol, so simply excluding it from the wildcard upstream config map resolves the issue. * feat: include nodes count in operator usage endpoint and cli command (#17939) * feat: update operator usage api endpoint to include nodes count * feat: update operator usange cli command to includes nodes count * [OSS] Improve Gateway Test Coverage of Catalog Health (#18011) * fix(cli): remove failing check from 'connect envoy' registration for api gateway * test(integration): add tests to check catalog statsus of gateways on startup * remove extra sleep comment * Update test/integration/consul-container/libs/assert/service.go * changelog * Fixes Traffic rate limitting docs (#17997) * Fix removed service-to-service peering links (#17221) * docs: fix removed service-to-service peering links * docs: extend peering-via-mesh-gateways intro (thanks @trujillo-adam) --------- Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> * docs: Sameness "beta" warning (#18017) * Warning updates * .x * updated typo in tab heading (#18022) * updated typo in tab heading * updated tab group typo, too * Document that DNS lookups can target cluster peers (#17990) Static DNS lookups, in addition to explicitly targeting a datacenter, can target a cluster peer. This was added in 95dc0c7b301b70a6b955a8b7c9737c9b86f03df6 but didn't make the documentation. The driving function for the change is `parseLocality` here:0b1299c28d/agent/dns_oss.go (L25)
The biggest change in this is to adjust the standard lookup syntax to tie `.<datacenter>` to `.dc` as required-together, and to append in the similar `.<cluster-peer>.peer` optional argument, both to A record and SRV record lookups. Co-authored-by: David Yu <dyu@hashicorp.com> * Add first integration test for jwt auth with intention (#18005) * fix stand-in text for name field (#18030) * removed sameness conf entry from failover nav (#18033) * docs - add service sync annotations and k8s service weight annotation (#18032) * Docs for https://github.com/hashicorp/consul-k8s/pull/2293 * remove versions for enterprise features since they are old --------- Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com> * docs - add jobs use case for service mesh k8s (#18037) * docs - add jobs use case for service mesh k8s * add code blocks * address feedback (#18045) * Add verify server hostname to tls default (#17155) * [CC-5718] Remove HCP token requirement during bootstrap * Re-add error for loading HCP management token * backport of commit f1225f5d474406073ae9f46ddd3c51e93aee3d8a * backport of commit 5958ae0921522707652794668233741298863ade * backport of commit 536e6f3b3b68754cb9958f83e225e6f7d1565c08 * backport of commit a99dd82929013249c8f57f1867da8a8ed4fb93d9 * backport of commit fc680e806ee6428c3a363d066f6b093fbe2fdd20 --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Dan Bond <danbond@protonmail.com> Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com> Co-authored-by: Ronald <roncodingenthusiast@users.noreply.github.com> Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com> Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com> Co-authored-by: Luke Kysow <1034429+lkysow@users.noreply.github.com> Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com> Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Co-authored-by: David Yu <dyu@hashicorp.com> Co-authored-by: Bryce Kalow <bkalow@hashicorp.com> Co-authored-by: Paul Glass <pglass@hashicorp.com> Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> Co-authored-by: Poonam Jadhav <poonam.jadhav@hashicorp.com> Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com> Co-authored-by: Chris Thain <32781396+cthain@users.noreply.github.com> Co-authored-by: Hariram Sankaran <56744845+ramramhariram@users.noreply.github.com> Co-authored-by: shanafarkas <105076572+shanafarkas@users.noreply.github.com> Co-authored-by: Thomas Eckert <teckert@hashicorp.com> Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com> Co-authored-by: Joshua Timmons <josh.timmons@hashicorp.com> Co-authored-by: Ashesh Vidyut <134911583+absolutelightning@users.noreply.github.com> Co-authored-by: Dan Stough <dan.stough@hashicorp.com> Co-authored-by: Curt Bushko <cbushko@gmail.com> Co-authored-by: Tobias Birkefeld <t@craxs.de> Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com> Co-authored-by: Semir Patel <semir.patel@hashicorp.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: chappie <6537530+chapmanc@users.noreply.github.com> Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com> Co-authored-by: John Murret <john.murret@hashicorp.com> Co-authored-by: Mark Campbell-Vincent <mnmvincent@gmail.com> Co-authored-by: Daniel Upton <daniel@floppy.co> Co-authored-by: Steven Zamborsky <97125550+stevenzamborsky@users.noreply.github.com> Co-authored-by: George Bolo <george.bolo@gmail.com> Co-authored-by: Chris S. Kim <ckim@hashicorp.com> Co-authored-by: wangxinyi7 <121973291+wangxinyi7@users.noreply.github.com> Co-authored-by: cskh <hui.kang@hashicorp.com> Co-authored-by: V. K <cn007b@gmail.com> Co-authored-by: Iryna Shustava <ishustava@users.noreply.github.com> Co-authored-by: Alex Simenduev <shamil.si@gmail.com> Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> Co-authored-by: Dan Bond <danbond@protonmail.com> Co-authored-by: sarahalsmiller <100602640+sarahalsmiller@users.noreply.github.com> Co-authored-by: Gerard Nguyen <gerard@hashicorp.com> Co-authored-by: mr-miles <miles.waller@gmail.com> Co-authored-by: natemollica-dev <57850649+natemollica-nm@users.noreply.github.com> Co-authored-by: John Maguire <john.maguire@hashicorp.com> Co-authored-by: Samantha <hello@entropy.cat> Co-authored-by: Ranjandas <thejranjan@gmail.com> Co-authored-by: Evan Phoenix <evan@phx.io> Co-authored-by: Michael Hofer <karras@users.noreply.github.com> Co-authored-by: J.C. Jones <james.jc.jones@gmail.com> Co-authored-by: Fulvio <fulviodenza823@gmail.com> Co-authored-by: Jeremy Jacobson <jeremy.jacobson@hashicorp.com> Co-authored-by: Jeremy Jacobson <jjacobson93@users.noreply.github.com>
585 lines
19 KiB
Go
585 lines
19 KiB
Go
// Copyright (c) HashiCorp, Inc.
|
|
// SPDX-License-Identifier: MPL-2.0
|
|
|
|
// Package bootstrap handles bootstrapping an agent's config from HCP. It must be a
|
|
// separate package from other HCP components because it has a dependency on
|
|
// agent/config while other components need to be imported and run within the
|
|
// server process in agent/consul and that would create a dependency cycle.
|
|
package bootstrap
|
|
|
|
import (
|
|
"bufio"
|
|
"context"
|
|
"crypto/tls"
|
|
"crypto/x509"
|
|
"encoding/json"
|
|
"encoding/pem"
|
|
"errors"
|
|
"fmt"
|
|
"os"
|
|
"path/filepath"
|
|
"strings"
|
|
"time"
|
|
|
|
"github.com/hashicorp/consul/agent/config"
|
|
"github.com/hashicorp/consul/agent/connect"
|
|
hcpclient "github.com/hashicorp/consul/agent/hcp/client"
|
|
"github.com/hashicorp/consul/lib"
|
|
"github.com/hashicorp/consul/lib/retry"
|
|
"github.com/hashicorp/go-uuid"
|
|
)
|
|
|
|
const (
|
|
subDir = "hcp-config"
|
|
|
|
caFileName = "server-tls-cas.pem"
|
|
certFileName = "server-tls-cert.pem"
|
|
configFileName = "server-config.json"
|
|
keyFileName = "server-tls-key.pem"
|
|
tokenFileName = "hcp-management-token"
|
|
successFileName = "successful-bootstrap"
|
|
)
|
|
|
|
type ConfigLoader func(source config.Source) (config.LoadResult, error)
|
|
|
|
// UI is a shim to allow the agent command to pass in it's mitchelh/cli.UI so we
|
|
// can output useful messages to the user during bootstrapping. For example if
|
|
// we have to retry several times to bootstrap we don't want the agent to just
|
|
// stall with no output which is the case if we just returned all intermediate
|
|
// warnings or errors.
|
|
type UI interface {
|
|
Output(string)
|
|
Warn(string)
|
|
Info(string)
|
|
Error(string)
|
|
}
|
|
|
|
// RawBootstrapConfig contains the Consul config as a raw JSON string and the management token
|
|
// which either was retrieved from persisted files or from the bootstrap endpoint
|
|
type RawBootstrapConfig struct {
|
|
ConfigJSON string
|
|
ManagementToken string
|
|
}
|
|
|
|
// LoadConfig will attempt to load previously-fetched config from disk and fall back to
|
|
// fetch from HCP servers if the local data is incomplete.
|
|
// It must be passed a (CLI) UI implementation so it can deliver progress
|
|
// updates to the user, for example if it is waiting to retry for a long period.
|
|
func LoadConfig(ctx context.Context, client hcpclient.Client, dataDir string, loader ConfigLoader, ui UI) (ConfigLoader, error) {
|
|
ui.Output("Loading configuration from HCP")
|
|
|
|
// See if we have existing config on disk
|
|
//
|
|
// OPTIMIZE: We could probably be more intelligent about config loading.
|
|
// The currently implemented approach is:
|
|
// 1. Attempt to load data from disk
|
|
// 2. If that fails or the data is incomplete, block indefinitely fetching remote config.
|
|
//
|
|
// What if instead we had the following flow:
|
|
// 1. Attempt to fetch config from HCP.
|
|
// 2. If that fails, fall back to data on disk from last fetch.
|
|
// 3. If that fails, go into blocking loop to fetch remote config.
|
|
//
|
|
// This should allow us to more gracefully transition cases like when
|
|
// an existing cluster is linked, but then wants to receive TLS materials
|
|
// at a later time. Currently, if we observe the existing-cluster marker we
|
|
// don't attempt to fetch any additional configuration from HCP.
|
|
|
|
cfg, ok := loadPersistedBootstrapConfig(dataDir, ui)
|
|
if !ok {
|
|
ui.Info("Fetching configuration from HCP servers")
|
|
|
|
var err error
|
|
cfg, err = fetchBootstrapConfig(ctx, client, dataDir, ui)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to bootstrap from HCP: %w", err)
|
|
}
|
|
ui.Info("Configuration fetched from HCP and saved on local disk")
|
|
|
|
} else {
|
|
ui.Info("Loaded HCP configuration from local disk")
|
|
|
|
}
|
|
|
|
// Create a new loader func to return
|
|
newLoader := bootstrapConfigLoader(loader, cfg)
|
|
return newLoader, nil
|
|
}
|
|
|
|
// bootstrapConfigLoader is a ConfigLoader for passing bootstrap JSON config received from HCP
|
|
// to the config.builder. ConfigLoaders are functions used to build an agent's RuntimeConfig
|
|
// from various sources like files and flags. This config is contained in the config.LoadResult.
|
|
//
|
|
// The flow to include bootstrap config from HCP as a loader's data source is as follows:
|
|
//
|
|
// 1. A base ConfigLoader function (baseLoader) is created on agent start, and it sets the input
|
|
// source argument as the DefaultConfig.
|
|
//
|
|
// 2. When a server agent can be configured by HCP that baseLoader is wrapped in this bootstrapConfigLoader.
|
|
//
|
|
// 3. The bootstrapConfigLoader calls that base loader with the bootstrap JSON config as the
|
|
// default source. This data will be merged with other valid sources in the config.builder.
|
|
//
|
|
// 4. The result of the call to baseLoader() below contains the resulting RuntimeConfig, and we do some
|
|
// additional modifications to attach data that doesn't get populated during the build in the config pkg.
|
|
//
|
|
// Note that since the ConfigJSON is stored as the baseLoader's DefaultConfig, its data is the first
|
|
// to be merged by the config.builder and could be overwritten by user-provided values in config files or
|
|
// CLI flags. However, values set to RuntimeConfig after the baseLoader call are final.
|
|
func bootstrapConfigLoader(baseLoader ConfigLoader, cfg *RawBootstrapConfig) ConfigLoader {
|
|
return func(source config.Source) (config.LoadResult, error) {
|
|
// Don't allow any further attempts to provide a DefaultSource. This should
|
|
// only ever be needed later in client agent AutoConfig code but that should
|
|
// be mutually exclusive from this bootstrapping mechanism since this is
|
|
// only for servers. If we ever try to change that, this clear failure
|
|
// should alert future developers that the assumptions are changing rather
|
|
// than quietly not applying the config they expect!
|
|
if source != nil {
|
|
return config.LoadResult{},
|
|
fmt.Errorf("non-nil config source provided to a loader after HCP bootstrap already provided a DefaultSource")
|
|
}
|
|
|
|
// Otherwise, just call to the loader we were passed with our own additional
|
|
// JSON as the source.
|
|
//
|
|
// OPTIMIZE: We could check/log whether any fields set by the remote config were overwritten by a user-provided flag.
|
|
res, err := baseLoader(config.FileSource{
|
|
Name: "HCP Bootstrap",
|
|
Format: "json",
|
|
Data: cfg.ConfigJSON,
|
|
})
|
|
if err != nil {
|
|
return res, fmt.Errorf("failed to load HCP Bootstrap config: %w", err)
|
|
}
|
|
|
|
finalizeRuntimeConfig(res.RuntimeConfig, cfg)
|
|
return res, nil
|
|
}
|
|
}
|
|
|
|
const (
|
|
accessControlHeaderName = "Access-Control-Expose-Headers"
|
|
accessControlHeaderValue = "x-consul-default-acl-policy"
|
|
)
|
|
|
|
// finalizeRuntimeConfig will set additional HCP-specific values that are not
|
|
// handled by the config.builder.
|
|
func finalizeRuntimeConfig(rc *config.RuntimeConfig, cfg *RawBootstrapConfig) {
|
|
rc.Cloud.ManagementToken = cfg.ManagementToken
|
|
|
|
// HTTP response headers are modified for the HCP UI to work.
|
|
if rc.HTTPResponseHeaders == nil {
|
|
rc.HTTPResponseHeaders = make(map[string]string)
|
|
}
|
|
prevValue, ok := rc.HTTPResponseHeaders[accessControlHeaderName]
|
|
if !ok {
|
|
rc.HTTPResponseHeaders[accessControlHeaderName] = accessControlHeaderValue
|
|
} else {
|
|
rc.HTTPResponseHeaders[accessControlHeaderName] = prevValue + "," + accessControlHeaderValue
|
|
}
|
|
}
|
|
|
|
// fetchBootstrapConfig will fetch boostrap configuration from remote servers and persist it to disk.
|
|
// It will retry until successful or a terminal error condition is found (e.g. permission denied).
|
|
func fetchBootstrapConfig(ctx context.Context, client hcpclient.Client, dataDir string, ui UI) (*RawBootstrapConfig, error) {
|
|
w := retry.Waiter{
|
|
MinWait: 1 * time.Second,
|
|
MaxWait: 5 * time.Minute,
|
|
Jitter: retry.NewJitter(50),
|
|
}
|
|
|
|
var bsCfg *hcpclient.BootstrapConfig
|
|
for {
|
|
// Note we don't want to shadow `ctx` here since we need that for the Wait
|
|
// below.
|
|
reqCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
|
|
defer cancel()
|
|
|
|
resp, err := client.FetchBootstrap(reqCtx)
|
|
if err != nil {
|
|
ui.Error(fmt.Sprintf("Error: failed to fetch bootstrap config from HCP, will retry in %s: %s",
|
|
w.NextWait().Round(time.Second), err))
|
|
if err := w.Wait(ctx); err != nil {
|
|
return nil, err
|
|
}
|
|
// Finished waiting, restart loop
|
|
continue
|
|
}
|
|
bsCfg = resp
|
|
break
|
|
}
|
|
|
|
devMode := dataDir == ""
|
|
|
|
cfgJSON, err := persistAndProcessConfig(dataDir, devMode, bsCfg)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to persist config for existing cluster: %w", err)
|
|
}
|
|
|
|
return &RawBootstrapConfig{
|
|
ConfigJSON: cfgJSON,
|
|
ManagementToken: bsCfg.ManagementToken,
|
|
}, nil
|
|
}
|
|
|
|
// persistAndProcessConfig is called when we receive data from CCM.
|
|
// We validate and persist everything that was received, then also update
|
|
// the JSON config as needed.
|
|
func persistAndProcessConfig(dataDir string, devMode bool, bsCfg *hcpclient.BootstrapConfig) (string, error) {
|
|
if devMode {
|
|
// Agent in dev mode, we still need somewhere to persist the certs
|
|
// temporarily though to be able to start up at all since we don't support
|
|
// inline certs right now. Use temp dir
|
|
tmp, err := os.MkdirTemp(os.TempDir(), "consul-dev-")
|
|
if err != nil {
|
|
return "", fmt.Errorf("failed to create temp dir for certificates: %w", err)
|
|
}
|
|
dataDir = tmp
|
|
}
|
|
|
|
// Create subdir if it's not already there.
|
|
dir := filepath.Join(dataDir, subDir)
|
|
if err := lib.EnsurePath(dir, true); err != nil {
|
|
return "", fmt.Errorf("failed to ensure directory %q: %w", dir, err)
|
|
}
|
|
|
|
// Parse just to a map for now as we only have to inject to a specific place
|
|
// and parsing whole Config struct is complicated...
|
|
var cfg map[string]any
|
|
|
|
if err := json.Unmarshal([]byte(bsCfg.ConsulConfig), &cfg); err != nil {
|
|
return "", fmt.Errorf("failed to unmarshal bootstrap config: %w", err)
|
|
}
|
|
|
|
// Avoid ever setting an initial_management token from HCP now that we can
|
|
// separately bootstrap an HCP management token with a distinct accessor ID.
|
|
//
|
|
// CCM will continue to return an initial_management token because previous versions of Consul
|
|
// cannot bootstrap an HCP management token distinct from the initial management token.
|
|
// This block can be deleted once CCM supports tailoring bootstrap config responses
|
|
// based on the version of Consul that requested it.
|
|
acls, aclsOK := cfg["acl"].(map[string]any)
|
|
if aclsOK {
|
|
tokens, tokensOK := acls["tokens"].(map[string]interface{})
|
|
if tokensOK {
|
|
delete(tokens, "initial_management")
|
|
}
|
|
}
|
|
|
|
var cfgJSON string
|
|
if bsCfg.TLSCert != "" {
|
|
if err := validateTLSCerts(bsCfg.TLSCert, bsCfg.TLSCertKey, bsCfg.TLSCAs); err != nil {
|
|
return "", fmt.Errorf("invalid certificates: %w", err)
|
|
}
|
|
|
|
// Persist the TLS cert files from the response since we need to refer to them
|
|
// as disk files either way.
|
|
if err := persistTLSCerts(dir, bsCfg.TLSCert, bsCfg.TLSCertKey, bsCfg.TLSCAs); err != nil {
|
|
return "", fmt.Errorf("failed to persist TLS certificates to dir %q: %w", dataDir, err)
|
|
}
|
|
|
|
// Store paths to the persisted TLS cert files.
|
|
cfg["ca_file"] = filepath.Join(dir, caFileName)
|
|
cfg["cert_file"] = filepath.Join(dir, certFileName)
|
|
cfg["key_file"] = filepath.Join(dir, keyFileName)
|
|
|
|
// Convert the bootstrap config map back into a string
|
|
cfgJSONBytes, err := json.Marshal(cfg)
|
|
if err != nil {
|
|
return "", err
|
|
}
|
|
cfgJSON = string(cfgJSONBytes)
|
|
}
|
|
|
|
if !devMode {
|
|
// Persist the final config we need to add so that it is available locally after a restart.
|
|
// Assuming the configured data dir wasn't a tmp dir to start with.
|
|
if err := persistBootstrapConfig(dir, cfgJSON); err != nil {
|
|
return "", fmt.Errorf("failed to persist bootstrap config: %w", err)
|
|
}
|
|
|
|
// HCP only returns the management token if it requires Consul to
|
|
// initialize it
|
|
if bsCfg.ManagementToken != "" {
|
|
if err := validateManagementToken(bsCfg.ManagementToken); err != nil {
|
|
return "", fmt.Errorf("invalid management token: %w", err)
|
|
}
|
|
if err := persistManagementToken(dir, bsCfg.ManagementToken); err != nil {
|
|
return "", fmt.Errorf("failed to persist HCP management token: %w", err)
|
|
}
|
|
}
|
|
|
|
if err := persistSuccessMarker(dir); err != nil {
|
|
return "", fmt.Errorf("failed to persist success marker: %w", err)
|
|
}
|
|
}
|
|
return cfgJSON, nil
|
|
}
|
|
|
|
func persistSuccessMarker(dir string) error {
|
|
name := filepath.Join(dir, successFileName)
|
|
return os.WriteFile(name, []byte(""), 0600)
|
|
|
|
}
|
|
|
|
func persistTLSCerts(dir string, serverCert, serverKey string, caCerts []string) error {
|
|
if serverCert == "" || serverKey == "" {
|
|
return fmt.Errorf("unexpected bootstrap response from HCP: missing TLS information")
|
|
}
|
|
|
|
// Write out CA cert(s). We write them all to one file because Go's x509
|
|
// machinery will read as many certs as it finds from each PEM file provided
|
|
// and add them separaetly to the CertPool for validation
|
|
f, err := os.OpenFile(filepath.Join(dir, caFileName), os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0600)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
bf := bufio.NewWriter(f)
|
|
for _, caPEM := range caCerts {
|
|
bf.WriteString(caPEM + "\n")
|
|
}
|
|
if err := bf.Flush(); err != nil {
|
|
return err
|
|
}
|
|
if err := f.Close(); err != nil {
|
|
return err
|
|
}
|
|
|
|
if err := os.WriteFile(filepath.Join(dir, certFileName), []byte(serverCert), 0600); err != nil {
|
|
return err
|
|
}
|
|
|
|
if err := os.WriteFile(filepath.Join(dir, keyFileName), []byte(serverKey), 0600); err != nil {
|
|
return err
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// Basic validation to ensure a UUID was loaded and assumes the token is non-empty
|
|
func validateManagementToken(token string) error {
|
|
// note: we assume that the token is not an empty string
|
|
if _, err := uuid.ParseUUID(token); err != nil {
|
|
return errors.New("management token is not a valid UUID")
|
|
}
|
|
return nil
|
|
}
|
|
|
|
func persistManagementToken(dir, token string) error {
|
|
name := filepath.Join(dir, tokenFileName)
|
|
return os.WriteFile(name, []byte(token), 0600)
|
|
}
|
|
|
|
func persistBootstrapConfig(dir, cfgJSON string) error {
|
|
// Persist the important bits we got from bootstrapping. The TLS certs are
|
|
// already persisted, just need to persist the config we are going to add.
|
|
name := filepath.Join(dir, configFileName)
|
|
return os.WriteFile(name, []byte(cfgJSON), 0600)
|
|
}
|
|
|
|
func loadPersistedBootstrapConfig(dataDir string, ui UI) (*RawBootstrapConfig, bool) {
|
|
if dataDir == "" {
|
|
// There's no files to load when in dev mode.
|
|
return nil, false
|
|
}
|
|
|
|
dir := filepath.Join(dataDir, subDir)
|
|
|
|
_, err := os.Stat(filepath.Join(dir, successFileName))
|
|
if os.IsNotExist(err) {
|
|
// Haven't bootstrapped from HCP.
|
|
return nil, false
|
|
}
|
|
if err != nil {
|
|
ui.Warn("failed to check for config on disk, re-fetching from HCP: " + err.Error())
|
|
return nil, false
|
|
}
|
|
|
|
if err := checkCerts(dir); err != nil {
|
|
ui.Warn("failed to validate certs on disk, re-fetching from HCP: " + err.Error())
|
|
return nil, false
|
|
}
|
|
|
|
configJSON, err := loadBootstrapConfigJSON(dataDir)
|
|
if err != nil {
|
|
ui.Warn("failed to load bootstrap config from disk, re-fetching from HCP: " + err.Error())
|
|
return nil, false
|
|
}
|
|
|
|
mgmtToken, err := loadManagementToken(dir)
|
|
if err != nil {
|
|
ui.Warn("failed to load HCP management token from disk, re-fetching from HCP: " + err.Error())
|
|
return nil, false
|
|
}
|
|
|
|
return &RawBootstrapConfig{
|
|
ConfigJSON: configJSON,
|
|
ManagementToken: mgmtToken,
|
|
}, true
|
|
}
|
|
|
|
func loadBootstrapConfigJSON(dataDir string) (string, error) {
|
|
filename := filepath.Join(dataDir, subDir, configFileName)
|
|
|
|
_, err := os.Stat(filename)
|
|
if os.IsNotExist(err) {
|
|
return "", nil
|
|
}
|
|
if err != nil {
|
|
return "", fmt.Errorf("failed to check for bootstrap config: %w", err)
|
|
}
|
|
|
|
// Attempt to load persisted config to check for errors and basic validity.
|
|
// Errors here will raise issues like referencing unsupported config fields.
|
|
_, err = config.Load(config.LoadOpts{
|
|
ConfigFiles: []string{filename},
|
|
HCL: []string{
|
|
"server = true",
|
|
`bind_addr = "127.0.0.1"`,
|
|
fmt.Sprintf("data_dir = %q", dataDir),
|
|
},
|
|
ConfigFormat: "json",
|
|
})
|
|
if err != nil {
|
|
return "", fmt.Errorf("failed to parse local bootstrap config: %w", err)
|
|
}
|
|
|
|
jsonBs, err := os.ReadFile(filename)
|
|
if err != nil {
|
|
return "", fmt.Errorf(fmt.Sprintf("failed to read local bootstrap config file: %s", err))
|
|
}
|
|
return strings.TrimSpace(string(jsonBs)), nil
|
|
}
|
|
|
|
func loadManagementToken(dir string) (string, error) {
|
|
name := filepath.Join(dir, tokenFileName)
|
|
bytes, err := os.ReadFile(name)
|
|
if os.IsNotExist(err) {
|
|
return "", errors.New("configuration files on disk are incomplete, missing: " + name)
|
|
}
|
|
if err != nil {
|
|
return "", fmt.Errorf("failed to read: %w", err)
|
|
}
|
|
|
|
token := string(bytes)
|
|
if err := validateManagementToken(token); err != nil {
|
|
return "", fmt.Errorf("invalid management token: %w", err)
|
|
}
|
|
|
|
return token, nil
|
|
}
|
|
|
|
func checkCerts(dir string) error {
|
|
files := []string{
|
|
filepath.Join(dir, caFileName),
|
|
filepath.Join(dir, certFileName),
|
|
filepath.Join(dir, keyFileName),
|
|
}
|
|
|
|
missing := make([]string, 0)
|
|
for _, file := range files {
|
|
_, err := os.Stat(file)
|
|
if os.IsNotExist(err) {
|
|
missing = append(missing, file)
|
|
continue
|
|
}
|
|
if err != nil {
|
|
return err
|
|
}
|
|
}
|
|
|
|
// If all the TLS files are missing, assume this is intentional.
|
|
// Existing clusters do not receive any TLS certs.
|
|
if len(missing) == len(files) {
|
|
return nil
|
|
}
|
|
|
|
// If only some of the files are missing, something went wrong.
|
|
if len(missing) > 0 {
|
|
return fmt.Errorf("configuration files on disk are incomplete, missing: %v", missing)
|
|
}
|
|
|
|
cert, key, caCerts, err := loadCerts(dir)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to load certs from disk: %w", err)
|
|
}
|
|
|
|
if err = validateTLSCerts(cert, key, caCerts); err != nil {
|
|
return fmt.Errorf("invalid certs on disk: %w", err)
|
|
}
|
|
return nil
|
|
}
|
|
|
|
func loadCerts(dir string) (cert, key string, caCerts []string, err error) {
|
|
certPEMBlock, err := os.ReadFile(filepath.Join(dir, certFileName))
|
|
if err != nil {
|
|
return "", "", nil, err
|
|
}
|
|
keyPEMBlock, err := os.ReadFile(filepath.Join(dir, keyFileName))
|
|
if err != nil {
|
|
return "", "", nil, err
|
|
}
|
|
|
|
caPEMs, err := os.ReadFile(filepath.Join(dir, caFileName))
|
|
if err != nil {
|
|
return "", "", nil, err
|
|
}
|
|
caCerts, err = splitCACerts(caPEMs)
|
|
if err != nil {
|
|
return "", "", nil, fmt.Errorf("failed to parse CA certs: %w", err)
|
|
}
|
|
|
|
return string(certPEMBlock), string(keyPEMBlock), caCerts, nil
|
|
}
|
|
|
|
// splitCACerts takes a list of concatenated PEM blocks and splits
|
|
// them back up into strings. This is used because CACerts are written
|
|
// into a single file, but validated individually.
|
|
func splitCACerts(caPEMs []byte) ([]string, error) {
|
|
var out []string
|
|
|
|
for {
|
|
nextBlock, remaining := pem.Decode(caPEMs)
|
|
if nextBlock == nil {
|
|
break
|
|
}
|
|
if nextBlock.Type != "CERTIFICATE" {
|
|
return nil, fmt.Errorf("PEM-block should be CERTIFICATE type")
|
|
}
|
|
|
|
// Collect up to the start of the remaining bytes.
|
|
// We don't grab nextBlock.Bytes because it's not PEM encoded.
|
|
out = append(out, string(caPEMs[:len(caPEMs)-len(remaining)]))
|
|
caPEMs = remaining
|
|
}
|
|
|
|
if len(out) == 0 {
|
|
return nil, errors.New("invalid CA certificate")
|
|
}
|
|
return out, nil
|
|
}
|
|
|
|
// validateTLSCerts checks that the CA cert, server cert, and key on disk are structurally valid.
|
|
//
|
|
// OPTIMIZE: This could be improved by returning an error if certs are expired or close to expiration.
|
|
// However, that requires issuing new certs on bootstrap requests, since returning an error
|
|
// would trigger a re-fetch from HCP.
|
|
func validateTLSCerts(cert, key string, caCerts []string) error {
|
|
leaf, err := tls.X509KeyPair([]byte(cert), []byte(key))
|
|
if err != nil {
|
|
return errors.New("invalid server certificate or key")
|
|
}
|
|
_, err = x509.ParseCertificate(leaf.Certificate[0])
|
|
if err != nil {
|
|
return errors.New("invalid server certificate")
|
|
}
|
|
|
|
for _, caCert := range caCerts {
|
|
_, err = connect.ParseCert(caCert)
|
|
if err != nil {
|
|
return errors.New("invalid CA certificate")
|
|
}
|
|
}
|
|
return nil
|
|
}
|