Read requests performed during anti antropy full sync currently target
the leader only. This generates a non-negligible load on the leader when
the DC is large enough and can be offloaded to the followers following
the "eventually consistent" policy for the agent state.
We switch the AE read calls to use stale requests with a small (2s)
MaxStaleDuration value and make sure we do not read too fast after a
write.
* clarify possibilities for centralized proxy configuration
* add line breaks to config entries file
* add info about centralized config to built in proxy doc
* mondify connect landing page to help with navigation
* move internals details to its own page
* link fixes and shortening text on main page
* put built-in proxy options on its own page
* add configuration details for connect
* clarify security title and add observability page
* reorganize menu
* remove observability from configuration section
* Update website/source/docs/connect/configuration.html.md
Co-Authored-By: Paul Banks <banks@banksco.de>
* Update website/source/docs/connect/index.html.md
Co-Authored-By: Paul Banks <banks@banksco.de>
* Update website/source/docs/agent/config_entries.html.md
Co-Authored-By: Paul Banks <banks@banksco.de>
* Update website/source/docs/connect/configuration.html.md
Co-Authored-By: Paul Banks <banks@banksco.de>
* rename connect section to include service mesh
* reorganize sections per suggestions from paul
* add configuration edits from paul
* add internals edits from paul
* add observability edits from paul
* reorganize pages and menu
* Update website/source/docs/connect/configuration.html.md
Co-Authored-By: Paul Banks <banks@banksco.de>
* menu corrections and edits
* incorporate some of pauls comments
* incorporate more of pauls comments
* Update website/source/docs/connect/configuration.html.md
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* Update website/source/docs/connect/index.html.md
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* Update website/source/docs/connect/index.html.md
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* Update website/source/docs/connect/registration.html.md
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* incorporate kaitlin and pavanni feedback
* add redirect
* fix conflicts in index file
* Resolve conflicts in index file
* correct links for new organization
* Update website/source/docs/connect/proxies.html.md
Co-Authored-By: Paul Banks <banks@banksco.de>
* Update website/source/docs/connect/registration.html.md
Co-Authored-By: Paul Banks <banks@banksco.de>
* Update website/source/docs/connect/registration.html.md
Co-Authored-By: Paul Banks <banks@banksco.de>
* Update website/source/docs/connect/registration.html.md
Co-Authored-By: Paul Banks <banks@banksco.de>
* add title to service registration page
* Upgrade xDS (go-control-plane) API to support Envoy 1.10.
This includes backwards compatibility shim to work around the ext_authz package rename in 1.10.
It also adds integration test support in CI for 1.10.0.
* Fix go vet complaints
* go mod vendor
* Update Envoy version info in docs
* Update website/source/docs/connect/proxies/envoy.md
The policy in the time-of-day Sentinel example incorrectly references
the top-level time.hour constant. This is actually the same as the
time.Hour Go value, so in other words, 3600000000000 (the int64 value
representing the time in nanoseconds).
This is corrected by just using time.now.hour instead.
1. All {{ivy-codemirror}} components need 'refreshing' when they become
visible via our own `didAppear` method on the `{{code-editor}}`
component
(also see:)
- https://github.com/hashicorp/consul/pull/4190#discussion_r193270223
- 73db111db8 (r225264296)
2. On initial investigation, it looks like the component we are using
for the code editor doesn't distinguish between setting its `value`
programatically and a `keyup` event, i.e. an interaction from the user.
We currently pretend that whenever its `value` changes, it is a `keyup`
event. This means that when we reset the `value` to `""`
programmatically for form resetting purposes, a 'pretend keyup' event
would also be fired, which would in turn kick off the validation, which
would fail and show an error message for empty values in other fields of
the form - something that is perfectly valid if you haven't typed
anything yet. We solved this by checking for `isPristine` on fields that
are allowed to be empty before you have typed anything.
Just because Consul gives us a 404 this doesn't guarantee the KV doesn't
exist, it doesn't even mean we don't have access to it. Furthermore we
should never destroyRecord's without user interaction (therefore only via the
repo.delete method).
This switches destroyRecord to unloadRecord which performs the
additional legwork to keep ember-data in sync with the actual truth.
unloadRecord unloads the record from ember-data rather than sending an API
delete request, which would have been the intent here.
The observed bug was that a full restart of a consul datacenter (servers
and clients) in conjunction with a restart of a connect-flavored
application with bring-your-own-service-registration logic would very
frequently cause the envoy sidecar service check to never reflect the
aliased service.
Over the course of investigation several bugs and unfortunate
interactions were corrected:
(1)
local.CheckState objects were only shallow copied, but the key piece of
data that gets read and updated is one of the things not copied (the
underlying Check with a Status field). When the stock code was run with
the race detector enabled this highly-relevant-to-the-test-scenario field
was found to be racy.
Changes:
a) update the existing Clone method to include the Check field
b) copy-on-write when those fields need to change rather than
incrementally updating them in place.
This made the observed behavior occur slightly less often.
(2)
If anything about how the runLocal method for node-local alias check
logic was ever flawed, there was no fallback option. Those checks are
purely edge-triggered and failure to properly notice a single edge
transition would leave the alias check incorrect until the next flap of
the aliased check.
The change was to introduce a fallback timer to act as a control loop to
double check the alias check matches the aliased check every minute
(borrowing the duration from the non-local alias check logic body).
This made the observed behavior eventually go away when it did occur.
(3)
Originally I thought there were two main actions involved in the data race:
A. The act of adding the original check (from disk recovery) and its
first health evaluation.
B. The act of the HTTP API requests coming in and resetting the local
state when re-registering the same services and checks.
It took awhile for me to realize that there's a third action at work:
C. The goroutines associated with the original check and the later
checks.
The actual sequence of actions that was causing the bad behavior was
that the API actions result in the original check to be removed and
re-added _without waiting for the original goroutine to terminate_. This
means for brief windows of time during check definition edits there are
two goroutines that can be sending updates for the alias check status.
In extremely unlikely scenarios the original goroutine sees the aliased
check start up in `critical` before being removed but does not get the
notification about the nearly immediate update of that check to
`passing`.
This is interlaced wit the new goroutine coming up, initializing its
base case to `passing` from the current state and then listening for new
notifications of edge triggers.
If the original goroutine "finishes" its update, it then commits one
more write into the local state of `critical` and exits leaving the
alias check no longer reflecting the underlying check.
The correction here is to enforce that the old goroutines must terminate
before spawning the new one for alias checks.
* Improve startup message to avoid confusing users when no error occurs
Several times, some users not very familiar with Consul get confused
by error message at startup:
`[INFO] agent: (LAN) joined: 1 Err: <nil>`
Having `Err: <nil>` seems weird to many users, I propose to have the
following instead:
* Success: `[INFO] agent: (LAN) joined: 1`
* Error: `[WARN] agent: (LAN) couldn't join: %d Err: ERROR`