While working on another change I caused a bunch of these tests to fail.
Unfortunately the failure messages were not super helpful at first.
One problem was that the request and response were created outside of
the retry. This meant that when the second attempt happened, the request
body was empty (because the buffer had been consumed), and so the
request was not actually being retried. This was fixed by moving more of
the request creation into the retry block.
Another problem was that these functions can return errors in two ways, and
are not consistent about which way they use. Some errors are returned to
the response writer, but the tests were not checking those errors, which
was causing a panic later on. This was fixed by adding a check for the
response code.
Also adds some missing t.Helper(), and has assertIndex use checkIndex so
that it is clear these are the same implementation.
I saw this test flake locally, and it was easy to reproduce with -count=10.
The failure was: 'TestAgent.dns: rpc error: error=No known Consul servers'.
Waiting for the agent seems to fix it.
This commit fixes a test that I saw flake locally while running tests. The test output from the monitor
started immediately after the line the test was looking for.
To fix the problem a channel is closed when the goroutine starts. Shutdown is not called until this channel
is closed, which seems to greatly reduce the chance of a flake.
TestAgent.Key was only used by 3 tests. Extracting it from the common helper that is used in hundreds of
tests helps keep the shared part small and more focused.
This required a second change (which I was planning on making anyway), which was to change the behaviour of
DataDir. Now in all cases the TestAgent will use the DataDir, and clean it up once the test is complete.
There are a couple reasons for this change:
1. agent.go is way too big. Smaller files makes code eaasier to read
because tools that show usage also include filename which can give
a lot more context to someone trying to understand which functions
call other functions.
2. these two functions call into a large number of functions already in
keyring.go.
Issue and PR numbers do not overlap, they are based of the same counter.
A PR can be also linked to via issues, if it is a PR, Github will
redirect to it.
This change has the benefit that one can link to both - issues and PRs.
* Update k8s sync docs
- remove docs that said for nodeport service we register each instance
on a node with its same node name. We instead register each instance
onto the k8s-sync node
- add docs describing which ports and ips are used
This is a small step to allowing Agent to accept its dependencies
instead of creating them in New.
There were two fields in autoconfig.Config that were used exclusively
to load config. These were replaced with a single function, allowing us
to move LoadConfig back to the config package.
Also removed the WithX functions for building a Config. Since these were
simple assignment, it appeared we were not getting much value from them.
Making these functions allows us to cleanup how an agent is initialized. They only make use of a config and a logger, so they do not need to be agent methods.
Also cleanup the testing to use t.Run and require.
Fixes#8466
Since Consul 1.8.0 there was a bug in how ingress gateway protocol
compatibility was enforced. At the point in time that an ingress-gateway
config entry was modified the discovery chain for each upstream was
checked to ensure the ingress gateway protocol matched. Unfortunately
future modifications of other config entries were not validated against
existing ingress-gateway definitions, such as:
1. create tcp ingress-gateway pointing to 'api' (ok)
2. create service-defaults for 'api' setting protocol=http (worked, but not ok)
3. create service-splitter or service-router for 'api' (worked, but caused an agent panic)
If you were to do these in a different order, it would fail without a
crash:
1. create service-defaults for 'api' setting protocol=http (ok)
2. create service-splitter or service-router for 'api' (ok)
3. create tcp ingress-gateway pointing to 'api' (fail with message about
protocol mismatch)
This PR introduces the missing validation. The two new behaviors are:
1. create tcp ingress-gateway pointing to 'api' (ok)
2. (NEW) create service-defaults for 'api' setting protocol=http ("ok" for back compat)
3. (NEW) create service-splitter or service-router for 'api' (fail with
message about protocol mismatch)
In consideration for any existing users that may be inadvertently be
falling into item (2) above, that is now officiall a valid configuration
to be in. For anyone falling into item (3) above while you cannot use
the API to manufacture that scenario anymore, anyone that has old (now
bad) data will still be able to have the agent use them just enough to
generate a new agent/proxycfg error message rather than a panic.
Unfortunately we just don't have enough information to properly fix the
config entries.
* ui: Reduce reconnection attempts on disconnection
The UI will attempt to reconnect/retry a blocking query to Consul after
a disconnection in certain circumstances.
1. On receipt of a 5xx error (used for keeping blocking queries running
through reverse proxies that have lowertimeouts than consul itself)
2. When a user switches to a different tab and back again)
3. When the connection to Consul is dropped entirely (when Consul itself
has exited)
In the last case the retry attempts where not using a 3 second interval
between attempts like the first case is.
This commit changes the last case to use the same 3 second pause as the
last case.
* ui: Switch selects to use more HTML-like approach for optgroups
* Add KV comparator
* Use new option/optgroup approach for sort/select
* Fix up tests for new order of menu items