* Added log-file flag to capture Consul logs in a user specified file
* Refactored code.
* Refactored code. Added flags to rotate logs based on bytes and duration
* Added the flags for log file and log rotation on the webpage
* Fixed TestSantize from failing due to the addition of 3 flags
* Introduced changes : mutex, data-dir log writes, rotation logic
* Added test for logfile and updated the default log destination for docs
* Log name now uses UnixNano
* TestLogFile is now uses t.Parallel()
* Removed unnecessary int64Val function
* Updated docs to reflect default log name for log-file
* No longer writes to data-dir and adds .log if the filename has no extension
- Add WaitForTestAgent to tests flaky due to missing serfHealth registration
- Fix bug in retries calling Fatalf with *testing.T
- Convert TestLockCommand_ChildExitCode to table driven test
- Improve resilience of testrpc.WaitForLeader()
- Add additionall retry to CI
- Increase "go test" timeout to 8m
- Add wait for cluster leader to several tests in the agent package
- Add retry to some tests in the api and command packages
- Dev mode assumed no persistence of services although proxy state is persisted which caused proxies to be killed on startup as their services were no longer registered. Fixed.
- Didn't snapshot the ProxyID which meant that proxies were adopted OK from snapshot but failed to restart if they died since there was no proxyID in the ENV on restart
- Dev mode with no persistence just kills all proxies on shutdown since it can't recover them later
- Naming things
This turns out to have a lot more subtelty than we accounted for. The test suite is especially prone to races now we can only poll the child and many extra levels of indirectoin are needed to correctly run daemon process without it becoming a Zombie.
I ran this test suite in a loop with parallel enabled to verify for races (-race doesn't find any as they are logical inter-process ones not actual data races). I made it through ~50 runs before hitting an error due to timing which is much better than before. I want to go back and see if we can do better though. Just getting this up.
- Includes some bug fixes for previous `api` work and `agent` that weren't tested
- Needed somewhat pervasive changes to support hash based blocking - some TODOs left in our watch toolchain that will explicitly fail on hash-based watches.
- Integration into `connect` is partially done here but still WIP
This has an explcit unit test already which somehow passes at least some of the time. I suspect it passes because under some conditions the actual KV delete fails and returns non-zero as well as printing the warning which is what is being checked for in the test.
For some reason despite working for quite some time like this, I now have a branch in which this test fails consistently. It may be a timing/env issue where another process running an agent causes the delete to be successful so the command returns a 0 by chance. Either way this is clearly wrong and fixing it stops the test being flaky in my branch.
Calling twice appears to have no adverse effects, however serves to
confuse as to what the semantics of such code may be! This seems like it
was probably introduced while resolving conflicts during the merge of
the fix for #2404.
The root cause is actually that the agent's streaming HTTP API didn't flush until the first log line was found which commonly was pretty soon since the default level is INFO. In cases where there were no logs immediately due to level for instance, the client gets stuck in the HTTP code waiting on a response packet from the server before we enter the loop that checks the shutdown channel from the signal handler.
This fix flushes the initial status immediately on the streaming endpoint which lets the client code get into it's expected state where it's listening for shutdown or log lines.
The `consul agent` command was ignoring extra command line arguments
which can lead to confusion when the user has for example forgotten to
add a dash in front of an argument or is not using an `=` when setting
boolean flags to `true`. `-bootstrap true` is not the same as
`-bootstrap=true`, for example.
Since all command line flags are known and we don't expect unparsed
arguments we can return an error. However, this may make it slightly
more difficult in the future if we ever wanted to have these kinds of
arguments.
Fixes#3397