Commit graph

29 commits

Author SHA1 Message Date
hc-github-team-consul-core 72409898bc
Backport of [NET-4107][Supportability] Log Level set to TRACE and duration set to 5m for consul-debug into release/1.16.x (#17728)
* backport of commit 9d72a262f357ef0ee6409acae7d3c24d8f87caec
* backport of commit 9b9bb8d206bd68721bc89a9cdf488e8bba38f4d2
* backport of commit ba448092974b8f0439e64e338267835a02480f73

---------

Co-authored-by: Ashesh Vidyut <ashesh.vidyut@hashicorp.com>
2023-06-20 13:48:33 -07:00
Ronald 7a5c8dc1eb
Copyright headers for command folder (#16705)
* copyright headers for agent folder

* Ignore test data files

* fix proto files and remove headers in agent/uiserver folder

* ignore deep-copy files

* copyright headers for agent folder

* Copyright headers for command folder

* fix merge conflicts
2023-03-28 15:12:30 -04:00
Chris S. Kim 652b74dd37
Fix various flaky tests (#16396) 2023-02-23 14:52:18 -05:00
Chris S. Kim a0ac76ecf5
Allow consul debug on non-ACL consul servers (#15155) 2022-10-27 09:25:18 -04:00
Daniel Nephin c6993bda15 debug: update CLI docs
To clarify how trace is captured.

Also remove the minimum seconds check, because that is already done in prepare()
2022-02-15 18:16:12 -05:00
Daniel Nephin 5bd73fc218 debug: limit the size of the trace
We've noticed that a trace that is captured over the full duration is
too large to open on most machines. A trace.out captured over just the
interval period (30s by default) should be a more than enough time to
capture trace data.
2022-02-15 14:15:34 -05:00
Daniel Nephin e2a19b1799 debug: use human readable dates for filenames
The unix timestamps that were used make the debug data a little bit more
difficult to consume. By using human readable dates we can easily see
when the profile data was collected.

This commit also improves the test coverage. Two test cases are removed
and the assertions from those cases are moved to TestDebugCommand.

Now TestDebugCommand is able to validate the contents of all files. This
change reduces the test runtime of the command/debug package by almost
50%. It also makes much more strict assertions about the contents by
using gotest.tools/v3/fs.
2021-08-18 13:06:57 -04:00
Daniel Nephin dccaf95cc8 debug: small cleanup
Use the new WriteJsonFile function to write index.json
Remove .String() from time.local() since that is done by %s
Remove an unused field.
2021-08-18 12:30:59 -04:00
Daniel Nephin 064c43ee69 debug: restore cancel on SigInt
Some previous changes broke interrupting the debug on SigInterupt. This change restores
the original behaviour by passing a context to requests.

Since a new API client function was required to pass the context, I had
it also return an io.ReadCloser, so that output can be streamed to files
instead of fully buffering in process memory.
2021-08-18 12:29:34 -04:00
Daniel Nephin d2f5b4d335 debug: improve a couple of the test cases
Use gotest.tools/v3/fs to make better assertions about the files

Remove the TestAgent from TestDebugCommand_Prepare_ValidateTiming, since we can test that validation
without making any API calls.
2021-08-18 12:29:34 -04:00
Daniel Nephin bf30404412 debug: rename cluster target to members
The API is called members. Using the same name as the API should help describe the contents
of the file.
2021-08-18 12:29:34 -04:00
Daniel Nephin e1eab6509c debug: remove unused 2021-08-18 12:29:33 -04:00
Daniel Nephin cf2e25c6bb http: emit indented JSON in the metrics stream endpoint
To remove the need to decode and re-encode in the CLI
2021-07-26 17:53:33 -04:00
Daniel Nephin d716f709fd debug: use the new metrics stream in debug command 2021-07-26 17:53:32 -04:00
Dhia Ayachi e3dd0f9a44
generate a single debug file for a long duration capture (#10279)
* debug: remove the CLI check for debug_enabled

The API allows collecting profiles even debug_enabled=false as long as
ACLs are enabled. Remove this check from the CLI so that users do not
need to set debug_enabled=true for no reason.

Also:
- fix the API client to return errors on non-200 status codes for debug
  endpoints
- improve the failure messages when pprof data can not be collected

Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com>

* remove parallel test runs

parallel runs create a race condition that fail the debug tests

* snapshot the timestamp at the beginning of the capture

- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist

* Revert "snapshot the timestamp at the beginning of the capture"

This reverts commit c2d03346

* Refactor captureDynamic to extract capture logic for each item in a different func

* snapshot the timestamp at the beginning of the capture

- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist

* Revert "snapshot the timestamp at the beginning of the capture"

This reverts commit c2d03346

* Refactor captureDynamic to extract capture logic for each item in a different func

* extract wait group outside the go routine to avoid a race condition

* capture pprof in a separate go routine

* perform a single capture for pprof data for the whole duration

* add missing vendor dependency

* add a change log and fix documentation to reflect the change

* create function for timestamp dir creation and simplify error handling

* use error groups and ticker to simplify interval capture loop

* Logs, profile and traces are captured for the full duration. Metrics, Heap and Go routines are captured every interval

* refactor Logs capture routine and add log capture specific test

* improve error reporting when log test fail

* change test duration to 1s

* make time parsing in log line more robust

* refactor log time format in a const

* test on log line empty the earliest possible and return

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* rename function to captureShortLived

* more specific changelog

Co-authored-by: Paul Banks <banks@banksco.de>

* update documentation to reflect current implementation

* add test for behavior when invalid param is passed to the command

* fix argument line in test

* a more detailed description of the new behaviour

Co-authored-by: Paul Banks <banks@banksco.de>

* print success right after the capture is done

* remove an unnecessary error check

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* upgraded github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57 => v0.0.0-20210601050228-01bbb1931b22

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Co-authored-by: Paul Banks <banks@banksco.de>
2021-06-07 13:00:51 -04:00
Dhia Ayachi 00f7e0772a
debug: remove the CLI check for debug_enabled (#10273)
* debug: remove the CLI check for debug_enabled

The API allows collecting profiles even debug_enabled=false as long as
ACLs are enabled. Remove this check from the CLI so that users do not
need to set debug_enabled=true for no reason.

Also:
- fix the API client to return errors on non-200 status codes for debug
  endpoints
- improve the failure messages when pprof data can not be collected

Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com>

* remove parallel test runs

parallel runs create a race condition that fail the debug tests

* Add changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-05-27 09:41:53 -04:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
Daniel Nephin 8d35e37b3c testing: Remove all the defer os.Removeall
Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage
the removal
2020-08-14 19:58:53 -04:00
Daniel Nephin 8b6877febd Remove name from NewTestAgent
Using:

git grep -l 'NewTestAgent(t, t.Name(),' | \
    xargs sed -i -e 's/NewTestAgent(t, t.Name(),/NewTestAgent(t,/g'
2020-03-31 16:13:44 -04:00
Chris Piraino 3dd0b59793
Allow users to configure either unstructured or JSON logging (#7130)
* hclog Allow users to choose between unstructured and JSON logging
2020-01-28 17:50:41 -06:00
rerorero e210b0c854 fix: incorrect struct tag and WaitGroup usage (#6649)
* remove duplicated json tag

* fix: incorrect wait group usage
2019-10-18 13:59:29 -04:00
Christian Muehlhaeuser 2602f6907e Simplified code in various places (#6176)
All these changes should have no side-effects or change behavior:

- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
2019-07-20 09:37:19 -04:00
Alvin Huang 96c2c79908
Add fmt and vet (#5671)
* add go fmt and vet

* go fmt fixes
2019-04-25 12:26:33 -04:00
Jeff Mitchell d3c7d57209
Move internal/ to sdk/ (#5568)
* Move internal/ to sdk/

* Add a readme to the SDK folder
2019-03-27 08:54:56 -04:00
Jeff Mitchell a41c865059
Convert to Go Modules (#5517)
* First conversion

* Use serf 0.8.2 tag and associated updated deps

* * Move freeport and testutil into internal/

* Make internal/ its own module

* Update imports

* Add replace statements so API and normal Consul code are
self-referencing for ease of development

* Adapt to newer goe/values

* Bump to new cleanhttp

* Fix ban nonprintable chars test

* Update lock bad args test

The error message when the duration cannot be parsed changed in Go 1.12
(ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test.

* Update another test as well

* Bump travis

* Bump circleci

* Bump go-discover and godo to get rid of launchpad dep

* Bump dockerfile go version

* fix tar command

* Bump go-cleanhttp
2019-03-26 17:04:58 -04:00
Matt Keeler a34f8c751e
Pass a testing.T into NewTestAgent and TestAgent.Start (#5342)
This way we can avoid unnecessary panics which cause other tests not to run.

This doesn't remove all the possibilities for panics causing other tests not to run, it just fixes the TestAgent
2019-02-14 10:59:14 -05:00
Boris Popovschi 8831b043ab Fixed gziping function for debug archive (#5184) 2019-01-03 10:39:58 -05:00
R.B. Boyer 917488abc2 command/debug: make better use of atomic operations to write out the debug snapshots to disk 2018-11-02 13:13:49 -05:00
Jack Pearkes 197d62c6ca New command: consul debug (#4754)
* agent/debug: add package for debugging, host info

* api: add v1/agent/host endpoint

* agent: add v1/agent/host endpoint

* command/debug: implementation of static capture

* command/debug: tests and only configured targets

* agent/debug: add basic test for host metrics

* command/debug: add methods for dynamic data capture

* api: add debug/pprof endpoints

* command/debug: add pprof

* command/debug: timing, wg, logs to disk

* vendor: add gopsutil/disk

* command/debug: add a usage section

* website: add docs for consul debug

* agent/host: require operator:read

* api/host: improve docs and no retry timing

* command/debug: fail on extra arguments

* command/debug: fixup file permissions to 0644

* command/debug: remove server flags

* command/debug: improve clarity of usage section

* api/debug: add Trace for profiling, fix profile

* command/debug: capture profile and trace at the same time

* command/debug: add index document

* command/debug: use "clusters" in place of members

* command/debug: remove address in output

* command/debug: improve comment on metrics sleep

* command/debug: clarify usage

* agent: always register pprof handlers and protect

This will allow us to avoid a restart of a target agent
for profiling by always registering the pprof handlers.

Given this is a potentially sensitive path, it is protected
with an operator:read ACL and enable debug being
set to true on the target agent. enable_debug still requires
a restart.

If ACLs are disabled, enable_debug is sufficient.

* command/debug: use trace.out instead of .prof

More in line with golang docs.

* agent: fix comment wording

* agent: wrap table driven tests in t.run()
2018-10-19 08:41:03 -07:00