Commit Graph

16225 Commits

Author SHA1 Message Date
Dhia Ayachi e3dd0f9a44
generate a single debug file for a long duration capture (#10279)
* debug: remove the CLI check for debug_enabled

The API allows collecting profiles even debug_enabled=false as long as
ACLs are enabled. Remove this check from the CLI so that users do not
need to set debug_enabled=true for no reason.

Also:
- fix the API client to return errors on non-200 status codes for debug
  endpoints
- improve the failure messages when pprof data can not be collected

Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com>

* remove parallel test runs

parallel runs create a race condition that fail the debug tests

* snapshot the timestamp at the beginning of the capture

- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist

* Revert "snapshot the timestamp at the beginning of the capture"

This reverts commit c2d03346

* Refactor captureDynamic to extract capture logic for each item in a different func

* snapshot the timestamp at the beginning of the capture

- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist

* Revert "snapshot the timestamp at the beginning of the capture"

This reverts commit c2d03346

* Refactor captureDynamic to extract capture logic for each item in a different func

* extract wait group outside the go routine to avoid a race condition

* capture pprof in a separate go routine

* perform a single capture for pprof data for the whole duration

* add missing vendor dependency

* add a change log and fix documentation to reflect the change

* create function for timestamp dir creation and simplify error handling

* use error groups and ticker to simplify interval capture loop

* Logs, profile and traces are captured for the full duration. Metrics, Heap and Go routines are captured every interval

* refactor Logs capture routine and add log capture specific test

* improve error reporting when log test fail

* change test duration to 1s

* make time parsing in log line more robust

* refactor log time format in a const

* test on log line empty the earliest possible and return

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* rename function to captureShortLived

* more specific changelog

Co-authored-by: Paul Banks <banks@banksco.de>

* update documentation to reflect current implementation

* add test for behavior when invalid param is passed to the command

* fix argument line in test

* a more detailed description of the new behaviour

Co-authored-by: Paul Banks <banks@banksco.de>

* print success right after the capture is done

* remove an unnecessary error check

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* upgraded github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57 => v0.0.0-20210601050228-01bbb1931b22

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Co-authored-by: Paul Banks <banks@banksco.de>
2021-06-07 13:00:51 -04:00
allisaurus f2c2809612
docs: Improve ECS routing example nesting (#10316) 2021-06-07 09:28:06 -07:00
Dhia Ayachi 60d51a50d1
fix monitor to only start the monitor in json format when requested (#10358)
* fix monitor to only start the monitor in json format when requested

* add release notes

* add test to validate json format when requested
2021-06-07 12:08:48 -04:00
Mark Anderson ce52d3502c
Docs for Unix Domain Sockets (#10252)
* Docs for Unix Domain Sockets

There are a number of cases where a user might wish to either 1)
expose a service through a Unix Domain Socket in the filesystem
('downstream') or 2) connect to an upstream service by a local unix
domain socket (upstream).
As of Consul (1.10-beta2) we've added new syntax and support to configure
the Envoy proxy to support this
To connect to a service via local Unix Domain Socket instead of a
port, add local_bind_socket_path and optionally local_bind_socket_mode
to the upstream config for a service:
    upstreams = [
      {
         destination_name = "service-1"
         local_bind_socket_path = "/tmp/socket_service_1"
         local_bind_socket_mode = "0700"
	 ...
      }
      ...
    ]
This will cause Envoy to create a socket with the path and mode
provided, and connect that to service-1
The mode field is optional, and if omitted will use the default mode
for Envoy. This is not applicable for abstract sockets. See
https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/core/v3/address.proto#envoy-v3-api-msg-config-core-v3-pipe
for details
NOTE: These options conflict the local_bind_socket_port and
local_bind_socket_address options. We can bind to an port or we can
bind to a socket, but not both.
To expose a service listening on a Unix Domain socket to the service
mesh use either the 'socket_path' field in the service definition or the
'local_service_socket_path' field in the proxy definition. These
fields are analogous to the 'port' and 'service_port' fields in their
respective locations.
    services {
      name = "service-2"
      socket_path = "/tmp/socket_service_2"
      ...
    }
OR
    proxy {
      local_service_socket_path = "/tmp/socket_service_2"
      ...
    }
There is no mode field since the service is expected to create the
socket it is listening on, not the Envoy proxy.
Again, the socket_path and local_service_socket_path fields conflict
with address/port and local_service_address/local_service_port
configuration entries.
Set up a simple service mesh with dummy services:
socat -d UNIX-LISTEN:/tmp/downstream.sock,fork UNIX-CONNECT:/tmp/upstream.sock
socat -v tcp-l:4444,fork exec:/bin/cat
services {
  name = "sock_forwarder"
  id = "sock_forwarder.1"
  socket_path = "/tmp/downstream.sock"
  connect {
    sidecar_service {
      proxy {
	upstreams = [
	  {
	    destination_name = "echo-service"
	    local_bind_socket_path = "/tmp/upstream.sock"
	    config {
	      passive_health_check {
		interval = "10s"
		max_failures = 42
	      }
	    }
	  }
	]
      }
    }
  }
}
services {
  name = "echo-service"
  port = 4444
  connect = { sidecar_service {} }
Kind = "ingress-gateway"
Name = "ingress-service"
Listeners = [
 {
   Port = 8080
   Protocol = "tcp"
   Services = [
     {
       Name = "sock_forwarder"
     }
   ]
 }
]
consul agent -dev -enable-script-checks -config-dir=./consul.d
consul connect envoy -sidecar-for sock_forwarder.1
consul connect envoy -sidecar-for echo-service -admin-bind localhost:19001
consul config write ingress-gateway.hcl
consul connect envoy -gateway=ingress -register -service ingress-service -address '{{ GetInterfaceIP "eth0" }}:8888' -admin-bind localhost:19002
netcat 127.0.0.1 4444
netcat 127.0.0.1 8080

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* fixup Unix capitalization

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Update website/content/docs/connect/registration/service-registration.mdx

Co-authored-by: Blake Covarrubias <blake@covarrubi.as>

* Provide examples in hcl and json

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Apply suggestions from code review

Co-authored-by: Blake Covarrubias <blake@covarrubi.as>

* One more fixup for docs

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

Co-authored-by: Blake Covarrubias <blake@covarrubi.as>
2021-06-04 18:54:31 -07:00
Jeff Escalante dbd278c09c
rotate algolia api key (#10297) 2021-06-04 19:54:05 -04:00
Daniel Nephin 48f388f590 stream: fix a bug with creating a snapshot
The head of the topic buffer was being ignored when creating a snapshot. This commit fixes
the bug by ensuring that the head of the topic buffer is included in the snapshot
before handing it off to the subscription.
2021-06-04 18:33:04 -04:00
Matt Keeler 42007d4a94
Add license inspect command documentation and changelog (#10351)
Also reformatted another changelog entry.
2021-06-04 14:33:13 -04:00
Daniel Nephin 1766aa8a84
Merge pull request #10348 from hashicorp/dnephin/fix-submatview-store-bug
submatview: fix a bug with Store.Get
2021-06-04 12:04:29 -04:00
Daniel Nephin 4fb6c5a137 submatview: fix a bug with Store.Get
When info.Timeout is 0, it should have no timeout. Previously it was using a 0 duration timeout
which caused it to return without waiting.

This bug was masked by using a timeout in the tests. Removing the timeout caused the tests to fail.
2021-06-03 17:48:44 -04:00
Matt Keeler 65b8929acf
Follow on to PR 10336 (#10343)
There was some PR feedback that came in just after I merged that other PR. This addresses that feedback.
2021-06-03 12:29:41 -04:00
Daniel Nephin 58116be6ca
Merge pull request #10338 from hashicorp/dnephin/fix-logging-indent
agent: remove leading whitespace from agent log lines
2021-06-03 12:16:45 -04:00
Paul Ewing e454a9aae0
usagemetrics: add cluster members to metrics API (#10340)
This PR adds cluster members to the metrics API. The number of members per
segment are reported as well as the total number of members.

Tested by running a multi-node cluster locally and ensuring the numbers were
correct. Also added unit test coverage to add the new expected gauges to
existing test cases.
2021-06-03 08:25:53 -07:00
Matt Keeler fd97cf9ecc
Merge pull request #10336 from hashicorp/docs/licensing-updates
[Docs] Update documentation with information about v1.10 licensing changes.
2021-06-03 10:50:55 -04:00
Matt Keeler 14ffc7331d Add enterprise v1.10 specific upgrade notes. 2021-06-03 10:48:16 -04:00
Matt Keeler 620b88e29a Add licensing information to snapshot agent docs. 2021-06-03 10:48:16 -04:00
Matt Keeler 798e693d5c Add deprecation/removal notices regarding the APIs/CLI commands for licensing that are going away.
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2021-06-03 10:48:16 -04:00
Matt Keeler 0bfbb8e22c Update licensing docs for 1.10 licensing 2021-06-03 10:47:33 -04:00
Matt Keeler fe104ad99c Add licensing telemetry docs. 2021-06-03 10:47:33 -04:00
Daniel Nephin d4c17f29a4 Add changelog 2021-06-02 17:39:30 -04:00
Daniel Nephin b93d63b8d2 cmd: remove unnecessary GatedUi
The intent of this struct was to prevent non-json output to stdout. With
the previous cleanup, this can now be done by simply changing the stdout
stream to io.Discard.

This is just one example of why passing around io.Writers for the
streams is better than the UI interface.
2021-06-02 17:33:20 -04:00
Daniel Nephin e207a5de05 cmd: move agent running message to logs
Previously this line was mixed up with logging, which made the output
quite ugly. Use the logger to output this message, instead of printing
directly to stdout.

This has the advantage that the message will be visible when json logs
are enabled.
2021-06-02 17:17:43 -04:00
Daniel Nephin 63c737017d agent: fix agent logging
Remove the leading whitespace on every log line. This was causing problems for
a customer because their logging system was interpretting the logs as a single
multi-line log.
2021-06-02 17:15:12 -04:00
Daniel Nephin 5086baeb7f cmd: introduce a shim to expose Stdout/Stderr writers
This will allow commands to do the right thing, and write to the proper
output stream.
2021-06-02 16:51:34 -04:00
Daniel Nephin 9f83bc97d6 cmd: remove unnecessary args to agent.New
The version args are static and passed in from the caller. Instead read
the static values in New.

The shutdownCh was never closed, so did nothing. Remove it as a field
and an arg.
2021-06-02 16:29:29 -04:00
Daniel Nephin b99f237b3c
Merge pull request #10334 from hashicorp/dnephin/grpc-fix-resolver-data-race
grpc: fix resolver data race
2021-06-02 13:23:27 -04:00
Daniel Nephin 0dfb7da610 grpc: fix a data race by using a static resolver
We have seen test flakes caused by 'concurrent map read and map write', and the race detector
reports the problem as well (prevent us from running some tests with -race).

The root of the problem is the grpc expects resolvers to be registered at init time
before any requests are made, but we were using a separate resolver for each test.

This commit introduces a resolver registry. The registry is registered as the single
resolver for the consul scheme. Each test uses the Authority section of the target
(instead of the scheme) to identify the resolver that should be used for the test.
The scheme is used for lookup, which is why it can no longer be used as the unique
key.

This allows us to use a lock around the map of resolvers, preventing the data race.
2021-06-02 11:35:38 -04:00
Jimmy Merritello 7628bfcfa8
Fix broken link (#10335) 2021-06-02 09:33:12 -05:00
Jimmy Merritello 41b8fac464
[Website] WIP - Update Homepage (#10314)
* Initial structure for updated homepage

* Bring back <UseCases />

* Add section stubs

* Add ecosystem section

* Add features section

* Iron out features section

* Add Learn Callout section

* Copy updates

* Better together copy

* Add updated copy & swap assets

* Remove comment & just add existing icon for now

* Copy and asset tweaks

* Remove unwanted copy

* Process the codeblock

* Add transparent img

* Swap for transparent img

* More transparent img

* Use Learn cards pattern

* Rearrange img and finishing padding touches
2021-06-02 09:22:52 -05:00
Daniel Nephin 2dcfe4a0d5 submatview: improve a couple comments 2021-06-01 17:49:31 -04:00
Kendall Strautman bc733e1346 chore: updates alert banner — hashiconf 2021-06-01 14:28:08 -07:00
Daniel Nephin 044de812bd
Merge pull request #10325 from hashicorp/docs/clarify-acl-set-agent-token-persistence
docs: Clarify set-agent-token token persistence behavior
2021-06-01 16:47:45 -04:00
Daniel Nephin 6e236f0e49
Merge pull request #10315 from hashicorp/ma/haxandmat-replace
Fix strings.Replace->strings.ReplaceAll
2021-06-01 15:05:47 -04:00
Daniel Nephin 43408eddb7
Merge pull request #10324 from hashicorp/dnephin/fix-envoy-bootstrap-exec
envoy: fix deadlock when input is larger than named pipe buffer size
2021-06-01 13:02:51 -04:00
Dhia Ayachi 9f2f9ac3a5
make tests use a dummy node_name to avoid environment related failures (#10262)
* fix tests to use a dummy nodeName and not fail when hostname is not a valid nodeName

* remove conditional testing

* add test when node name is invalid
2021-06-01 11:58:03 -04:00
Daniel Nephin 3fd67dc611 envoy: improve comments 2021-06-01 11:35:32 -04:00
Daniel Nephin 8385a4f16a
Merge pull request #9556 from hashicorp/dnephin/add-more-cache-key-completness-tests
structs: Add more cache key completeness tests
2021-06-01 11:24:02 -04:00
Blake Covarrubias 035b0646a3 docs: Clarify set-agent-token token persistence behavior
Clarify that tokens configured via `set-agent-token` will not be
persisted if `acl.enable_token_persistence` is `false`.
2021-05-31 16:08:43 -07:00
Daniel Nephin 0a39ba2c54 envoy: fix bootstrap deadlock caused by a full named pipe
Normally the named pipe would buffer up to 64k, but in some cases when a
soft limit is reached, they will start only buffering up to 4k.
In either case, we should not deadlock.

This commit changes the pipe-bootstrap command to first buffer all of
stdin into the process, before trying to write it to the named pipe.
This allows the process memory to act as the buffer, instead of the
named pipe.

Also changed the order of operations in `makeBootstrapPipe`. The new
test added in this PR showed that simply buffering in the process memory
was not enough to fix the issue. We also need to ensure that the
`pipe-bootstrap` process is started before we try to write to its
stdin. Otherwise the write will still block.

Also set stdout/stderr on the subprocess, so that any errors are visible
to the user.
2021-05-31 18:53:17 -04:00
Daniel Nephin 177a504e9f envoy: start timeout func after validation
This removes the need to check arg length in the timeout function.
2021-05-31 17:37:58 -04:00
Daniel Nephin dcf80907a9 structs: fix cache keys
So that requests are cached properly, and the cache does not return the wrong data for a
request.
2021-05-31 17:22:16 -04:00
Daniel Nephin 857799cd56 structs: add two cache completeness tests types that implement cache.Request 2021-05-31 16:54:41 -04:00
Daniel Nephin 01790fbcb7 structs: improve the interface of assertCacheInfoKeyIsComplete 2021-05-31 16:54:41 -04:00
Daniel Nephin 9de439f66a structs: Add more cache key tests 2021-05-31 16:54:40 -04:00
Blake Covarrubias 9d333309fe docs: Fix agent token name under ACL Agent Token
Reference the correct name of the agent token in the ACL Agent Token
section for the ACL System docs.
2021-05-31 10:52:15 -07:00
Konstantin Albutov 4434f3386a Fix strings.Replace->strings.ReplaceAll 2021-05-27 16:57:10 -07:00
Dhia Ayachi 0c13f80d5a
RPC Timeout/Retries account for blocking requests (#8978) 2021-05-27 17:29:43 -04:00
Stanko 8ce18e82da ui: Fix broken link format in ECS install page 2021-05-27 14:11:04 -07:00
Bryce Kalow 4b678ad49e
fix(website): update node version to latest LTS for website (#10312) 2021-05-27 15:35:06 -05:00
allisaurus d09ec192d7
Add note about new ECS ARN format to ECS docs (#10304)
* docs: Add note about ECS task ARN format to ECS docs
2021-05-27 10:59:28 -07:00
Matt Keeler b45dd03b8f
Bump raft-autopilot version to the latest. (#10306) 2021-05-27 12:59:14 -04:00