Commit graph

48 commits

Author SHA1 Message Date
Daniel Nephin 52c8b4994b Merge remote-tracking branch 'origin/main' into serve-panic-recovery 2021-12-07 16:30:41 -05:00
R.B. Boyer 80422c0dfe
areas: make the gRPC server tracker network area aware (#11748)
Fixes a bug whereby servers present in multiple network areas would be
properly segmented in the Router, but not in the gRPC mirror. This would
lead servers in the current datacenter leaving from a network area
(possibly during the network area's removal) from deleting their own
records that still exist in the standard WAN area.

The gRPC client stack uses the gRPC server tracker to execute all RPCs,
even those targeting members of the current datacenter (which is unlike
the net/rpc stack which has a bypass mechanism).

This would manifest as a gRPC method call never opening a socket because
it would block forever waiting for the current datacenter's pool of
servers to be non-empty.
2021-12-06 09:55:54 -06:00
Daniel Nephin 056a52ba64 sdk/freeport: rename Port to GetOne
For better consistency with GetN
2021-11-30 17:32:41 -05:00
Daniel Nephin 4f0d092c95 testing: remove unnecessary calls to freeport
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.

https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.

This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.

This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
2021-11-29 12:19:43 -05:00
Daniel Nephin 20a8e11bf2 testing: use the new freeport interfaces 2021-11-27 15:39:46 -05:00
Giulio Micheloni 10cdc0a5c8
Merge branch 'main' into serve-panic-recovery 2021-11-06 16:12:06 +01:00
Dhia Ayachi 4d763ef9e6
regenerate expired certs (#11462)
* regenerate expired certs

* add documentation to generate tests certificates
2021-11-01 11:40:16 -04:00
Giulio Micheloni b549de831d Restored comment. 2021-10-16 18:05:32 +01:00
Giulio Micheloni a5a4eb9cae Separete test file and no stack trace in ret error 2021-10-16 18:02:03 +01:00
Giulio Micheloni 10814d934e Merge branch 'main' of https://github.com/hashicorp/consul into hashicorp-main 2021-10-16 16:59:32 +01:00
R.B. Boyer a84f5fa25d
grpc: ensure that streaming gRPC requests work over mesh gateway based wan federation (#10838)
Fixes #10796
2021-08-24 16:28:44 -05:00
Giulio Micheloni 387f6f717b Fix merge conflicts 2021-08-22 19:35:08 +01:00
Giulio Micheloni 10b03c3f4e
Merge branch 'main' into serve-panic-recovery 2021-08-22 20:31:11 +02:00
Giulio Micheloni 465e9fecda grpc, xds: recovery middleware to return and log error in case of panic
1) xds and grpc servers:
   1.1) to use recovery middleware with callback that prints stack trace to log
   1.2) callback turn the panic into a core.Internal error
2) added unit test for grpc server
2021-08-22 19:06:26 +01:00
Mike Morris 86d76cb099
deps: upgrade gogo-protobuf to v1.3.2 (#10813)
* deps: upgrade gogo-protobuf to v1.3.2

* go mod tidy using go 1.16

* proto: regen protobufs after upgrading gogo/protobuf

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-08-12 14:05:46 -04:00
Giulio Micheloni 0bf124502e grpc Server: turn panic into error through middleware 2021-08-07 13:21:12 +01:00
R.B. Boyer 254557a1f6
sync changes to oss files made in enterprise (#10670) 2021-07-22 13:58:08 -05:00
Daniel Nephin e226733b26 fix 64-bit aligment for 32-bit platforms
sync/atomic must be used with 64-bit aligned fields, and that alignment is difficult to
ensure unless the field is the first one in the struct.

https://golang.org/pkg/sync/atomic/#pkg-note-BUG.
2021-06-29 16:10:21 -04:00
Daniel Nephin 0dfb7da610 grpc: fix a data race by using a static resolver
We have seen test flakes caused by 'concurrent map read and map write', and the race detector
reports the problem as well (prevent us from running some tests with -race).

The root of the problem is the grpc expects resolvers to be registered at init time
before any requests are made, but we were using a separate resolver for each test.

This commit introduces a resolver registry. The registry is registered as the single
resolver for the consul scheme. Each test uses the Authority section of the target
(instead of the scheme) to identify the resolver that should be used for the test.
The scheme is used for lookup, which is why it can no longer be used as the unique
key.

This allows us to use a lock around the map of resolvers, preventing the data race.
2021-06-02 11:35:38 -04:00
Paul Banks f4257f91f6 Set gRPC keepalives to mirror Yamux keepalive behaviour 2021-04-07 14:09:22 +01:00
Pierre Souchay 542852786c [Streaming][bugfix] handle TLS signalisation when TLS is disabled on client side
Tnis is an alternative to https://github.com/hashicorp/consul/pull/9494
2021-01-06 17:24:58 +01:00
Kit Patella 0b18f5612e trim help strings to save a few bytes 2020-11-16 11:02:11 -08:00
Kit Patella 374748dafc merge master 2020-11-16 10:46:53 -08:00
Kit Patella af719981f3 finish adding static server metrics 2020-11-13 16:26:08 -08:00
Kit Patella b486c1bce8 add the service name in the agent rather than in the definitions themselves 2020-11-13 13:18:04 -08:00
Kit Patella 9533372ded first pass on agent-configured prometheusDefs and adding defs for every consul metric 2020-11-12 18:12:12 -08:00
Daniel Nephin 956bff398a grpc: fix grpc metrics
defaultMetrics was being set at package import time, which meant that it received an instance of
the original default. But lib/telemetry.InitTelemetry sets a new global when it is called.

This resulted in the metrics being sent nowhere.

This commit changes defaultMetrics to be a function, so it will return the global instance when
called. Since it is called after InitTelemetry it will return the correct metrics instance.
2020-11-11 14:27:25 -05:00
Daniel Nephin 40cb72fe06 agent/grpc: add connection count metrics
Gauge metrics are great for understanding the current state, but can somtimes hide problems
if there are many disconnect/reconnects.

This commit adds counter metrics for connections and streams to make it easier to see the
count of newly created connections and streams.
2020-10-27 16:49:49 -04:00
Daniel Nephin 64284ed91a agent/grpc: rename metrics
These new names should make it easier to add counter metics with similar prefixes
2020-10-27 16:49:49 -04:00
Daniel Nephin 6e34759442 agent/grpc: Add an integration test for ClientPool with TLS
Also deregister the resolver.Builder in tests.
2020-10-27 16:34:18 -04:00
Daniel Nephin 87793cd090 agent/grpc: pass metrics to constructor
Instead of referencing a package var. This does not fix the flaky test, but it seems more correct.
2020-10-27 16:34:17 -04:00
Daniel Nephin 70fea7a77e agent/grpc: fix a flaky test by performing more retries
Instead of using retry.Run, which appears to have problems in some cases where it does not
emit an error message, use a for loop.

Increase the number of attempts and remove any sleep, since this operation is not that expensive to do
in a tight loop
2020-10-27 16:34:17 -04:00
Daniel Nephin 9b89fb492d agent/grpc: remove misleading warnings from test output
Handle shutdown properly in tests so that the tests don't warn about using a closed connection.
2020-10-27 16:34:16 -04:00
Daniel Nephin 64105079d9 agent/grpc: fix a flake in TestHandler_EmitsStats 2020-10-27 16:34:16 -04:00
Daniel Nephin 7e338693a8 agent/grpc: use a separate channel for closing the Accept
Closing l.conns can lead to a race and a 'panic: send on closed chan' when a
connection is in the middle of being handled when the server is shutting down.

Found using '-race -count=800'
2020-10-27 16:34:15 -04:00
Daniel Nephin e640d47319 agent/grpc/resolver: namespace the server ID with the DC name
So that if two datacenters end up with overlapping serverIDs we don't send requests to the wrong server
2020-10-27 16:34:15 -04:00
Daniel Nephin 022744699f grpc: fix data rate in stats handler test 2020-10-08 19:43:49 -04:00
Daniel Nephin 371ec2d70a subscribe: add a stateless subscribe service for the gRPC server
With a Backend that provides access to the necessary dependencies.
2020-10-06 12:49:35 -04:00
Daniel Nephin c89eb4ff20 agent/grpc: always close the conn when dialing fails. 2020-09-24 12:53:14 -04:00
Daniel Nephin 87f0eb6790 agent/grpc: seed the rand for shuffling servers 2020-09-24 12:53:14 -04:00
Daniel Nephin ff3610850e agent/grpc: use router.Manager to handle the rebalance
The router.Manager is already rebalancing servers for other connection pools, so it can call into our resolver to do the same.
This change allows us to remove the serf dependency from resolverBuilder, and remove Datacenter from the config.

Also revert the change to refreshServerRebalanceTimer
2020-09-24 12:53:14 -04:00
Daniel Nephin 5e7b9bffdb grpc: restore integration tests for grpc client conn pool
Add a fake rpc Listener
2020-09-24 12:53:14 -04:00
Daniel Nephin 4b041a018d grpc: redeuce dependencies, unexport, and add godoc
Rename GRPCClient to ClientConnPool. This type appears to be more of a
conn pool than a client. The clients receive the connections from this
pool.

Reduce some dependencies by adjusting the interface baoundaries.

Remove the need to create a second slice of Servers, just to pick one and throw the rest away.

Unexport serverResolver, it is not used outside the package.

Use a RWMutex for ServerResolverBuilder, some locking is read-only.

Add more godoc.
2020-09-24 12:53:10 -04:00
Daniel Nephin 4b24470887 grpc: move client conn pool to grpc package 2020-09-24 12:48:12 -04:00
Daniel Nephin 1e40f00567 agent/grpc: make TestHandler_EmitsStats predictable
Occasionally this test would flake. The flakes were fixed by:

1. Stopping the service and retrying to check on metrics. This way we
   also include the active_streams going to 0 in the metric calls.

2. Using a reference to the global Metrics. This way when other tests
   have background goroutines that are still shutting down, they won't
   emit metrics to the metric instance with the fake Sink. The stats
   test can patch the local reference to the global, so the existing
   statHandlers will continue to emit to the global, but the stats
   test will send all metrics to the replacement.
2020-09-14 19:05:22 -04:00
Daniel Nephin c827e7f1e9 grpc: add Datacenter field to testing service response 2020-09-14 19:02:09 -04:00
Daniel Nephin 1a930fcf75 grpc: Add a simple test service for testing the gRPC server 2020-09-08 12:10:43 -04:00
Daniel Nephin 863a9df951 server: add gRPC server for streaming events
Includes a stats handler and stream interceptor for grpc metrics.

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-08 12:10:41 -04:00