Commit Graph

257 Commits

Author SHA1 Message Date
Kyle Havlovitz b21cd112e5 Allow ingress gateways to route traffic based on Host header
This commit adds the necessary changes to allow an ingress gateway to
route traffic from a single defined port to multiple different upstream
services in the Consul mesh.

To do this, we now require all HTTP requests coming into the ingress
gateway to specify a Host header that matches "<service-name>.*" in
order to correctly route traffic to the correct service.

- Differentiate multiple listener's route names by port
- Adds a case in xds for allowing default discovery chains to create a
  route configuration when on an ingress gateway. This allows default
  services to easily use host header routing
- ingress-gateways have a single route config for each listener
  that utilizes domain matching to route to different services.
2020-05-06 15:06:13 -05:00
Freddy c34ee5d339
Watch fallback channel for gateways that do not exist (#7715)
Also ensure that WatchSets in tests are reset between calls to watchFired. 
Any time a watch fires, subsequent calls to watchFired on the same WatchSet
will also return true even if there were no changes.
2020-04-29 16:52:27 -06:00
Freddy f5c1e5268b
TLS Origination for Terminating Gateways (#7671) 2020-04-27 16:25:37 -06:00
freddygv e751b83a3f Clean up dead code, issue addressed by passing ws to serviceGatewayNodes 2020-04-27 11:08:41 -06:00
freddygv 7667567688 Avoid deleting mappings for services linked to other gateways on dereg 2020-04-27 11:08:41 -06:00
freddygv 28fe6920fe Re-fix bug in CheckConnectServiceNodes 2020-04-27 11:08:41 -06:00
freddygv bab101107c Fix ConnectQueryBlocking test 2020-04-27 11:08:40 -06:00
freddygv 65e60d02f1 Fix bug in CheckConnectServiceNodes
Previously, if a blocking query called CheckConnectServiceNodes
before the gateway-services memdb table had any entries,
a nil watchCh would be returned when calling serviceTerminatingGatewayNodes.
This means that the blocking query would not fire if a gateway config entry
was added after the watch started.

In cases where the blocking query started on proxy registration,
the proxy could potentially never become aware of an upstream endpoint
if that upstream was going to be represented by a gateway.
2020-04-27 11:08:40 -06:00
Chris Piraino c5ab43ebbc
Fix bug where non-typical services are associated with gateways (#7662)
On every service registration, we check to see if a service should be
assassociated to a wildcard gateway-service. This fixes an issue where
we did not correctly check to see if the service being registered was a
"typical" service or not.
2020-04-17 11:24:34 -05:00
Kyle Havlovitz 6a5eba63ab
Ingress Gateways for TCP services (#7509)
* Implements a simple, tcp ingress gateway workflow

This adds a new type of gateway for allowing Ingress traffic into Connect from external services.

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-04-16 14:00:48 -07:00
Freddy c1f79c6b3c
Terminating gateway discovery (#7571)
* Enable discovering terminating gateways

* Add TerminatingGatewayServices to state store

* Use GatewayServices RPC endpoint for ingress/terminating
2020-04-08 12:37:24 -06:00
Emre Savcı 7a99f29adc
agent: add len, cap while initializing arrays 2020-04-01 10:54:51 +02:00
Freddy 8a1e53754e
Add config entry for terminating gateways (#7545)
This config entry will be used to configure terminating gateways.

It accepts the name of the gateway and a list of services the gateway will represent.

For each service users will be able to specify: its name, namespace, and additional options for TLS origination.

Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-03-31 13:27:32 -06:00
Kyle Havlovitz 01a23b8eb4
Add config entry/state for Ingress Gateways (#7483)
* Add Ingress gateway config entry and other relevant structs

* Add api package tests for ingress gateways

* Embed EnterpriseMeta into ingress service struct

* Add namespace fields to api module and test consul config write decoding

* Don't require a port for ingress gateways

* Add snakeJSON and camelJSON cases in command test

* Run Normalize on service's ent metadata

Sadly cannot think of a way to test this in OSS.

* Every protocol requires at least 1 service

* Validate ingress protocols

* Update agent/structs/config_entry_gateways.go

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2020-03-31 11:59:10 -05:00
R.B. Boyer a7fb26f50f
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00
Matt Keeler 966d085066
Catalog + Namespace OSS changes. (#7219)
* Various Prepared Query + Namespace things

* Last round of OSS changes for a namespaced catalog
2020-02-10 10:40:44 -05:00
Freddy aca8b85440
Remove outdated TODO (#7244) 2020-02-07 13:14:48 -07:00
Matt Keeler 2524a028ea
OSS Changes for various config entry namespacing bugs (#7226) 2020-02-06 10:52:25 -05:00
Matt Keeler 119168203b
Fix disco chain graph validation for namespaces (#7217)
Previously this happened to be validating only the chains in the default namespace. Now it will validate all chains in all namespaces when the global proxy-defaults is changed.
2020-02-05 10:06:27 -05:00
Hans Hasselberg a9f9ed83cb
agent: increase watchLimit to 8192. (#7200)
The previous value was too conservative and users with many instances
were having problems because of it. This change increases the limit to
8192 which reportedly fixed most of the issues with that.

Related: #4984, #4986, #5050.
2020-02-04 13:11:30 +01:00
Matt Keeler 485a0a65ea
Updates to Config Entries and Connect for Namespaces (#7116) 2020-01-24 10:04:58 -05:00
Matt Keeler c8294b8595
AuthMethod updates to support alternate namespace logins (#7029) 2020-01-14 10:09:29 -05:00
Matt Keeler baa89c7c65
Intentions ACL enforcement updates (#7028)
* Renamed structs.IntentionWildcard to structs.WildcardSpecifier

* Refactor ACL Config

Get rid of remnants of enterprise only renaming.

Add a WildcardName field for specifying what string should be used to indicate a wildcard.

* Add wildcard support in the ACL package

For read operations they can call anyAllowed to determine if any read access to the given resource would be granted.

For write operations they can call allAllowed to ensure that write access is granted to everything.

* Make v1/agent/connect/authorize namespace aware

* Update intention ACL enforcement

This also changes how intention:read is granted. Before the Intention.List RPC would allow viewing an intention if the token had intention:read on the destination. However Intention.Match allowed viewing if access was allowed for either the source or dest side. Now Intention.List and Intention.Get fall in line with Intention.Matches previous behavior.

Due to this being done a few different places ACL enforcement for a singular intention is now done with the CanRead and CanWrite methods on the intention itself.

* Refactor Intention.Apply to make things easier to follow.
2020-01-13 15:51:40 -05:00
R.B. Boyer 20f51f9181 connect: derive connect certificate serial numbers from a memdb index instead of the provider table max index (#7011) 2020-01-09 16:32:19 +01:00
R.B. Boyer 42f80367be
Restore a few more service-kind index updates so blocking in ServiceDump works in more cases (#6948)
Restore a few more service-kind index updates so blocking in ServiceDump works in more cases

Namely one omission was that check updates for dumped services were not
unblocking.

Also adds a ServiceDump state store test and also fix a watch bug with the
normal dump.

Follow-on from #6916
2019-12-19 10:15:37 -06:00
Matt Keeler 9812b32155
Fix blocking for ServiceDumping by kind (#6919) 2019-12-10 13:58:30 -05:00
Matt Keeler 442924c35a
Sync of OSS changes to support namespaces (#6909) 2019-12-09 21:26:41 -05:00
Matt Keeler 609c9dab02
Miscellaneous Fixes (#6896)
Ensure we close the Sentinel Evaluator so as not to leak go routines

Fix a bunch of test logging so that various warnings when starting a test agent go to the ltest logger and not straight to stdout.

Various canned ent meta types always return a valid pointer (no more nils). This allows us to blindly deref + assign in various places.

Update ACL index tracking to ensure oss -> ent upgrades will work as expected.

Update ent meta parsing to include function to disallow wildcarding.
2019-12-06 14:01:34 -05:00
Matt Keeler 90ae4a1f1e
OSS KV Modifications to Support Namespaces 2019-11-25 12:57:35 -05:00
Matt Keeler 68d79142c4
OSS Modifications necessary for sessions namespacing 2019-11-25 12:07:04 -05:00
Matt Keeler 1270a93274
Updates to allow for Namespacing ACL resources in Consul Enterp… (#6675)
Main Changes:

• method signature updates everywhere to account for passing around enterprise meta.
• populate the EnterpriseAuthorizerContext for all ACL related authorizations.
• ACL resource listings now operate like the catalog or kv listings in that the returned entries are filtered down to what the token is allowed to see. With Namespaces its no longer all or nothing.
• Modified the acl.Policy parsing to abstract away basic decoding so that enterprise can do it slightly differently. Also updated method signatures so that when parsing a policy it can take extra ent metadata to use during rules validation and policy creation.

Secondary Changes:

• Moved protobuf encoding functions out of the agentpb package to eliminate circular dependencies.
• Added custom JSON unmarshalers for a few ACL resource types (to support snake case and to get rid of mapstructure)
• AuthMethod validator cache is now an interface as these will be cached per-namespace for Consul Enterprise.
• Added checks for policy/role link existence at the RPC API so we don’t push the request through raft to have it fail internally.
• Forward ACL token delete request to the primary datacenter when the secondary DC doesn’t have the token.
• Added a bunch of ACL test helpers for inserting ACL resource test data.
2019-10-24 14:38:09 -04:00
Matt Keeler f9a43a1e2d
ACL Authorizer overhaul (#6620)
* ACL Authorizer overhaul

To account for upcoming features every Authorization function can now take an extra *acl.EnterpriseAuthorizerContext. These are unused in OSS and will always be nil.

Additionally the acl package has received some thorough refactoring to enable all of the extra Consul Enterprise specific authorizations including moving sentinel enforcement into the stubbed structs. The Authorizer funcs now return an acl.EnforcementDecision instead of a boolean. This improves the overall interface as it makes multiple Authorizers easily chainable as they now indicate whether they had an authoritative decision or should use some other defaults. A ChainedAuthorizer was added to handle this Authorizer enforcement chain and will never itself return a non-authoritative decision.

* Include stub for extra enterprise rules in the global management policy

* Allow for an upgrade of the global-management policy
2019-10-15 16:58:50 -04:00
Matt Keeler 8431c5f533
Add support for implementing new requests with protobufs instea… (#6502)
* Add build system support for protobuf generation

This is done generically so that we don’t have to keep updating the makefile to add another proto generation.

Note: anything not in the vendor directory and with a .proto extension will be run through protoc if the corresponding namespace.pb.go file is not up to date.

If you want to rebuild just a single proto file you can do so with: make proto-rebuild PROTOFILES=<list of proto files to rebuild>

Providing the PROTOFILES var will override the default behavior of finding all the .proto files.

* Start adding types to the agent/proto package

These will be needed for some other work and are by no means comprehensive.

* Add ability to resolve/fixup the agentpb.ACLLinks structure in the state store.

* Use protobuf marshalling of raft requests instead of msgpack for protoc generated types.

This does not change any encoding of existing types.

* Removed structs package automatically encoding with protobuf marshalling

Instead the caller of raftApply that wants to opt-in to protobuf encoding will have to call `raftApplyProtobuf`

* Run update-vendor to fixup modules.txt

Nothing changed as far as dependencies go but the ordering of modules in that file depends on the time they are first seen and its not alphabetical.

* Rename some things and implement the structs.RPCInfo interface bits

agentpb.QueryOptions and agentpb.WriteRequest implement 3 of the 4 RPCInfo funcs and the new TargetDatacenter message type implements the fourth.

* Use the right encoding function.

* Renamed agent/proto package to agent/agentpb to prevent package name conflicts

* Update modules.txt to fix ordering

* Change blockingQuery to take in interfaces for the query options and meta

* Add %T to error output.

* Add/Update some comments
2019-09-20 14:37:22 -04:00
Pierre Souchay 35d90fc899 Display IPs of machines when node names conflict to ease troubleshooting
When there is an node name conflicts, such messages are displayed within Consul:

`consul.fsm: EnsureRegistration failed: failed inserting node: Error while renaming Node ID: "e1d456bc-f72d-98e5-ebb3-26ae80d785cf": Node name node001 is reserved by node 05f10209-1b9c-b90c-e3e2-059e64556d4a with name node001`

While it is easy to find the node that has reserved the name, it is hard to find
the node trying to aquire the name since it is not registered, because it
is not part of `consul members` output

This PR will display the IP of the offender and solve far more easily those issues.
2019-08-28 15:57:05 -04:00
R.B. Boyer 0675e0606e
connect: generate the full SNI names for discovery targets in the compiler rather than in the xds package (#6340) 2019-08-19 13:03:03 -05:00
R.B. Boyer 64fc002e03
connect: fix failover through a mesh gateway to a remote datacenter (#6259)
Failover is pushed entirely down to the data plane by creating envoy
clusters and putting each successive destination in a different load
assignment priority band. For example this shows that normally requests
go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080:

- name: foo
  load_assignment:
    cluster_name: foo
    policy:
      overprovisioning_factor: 100000
    endpoints:
    - priority: 0
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 1.2.3.4
              port_value: 8080
    - priority: 1
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 6.7.8.9
              port_value: 8080

Mesh gateways route requests based solely on the SNI header tacked onto
the TLS layer. Envoy currently only lets you configure the outbound SNI
header at the cluster layer.

If you try to failover through a mesh gateway you ideally would
configure the SNI value per endpoint, but that's not possible in envoy
today.

This PR introduces a simpler way around the problem for now:

1. We identify any target of failover that will use mesh gateway mode local or
   remote and then further isolate any resolver node in the compiled discovery
   chain that has a failover destination set to one of those targets.

2. For each of these resolvers we will perform a small measurement of
   comparative healths of the endpoints that come back from the health API for the
   set of primary target and serial failover targets. We walk the list of targets
   in order and if any endpoint is healthy we return that target, otherwise we
   move on to the next target.

3. The CDS and EDS endpoints both perform the measurements in (2) for the
   affected resolver nodes.

4. For CDS this measurement selects which TLS SNI field to use for the cluster
   (note the cluster is always going to be named for the primary target)

5. For EDS this measurement selects which set of endpoints will populate the
   cluster. Priority tiered failover is ignored.

One of the big downsides to this approach to failover is that the failover
detection and correction is going to be controlled by consul rather than
deferring that entirely to the data plane as with the prior version. This also
means that we are bound to only failover using official health signals and
cannot make use of data plane signals like outlier detection to affect
failover.

In this specific scenario the lack of data plane signals is ok because the
effectiveness is already muted by the fact that the ultimate destination
endpoints will have their data plane signals scrambled when they pass through
the mesh gateway wrapper anyway so we're not losing much.

Another related fix is that we now use the endpoint health from the
underlying service, not the health of the gateway (regardless of
failover mode).
2019-08-05 13:30:35 -05:00
R.B. Boyer 4e2fb5730c
connect: detect and prevent circular discovery chain references (#6246) 2019-08-02 09:18:45 -05:00
Alvin Huang 5b6fa58453 resolve circleci config conflicts 2019-07-23 20:18:36 -04:00
Christian Muehlhaeuser 2602f6907e Simplified code in various places (#6176)
All these changes should have no side-effects or change behavior:

- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
2019-07-20 09:37:19 -04:00
hashicorp-ci 8b109e5f9f Merge Consul OSS branch 'master' at commit ef257b084d2e2a474889518440515e360d0cd990 2019-07-20 02:00:29 +00:00
Christian Muehlhaeuser 26f9368567 Fixed typos in comments (#6175)
Just a few nitpicky typo fixes.
2019-07-19 07:54:53 -04:00
Matt Keeler 3914ec5c62
Various Gateway Fixes (#6093)
* Ensure the mesh gateway configuration comes back in the api within each upstream

* Add a test for the MeshGatewayConfig in the ToAPI functions

* Ensure we don’t use gateways for dc local connections

* Update the svc kind index for deletions

* Replace the proxycfg.state cache with an interface for testing

Also start implementing proxycfg state testing.

* Update the state tests to verify some gateway watches for upstream-targets of a discovery chain.
2019-07-12 17:19:37 -04:00
R.B. Boyer 72a8195839
implement some missing service-router features and add more xDS testing (#6065)
- also implement OnlyPassing filters for non-gateway clusters
2019-07-12 14:16:21 -05:00
Matt Keeler 35a839952b Fix Internal.ServiceDump blocking (#6076)
maxIndexWatchTxn was only watching the IndexEntry of the max index of all the entries. It needed to watch all of them regardless of which was the max.

Also plumbed the query source through in the proxy config to help better track requests.
2019-07-04 16:17:49 +01:00
R.B. Boyer a1900754db
digest the proxy-defaults protocol into the graph (#6050) 2019-07-02 11:01:17 -05:00
Matt Keeler 39bb0e3e77 Implement Mesh Gateways
This includes both ingress and egress functionality.
2019-07-01 16:28:30 -04:00
Matt Keeler 03ccc7c5ae Fix secondary dc connect CA roots watch issue
The general problem was that a the CA config which contained the trust domain was happening outside of the blocking mechanism so if the client started the blocking query before the primary dcs roots had been set then a state trust domain was being pushed down.

This was fixed here but in the future we should probably fixup the CA initialization code to not initialize the CA config twice when it doesn’t need to.
2019-07-01 16:28:30 -04:00
Matt Keeler 24749bc7e5 Implement Kind based ServiceDump and caching of the ServiceDump RPC 2019-07-01 16:28:30 -04:00
R.B. Boyer 686e4606c6
do some initial config entry graph validation during writes (#6047) 2019-07-01 15:23:36 -05:00
R.B. Boyer 3eb1f00371
initial version of L7 config entry compiler (#5994)
With this you should be able to fetch all of the relevant discovery
chain config entries from the state store in one query and then feed
them into the compiler outside of a transaction.

There are a lot of TODOs scattered through here, but they're mostly
around handling fun edge cases and can be deferred until more of the
plumbing works completely.
2019-06-27 13:38:21 -05:00
R.B. Boyer 8850656580
adding new config entries for L7 discovery chain (unused) (#5987) 2019-06-27 12:37:43 -05:00
hashicorp-ci d237e86d83 Merge Consul OSS branch 'master' at commit 88b15d84f9fdb58ceed3dc971eb0390be85e3c15
skip-checks: true
2019-06-25 02:00:26 +00:00
Matt Keeler 93debd2610
Ensure that looking for services by addreses works with Tagged Addresses (#5984) 2019-06-21 13:16:17 -04:00
Aestek 24c29e195b kv: do not trigger watches when setting the same value (#5885)
If a KVSet is performed but does not update the entry, do not trigger
watches for this key.
This avoids releasing blocking queries for KV values that did not
actually changed.
2019-06-18 15:06:29 +02:00
Matt Keeler 4c03f99a85
Fix CAS operations on Services (#5971)
* Fix CAS operations on services

* Update agent/consul/state/catalog_test.go

Co-Authored-By: R.B. Boyer <public@richardboyer.net>
2019-06-17 10:41:04 -04:00
Kyle Havlovitz dcbffdb956
Merge branch 'master' into change-node-id 2019-05-15 10:51:04 -07:00
R.B. Boyer 372bb06c83
acl: a role binding rule for a role that does not exist should be ignored (#5778)
I wrote the docs under this assumption but completely forgot to actually
enforce it.
2019-05-03 14:22:44 -05:00
R.B. Boyer 7d0f729f77
acl: enforce that you cannot persist tokens and roles with missing links except during replication (#5779) 2019-05-02 15:02:21 -05:00
Matt Keeler 26708570c5
Fix ConfigEntryResponse binary marshaller and ensure we watch the chan in ConfigEntry.Get even when no entry exists. (#5773) 2019-05-02 15:25:29 -04:00
R.B. Boyer 5a505c5b3a acl: adding support for kubernetes auth provider login (#5600)
* auth providers
* binding rules
* auth provider for kubernetes
* login/logout
2019-04-26 14:49:25 -05:00
R.B. Boyer 9542fdc9bc acl: adding Roles to Tokens (#5514)
Roles are named and can express the same bundle of permissions that can
currently be assigned to a Token (lists of Policies and Service
Identities). The difference with a Role is that it not itself a bearer
token, but just another entity that can be tied to a Token.

This lets an operator potentially curate a set of smaller reusable
Policies and compose them together into reusable Roles, rather than
always exploding that same list of Policies on any Token that needs
similar permissions.

This also refactors the acl replication code to be semi-generic to avoid
3x copypasta.
2019-04-26 14:49:12 -05:00
R.B. Boyer f43bc981e9 making ACLToken.ExpirationTime a *time.Time value instead of time.Time (#5663)
This is mainly to avoid having the API return "0001-01-01T00:00:00Z" as
a value for the ExpirationTime field when it is not set. Unfortunately
time.Time doesn't respect the json marshalling "omitempty" directive.
2019-04-26 14:48:16 -05:00
R.B. Boyer b3956e511c acl: ACL Tokens can now be assigned an optional set of service identities (#5390)
These act like a special cased version of a Policy Template for granting
a token the privileges necessary to register a service and its connect
proxy, and read upstreams from the catalog.
2019-04-26 14:48:04 -05:00
R.B. Boyer 76321aa952 acl: tokens can be created with an optional expiration time (#5353) 2019-04-26 14:47:51 -05:00
Matt Keeler 3b5d38fb49
Implement config entry replication (#5706) 2019-04-26 13:38:39 -04:00
Kyle Havlovitz d51fd740bf
Merge pull request #5615 from hashicorp/config-entry-rpc
Add RPC endpoints for config entry operations
2019-04-23 00:16:54 -07:00
Matt Keeler ac78c23021
Implement data filtering of some endpoints (#5579)
Fixes: #4222 

# Data Filtering

This PR will implement filtering for the following endpoints:

## Supported HTTP Endpoints

- `/agent/checks`
- `/agent/services`
- `/catalog/nodes`
- `/catalog/service/:service`
- `/catalog/connect/:service`
- `/catalog/node/:node`
- `/health/node/:node`
- `/health/checks/:service`
- `/health/service/:service`
- `/health/connect/:service`
- `/health/state/:state`
- `/internal/ui/nodes`
- `/internal/ui/services`

More can be added going forward and any endpoint which is used to list some data is a good candidate.

## Usage

When using the HTTP API a `filter` query parameter can be used to pass a filter expression to Consul. Filter Expressions take the general form of:

```
<selector> == <value>
<selector> != <value>
<value> in <selector>
<value> not in <selector>
<selector> contains <value>
<selector> not contains <value>
<selector> is empty
<selector> is not empty
not <other expression>
<expression 1> and <expression 2>
<expression 1> or <expression 2>
```

Normal boolean logic and precedence is supported. All of the actual filtering and evaluation logic is coming from the [go-bexpr](https://github.com/hashicorp/go-bexpr) library

## Other changes

Adding the `Internal.ServiceDump` RPC endpoint. This will allow the UI to filter services better.
2019-04-16 12:00:15 -04:00
Kyle Havlovitz 81254deb59 Add RPC endpoints for config entry operations 2019-04-06 23:38:08 -07:00
Kyle Havlovitz d6c25a13a5
Merge pull request #5539 from hashicorp/service-config
Service config state model
2019-04-02 16:34:58 -07:00
Kyle Havlovitz 96a460c0cf Clean up service config state store methods 2019-03-27 16:52:38 -07:00
R.B. Boyer ab57b02ff8
acl: memdb filter of tokens-by-policy was inverted (#5575)
The inversion wasn't noticed because the parallel execution of TokenList
tests was operating incorrectly due to variable shadowing.
2019-03-27 15:24:44 -05:00
Paul Banks 68e8933ba5
Connect: Make Connect health queries unblock correctly (#5508)
* Make Connect health queryies unblock correctly in all cases and use optimal number of watch chans. Fixes #5506.

* Node check test cases and clearer bug test doc

* Comment update
2019-03-21 16:01:56 +00:00
Kyle Havlovitz c2cba68042 Fix fsm serialization and add snapshot/restore 2019-03-20 16:13:13 -07:00
Kyle Havlovitz 9df597b257 Fill out state store/FSM functions and add tests 2019-03-19 15:56:17 -07:00
Kyle Havlovitz 53913461db Add config types and state store table 2019-03-19 10:06:46 -07:00
Kyle Havlovitz bb0839ea5b Condense some test logic and add a comment about renaming 2019-03-18 16:15:36 -07:00
Paul Banks dd08426b04
Optimize health watching to single chan/goroutine. (#5449)
Refs #4984.

Watching chans for every node we touch in a health query is wasteful. In #4984 it shows that if there are more than 682 service instances we always fallback to watching all services which kills performance.

We already have a record in MemDB that is reliably update whenever the service health result should change thanks to per-service watch indexes.

So in general, provided there is at least one service instances and we actually have a service index for it (we always do now) we only ever need to watch a single channel.

This saves us from ever falling back to the general index and causing the performance cliff in #4984, but it also means fewer goroutines and work done for every blocking health query.

It also saves some allocations made during the query because we no longer have to populate a WatchSet with 3 chans per service instance which saves the internal map allocation.

This passes all state store tests except the one that explicitly checked for the fallback behaviour we've now optimized away and in general seems safe.
2019-03-15 20:18:48 +00:00
Kyle Havlovitz 3aec844fd2 Update state store test for changing node ID 2019-03-13 17:05:31 -07:00
Aestek 071fcb28ba [catalog] Update the node's services indexes on update (#5458)
Node updates were not updating the service indexes, which are used for
service related queries. This caused the X-Consul-Index to stay the same
after a node update as seen from a service query even though the node
data is returned in heath queries. If that happened in between queries
the client would miss this change.
We now update the indexes of the services on the node when it is
updated.

Fixes: #5450
2019-03-11 14:48:19 +00:00
Kyle Havlovitz bf09061e86 Add logic to allow changing a failed node's ID 2019-03-07 22:42:54 -08:00
R.B. Boyer 91e78e00c7
fix typos reported by golangci-lint:misspell (#5434) 2019-03-06 11:13:28 -06:00
Matt Keeler 612aba7ced
Dont modify memdb owned token data for get/list requests of tokens (#5412)
Previously we were fixing up the token links directly on the *ACLToken returned by memdb. This invalidated some assumptions that a snapshot is immutable as well as potentially being able to cause a crash.

The fix here is to give the policy link fixing function copy on write semantics. When no fixes are necessary we can return the memdb object directly, otherwise we copy it and create a new list of links.

Eventually we might find a better way to keep those policy links in sync but for now this fixes the issue.
2019-03-04 09:28:46 -05:00
R.B. Boyer d3be5c1d3a fix ignored errors in state store internals as reported by errcheck 2019-03-01 14:18:00 -06:00
R.B. Boyer 57be6ca215 correct some typos 2019-02-13 13:02:12 -06:00
R.B. Boyer 3b60891bf8 reduce the local scope of variable 2019-02-13 11:54:28 -06:00
R.B. Boyer 106d87a4a8
update TestStateStore_ACLBootstrap to not rely upon request mutation (#5335) 2019-02-12 16:09:26 -06:00
Kyle Havlovitz b0f07d9b5e
Merge pull request #4869 from hashicorp/txn-checks
Add node/service/check operations to transaction api
2019-01-22 11:16:09 -08:00
Paul Banks 1c4dfbcd2e
connect: tame thundering herd of CSRs on CA rotation (#5228)
* Support rate limiting and concurrency limiting CSR requests on servers; handle CA rotations gracefully with jitter and backoff-on-rate-limit in client

* Add CSR rate limiting docs

* Fix config naming and add tests for new CA configs
2019-01-22 17:19:36 +00:00
Matt Keeler 2f6a9edfac
Store leaf cert indexes in raft and use for the ModifyIndex on the returned certs (#5211)
* Store leaf cert indexes in raft and use for the ModifyIndex on the returned certs

This ensures that future certificate signings will have a strictly greater ModifyIndex than any previous certs signed.
2019-01-11 16:04:57 -05:00
Aestek ff13518961 Improve blocking queries on services that do not exist (#4810)
## Background

When making a blocking query on a missing service (was never registered, or is not registered anymore) the query returns as soon as any service is updated.
On clusters with frequent updates (5~10 updates/s in our DCs) these queries virtually do not block, and clients with no protections againt this waste ressources on the agent and server side. Clients that do protect against this get updates later than they should because of the backoff time they implement between requests.

## Implementation

While reducing the number of unnecessary updates we still want :
* Clients to be notified as soon as when the last instance of a service disapears.
* Clients to be notified whenever there's there is an update for the service.
* Clients to be notified as soon as the first instance of the requested service is added.

To reduce the number of unnecessary updates we need to block when a request to a missing service is made. However in the following case :

1. Client `client1` makes a query for service `foo`, gets back a node and X-Consul-Index 42
2. `foo` is unregistered 
3. `client1`  makes a query for `foo` with `index=42` -> `foo` does not exist, the query blocks and `client1` is not notified of the change on `foo` 

We could store the last raft index when each service was last alive to know wether we should block on the incoming query or not, but that list could grow indefinetly. 
We instead store the last raft index when a service was unregistered and use it when a query targets a service that does not exist. 
When a service `srv` is unregistered this "missing service index" is always greater than any X-Consul-Index held by the clients while `srv` was up, allowing us to immediatly notify them.

1. Client `client1` makes a query for service `foo`, gets back a node and `X-Consul-Index: 42`
2. `foo` is unregistered, we set the "missing service index" to 43 
3. `client1` makes a blocking query for `foo` with `index=42` -> `foo` does not exist, we check against the "missing service index" and return immediatly with `X-Consul-Index: 43`
4. `client1` makes a blocking query for `foo` with `index=43` -> we block
5. Other changes happen in the cluster, but foo still doesn't exist and "missing service index" hasn't changed, the query is still blocked
6. `foo` is registered again on index 62 -> `foo` exists and its index is greater than 43, we unblock the query
2019-01-11 09:26:14 -05:00
Kyle Havlovitz c266277a49 txn: clean up some state store/acl code 2019-01-09 11:59:23 -08:00
Kyle Havlovitz 8b1dc6a22c txn: fix an issue with querying nodes by name instead of ID 2018-12-12 12:46:33 -08:00
Kyle Havlovitz efcdc85e1a api: add support for new txn operations 2018-12-12 10:54:09 -08:00
Kyle Havlovitz 2408f99cca txn: add tests for RPC endpoint 2018-12-12 10:04:10 -08:00
Kyle Havlovitz 41e8120d3d state: add tests for new txn ops 2018-12-12 10:04:10 -08:00
Kyle Havlovitz a40a346be8 txn: add service operations 2018-12-12 10:04:10 -08:00
Kyle Havlovitz b1aeb3b943 txn: add node operations 2018-12-12 10:04:10 -08:00
Kyle Havlovitz bd6b7ad162 txn: add pre-check operations to txn endpoint 2018-12-12 10:04:10 -08:00
Kyle Havlovitz 8a0d7b65d6 Add check operations to transaction api 2018-12-12 10:04:10 -08:00
Kyle Havlovitz e7946197b8 connect/ca: prevent blank CA config in snapshot
This PR both prevents a blank CA config from being written out to
a snapshot and allows Consul to gracefully recover from a snapshot
with an invalid CA config.

Fixes #4954.
2018-12-06 17:40:53 -08:00