* Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0
Fixes#16808
* Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack
* deps: use go-msgpack v2.0.0
go-msgpack v2.1.0 includes some code changes that we will need to
investigate furthere to assess its impact on Nomad, so keeping this
dependency on v2.0.0 for now since it's no-op.
---------
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
The signature of the `raftApply` function requires that the caller unwrap the
first returned value (the response from `FSM.Apply`) to see if it's an
error. This puts the burden on the caller to remember to check two different
places for errors, and we've done so inconsistently.
Update `raftApply` to do the unwrapping for us and return any `FSM.Apply` error
as the error value. Similar work was done in Consul in
https://github.com/hashicorp/consul/pull/9991. This eliminates some boilerplate
and surfaces a few minor bugs in the process:
* job deregistrations of already-GC'd jobs were still emitting evals
* reconcile job summaries does not return scheduler errors
* node updates did not report errors associated with inconsistent service
discovery or CSI plugin states
Note that although _most_ of the `FSM.Apply` functions return only errors (which
makes it tempting to remove the first return value entirely), there are few that
return `bool` for some reason and Variables relies on the response value for
proper CAS checking.
This PR
- upgrades the serf library
- has the test start the join process using the un-joined server first
- disables schedulers on the servers
- uses the WaitForLeader and wantPeers helpers
Not sure which, if any of these actually improves the flakiness of this test.
Nomad inherited protocol version numbering configuration from Consul and
Serf, but unlike those projects Nomad has never used it. Nomad's
`protocol_version` has always been `1`.
While the code is effectively unused and therefore poses no runtime
risks to leave, I felt like removing it was best because:
1. Nomad's RPC subsystem has been able to evolve extensively without
needing to increment the version number.
2. Nomad's HTTP API has evolved extensively without increment
`API{Major,Minor}Version`. If we want to version the HTTP API in the
future, I doubt this is the mechanism we would choose.
3. The presence of the `server.protocol_version` configuration
parameter is confusing since `server.raft_protocol` *is* an important
parameter for operators to consider. Even more confusing is that
there is a distinct Serf protocol version which is included in `nomad
server members` output under the heading `Protocol`. `raft_protocol`
is the *only* protocol version relevant to Nomad developers and
operators. The other protocol versions are either deadcode or have
never changed (Serf).
4. If we were to need to version the RPC, HTTP API, or Serf protocols, I
don't think these configuration parameters and variables are the best
choice. If we come to that point we should choose a versioning scheme
based on the use case and modern best practices -- not this 6+ year
old dead code.
PR #11956 implemented a new mTLS RPC check to validate the role of the
certificate used in the request, but further testing revealed two flaws:
1. client-only endpoints did not accept server certificates so the
request would fail when forwarded from one server to another.
2. the certificate was being checked after the request was forwarded,
so the check would happen over the server certificate, not the
actual source.
This commit checks for the desired mTLS level, where the client level
accepts both, a server or a client certificate. It also validates the
cercertificate before the request is forwarded.
When mTLS is enabled, only nomad servers of the region should access the
Raft RPC layer. Clients and servers in other regions should only use the
Nomad RPC endpoints.
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>
The MaxQueryTime value used in QueryOptions.HasTimedOut() can be set to
an invalid value that would throw off how RPC requests are retried.
This fix uses the same logic that enforces the MaxQueryTime bounds in the
blockingRPC() call.
MultiplexV2 is a new connection multiplex header that supports multiplex both
RPC and streaming requests over the same Yamux connection.
MultiplexV2 was added in 0.8.0 as part of
https://github.com/hashicorp/nomad/pull/3892 . So Nomad 0.11 can expect it to
be supported. Though, some more rigorous testing is required before merging
this.
I want to call out some implementation details:
First, the current connection pool reuses the Yamux stream for multiple RPC calls,
and doesn't close them until an error is encountered. This commit doesn't
change it, and sets the `RpcNomad` byte only at stream creation.
Second, the StreamingRPC session gets closed by callers and cannot be reused.
Every StreamingRPC opens a new Yamux session.
Introduce limits to prevent unauthorized users from exhausting all
ephemeral ports on agents:
* `{https,rpc}_handshake_timeout`
* `{http,rpc}_max_conns_per_client`
The handshake timeout closes connections that have not completed the TLS
handshake by the deadline (5s by default). For RPC connections this
timeout also separately applies to first byte being read so RPC
connections with TLS enabled have `rpc_handshake_time * 2` as their
deadline.
The connection limit per client prevents a single remote TCP peer from
exhausting all ephemeral ports. The default is 100, but can be lowered
to a minimum of 26. Since streaming RPC connections create a new TCP
connection (until MultiplexV2 is used), 20 connections are reserved for
Raft and non-streaming RPCs to prevent connection exhaustion due to
streaming RPCs.
All limits are configurable and may be disabled by setting them to `0`.
This also includes a fix that closes connections that attempt to create
TLS RPC connections recursively. While only users with valid mTLS
certificates could perform such an operation, it was added as a
safeguard to prevent programming errors before they could cause resource
exhaustion.
Allows addressing servers with nomad monitor using the servers name or
ID.
Also unifies logic for addressing servers for client_agent_endpoint
commands and makes addressing logic region aware.
rpc getServer test
This ensures that server-to-server streaming RPC calls use the tls
wrapped connections.
Prior to this, `streamingRpcImpl` function uses tls for setting header
and invoking the rpc method, but returns unwrapped tls connection.
Thus, streaming writes fail with tls errors.
This tls streaming bug existed since 0.8.0[1], but PR #5654[2]
exacerbated it in 0.9.2. Prior to PR #5654, nomad client used to
shuffle servers at every heartbeat -- `servers.Manager.setServers`[3]
always shuffled servers and was called by heartbeat code[4]. Shuffling
servers meant that a nomad client would heartbeat and establish a
connection against all nomad servers eventually. When handling
streaming RPC calls, nomad servers used these local connection to
communicate directly to the client. The server-to-server forwarding
logic was left mostly unexercised.
PR #5654 means that a nomad client may connect to a single server only
and caused the server-to-server forward streaming RPC code to get
exercised more and unearthed the problem.
[1] https://github.com/hashicorp/nomad/blob/v0.8.0/nomad/rpc.go#L501-L515
[2] https://github.com/hashicorp/nomad/pull/5654
[3] https://github.com/hashicorp/nomad/blob/v0.9.1/client/servers/manager.go#L198-L216
[4] https://github.com/hashicorp/nomad/blob/v0.9.1/client/client.go#L1603
Here, we ensure that when leader only responds to RPC calls when state
store is up to date. At leadership transition or launch with restored
state, the server local store might not be caught up with latest raft
logs and may return a stale read.
The solution here is to have an RPC consistency read gate, enabled when
`establishLeadership` completes before we respond to RPC calls.
`establishLeadership` is gated by a `raft.Barrier` which ensures that
all prior raft logs have been applied.
Conversely, the gate is disabled when leadership is lost.
This is very much inspired by https://github.com/hashicorp/consul/pull/3154/files
Multiplexer continues to create rpc connections even when
the context which is passed to the underlying rpc connections
is cancelled by the server.
This was causing #4413 - when a SIGHUP causes everything to reload,
it uses context to cancel the underlying http/rpc connections
so that they may come up with the new configuration.
The multiplexer was not being cancelled properly so it would
continue to create rpc connections and constantly fail,
causing communication issues with other nomad agents.
Fixes#4413
This PR introduces an ack allowing the receiving end of the streaming
RPC to return any error that may have occured during the establishment
of the streaming RPC.