Submitting a job with an ingress gateway in host networking mode
with an absent gateway.proxy block would cause the Nomad client
to panic on NPE.
The consul registration bits would assume the proxy stanza was
not nil, but it could be if the user does not supply any manually
configured envoy proxy settings.
Check the proxy field is not nil before using it.
Fixes#9669
Nomad v1.0.0 introduced a regression where the client configurations
for `connect.sidecar_image` and `connect.gateway_image` would be
ignored despite being set. This PR restores that functionality.
There was a missing layer of interpolation that needs to occur for
these parameters. Since Nomad 1.0 now supports dynamic envoy versioning
through the ${NOMAD_envoy_version} psuedo variable, we basically need
to first interpolate
${connect.sidecar_image} => envoyproxy/envoy:v${NOMAD_envoy_version}
then use Consul at runtime to resolve to a real image, e.g.
envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.16.0
Of course, if the version of Consul is too old to provide an envoy
version preference, we then need to know to fallback to the old
version of envoy that we used before.
envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09
Beyond that, we also need to continue to support jobs that set the
sidecar task themselves, e.g.
sidecar_task { config { image: "custom/envoy" } }
which itself could include teh pseudo envoy version variable.
* fix acl event creation
* allow way to access secretID without exposing it to stream
test that values are omitted
test event creation
test acl events
payloads are pointers
fix failing tests, do all security steps inside constructor
* increase time
* ignore empty tokens
* uncomment line
* changelog
* use full name for events
use evaluation and allocation instead of short name
* update api event stream package and shortnames
* update docs
* make sync; fix typo
* backwards compat not from 1.0.0-beta event stream api changes
* use api types instead of string
* rm backwards compat note that only changed between prereleases
* remove backwards incompat that only existed in prereleases
* upsertaclpolicies
* delete acl policies msgtype
* upsert acl policies msgtype
* delete acl tokens msgtype
* acl bootstrap msgtype
wip unsubscribe on token delete
test that subscriptions are closed after an ACL token has been deleted
Start writing policyupdated test
* update test to use before/after policy
* add SubscribeWithACLCheck to run acl checks on subscribe
* update rpc endpoint to use broker acl check
* Add and use subscriptions.closeSubscriptionFunc
This fixes the issue of not being able to defer unlocking the mutex on
the event broker in the for loop.
handle acl policy updates
* rpc endpoint test for terminating acl change
* add comments
Co-authored-by: Kris Hicks <khicks@hashicorp.com>
* Remove Managed Sinks from Nomad
Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.
* update comment
* changelog
Before, upstreams could only be defined using the default datacenter.
Now, the `datacenter` field can be set in a connect upstream definition,
informing consul of the desire for an instance of the upstream service
in the specified datacenter. The field is optional and continues to
default to the local datacenter.
Closes#8964
The CSIVolume struct "denormalizes" allocations when it's first queried from
the state store. The CSIVolumeByID method on the state store copies the volume
before denormalizing so that we don't end up with unexpected changes. The
copying has some subtle bugs that meant that Allocations (as well as
Topologies and MountOptions) were not getting copied when expected.
Also, ensure we never write allocations attached to volumes to the state store
during claims.
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.
The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.
Closes#9306
The unpublish workflow requires that we know the mode (RW vs RO) if we want to
unpublish the node. Update the hook and the Unpublish RPC so that we mark the
claim for release in a new state but leave the mode alone. This fixes a bug
where RO claims were failing node unpublish.
The core job GC doesn't know the mode, but we don't need it for that workflow,
so add a mode specifically for GC; the volumewatcher uses this as a sentinel
to check whether claims (with their specific RW vs RO modes) need to be claimed.
* Improve managed sink run loop and reloading
resetCh no longer needed
length of buffer equal to count of items, not count of events in each item
update equality fn name, pr feedback
clean up sink manager sink creation
* update test to reflect changes
* bad editor find and replace
* pr feedback
state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type
* Process to send events to configured sinks
This PR adds a SinkManager to a server which is responsible for managing
managed sinks. Managed sinks subscribe to the event broker and send
events to a sink writer (webhook). When changes to the eventstore are
made the sinkmanager and managed sink are responsible for reloading or
starting a new managed sink.
* periodically check in sink progress to raft
Save progress on the last successfully sent index to raft. This allows a
managed sink to resume close to where it left off in the event of a lost
server or leadership change
dereference eventsink so we can accurately use the watchch
When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger
* network sink rpc/api plumbing
state store methods and restore
upsert sink test
get sink
delete sink
event sink list and tests
go generate new msg types
validate sink on upsert
* go generate
* consul: advertise cni and multi host interface addresses
* structs: add service/check address_mode validation
* ar/groupservices: fetch networkstatus at hook runtime
* ar/groupservice: nil check network status getter before calling
* consul: comment network status can be nil
The MaxQueryTime value used in QueryOptions.HasTimedOut() can be set to
an invalid value that would throw off how RPC requests are retried.
This fix uses the same logic that enforces the MaxQueryTime bounds in the
blockingRPC() call.
properly wire up durable event count
move newline responsibility
moves newline creation from NDJson to the http handler, json stream only encodes and sends now
ignore snapshot restore if broker is disabled
enable dev mode to access event steam without acl
use mapping instead of switch
use pointers for config sizes, remove unused ttl, simplify closed conn logic
Fixes#9017
The ?resources=true query parameter includes resources in the object
stub listings. Specifically:
- For `/v1/nodes?resources=true` both the `NodeResources` and
`ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
included.
The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)
* Node Drain events and Node Events (#8980)
Deployment status updates
handle deployment status updates (paused, failed, resume)
deployment alloc health
generate events from apply plan result
txn err check, slim down deployment event
one ndjson line per index
* consolidate down to node event + type
* fix UpdateDeploymentAllocHealth test invocations
* fix test
This Commit adds an /v1/events/stream endpoint to stream events from.
The stream framer has been updated to include a SendFull method which
does not fragment the data between multiple frames. This essentially
treats the stream framer as a envelope to adhere to the stream framer
interface in the UI.
If the `encode` query parameter is omitted events will be streamed as
newline delimted JSON.
As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.
This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.
Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.
Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.
`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.
Addresses #8585#7665
Volumes using attachment mode `file-system` use the CSI filesystem API when
they're mounted, and can be passed mount options. But `block-device` mode
volumes don't have this option. When RPCs are made to plugins, we are silently
dropping the mount options we don't expect to see, but this results in a poor
operator experience when the mount options aren't honored. This changeset
makes passing mount options to a `block-device` volume a validation error.
Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode
were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether
the volume had writer claims available for scheduling.
Extends CSI claim endpoint test to exercise multi-reader and make sure `WriteFreeClaims`
is exercised for multi-writer in feasibility test.