open-nomad

Author	SHA1	Message	Date
Drew Bailey	d62d8a8587	Event sink manager improvements (#9206 ) * Improve managed sink run loop and reloading resetCh no longer needed length of buffer equal to count of items, not count of events in each item update equality fn name, pr feedback clean up sink manager sink creation * update test to reflect changes * bad editor find and replace * pr feedback	2020-11-02 09:21:32 -05:00
Kris Hicks	a98a8253d8	Update subscription filter func (#9232 ) This adds support for specifying a global topic match for a specific key.	2020-10-30 10:07:38 -07:00
Chris Baker	719077a26d	added new policy capabilities for recommendations API state store: call-out to generic update of job recommendations from job update method recommendations API work, and http endpoint errors for OSS support for scaling polices in task block of job spec add query filters for ScalingPolicy list endpoint command: nomad scaling policy list: added -job and -type	2020-10-28 14:32:16 +00:00
Drew Bailey	86080e25a9	Send events to EventSinks (#9171 ) * Process to send events to configured sinks This PR adds a SinkManager to a server which is responsible for managing managed sinks. Managed sinks subscribe to the event broker and send events to a sink writer (webhook). When changes to the eventstore are made the sinkmanager and managed sink are responsible for reloading or starting a new managed sink. * periodically check in sink progress to raft Save progress on the last successfully sent index to raft. This allows a managed sink to resume close to where it left off in the event of a lost server or leadership change dereference eventsink so we can accurately use the watchch When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger	2020-10-26 17:27:54 -04:00
Drew Bailey	1ae39a9ed9	event sink crud operation api (#9155 ) * network sink rpc/api plumbing state store methods and restore upsert sink test get sink delete sink event sink list and tests go generate new msg types validate sink on upsert * go generate	2020-10-23 14:23:00 -04:00
Michael Schurter	c2dd9bc996	core: open source namespaces	2020-10-22 15:26:32 -07:00
Drew Bailey	f3dcefe5a9	remove event durability (#9147 ) * remove event durability temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs * drop events table schema and state store methods * fix neweventbuffer invocations	2020-10-22 12:21:03 -04:00
Tim Gross	8459f1ead5	csi: prevent in-use plugin GC from blocking volume GC (#9141 ) During CSI plugin GC, we don't return an error if the volume is in use, because this is not an error condition. If we were to return an error during a `nomad system gc`, we would not continue on to GC volumes. But check for the specific error message fails if the GC is performed on a worker rather than on the leader, due to RPC forwarding wrapping the error message. Use a less specific test so that we don't return an error.	2020-10-21 16:54:28 -04:00
Alexander Shtuchkin	90fd8bb85f	Implement 'batch mode' for persisting allocations on the client. (#9093 ) Fixes #9047, see problem details there. As a solution, we use BoltDB's 'Batch' mode that combines multiple parallel writes into small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.	2020-10-20 16:15:37 -04:00
Drew Bailey	8451de99b2	adds two base event stream e2e tests (#9126 ) * adds two base event stream e2e tests test evaluation filter keys are included * Apply suggestions from code review Co-authored-by: Tim Gross <tgross@hashicorp.com> * gc aftereach Co-authored-by: Tim Gross <tgross@hashicorp.com>	2020-10-20 08:26:21 -04:00
Drew Bailey	6c788fdccd	Events/msgtype cleanup (#9117 ) * use msgtype in upsert node adds message type to signature for upsert node, update tests, remove placeholder method * UpsertAllocs msg type test setup * use upsertallocs with msg type in signature update test usage of delete node delete placeholder msgtype method * add msgtype to upsert evals signature, update test call sites with test setup msg type handle snapshot upsert eval outside of FSM and ignore eval event remove placeholder upsertevalsmsgtype handle job plan rpc and prevent event creation for plan msgtype cleanup upsertnodeevents updatenodedrain msgtype msg type 0 is a node registration event, so set the default to the ignore type * fix named import * fix signature ordering on upsertnode to match	2020-10-19 09:30:15 -04:00
Drew Bailey	c57e760933	remove special node drain event type rely on standardized events instead of special node drain event	2020-10-15 16:44:36 -04:00
Nick Ethier	4903e5b114	Consul with CNI and host_network addresses (#9095 ) * consul: advertise cni and multi host interface addresses * structs: add service/check address_mode validation * ar/groupservices: fetch networkstatus at hook runtime * ar/groupservice: nil check network status getter before calling * consul: comment network status can be nil	2020-10-15 15:32:21 -04:00
Pierre Cauchois	13218dc345	Enforce bounds on MaxQueryTime (#9064 ) The MaxQueryTime value used in QueryOptions.HasTimedOut() can be set to an invalid value that would throw off how RPC requests are retried. This fix uses the same logic that enforces the MaxQueryTime bounds in the blockingRPC() call.	2020-10-15 08:43:06 -04:00
Michael Schurter	dd09fa1a4a	Merge pull request #9055 from hashicorp/f-9017-resources api: add field filters to /v1/{allocations,nodes}	2020-10-14 14:49:39 -07:00
Drew Bailey	c463479848	filter on additional filter keys, remove switch statement duplication properly wire up durable event count move newline responsibility moves newline creation from NDJson to the http handler, json stream only encodes and sends now ignore snapshot restore if broker is disabled enable dev mode to access event steam without acl use mapping instead of switch use pointers for config sizes, remove unused ttl, simplify closed conn logic	2020-10-14 14:14:33 -04:00
Michael Schurter	8ccbd92cb6	api: add field filters to /v1/{allocations,nodes} Fixes #9017 The ?resources=true query parameter includes resources in the object stub listings. Specifically: - For `/v1/nodes?resources=true` both the `NodeResources` and `ReservedResources` field are included. - For `/v1/allocations?resources=true` the `AllocatedResources` field is included. The ?task_states=false query parameter removes TaskStates from /v1/allocations responses. (By default TaskStates are included.)	2020-10-14 10:35:22 -07:00
Drew Bailey	684807bddb	namespace filtering	2020-10-14 12:44:43 -04:00
Drew Bailey	fdc576af09	handle txn returning error	2020-10-14 12:44:42 -04:00
Drew Bailey	df96b89958	Add EvictCallbackFn to handle removing entries from go-memdb when they are removed from the event buffer. Wire up event buffer size config, use pointers for structs.Events instead of copying.	2020-10-14 12:44:42 -04:00
Drew Bailey	315f77a301	rehydrate event publisher on snapshot restore address pr feedback	2020-10-14 12:44:41 -04:00
Drew Bailey	d793529d61	event durability count and cfg	2020-10-14 12:44:40 -04:00
Drew Bailey	b4c135358d	use Events to wrap index and events, store in events table	2020-10-14 12:44:39 -04:00
Drew Bailey	9d48818eb8	writetxn can return error, add alloc and job generic events. Add events table for durability	2020-10-14 12:44:39 -04:00
Drew Bailey	400455d302	Events/eval alloc events (#9012 ) * generic eval update event first pass at alloc client update events * api/event client	2020-10-14 12:44:37 -04:00
Drew Bailey	4793bb4e01	Events/deployment events (#9004 ) * Node Drain events and Node Events (#8980) Deployment status updates handle deployment status updates (paused, failed, resume) deployment alloc health generate events from apply plan result txn err check, slim down deployment event one ndjson line per index * consolidate down to node event + type * fix UpdateDeploymentAllocHealth test invocations * fix test	2020-10-14 12:44:37 -04:00
Drew Bailey	a4a2975edf	Event Stream API/RPC (#8947 ) This Commit adds an /v1/events/stream endpoint to stream events from. The stream framer has been updated to include a SendFull method which does not fragment the data between multiple frames. This essentially treats the stream framer as a envelope to adhere to the stream framer interface in the UI. If the `encode` query parameter is omitted events will be streamed as newline delimted JSON.	2020-10-14 12:44:36 -04:00
Drew Bailey	207068ca28	Events/event source node (#8918 ) * Node Register/Deregister event sourcing example upsert node with context fill in writetxnwithctx ctx passing to handle event type creation, wip test node deregistration event drop Node from registration event * node batch deregistration	2020-10-14 12:44:35 -04:00
Drew Bailey	4753904b90	Events/cfg enable publisher (#8916 ) * only enable publisher based on config * add default prune tick * back out state abandon changes on fsm close	2020-10-14 12:44:35 -04:00
Drew Bailey	f820744746	abandon current state on server shutdown	2020-10-14 12:44:34 -04:00
Drew Bailey	fddac3af00	Event Buffer Implemenation adds an event buffer to hold events from raft changes. update events to use event buffer fix append call provide way to prune buffer items after TTL event publisher tests basic publish test wire up max item ttl rename package to stream, cleanup exploratory work subscription filtering subscription plumbing allow subscribers to consume events, handle closing subscriptions back out old exploratory ctx work fix lint remove unused ctx bits add a few comments fix test stop publisher on abandon	2020-10-14 12:44:34 -04:00
Chris Baker	1d35578bed	removed backwards-compatible/untagged metrics deprecated in 0.7	2020-10-13 20:18:39 +00:00
Seth Hoenig	ed13e5723f	consul/connect: dynamically select envoy sidecar at runtime As newer versions of Consul are released, the minimum version of Envoy it supports as a sidecar proxy also gets bumped. Starting with the upcoming Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the default implementation of Connect sidecar proxy. This PR introduces a change such that each Nomad Client will query its local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545) and then launch the Connect sidecar proxy task using the latest supported version of Envoy. If the `SupportedProxies` API component is not available from Consul, Nomad will fallback to the old version of Envoy supported by old versions of Consul. Setting the meta configuration option `meta.connect.sidecar_image` or setting the `connect.sidecar_task` stanza will take precedence as is the current behavior for sidecar proxies. Setting the meta configuration option `meta.connect.gateway_image` will take precedence as is the current behavior for connect gateways. `meta.connect.sidecar_image` and `meta.connect.gateway_image` may make use of the special `${NOMAD_envoy_version}` variable interpolation, which resolves to the newest version of Envoy supported by the Consul agent. Addresses #8585 #7665	2020-10-13 09:14:12 -05:00
Tim Gross	4335d847a4	Allow job Version to start at non-zero value (#9071 ) Stop coercing version of new job to 0 in the state_store, so that we can add regions to a multi-region deployment. Send new version, rather than existing version, to MRD to accomodate version-choosing logic changes in ENT. Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>	2020-10-12 13:59:48 -04:00
Nick Ethier	d45be0b5a6	client: add NetworkStatus to Allocation (#8657 )	2020-10-12 13:43:04 -04:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Tim Gross	9b4917ae5f	csi: volumewatcher only needs one pass to collect past claims If a volume GC and a `nomad volume detach` command land concurrently, we can end up with multiple claims without an allocation, which results in extra no-op work when finding claims to collect as past claims.	2020-10-09 11:03:51 -04:00
Tim Gross	ec1e75d9f4	csi: remove stray TODO comment This item was completed in #8626	2020-10-09 11:03:51 -04:00
Tim Gross	e8c13a2307	csi: validate mount options during volume registration (#9044 ) Volumes using attachment mode `file-system` use the CSI filesystem API when they're mounted, and can be passed mount options. But `block-device` mode volumes don't have this option. When RPCs are made to plugins, we are silently dropping the mount options we don't expect to see, but this results in a poor operator experience when the mount options aren't honored. This changeset makes passing mount options to a `block-device` volume a validation error.	2020-10-08 09:23:21 -04:00
Tim Gross	3ceb5b36b1	csi: allow more than 1 writer claim for multi-writer mode (#9040 ) Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether the volume had writer claims available for scheduling. Extends CSI claim endpoint test to exercise multi-reader and make sure `WriteFreeClaims` is exercised for multi-writer in feasibility test.	2020-10-07 10:43:23 -04:00
Seth Hoenig	0c5ae5769f	Merge pull request #9029 from hashicorp/b-tgs-updates consul/connect: trigger update as necessary on connect changes	2020-10-05 16:48:04 -05:00
Seth Hoenig	f44a4f68ee	consul/connect: trigger update as necessary on connect changes This PR fixes a long standing bug where submitting jobs with changes to connect services would not trigger updates as expected. Previously, service blocks were not considered as sources of destructive updates since they could be synced with consul non-destructively. With Connect, task group services that have changes to their connect block or to the service port should be destructive, since the network plumbing of the alloc is going to need updating. Fixes #8596 #7991 Non-destructive half in #7192	2020-10-05 14:53:00 -05:00
Chris Baker	7f701fddd0	updated docs and validation to further prohibit null chars in region, datacenter, and job name	2020-10-05 18:01:50 +00:00
Chris Baker	23ea7cd27c	updated job validate to refute job/group/task IDs containing null characters updated CHANGELOG and upgrade guide	2020-10-05 18:01:49 +00:00
Chris Baker	c8fd9428d4	documenting tests around null characters in job id, task group name, and task name	2020-10-05 18:01:49 +00:00
Fredrik Hoem Grelland	a015c52846	configure nomad cluster to use a Consul Namespace [Consul Enterprise] (#8849 )	2020-10-02 14:46:36 -04:00
Michael Schurter	765473e8b0	jobspec: lower min cpu resources from 10->1 Since CPU resources are usually a soft limit it is desirable to allow setting it as low as possible to allow tasks to run only in "idle" time. Setting it to 0 is still not allowed to avoid potential unintentional side effects with allowing a zero value. While there may not be any side effects this commit attempts to minimize risk by avoiding the issue. This does not change the defaults.	2020-09-30 12:15:13 -07:00
Luiz Aoqui	88d4eecfd0	add scaling policy type	2020-09-29 17:57:46 -04:00
Seth Hoenig	af9543c997	consul: fix validation of task in group-level script-checks When defining a script-check in a group-level service, Nomad needs to know which task is associated with the check so that it can use the correct task driver to execute the check. This PR fixes two bugs: 1) validate service.task or service.check.task is configured 2) make service.check.task inherit service.task if it is itself unset Fixes #8952	2020-09-28 15:02:59 -05:00
Michael Schurter	9dd59ceaa7	core: improve job deregister error logging Noticed this error in some production logs, and they were far from helpful. Changes: 1. Include job ID in logs 2. Wrap errors and log once instead of double log lines 3. Test fsm error handling behavior	2020-09-21 08:59:03 -07:00

1 2 3 4 5 ...

3529 commits