open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	0a19fe3b60	fix multiple overflow errors in exponential backoff (#18200 ) We use capped exponential backoff in several places in the code when handling failures. The code we've copy-and-pasted all over has a check to see if the backoff is greater than the limit, but this check happens after the bitshift and we always increment the number of attempts. This causes an overflow with a fairly small number of failures (ex. at one place I tested it occurs after only 24 iterations), resulting in a negative backoff which then never recovers. The backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC handler or an external API such as Vault. Note this doesn't occur in places where we cap the number of iterations so the loop breaks (usually to return an error), so long as the number of iterations is reasonable. Introduce a helper with a check on the cap before the bitshift to avoid overflow in all places this can occur. Fixes: #18199 Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>	2023-08-15 14:39:09 -04:00
hashicorp-copywrite[bot]	005636afa0	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
James Rasell	4b9bcf94da	chore: remove use of "err" a log line context key for errors. (#14433 ) Log lines which include an error should use the full term "error" as the context key. This provides consistency across the codebase and avoids a Go style which operators might not be aware of.	2022-09-01 15:06:10 +02:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Ben Buzbee	573fb840fa	Log error if there are no event handlers registered We see this error all the time ``` no handler registered for event event.Message=, event.Annotations=, event.Timestamp=0001-01-01T00:00:00Z, event.TaskName=, event.AllocID=, event.TaskID=, ``` So we're handling an even with all default fields. I noted that this can happen if only err is set as in ``` func (d driverPluginClient) handleTaskEvents(reqCtx context.Context, ch chan TaskEvent, stream proto.Driver_TaskEventsClient) { defer close(ch) for { ev, err := stream.Recv() if err != nil { if err != io.EOF { ch <- &TaskEvent{ Err: grpcutils.HandleReqCtxGrpcErr(err, reqCtx, d.doneCtx), } } ``` In this case Err fails to be serialized by the logger, see this test ``` ev := &drivers.TaskEvent{ Err: fmt.Errorf("errz"), } i.logger.Warn("ben test", "event", ev) i.logger.Warn("ben test2", "event err str", ev.Err.Error()) i.logger.Warn("ben test3", "event err", ev.Err) ev.Err = nil i.logger.Warn("ben test4", "nil error", ev.Err) 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.643900Z","driver":"mock_driver","event":{"TaskID":"","TaskName":"","AllocID":"","Timestamp":"0001-01-01T00:00:00Z","Message":"","Annotations":null,"Err":{}}} 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test2","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644226Z","driver":"mock_driver","event err str":"errz"} 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test3","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644240Z","driver":"mock_driver","event err":"errz"} 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test4","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644252Z","driver":"mock_driver","nil error":null} ``` Note in the first example err is set to an empty object and the error is lost. What we want is the last two examples which call out the err field explicitly so we can see what it is in this case	2021-10-11 19:44:52 +00:00
Mahmood Ali	4d90afb425	gofmt all the files mostly to handle build directives in 1.17.	2021-10-01 10:14:28 -04:00
James Rasell	d4a333e9b5	lint: mark false positive or fix gocritic append lint errors.	2021-09-06 10:49:44 +02:00
Mahmood Ali	2d0b80a0ed	Merge pull request #6517 from hashicorp/b-fingerprint-shutdown-race client: don't retry fingerprinting on shutdown	2020-07-24 11:56:32 -04:00
Mahmood Ali	2588b3bc98	cleanup driver eventor goroutines This fixes few cases where driver eventor goroutines are leaked during normal operations, but especially so in tests. This change makes few modifications: First, it switches drivers to use `Context`s to manage shutdown events. Previously, it relied on callers invoking `.Shutdown()` function that is specific to internal drivers only and require casting. Using `Contexts` provide a consistent idiomatic way to manage lifecycle for both internal and external drivers. Also, I discovered few places where we don't clean up a temporary driver instance in the plugin catalog code, where we dispense a driver to inspect and validate the schema config without properly cleaning it up.	2020-05-26 11:04:04 -04:00
Tim Gross	1cf7ef44ed	csi: docstring and log message fixups (#7327 ) Fix some docstring typos and fix noisy log message during client restarts. A log for the common case where the plugin socket isn't ready yet isn't actionable by the operator so having it at info is just noise.	2020-03-23 13:58:30 -04:00
Nick Ethier	d8eed3119d	drivermanager: attempt dispense on reattachment failure	2020-02-15 00:50:06 -05:00
Mahmood Ali	e1b3e208d1	client: don't retry fingerprinting on shutdown At shutdown, driver manager context expires and the fingerprinting channel closes. Thus it is undeterministic which clause of The select statement gets executed, and we may keep retrying until the `i.ctx.Done()` block is executed. Here, we check always check ctx expiration before retrying again.	2019-10-21 08:54:11 -04:00
Mahmood Ali	ab2cae0625	implement client endpoint of nomad exec Add a client streaming RPC endpoint for processing nomad exec tasks, by invoking the relevant task handler for execution.	2019-05-09 16:49:08 -04:00
Mahmood Ali	f74d60439f	client: log detected driver health state Noticed that `detected drivers` log line was misleading - when a driver doesn't fingerprint before timeout, their health status is empty string `""` which we would mark as detected. Now, we log all drivers along with their state to ease driver fingerprint debugging.	2019-04-19 09:15:25 -04:00
Preetha Appan	0e547d29ad	s/mananger/manager	2019-03-04 12:25:54 -06:00
Michael Schurter	f5e0dba9d1	fingerprint: improve initial fingerpint message The initial fingerprint message is actually fairly useful, so I bumped it to Debug and fixed the output formatting.	2019-02-21 15:32:18 -08:00
Nick Ethier	8d7a47340c	drivermanager: don't store nil reattach configs	2019-01-25 23:07:04 -05:00
Michael Schurter	32daa7b47b	goimports until make check is happy	2019-01-23 06:27:14 -08:00
Michael Schurter	be0bab7c3f	move pluginutils -> helper/pluginutils I wanted a different color bikeshed, so I get to paint it	2019-01-22 15:50:08 -08:00
Alex Dadgar	b2c7268843	move reattach config	2019-01-22 15:11:58 -08:00
Alex Dadgar	cdcd3c929c	loader and singleton	2019-01-22 15:11:57 -08:00
Alex Dadgar	6c2782f037	move catalog + grpcutils	2019-01-22 15:11:57 -08:00
Michael Schurter	324e989327	Merge pull request #5034 from hashicorp/test-fix-races Test fix races	2019-01-08 07:04:09 -08:00
Danielle Tomlinson	8df20f49f7	drivers: Add internal interface for Shutdown This allows us to correctly terminate internal state during runs of the nomad test suite, e.g closing eventer contexts correctly.	2019-01-08 13:48:49 +01:00
Alex Dadgar	c9825a9c36	recover	2019-01-07 14:49:40 -08:00
Alex Dadgar	c3f05f2476	Don't log event error on driver shutdown	2019-01-07 14:49:40 -08:00
Michael Schurter	17ed3f27ae	drivermgr: fix race in building driver list	2018-12-19 15:48:02 -08:00
Nick Ethier	6f1777284d	drivermanager: use correct plugin config types	2018-12-18 23:07:01 -05:00
Nick Ethier	a02308ee6a	drivermanager: attempt to reattach and shutdown driver plugin if blocked by allow/block lists	2018-12-18 23:01:57 -05:00
Nick Ethier	ce1a5cba0e	drivermanager: use allocID and task name to route task events	2018-12-18 23:01:51 -05:00
Nick Ethier	bda32f9c79	client/pluginmanager: add plugin manager interface to device/driver managers	2018-12-18 22:56:23 -05:00
Nick Ethier	d8a0265e68	client: batch initial fingerprinting in plugin manangers drivermanager: fix pr comments/feedback	2018-12-18 22:56:19 -05:00
Nick Ethier	7d23cbf448	client/drivermananger: fixup issues from rebase and address PR comments	2018-12-18 22:55:38 -05:00
Nick Ethier	82175d1328	client/drivermananger: add driver manager The driver manager is modeled after the device manager and is started by the client. It's responsible for handling driver lifecycle and reattachment state, as well as processing the incomming fingerprint and task events from each driver. The mananger exposes a method for registering event handlers for task events that is used by the task runner to update the server when a task has been updated with an event. Since driver fingerprinting has been implemented by the driver manager, it is no longer needed in the fingerprint mananger and has been removed.	2018-12-18 22:55:18 -05:00

34 Commits