open-nomad

Author	SHA1	Message	Date
Brandon Romano	8c863288ed	Merge pull request #11356 from hashicorp/update-alert-banner Update HashiConf alert-banner expiration	2021-10-20 16:28:30 -07:00
Brandon Romano	5c4f4be3ca	Update HashiConf alert-banner expiration Updates the HashiConf Alert Banner expiration to 10/20 @ 11pm (PT)	2021-10-20 16:02:45 -07:00
Michael Schurter	37a8f27a35	Merge pull request #11331 from shishir-a412ed/init Add support for --init to docker driver.	2021-10-20 10:49:51 -07:00
Michael Schurter	f95f966e8b	Merge pull request #11347 from shishir-a412ed/cleanup Code cleanup: Remove extra if clause.	2021-10-20 09:37:10 -07:00
Mahmood Ali	1de395b42c	Fix preemption panic (#11346 ) Fix a bug where the scheduler may panic when preemption is enabled. The conditions are a bit complicated: A job with higher priority that schedule multiple allocations that preempt other multiple allocations on the same node, due to port/network/device assignments. The cause of the bug is incidental mutation of internal cached data. `RankedNode` computes and cache proposed allocations in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L42-L53 . But scheduler then mutates the list to remove pre-emptable allocs in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L293-L294, and `RemoveAllocs` mutates and sets the tail of cached slice with `nil`s triggering a nil-pointer derefencing case. I fixed the issue by avoiding the mutation in `RemoveAllocs` - the micro-optimization there doesn't seem necessary. Fixes https://github.com/hashicorp/nomad/issues/11342	2021-10-19 20:22:03 -04:00
Shishir Mahajan	dd93f72920	Code cleanup: Remove extra if clause. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2021-10-19 16:52:11 -07:00
Michael Schurter	081cfb85d7	docs: add #11331 to changelog	2021-10-19 16:30:06 -07:00
Michael Schurter	fd68bbc342	test: update tests to properly use AllocDir Also use t.TempDir when possible.	2021-10-19 10:49:07 -07:00
Brandon Romano	4d3bdc0dbf	Merge pull request #11341 from hashicorp/nq.update-alert-banner-hcg2021-live website: Update alert banner for HashiConf	2021-10-19 07:01:04 -07:00
Michael Schurter	d25b60a82d	docs: add #11334 to changelog	2021-10-18 09:22:01 -07:00
Michael Schurter	10c3bad652	client: never embed alloc_dir in chroot Fixes #2522 Skip embedding client.alloc_dir when building chroot. If a user configures a Nomad client agent so that the chroot_env will embed the client.alloc_dir, Nomad will happily infinitely recurse while building the chroot until something horrible happens. The best case scenario is the filesystem's path length limit is hit. The worst case scenario is disk space is exhausted. A bad agent configuration will look something like this: ```hcl data_dir = "/tmp/nomad-badagent" client { enabled = true chroot_env { # Note that the source matches the data_dir "/tmp/nomad-badagent" = "/ohno" # ... } } ``` Note that `/ohno/client` (the state_dir) will still be created but not `/ohno/alloc` (the alloc_dir). While I cannot think of a good reason why someone would want to embed Nomad's client (and possibly server) directories in chroots, there should be no cause for harm. chroots are only built when Nomad runs as root, and Nomad disables running exec jobs as root by default. Therefore even if client state is copied into chroots, it will be inaccessible to tasks. Skipping the `data_dir` and `{client,server}.state_dir` is possible, but this PR attempts to implement the minimum viable solution to reduce risk of unintended side effects or bugs. When running tests as root in a vm without the fix, the following error occurs: ``` === RUN TestAllocDir_SkipAllocDir alloc_dir_test.go:520: Error Trace: alloc_dir_test.go:520 Error: Received unexpected error: Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long Test: TestAllocDir_SkipAllocDir --- FAIL: TestAllocDir_SkipAllocDir (22.76s) ``` Also removed unused Copy methods on AllocDir and TaskDir structs. Thanks to @eveld for not letting me forget about this!	2021-10-18 09:22:01 -07:00
Noel Quiles	ef533b6e3b	Update alert banner for HashiConf Final cleanup/closer exp date	2021-10-18 11:52:29 -04:00
James Rasell	2f5f6e0fdd	website: fixup link formatting within interpolation doc.	2021-10-18 12:21:05 +02:00
Andy Assareh	8c638217ac	exactly one of ingress, terminating, or mesh must be configured i believe mesh should be included in this statement was omitted.	2021-10-15 14:15:02 -07:00
Shishir Mahajan	d4daef7ebf	Add support for --init to docker driver. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2021-10-15 12:53:25 -07:00
Mahmood Ali	73351c35dd	ease building Linux binaries on macOS (#11329 ) Meant for development purposes only, so one can compile binary on a macos host then start a Docker container or scp the binary to a linux host easily. The resulting binary is statically linked and has very subtle differences. e.g. static binaries use go native network stack that honor /etc/hosts and /etc/resolve differently from the glibc implementation. In development environment, I don't expect these to materially change our experience.	2021-10-15 11:12:59 -04:00
Florian Apolloner	c762f64505	Follow up fixes for #11237 (#11260 )	2021-10-14 17:23:38 -04:00
Luiz Aoqui	130970e12e	Merge missing commits from 1.2.0-beta1 release branch (#11319 )	2021-10-14 16:10:05 -04:00
Luiz Aoqui	234aac14a8	Merge release branch (#11317 )	2021-10-14 13:06:04 -04:00
Luiz Aoqui	9d48daed8c	fix `nomad job allocs` command name (#11314 )	2021-10-14 12:44:59 -04:00
Luiz Aoqui	f1fb0987ab	docs: update Nvidia device plugin as external (#11313 )	2021-10-14 12:22:31 -04:00
Dave May	190716b4c6	Remove vendor folder during make clean (#11315 ) * Remove vendor folder during make clean * Add vendor warning to make dev build command	2021-10-14 11:32:19 -04:00
Luiz Aoqui	1bd9db3df0	changlog: add entry for #10796 (#11312 )	2021-10-14 09:01:43 -04:00
James Rasell	444d25db07	Merge pull request #11280 from benbuzbee/log-err Log error if there are no event handlers registered	2021-10-14 14:49:22 +02:00
Mahmood Ali	d5e136b82b	executor: set CpuWeight in cgroup-v2 (#11287 ) Cgroup-v2 uses `cpu.weight` property instead of cpu shares: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files . And it uses a different range (i.e. `[1, 10000]`) from cpu.shares (i.e. `[2, 262144]`) to make things more interesting. Luckily, the libcontainer provides a helper function to perform the conversion [`ConvertCPUSharesToCgroupV2Value`](https://pkg.go.dev/github.com/opencontainers/runc@v1.0.2/libcontainer/cgroups#ConvertCPUSharesToCgroupV2Value). I have confirmed that docker/libcontainer performs the conversion as well in https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/specconv/spec_linux.go#L536-L541 , and that CpuShares is ignored by libcontainer in https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/cgroups/fs2/cpu.go#L24-L29 .	2021-10-14 08:46:07 -04:00
Luiz Aoqui	536a5751ff	changelog: add entries for #9160 and #11078 (#11290 )	2021-10-14 08:43:36 -04:00
Charlie Voiselle	cb8e52b5df	Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )	2021-10-13 21:23:13 -04:00
Michael Schurter	59fda1894e	Merge pull request #11167 from a-zagaevskiy/master Support configurable dynamic port range	2021-10-13 16:47:38 -07:00
Michael Schurter	e14cd34392	client: improve errors & tests for dynamic ports	2021-10-13 16:25:25 -07:00
Dave May	c37a6ed583	cli: rename paths in debug bundle for clarity (#11307 ) * Rename folders to reflect purpose * Improve captured files test coverage * Rename CSI plugins output file * Add changelog entry * fix test and make changelog message more explicit Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2021-10-13 18:00:55 -04:00
Mahmood Ali	fa4df28fcd	tests: ensure that tests restore env-var values (#11309 ) Fix a test corruption issue, where a test accidentally unsets the `NOMAD_LICENSE` environment variable, that's relied on by some tests. As a habit, tests should always restore the environment variable value on test completion. Golang 1.17 introduced [`t.Setenv`](https://pkg.go.dev/testing#T.Setenv) to address this issue. However, as 1.0.x and 1.1.x branches target golang 1.15 and 1.16, I opted to use a helper function to ease backports.	2021-10-13 17:26:56 -04:00
Dave May	305e8e98bf	cli: Improved autocomplete support for job dispatch and operator debug (#11270 ) * Add autocomplete to nomad job dispatch * Add autocomplete to nomad operator debug * Update incorrect comment * Update test to verify autocomplete * Add changelog * Apply lint suggestions * Create dynamic slices instead of specific length * Align style across predictors	2021-10-12 20:01:54 -04:00
Jorge Marey	2af0422bca	Add os-nova nomad autoscaler repo link (#11277 )	2021-10-12 17:04:58 -04:00
Dave May	2d14c54fa0	debug: Improve namespace and region support (#11269 ) * Include region and namespace in CLI output * Add region and prefix matching for server members * Add namespace and region API outputs to cluster metadata folder * Add region awareness to WaitForClient helper function * Add helper functions for SliceStringHasPrefix and StringHasPrefixInSlice * Refactor test client agent generation * Add tests for region * Add changelog	2021-10-12 16:58:41 -04:00
Florian Apolloner	511cae92b4	Fixed plan diffing to handle non-unique service names. (#10965 )	2021-10-12 16:42:39 -04:00
Luiz Aoqui	c0023c6c85	Update job details box (#11288 )	2021-10-12 16:36:10 -04:00
Dave May	76b05f3cd2	cli: Add nomad job allocs command (#11242 )	2021-10-12 16:30:36 -04:00
Luiz Aoqui	3e0bad5a41	wrap `log` messages with `hclog` (#11291 )	2021-10-12 14:38:44 -04:00
Ben Buzbee	573fb840fa	Log error if there are no event handlers registered We see this error all the time ``` no handler registered for event event.Message=, event.Annotations=, event.Timestamp=0001-01-01T00:00:00Z, event.TaskName=, event.AllocID=, event.TaskID=, ``` So we're handling an even with all default fields. I noted that this can happen if only err is set as in ``` func (d driverPluginClient) handleTaskEvents(reqCtx context.Context, ch chan TaskEvent, stream proto.Driver_TaskEventsClient) { defer close(ch) for { ev, err := stream.Recv() if err != nil { if err != io.EOF { ch <- &TaskEvent{ Err: grpcutils.HandleReqCtxGrpcErr(err, reqCtx, d.doneCtx), } } ``` In this case Err fails to be serialized by the logger, see this test ``` ev := &drivers.TaskEvent{ Err: fmt.Errorf("errz"), } i.logger.Warn("ben test", "event", ev) i.logger.Warn("ben test2", "event err str", ev.Err.Error()) i.logger.Warn("ben test3", "event err", ev.Err) ev.Err = nil i.logger.Warn("ben test4", "nil error", ev.Err) 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.643900Z","driver":"mock_driver","event":{"TaskID":"","TaskName":"","AllocID":"","Timestamp":"0001-01-01T00:00:00Z","Message":"","Annotations":null,"Err":{}}} 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test2","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644226Z","driver":"mock_driver","event err str":"errz"} 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test3","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644240Z","driver":"mock_driver","event err":"errz"} 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test4","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644252Z","driver":"mock_driver","nil error":null} ``` Note in the first example err is set to an empty object and the error is lost. What we want is the last two examples which call out the err field explicitly so we can see what it is in this case	2021-10-11 19:44:52 +00:00
Bryce Kalow	679c547aa3	website: upgrade deps to fix search styles (#11294 )	2021-10-11 11:33:59 -05:00
Aleksandr Zagaevskiy	d92666e6a7	fixup! Support configurable dynamic port range	2021-10-11 14:13:59 +03:00
James Rasell	6f3a6f5ccf	Merge pull request #11283 from hashicorp/f-update-hclog-dep deps: update hashicorp/go-hclog to v1.0.0	2021-10-11 08:39:41 +02:00
Jai	563d609118	System Batch UI, Client Status Bar Chart and Client Tab page view (#11078 )	2021-10-07 17:11:38 -04:00
Michael Lange	e69dbe60f3	Merge pull request #11279 from hashicorp/f-ui/storybook-upgrade UI: Storybook upgrade	2021-10-07 09:17:27 -07:00
James Rasell	7200858cca	changelog: add entry for #11283	2021-10-07 08:16:05 +01:00
James Rasell	61a417d7e2	deps: update hashicorp/go-hclog to v1.0.0	2021-10-07 07:48:41 +01:00
Matt Mukerjee	b56432e645	Add FailoverHeartbeatTTL to config (#11127 ) FailoverHeartbeatTTL is the amount of time to wait after a server leader failure before considering reallocating client tasks. This TTL should be fairly long as the new server leader needs to rebuild the entire heartbeat map for the cluster. In deployments with a small number of machines, the default TTL (5m) may be unnecessary long. Let's allow operators to configure this value in their config files.	2021-10-06 18:48:12 -04:00
Michael Lange	93124622a3	Migrate: New hierarchical separator	2021-10-06 14:05:32 -07:00
Michael Lange	76255ae0ee	Migrate decorator to new file layout	2021-10-06 14:05:32 -07:00
Michael Lange	fd7970cf0d	Override the app rootURL for storybook Hopefully this work gets merged into ember-cli-storybook. For the time being, we get a fork instead.	2021-10-06 14:05:32 -07:00

... 3 4 5 6 7 ...

22090 commits