Commit graph

20789 commits

Author SHA1 Message Date
Tim Gross 64449cddc1 implement alloc runner task restart hook
Most allocation hooks don't need to know when a single task within the
allocation is restarted. The check watcher for group services triggers the
alloc runner to restart all tasks, but the alloc runner's `Restart` method
doesn't trigger any of the alloc hooks, including the group service hook. The
result is that after the first time a check triggers a restart, we'll never
restart the tasks of an allocation again.

This commit adds a `RunnerTaskRestartHook` interface so that alloc runner
hooks can act if a task within the alloc is restarted. The only implementation
is in the group service hook, which will force a re-registration of the
alloc's services and fix check restarts.
2021-01-22 10:55:40 -05:00
Drew Bailey bae0c6cd20
changelog entry for 9768 (#9873) 2021-01-22 09:22:02 -05:00
Drew Bailey 630babb886
prevent double job status update (#9768)
* Prevent Job Statuses from being calculated twice

https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard
2021-01-22 09:18:17 -05:00
Kris Hicks 8f9e47a8e7
Clean up Task Validation tests (#9833)
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2021-01-21 11:53:02 -08:00
Kris Hicks 7694a66414
Don't prepend https to docker cred helper call (#9852)
Some credential helpers, like the ECR helper, will strip the protocol if
given. Others, like the linux "pass" helper, do not.
2021-01-21 11:46:59 -08:00
Charlie Voiselle 4f4d6e6c37
Enable network namespaces for QEMU driver (#9861)
* Enable network namespaces for QEMU driver
* Add CHANGELOG entry
2021-01-21 14:05:46 -05:00
Mahmood Ali 1c6f4549ff
Merge pull request #9868 from hashicorp/e2e-tests-20200121
Deflake some e2e tests
2021-01-21 12:02:52 -05:00
Mahmood Ali 9dcdafe4cf e2e: show command output on failure
When a command fails, it's nice to have the full output, as it contains
diagnostic information. The status code isn't sufficient for debugging.
2021-01-21 10:32:16 -05:00
Mahmood Ali f03e67712a
docs: remove timestamp hcl2 function (#9867)
timestamp isn't actually implemented
2021-01-21 10:29:50 -05:00
Mahmood Ali 923725bf3d e2e: deflake TestVolumeMounts
After submitting an update, the test ought to wait until the new
allocations are placed. Previously, we'd use the original to-be-stopped
allocations and the test fails when attempting to exec.
2021-01-21 10:28:41 -05:00
Mahmood Ali 95b7fc80b8 e2e deflake namespaces: only check namespace jobs
Deflake namespace e2e test by only asserting on jobs related to the
namespace tests. During our e2e tests, some left over jobs (e.g.
prometheus) are left running while being shutdown and cause the test to
fail.
2021-01-21 10:26:24 -05:00
Mahmood Ali 2e8bcac261 e2e: deflake events
Handle streamCh channel being closed.
2021-01-21 10:25:42 -05:00
Buck Doyle 27f73f2b7b
Change to fork of audit to log flaky tests (#9518)
This will report the names of flaky tests instead of just counting them.
2021-01-21 08:25:16 -06:00
Dennis Schön 3eaf1432aa
validate connect block allowed only within group.service 2021-01-20 14:34:23 -05:00
Drew Bailey 3099eb0c73
fix changelog date (#9862) 2021-01-20 13:14:21 -05:00
Seth Hoenig 08e323b753
Merge pull request #9849 from hashicorp/b-cc-ig-id
consul/connect: Enable running multiple ingress gateways per Nomad agent
2021-01-20 10:08:14 -06:00
Seth Hoenig 53218716b3
docs: fix typo in changelog
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-01-20 09:50:59 -06:00
Seth Hoenig 5abaf1b86d consul/connect: ensure proxyID in test case 2021-01-20 09:48:12 -06:00
Seth Hoenig a18e63ed55 client: use closed variable in append 2021-01-20 09:20:50 -06:00
Dennis Schön c376545f49
don't prefix json logging 2021-01-20 09:09:31 -05:00
Marcus Naughton 8fe43cf063 Update server_join.mdx
Fixed Markdown for GCP example.
2021-01-19 16:47:12 -05:00
Seth Hoenig 991884e715 consul/connect: Enable running multiple ingress gateways per Nomad agent
Connect ingress gateway services were being registered into Consul without
an explicit deterministic service ID. Consul would generate one automatically,
but then Nomad would have no way to register a second gateway on the same agent
as it would not supply 'proxy-id' during envoy bootstrap.

Set the ServiceID for gateways, and supply 'proxy-id' when doing envoy bootstrap.

Fixes #9834
2021-01-19 12:58:36 -06:00
Seth Hoenig 1bebf475c6
Merge pull request #9851 from hashicorp/b-proxy-default-connect-timeout
consul/connect: always set gateway proxy default timeout
2021-01-19 12:09:14 -06:00
Seth Hoenig f213b8c51b consul/connect: always set gateway proxy default timeout
If the connect.proxy stanza is left unset, the connection timeout
value is not set but is assumed to be, and may cause a non-fatal NPE
on job submission.
2021-01-19 11:23:41 -06:00
Drew Bailey f42aa2a230
remove invalid duplicate (#9850) 2021-01-19 11:03:06 -05:00
Kris Hicks 8a8b95a119
executor_linux: Remove unreachable PATH= code (#9778)
This has to have been unused because the HasPrefix operation is
backwards, meaning a Command.Env that includes PATH= never would have
worked; the default path was always used.
2021-01-15 11:19:09 -08:00
Drew Bailey 009b8d5363
Persist shared allocated ports for inplace update (#9830)
* Persist shared allocated ports for inplace update

Ports were not copied over when performing inplace updates in the
generic scheduler

* changelog

* drop spew
2021-01-15 12:45:12 -05:00
Seth Hoenig 9263ae8a4a
Merge pull request #9828 from leonardobsjr/changelog-gw-sidecar-mention
Missed improvement regarding IGW task customization
2021-01-15 10:21:33 -06:00
Mahmood Ali 76ce6306a4 add helper for building ami 2021-01-15 10:49:13 -05:00
Mahmood Ali e51651c34a set sha 2021-01-15 10:49:13 -05:00
Mahmood Ali 82637715cf change ami naming 2021-01-15 10:49:12 -05:00
Mahmood Ali 0af1509a77 move config files to terraform 2021-01-15 10:49:12 -05:00
leonardobsjr 16ec2c76e7
Missed improvement regarding IGW improvement
Seems like this was missing from the docs. Was trying to make this work on 1.0.1 (it was on the docs), however, it's only working on 1.0.2.
2021-01-15 12:34:11 -03:00
Shishir Mahajan 661f7df7be Update FSIsolation from none to image. 2021-01-15 08:01:04 -05:00
Drew Bailey 9f56195dc2
bump upgrade guide version (#9822)
* bump upgrade guide version

* drop 1.0.3 until there are upgrade specifics
2021-01-14 16:18:54 -05:00
Kris Hicks d71a90c8a4
Fix some errcheck errors (#9811)
* Throw away result of multierror.Append

When given a *multierror.Error, it is mutated, therefore the return
value is not needed.

* Simplify MergeMultierrorWarnings, use StringBuilder

* Hash.Write() never returns an error

* Remove error that was always nil

* Remove error from Resources.Add signature

When this was originally written it could return an error, but that was
refactored away, and callers of it as of today never handle the error.

* Throw away results of io.Copy during Bridge

* Handle errors when computing node class in test
2021-01-14 12:46:35 -08:00
Kris Hicks 39e369c3bb
csi: Return error when deleting node (#9803)
In this change we'll properly return the error in the
CSIPluginTypeMonolith case (which is the type given in DeleteNode()),
and also return the error when the given ID is not found.

This was found via errcheck.
2021-01-14 12:44:50 -08:00
Kris Hicks 438717500d
gatedwriter: Fix race condition (#9791)
If one thread calls `Flush()` on a gatedwriter while another thread attempts to
`Write()` new data to it, strange things will happen.

The test I wrote shows that at the very least you can write _while_ flushing,
and the call to `Write()` will happen during the internal writes of the
buffered data, which is maybe not what is expected. (i.e. the `Write()`'d data
will be inserted somewhere in the middle of the data being `Flush()'d`)

It's also the case that, because `Write()` only has a read lock, if you had
multiple threads trying to write ("read") at the same time you might have data
loss because the `w.buf` that was read would not necessarily be up-to-date by
the time `p2` was appended to it and it was re-assigned to `w.buf`. You can see
this if you run the new gatedwriter tests with `-race` against the old implementation:

```
WARNING: DATA RACE
Read at 0x00c0000c0420 by goroutine 11:
  runtime.growslice()
      /usr/lib/go/src/runtime/slice.go:125 +0x0
  github.com/hashicorp/nomad/helper/gated-writer.(*Writer).Write()
      /home/hicks/workspace/nomad/helper/gated-writer/writer.go:41 +0x2b6
  github.com/hashicorp/nomad/helper/gated-writer.TestWriter_WithMultipleWriters.func1()
      /home/hicks/workspace/nomad/helper/gated-writer/writer_test.go:90 +0xea
```

This race condition is fixed in this change.
2021-01-14 12:43:14 -08:00
Kris Hicks abb8f2ebc0
Refactor Job.Scale() (#9771) 2021-01-14 12:40:42 -08:00
Kris Hicks f77ffb3b5b
Add missing sink.Cancel() in fsm (#9818) 2021-01-14 12:39:20 -08:00
Drew Bailey 199ec2d91d
bump website version (#9820) 2021-01-14 15:12:39 -05:00
Drew Bailey cdc7f85964
Release 1.0.2 (#9819)
* changelog for release 1.0.2

* Generate files for 1.0.2 release

* Release v1.0.2

* rm generated files, update changelog for next release

* checkout bindata_assetfs

* bump version

Co-authored-by: Nomad Release bot <nomad@hashicorp.com>
2021-01-14 15:08:28 -05:00
Brandon Romano 1087c20cf7
Merge pull request #9805 from hashicorp/br.stack-menu
Website StackMenu updates for 1/14
2021-01-14 09:31:54 -08:00
Mahmood Ali 8eedd8d3d0
ci: only read/modify GO_TAGS field (#9815)
Only lookup GO_TAGS variable, and avoid the false positives where GO_TAGS is a variable suffix.
2021-01-14 08:16:58 -05:00
Drew Bailey 9cd274ba8d
changelogfmt (#9807) 2021-01-13 15:21:17 -05:00
Seth Hoenig f1084b0a84
Merge pull request #9809 from hashicorp/f-use-jobspec2-in-e2eutil
e2e: use jobspec2 Parse for parsing jobfile in e2e utils
2021-01-13 14:14:34 -06:00
Seth Hoenig 536747f216 e2e: use jobspec2 Parse for parsing jobfile in e2e utils
We directly parse job files in e2eutil, but currently using jobspec
package. Instead, use the Parse method from the jobspec2 package so
we can parse job files with new features.
2021-01-13 14:00:40 -06:00
Brandon Romano 2588500bea Website StackMenu updates for 1/14 2021-01-13 10:21:55 -08:00
Nomad Release Bot 6ec555afd1
Release v1.0.2 2021-01-13 17:37:06 +00:00
Nomad Release bot f36d983863 Generate files for 1.0.2 release 2021-01-13 16:52:51 +00:00