Commit graph

13392 commits

Author SHA1 Message Date
Mahmood Ali c5de71a424 Allow nullable fields in StatValues
In state values, we need to be able to distinguish between zero values
(e.g. `false`) and unset values (e.g. `nil`).

We can alternatively use protobuf `oneOf` and nested map to ensure
consistency of fields that are set together, but the golang
representation does not represent that well and introducing a mismatch
between representations.  Thus, I opted not to use it.
2018-11-14 14:41:19 -05:00
Mahmood Ali 713c9fe683 Move Stat{Object|Value} to plugins/shared/structs
Moving them as they may be useful for other packages/plugins besides
devices.
2018-11-14 09:01:26 -05:00
Mahmood Ali 1f4db08f42 Regenerate proto files with protoc-gen-go@v1.2.0 2018-11-14 09:01:26 -05:00
Mahmood Ali a4a9347501 fix comment typos 2018-11-14 08:36:14 -05:00
Omar Khawaja 0b9f32c6d2
AWS sandbox environment upgrade (#4873)
* upgrade Nomad from 0.8.4 to 0.8.6

* update deprecated nomad and vault commands

* update AMI ID

* add ingress rule for default fabio port and fabio UI

* upgrade Consul and Vault versions

* update AMI ID in README.md and terraform.tfvars
2018-11-13 23:21:01 -05:00
Danielle Tomlinson 0917e93537
Merge pull request #4869 from hashicorp/b-executor-stdout
executor: Fix stdout stderr copy/paste
2018-11-13 19:22:37 -08:00
Jen 78362b5a83 Update open graph image 2018-11-13 17:36:13 -05:00
Danielle Tomlinson e5c641daa9 scheduler: Allow comparisons of nil values
This commit allows the ConstraintChecker to test values that do not exist.
This is useful when wanting to _exclude_ given nodes from executing a
job, for example, if you wanted to give canary nodes an attribute, and
not run critical services on them, you may specify something like the
below, but not want to tag all other nodes with the inverse.

```hcl
constraint {
  attribute = "${node.attr.canary}
  operator = "!="
  value = "1"
}
```

This also requires all constraint checkers to allow for nil target
values, as they will no longer be short circuited by resolving a target.
2018-11-13 13:36:51 -08:00
Mahmood Ali 1e92161f14
Merge pull request #4858 from hashicorp/b-fix-master-20181109
Fix some tests in master
2018-11-13 16:08:26 -05:00
Alex Dadgar 08dc2ea702
Merge pull request #4867 from hashicorp/b-deployment-progress-deadline
Blocked evaluation fixes
2018-11-13 10:29:03 -08:00
Alex Dadgar 17e8446484
Merge pull request #4868 from hashicorp/b-plugin-ctx
Plugin client's handle plugin dying
2018-11-13 10:26:53 -08:00
Mahmood Ali 683a74d153 Ignore apt-get update failures in CI
We run with ~120 apt sources, and apt-get update fails if any of them is
down.

True errors would be raised again at install phase as true dependencies
fetch would fail.
;
2018-11-13 10:21:40 -05:00
Mahmood Ali 356c194acc Use materialized duration fields for driver config 2018-11-13 10:21:40 -05:00
Mahmood Ali 865419e756 convert all config durations to strings in tests 2018-11-13 10:21:40 -05:00
Mahmood Ali 470d20cdf3 Avoid downloading image if present locally 2018-11-13 10:21:40 -05:00
Mahmood Ali ac3b4571eb Address review comments 2018-11-13 10:21:40 -05:00
Mahmood Ali 69f26783e4 avoid setting resource limit on rkt command
Was accidentally modified in 5b14d24bf4626bab420d00783d92bcf25e0b641e .
2018-11-13 10:21:40 -05:00
Mahmood Ali fa146d9b85 fix plugin test 2018-11-13 10:21:40 -05:00
Mahmood Ali 66f4c23848 increase timeout to 30 minutes
nomad/client take very long and exceed 15m sometimes:

In https://travis-ci.org/hashicorp/nomad/jobs/452990197 :

```
panic: test timed out after 15m0s

goroutine 4739 [running]:
testing.(*M).startAlarm.func1()
	/home/travis/.gimme/versions/go1.11.2.linux.amd64/src/testing/testing.go:1296 +0xfd
....
goroutine 4665 [select]:
github.com/hashicorp/nomad/vendor/google.golang.org/grpc.newClientStream.func5(0xc0003dd500, 0xc000420120, 0x2b3f86295588, 0xc000496810)
	/home/travis/gopath/src/github.com/hashicorp/nomad/vendor/google.golang.org/grpc/stream.go:287 +0xd7
created by github.com/hashicorp/nomad/vendor/google.golang.org/grpc.newClientStream
	/home/travis/gopath/src/github.com/hashicorp/nomad/vendor/google.golang.org/grpc/stream.go:286 +0x842
FAIL	github.com/hashicorp/nomad/client/driver	900.036s
```
2018-11-13 10:21:40 -05:00
Mahmood Ali 8fa26f5521 Fix docker log fetching in tests
We no longer use syslog for tracking logs so tracking them explicitly
here
2018-11-13 10:21:40 -05:00
Mahmood Ali 88fa968623 killing should be done with wait client
Incidentally changed in 5b14d24bf4626bab420d00783d92bcf25e0b641e
2018-11-13 10:21:40 -05:00
Mahmood Ali 7690f389a0 Prioritize checking consumer context cancellation
Tests expect that as soon as eventer shuts down immediately on context
cancellations; but golang does not guarantee priority when multiple
pending channels are ready in a select statement.
2018-11-13 10:21:40 -05:00
Mahmood Ali c62ec124c0 Set clean config for mock driver
The default job here contains some exec task config (for setting
command and args) that aren't used for mock driver.  Now, the alloc
runner seems stricter about validating fields and errors on unexpected
fields.

Updating configs in tests so we can have an explicit task config
whenever driver is set explicitly.
2018-11-13 10:21:40 -05:00
Mahmood Ali c7610d8c22 mark and skip failing consul failing tests 2018-11-13 10:21:40 -05:00
Mahmood Ali e5e6f9a785 Update Docker name parsing lookup
`ParseNamed` function changed in e9f3f2cfee9d729a8642344c4fa4ea70b2d49468
where became `ParsedNormalizedName` with extra checks.
2018-11-13 10:21:40 -05:00
Mahmood Ali e9067e52b4 pull alpine image needed for test
The test requires the image to be present locally, so importing it as
part of setup.
2018-11-13 10:21:40 -05:00
Mahmood Ali 4e18846fd9 Adjust streaming duration
This test expects 11 repeats of the same message emitted at intervals of
200ms; so we need more than 2 seconds to adjust for time sleep
variations and the like.  So raising it to 3s here that should be
enough.
2018-11-13 10:21:40 -05:00
Mahmood Ali 8923ea4663 Handle time.Duration in mock
Mock driver config uses `time.Duration` fields but we initialize them
inconsistently, as time.Duration sometimes and as duration strings other
times.  Previously, `mapstructure` handles it and does the right thing.

This is no longer the case with MsgPack.  I could not find a good way to
bring back old behavior without too much complexity.  `MsgPack` extended
types weren't ideal here as we lose type information (e.g. int64 vs
string), and the input is a generic map and not a MsgPack serialization
of duration.

As such, I went with the simple solution of declaring the config field
as duration string, and panicing if the test doesn't pass a valid
string.

I found this to cause the smallest change in tests, but we can
alternatively force all to be int64 instead.
2018-11-13 10:21:40 -05:00
Mahmood Ali fb56dd699d distinguish java driver tests from others 2018-11-13 10:21:40 -05:00
Mahmood Ali 343df28165 Fix java driver tests 2018-11-13 10:21:40 -05:00
Mahmood Ali 03148e3167
Merge pull request #4871 from hashicorp/r-job-rerun-bug-changelog
changelog job re-run fix
2018-11-13 10:18:54 -05:00
Mahmood Ali 1403ad21b9 Changelog job re-run fix 2018-11-13 07:52:51 -05:00
Danielle Tomlinson bfeded1f30 executor: Fix stdout stderr copy/paste 2018-11-12 22:08:04 -08:00
Mahmood Ali e2d668f21c
Merge pull request #4861 from hashicorp/b-batch-deregister-transaction
Run job deregistering in a single transaction
2018-11-12 20:59:44 -05:00
Mahmood Ali e506ebbc24
Merge pull request #4845 from hashicorp/r-exec-refactor
Update exec driver to match rawexec
2018-11-12 20:59:32 -05:00
Alex Dadgar 693f244cce Plugin client's handle plugin dying
This PR plumbs the plugins done ctx through the base and driver plugin
clients (device already had it). Further, it adds generic handling of
gRPC stream errors.
2018-11-12 17:09:27 -08:00
Alex Dadgar a90dc978e1 Handle new eval being the duplicate properly 2018-11-12 16:02:23 -08:00
Preetha 8218e197be
Merge pull request #4855 from hashicorp/b-device-scheduling-fixtest
Fixes device scheduling unit tests
2018-11-12 16:55:23 -06:00
Preetha 103bc03467
Merge pull request #4865 from hashicorp/b-preemption-config
improvements to scheduler config API
2018-11-12 16:54:58 -06:00
Preetha Appan fd0ba320da
change path to v1/scheduler/configuration 2018-11-12 15:57:45 -06:00
Preetha Appan de890b9d5c
blank line 2018-11-12 15:50:14 -06:00
Danielle Tomlinson 2f13e34888
Merge pull request #4859 from hashicorp/b-rawexec-cgroups-root
rawexec: Only use cgroups when running as root.
2018-11-12 13:47:07 -08:00
Mahmood Ali 8513b3cccb Comment public functions and batch write txn 2018-11-12 16:09:39 -05:00
Preetha Appan 20af09a1ef
Fix logic bug in tracking sum of matched affinity weights
We need to track the sum of matching weights per device, but only
change the final return value if its the highest scoring choice
2018-11-12 15:06:45 -06:00
Preetha Appan 3a10a589d7
Fix failing test 2018-11-10 19:53:47 -06:00
Preetha Appan 7ef126a027
Smaller methods, and added tests for RPC layer 2018-11-10 17:37:33 -06:00
Preetha Appan 75662b50d1
Use response object/querymeta/writemeta in scheduler config API 2018-11-10 10:31:10 -06:00
Danielle Tomlinson 880e5015f2 rawexec: Only use cgroups when running as root.
If Nomad is not running as root, we should not try to use cgroups for pid
freezing.

This originally was implemented pre-driver-support in
https://github.com/hashicorp/nomad/blob/v0.8.6/client/driver/raw_exec.go#L120-L130
2018-11-10 06:45:11 -08:00
Mahmood Ali 9c0a15f3ce Run job deregistering in a single transaction
Fixes https://github.com/hashicorp/nomad/issues/4299

Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.

Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index.  However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals.  When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.

This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Michael Lange aa0cbadb30 Improve mirage modeling of allocations
Pending allocations never have tasks
2018-11-09 17:11:47 -08:00