Commit graph

15152 commits

Author SHA1 Message Date
Preetha Appan 26ea951627
docs 2019-05-14 16:13:59 -05:00
Preetha Appan 4f9c8ea068
Fix one more test set up 2019-05-14 16:13:41 -05:00
Michael Schurter 2fe0768f3b docs: changelog entry for #5669 and fix comment 2019-05-14 10:54:00 -07:00
Michael Schurter af9096c8ba client: register before restoring
Registration and restoring allocs don't share state or depend on each
other in any way (syncing allocs with servers is done outside of
registration).

Since restoring is synchronous, start the registration goroutine first.

For nodes with lots of allocs to restore or close to their heartbeat
deadline, this could be the difference between becoming "lost" or not.
2019-05-14 10:53:27 -07:00
Michael Schurter e07f73bfe0 client: do not restart dead tasks until server is contacted (try 2)
Refactoring of 104067bc2b2002a4e45ae7b667a476b89addc162

Switch the MarkLive method for a chan that is closed by the client.
Thanks to @notnoop for the idea!

The old approach called a method on most existing ARs and TRs on every
runAllocs call. The new approach does a once.Do call in runAllocs to
accomplish the same thing with less work. Able to remove the gate
abstraction that did much more than was needed.
2019-05-14 10:53:27 -07:00
Michael Schurter 8589233a0e drivers/mock: implement InspectTask 2019-05-14 10:53:27 -07:00
Michael Schurter d7e5ace1ed client: do not restart dead tasks until server is contacted
Fixes #1795

Running restored allocations and pulling what allocations to run from
the server happen concurrently. This means that if a client is rebooted,
and has its allocations rescheduled, it may restart the dead allocations
before it contacts the server and determines they should be dead.

This commit makes tasks that fail to reattach on restore wait until the
server is contacted before restarting.
2019-05-14 10:53:27 -07:00
Michael Schurter 2b7f398726 e2e: fix nomad service for systemd<230 2019-05-14 10:53:26 -07:00
Yishan Lin 20638e7119
Merge pull request #5703 from hashicorp/yishan/corrected-website-redirects
Fixed Spark links in redirects.txt.
2019-05-14 10:36:31 -07:00
Yishan Lin 7ffd608456 Update redirects.txt
Fixed Spark redirects post-website restructuring for the guides.
2019-05-14 08:56:13 -07:00
Michael Schurter f072c7421c
Merge pull request #5695 from hashicorp/f-squelch-logline
client: log when server list changes
2019-05-14 08:38:05 -07:00
Michael Schurter be973d32e9
Merge pull request #4590 from hashicorp/d-fix-stagger
docs: fix description of update.stagger
2019-05-14 08:36:08 -07:00
Michael Schurter f871b8998f
Merge pull request #5693 from hashicorp/docs-task-config
docs: mention regression in task config validation
2019-05-14 07:50:39 -07:00
Michael Schurter 94ab5c8b43
Merge pull request #5657 from hashicorp/docs-plugin-link
docs: add lots of links to plugin guide
2019-05-14 07:50:09 -07:00
James Rasell 4c7f8bb0d7
Add jrasell/sherpa to resource page and remove Replicator.
Adds jrasell/sherpa to the resources page under the Integrations
section.

Replicator is no longer being maintained or has been under active
development for well over a year. I have therefore removed this
from the resources page.
2019-05-14 13:13:11 +01:00
Danielle Lancashire d9815888ed evalbroker: Simplify nextDelayedEval locking 2019-05-14 14:06:27 +02:00
Danielle Lancashire 38562afbc1 evalbroker: No new enqueues when disabled
Currently when an evalbroker is disabled, it still recieves delayed
enqueues via log application in the fsm. This causes an ever growing
heap of evaluations that will never be drained, and can cause memory
issues in larger clusters, or when left running for an extended period
of time without a leader election.

This commit prevents the enqueuing of evaluations while we are
disabled, and relies on the leader restoreEvals routine to handle
reconciling state during a leadership transition.

Existing dequeues during an Enabled->Disabled broker state transition are
handled by the enqueueLocked function dropping evals.
2019-05-14 13:59:10 +02:00
Danielle Lancashire c91ae21a6c evalbroker: Flush within update lock
Primarily a cleanup commit, however, currently there is a potential race
condition (that I'm not sure we've ever actually hit) during a flapping
SetEnabled/Disabled state where we may never correctly restart the eval
broker, if it was being called from multiple routines.
2019-05-14 13:26:56 +02:00
Preetha Appan 4d3f74e161
Fix test setup to have correct jobcreateindex for deployments 2019-05-13 18:53:47 -05:00
Preetha Appan d448750449
Lookup job only once, and fix tests 2019-05-13 18:33:41 -05:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Yishan Lin a850c3141a Added redirect for Spark guide link 2019-05-13 16:16:14 -07:00
Michael Schurter 3b1f8991a1 client: log when server list changes
Stop logging in the happy path when nothing has changed.
2019-05-13 15:42:55 -07:00
Michael Schurter 1e4330bf2b docs: mention regression in task config validation 2019-05-13 14:08:46 -07:00
Michael Schurter 48db8135da
Merge pull request #5492 from hashicorp/f-allocated-mem
client: expose allocated memory per task
2019-05-13 13:31:22 -07:00
Jasmine Dahilig e3b69ca98f
fix update to changelog 2019-05-13 13:14:01 -07:00
Jasmine Dahilig 27161d8a12
update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665 2019-05-13 13:10:29 -07:00
Jasmine Dahilig 30d346ca15
Merge pull request #5665 from hashicorp/b-empty-datacenters
add non-empty string validation for datacenters
2019-05-13 10:23:26 -07:00
Mahmood Ali 2ddc39973d
Merge pull request #5668 from hashicorp/flaky-test-20190430
fix flaky test by allowing for call invocation overhead
2019-05-13 12:33:44 -04:00
Lang Martin 1d03a43ce2
Merge pull request #5642 from hashicorp/b-network-fingerprinting-ipv4
network fingerprinting multiple IPs on the configured network device
2019-05-13 11:46:53 -04:00
Mahmood Ali ed5b008ed0
Merge pull request #5690 from hashicorp/f-nomad-exec-part-04-rkt
implement nomad exec for rkt
2019-05-13 10:03:55 -04:00
Mahmood Ali dd8762e348 typo: "atleast" -> "at least" 2019-05-13 10:01:19 -04:00
Mahmood Ali 513303347c add CLI commands for nomad exec 2019-05-12 22:04:50 -04:00
Mahmood Ali d1526571a5 implement nomad exec for rkt
Implement the streaming exec handler for the rkt driver
2019-05-12 18:59:00 -04:00
Danielle 495ff647de
Merge pull request #5685 from jweissig/patch-9
docs: fixed typo
2019-05-11 21:00:47 +02:00
Justin Weissig e137b7f2e3
docs: fixed typo
Fixed typo: programatic/programmatic
2019-05-11 10:40:39 -07:00
Mahmood Ali f58932afe9
Merge pull request #5634 from hashicorp/f-nomad-exec-parts-03-executors
nomad exec part 3: executor based drivers
2019-05-10 21:24:23 -04:00
Mahmood Ali b4df061fef use pty/tty terminology similar to github.com/kr/pty 2019-05-10 19:17:14 -04:00
Mahmood Ali 7fdb7564e8 vendor github.com/kr/pty 2019-05-10 19:17:14 -04:00
Mahmood Ali a4640db7a6 drivers: implement streaming exec for executor based drivers
These simply delegate call to backend executor.
2019-05-10 19:17:14 -04:00
Mahmood Ali 3055fd53df executors: implement streaming exec
Implements streamign exec handling in both executors (i.e. universal and
libcontainer).

For creation of TTY, some incidental complexity leaked in.  The universal
executor uses github.com/kr/pty for creation of TTYs.

On the other hand, libcontainer expects a console socket and for libcontainer to
create the underlying console object on process start.  The caller can then use
`libcontainer.utils.RecvFd()` to get tty master end.

I chose github.com/kr/pty for managing TTYs here.  I tried
`github.com/containerd/console` package (which is already imported), but the
package did not work as expected on macOS.
2019-05-10 19:17:14 -04:00
Mahmood Ali 085d2ef759 executor: scaffolding for executor grpc handling
Prepare executor to handle streaming exec API calls that reuse drivers protobuf
structs.
2019-05-10 19:17:14 -04:00
Mahmood Ali ea241d5da7
Merge pull request #5674 from hashicorp/b-ui/flaky-client-detail-test
UI: Fixed flaky client-detail test
2019-05-10 18:51:00 -04:00
Michael Schurter 1c4e585fa7 client: expose allocated memory per task
Related to #4280

This PR adds
`client.allocs.<job>.<group>.<alloc>.<task>.memory.allocated` as a gauge
in bytes to metrics to ease calculating how close a task is to OOMing.

```
'nomad.client.allocs.memory.allocated.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 268435456.000
'nomad.client.allocs.memory.cache.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 5677056.000
'nomad.client.allocs.memory.kernel_max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.kernel_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8908800.000
'nomad.client.allocs.memory.rss.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 876544.000
'nomad.client.allocs.memory.swap.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8208384.000
```
2019-05-10 11:12:12 -07:00
Lang Martin 99359d7fbe executor_linux only do path resolution in the taskDir, not local
split out lookPathIn to show it's similarity to exec.LookPath
2019-05-10 11:33:35 -04:00
Charlie Voiselle dba077d5dd
Merge pull request #5683 from hashicorp/docs-describe-sched-restart
Added Sparrow link
2019-05-10 11:25:27 -04:00
Lang Martin f6bc45dd23 client improve a comment in updateNetworks 2019-05-10 11:25:04 -04:00
Danielle 79ced20e20 stalebot: Add 'thinking' as an exempt label (#5684) 2019-05-10 11:00:35 -04:00
Danielle d529023040
Merge pull request #5375 from hashicorp/dani/stale-issues
Setup probot/stale
2019-05-10 16:53:05 +02:00
Charlie Voiselle 1af7e4c4d7 Added Sparrow link 2019-05-10 10:35:21 -04:00