Commit Graph

1134 Commits

Author SHA1 Message Date
Alex Dadgar 432784dae3 Fix alloc watcher snapshot streaming 2018-03-27 11:14:53 -07:00
Chelsea Komlo 57e2cd04bd
Merge pull request #4025 from hashicorp/reload-http-tls
Allow TLS configurations for HTTP and RPC connections to be reloaded …
2018-03-26 18:00:30 -04:00
Preetha Appan 33e170c15d
s/linear/constant/g 2018-03-26 14:45:09 -05:00
Chelsea Holland Komlo 96df419fff code review feedback 2018-03-26 10:55:22 -04:00
Alex Dadgar 34211f00a7 Allow separate enterprise config overlay 2018-03-22 13:53:08 -07:00
Michael Schurter 0e0b04afec test: fix by using mock.BatchJob 2018-03-21 16:51:45 -07:00
Michael Schurter 39cef16c73 test: don't call t.Fatal from within a goroutine 2018-03-21 16:51:45 -07:00
Michael Schurter cb61a4bdc7 Fix linting errors 2018-03-21 16:51:45 -07:00
Alex Dadgar 7b2bad8c5e Toggle Drain allows resetting eligibility
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar 02019f216a Correct defaulting 2018-03-21 16:51:44 -07:00
Alex Dadgar 78c7c36e65 code review 2018-03-21 16:51:44 -07:00
Alex Dadgar 8289cc3c6f HTTP and API 2018-03-21 16:51:44 -07:00
Alex Dadgar b3d2346419 Upgrade path 2018-03-21 16:51:43 -07:00
Alex Dadgar 010228577e Drain cli, api, http 2018-03-21 16:51:43 -07:00
Chelsea Holland Komlo 66e44cdb73 Allow TLS configurations for HTTP and RPC connections to be reloaded separately 2018-03-21 17:51:08 -04:00
Michael Schurter 70c370c6fe
Merge pull request #4003 from jrasell/f_gh_3988
Allow Nomads Consul health check names to be configurable.
2018-03-20 16:44:08 -07:00
James Rasell 121c3bc997 Update Consul check params from using health-check to check. 2018-03-20 16:03:58 +01:00
Michael Schurter 86ccdb9115 Fix generating static assets
Broke due to a change in go-bindata-assetfs
2018-03-19 15:52:38 -07:00
James Rasell 15afef9b77 Allow Nomads Consul health checks to be configurable.
This change allows the client HTTP and the server HTTP, Serf and
RPC health check names within Consul to be configurable with the
defaults as previous. The configuration can be done via either a
config file or using CLI flags.

Closes #3988
2018-03-19 19:37:56 +01:00
Preetha 6df57c177c
Merge pull request #4002 from hashicorp/b-reschedule-systemjob-panic
Fix incorrect initialization of reschedule policy for system jobs.
2018-03-19 13:06:55 -05:00
Preetha Appan 161bc66355
Fix incorrect initialization of reschedule policy for system jobs. 2018-03-19 12:16:13 -05:00
Alex Dadgar 9e05c9a50e
Merge pull request #3997 from hashicorp/b-serf-addr
RPC Advertise used exclusively for Clients
2018-03-19 09:30:20 -07:00
Alex Dadgar 9ef23ff277 enable server in test 2018-03-16 16:52:37 -07:00
Alex Dadgar b8607ad6d6 Heartbeat uses client rpc advertise and server defaults server rpc advertise addr 2018-03-16 16:47:08 -07:00
Alex Dadgar 52b7fb5361 Separate client and server rpc advertise addresses 2018-03-16 16:47:08 -07:00
Michael Schurter 86f562be3a Remove unnecessary conversions 2018-03-16 16:32:59 -07:00
Michael Schurter c3e8f6319c gofmt -s (simplify) files 2018-03-16 16:31:16 -07:00
Michael Schurter 1044bc0feb
Merge pull request #3984 from hashicorp/f-loosen-consul-skipverify
Replace Consul TLSSkipVerify handling
2018-03-16 11:21:28 -07:00
Michael Schurter 0971114f0c Replace Consul TLSSkipVerify handling
Instead of checking Consul's version on startup to see if it supports
TLSSkipVerify, assume that it does and only log in the job service
handler if we discover Consul does not support TLSSkipVerify.

The old code would break TLSSkipVerify support if Nomad started before
Consul (such as on system boot) as TLSSkipVerify would default to false
if Consul wasn't running. Since TLSSkipVerify has been supported since
Consul 0.7.2, it's safe to relax our handling.
2018-03-14 17:43:06 -07:00
Alex Dadgar 3537c73289
Merge pull request #3978 from hashicorp/b-core-sched
Always add core scheduler
2018-03-14 16:13:15 -07:00
Preetha Appan e75630f8e8
Fix formatting 2018-03-14 16:10:32 -05:00
Preetha Appan 9a5e6edf1f
Rename DelayCeiling to MaxDelay 2018-03-14 16:10:32 -05:00
Preetha Appan 5f50c3d618
Add new reschedule options to API layer and unit tests 2018-03-14 16:10:32 -05:00
Alex Dadgar 92cb552ff6 Always add core scheduler and detect invalid schedulers 2018-03-14 10:53:27 -07:00
Alex Dadgar 63e14b7d63 nodeevents -> events 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo 1488b076d1 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 00d9923454 Ensure node updates don't strip node events
Add node events to CLI
2018-03-13 18:05:40 -07:00
Michael Schurter ec381ee705 Revert spelling corrections in generated code 2018-03-12 11:19:29 -07:00
Josh Soref 1359fd2c3d spelling: unexpected 2018-03-11 19:08:07 +00:00
Josh Soref 42d7f19861 spelling: supports 2018-03-11 19:00:11 +00:00
Josh Soref c808dc3095 spelling: submitted 2018-03-11 18:59:27 +00:00
Josh Soref 6e1244b6c1 spelling: significantly 2018-03-11 18:56:45 +00:00
Josh Soref 8978caea28 spelling: shutdown 2018-03-11 18:55:49 +00:00
Josh Soref 05305afcd9 spelling: services 2018-03-11 18:53:58 +00:00
Josh Soref ad55e85e73 spelling: registrations 2018-03-11 18:40:53 +00:00
Josh Soref 6fa892a463 spelling: propagated 2018-03-11 18:39:26 +00:00
Josh Soref d208d26b6e spelling: preemptively 2018-03-11 17:58:48 +00:00
Josh Soref 8abf038f4d spelling: output 2018-03-11 18:35:30 +00:00
Josh Soref 3c1ce6d16d spelling: otherwise 2018-03-11 18:34:27 +00:00
Josh Soref 3e2f500cf9 spelling: largely 2018-03-11 18:21:52 +00:00
Josh Soref 85fabc63c8 spelling: expected 2018-03-11 17:57:01 +00:00
Josh Soref 680bbd6d4f spelling: encountered 2018-03-11 17:58:59 +00:00
Josh Soref 1f927cd343 spelling: each other 2018-03-11 17:56:50 +00:00
Josh Soref 7a6dfa4b1a spelling: current 2018-03-11 17:52:32 +00:00
Josh Soref 6f48e31c00 spelling: convenience 2018-03-11 17:50:48 +00:00
Josh Soref 76cf178933 spelling: cleanup 2018-03-11 17:47:09 +00:00
Josh Soref 3ec4ebc7b1 spelling: canonical 2018-03-11 17:46:01 +00:00
Josh Soref 1d58ae8899 spelling: bootstrap 2018-03-11 17:43:19 +00:00
Josh Soref 455eb1aeb3 spelling: authoritative 2018-03-11 17:42:05 +00:00
Josh Soref 5f87691df1 spelling: asynchronously 2018-03-11 17:41:50 +00:00
Josh Soref 58b794875f spelling: artifact 2018-03-11 17:41:02 +00:00
Josh Soref c74245279b spelling: arbitrary 2018-03-11 17:40:03 +00:00
Alex Dadgar 483e011720
Merge pull request #3892 from hashicorp/f-tunnel
Client RPC Endpoints, Server Routing and Streaming RPCs
2018-02-20 16:35:42 -08:00
Alex Dadgar 8b86e64fd8 Show HTTP request method 2018-02-16 15:55:26 -08:00
Alex Dadgar aa98f8ba7b Enhance API pkg to utilize Server's Client Tunnel
This PR enhances the API package by having client only RPCs route
through the server when they are low cost and for filesystem access to
first attempt a direct connection to the node and then falling back to
a server routed request.
2018-02-15 13:59:03 -08:00
Alex Dadgar 38b695b69c feedback and rebasing 2018-02-15 13:59:03 -08:00
Alex Dadgar c5e9ebb656 Use helper for forwarding 2018-02-15 13:59:03 -08:00
Alex Dadgar 9117ef4650 HTTP agent 2018-02-15 13:59:03 -08:00
Alex Dadgar d7029965ca Server side impl + touch ups 2018-02-15 13:59:02 -08:00
Alex Dadgar 14845bb918 Refactor determining the handler for a node id call 2018-02-15 13:59:02 -08:00
Alex Dadgar e685211892 Code review feedback 2018-02-15 13:59:02 -08:00
Alex Dadgar f5f43218f5 HTTP and tests 2018-02-15 13:59:02 -08:00
Alex Dadgar 9a5569678c Client Stat/List impl 2018-02-15 13:59:02 -08:00
Alex Dadgar 8854b35b34 Agent logs 2018-02-15 13:59:02 -08:00
Alex Dadgar 69def2ff22 Server tests of logs 2018-02-15 13:59:02 -08:00
Alex Dadgar ddd67f5f11 Server streaming 2018-02-15 13:59:01 -08:00
Alex Dadgar ca9379be09 Logs over RPC w/ lots to touch up 2018-02-15 13:59:01 -08:00
Alex Dadgar 993727c28f Use in-mem rpc 2018-02-15 13:59:01 -08:00
Alex Dadgar d5a834b801 fix lint 2018-02-15 13:59:01 -08:00
Alex Dadgar 9bc75f0ad4 Fix manager tests and make testagent recover from port conflicts 2018-02-15 13:59:01 -08:00
Alex Dadgar 8dcda29c21 Use nomad UUID 2018-02-15 13:59:00 -08:00
Alex Dadgar 71029b6329 Test http 2018-02-15 13:59:00 -08:00
Alex Dadgar 6dd1c9f49d Refactor 2018-02-15 13:59:00 -08:00
Alex Dadgar ad7bc0c6bd Server can forward ClientStats.Stats 2018-02-15 13:59:00 -08:00
Alex Dadgar 1472b943d6 Stats Endpoint 2018-02-15 13:59:00 -08:00
Kyle Havlovitz 54b691f538
Merge pull request #3852 from hashicorp/autopilot-cleanup
Clean up some leftover autopilot differences from Consul
2018-02-14 10:42:32 -08:00
Preetha df6400222b
Merge pull request #3868 from hashicorp/f-server-side-restarts
server side rescheduling
2018-02-13 20:09:51 -06:00
Kyle Havlovitz 709b693d39 Clean up some leftover autopilot differences from Consul 2018-02-08 10:27:26 -08:00
Mahmood Ali bebafb5234 Add tags option to datadog telemetry
Expose an global tags option in telemetry config for dogstatsd, for
purposes of distinguishing between multiple nomad cluster metrics.
2018-02-06 12:08:37 -05:00
Preetha Appan 4a78c5bc84
Fix unit test 2018-01-31 09:56:53 -06:00
Preetha Appan 1f834d1a31
Add reschedule policy to API, and HCL parsing support. 2018-01-31 09:56:53 -06:00
Kyle Havlovitz 0eb0acacdc Fix remaining issues with autopilot change 2018-01-30 15:21:28 -08:00
Kyle Havlovitz 2ccf565bf6 Refactor redundancy_zone/upgrade_version out of client meta 2018-01-29 20:03:38 -08:00
Chelsea Komlo d09cc2a69f
Merge pull request #3492 from hashicorp/f-client-tls-reload
Client/Server TLS dynamic reload
2018-01-23 05:51:32 -05:00
Michael Schurter 694b547a6b
Merge pull request #3682 from hashicorp/b-3681-always-set-driver-ip
Always advertise driver IP when in driver mode
2018-01-22 16:41:34 -08:00
Chelsea Holland Komlo 7d3c240871 swap raft layer tls wrapper 2018-01-19 17:00:15 -05:00
Michael Schurter 8a0cf66822 Improve invalid port error message for services
Related to #3681

If a user specifies an invalid port *label* when using
address_mode=driver they'll get an error message about the label being
an invalid number which is very confusing.

I also added a bunch of testing around Service.AddressMode validation
since I was concerned by the linked issue that there were cases I was
missing. Unfortunately when address_mode=driver is used there's only so
much validation that can be done as structs/structs.go validation never
peeks into the driver config which would be needed to verify the port
labels/map.
2018-01-18 15:35:24 -08:00
Michael Schurter 447dc5bbd3 Fix test 2018-01-18 15:35:24 -08:00
Michael Schurter 583e17fad5 Always advertise driver IP when in driver mode
Fixes #3681

When in drive address mode Nomad should always advertise the driver's IP
in Consul even when no network exists. This matches the 0.6 behavior.

When in host address mode Nomad advertises the alloc's network's IP if
one exists. Otherwise it lets Consul determine the IP.

I also added some much needed logging around Docker's network discovery.
2018-01-18 15:35:24 -08:00
Kyle Havlovitz 8d41f4ad40 Formatting/test adjustments 2018-01-18 15:03:35 -08:00
Kyle Havlovitz 12ff22ea70 Merge branch 'master' into autopilot 2018-01-18 13:29:25 -08:00
Chelsea Holland Komlo 35466a331a fixing up raft reload tests
close second goroutine in raft-net
2018-01-17 10:29:15 -05:00
Michael Schurter 57eb128dcf
Merge pull request #3718 from hashicorp/b-3713-fix-check-restart
Fix service.check_restart stanza propagation
2018-01-16 16:39:42 -08:00
Kyle Havlovitz 7b980c42d8 Add raft remove by id endpoint/command 2018-01-16 13:35:32 -08:00
Chelsea Holland Komlo 6c9f9c8ac3 adding additional test assertions; differentiate reloading agent and http server 2018-01-16 07:34:39 -05:00
Alex Dadgar 54124a8478 Test listener uses freeport instead of static ports 2018-01-12 15:10:26 -08:00
Michael Schurter 9f179e9fab Fix HTTP code for permission denied errors
Fixes #3697

The existing code and test case only covered the leader behavior. When
querying against non-leaders the error has an "rpc error: " prefix.

To provide consistency in HTTP error response I also strip the "rpc
error: " prefix for 403 responses as they offer no beneficial additional
information (and in theory disclose a tiny bit of data to unauthorized
users, but it would be a pretty weird bit of data to use in a malicious
way).
2018-01-09 15:25:53 -08:00
Michael Schurter 7c282f174b Fix service.check_restart stanza propagation
There was a bug in jobspec parsing, a bug in CheckRestart merging, and a
bug in CheckRestart canonicalization. All are now tested.
2018-01-09 15:15:36 -08:00
Chelsea Holland Komlo 214d128eb9 reload raft transport layer
fix up linting
2018-01-08 14:52:28 -05:00
Chelsea Holland Komlo 0708d34135 call reload on agent, client, and server separately 2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo d9ec538d6a don't ignore error in http reloading
code review feedback
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo 6a2432659a code review fixups 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo 4e0dbd23cf prevent races when reloading, fully shut down raft 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo ae7fc4695e fixups from code review
Revert "close raft long-lived connections"

This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.

reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo c0ad9a4627 add ability to upgrade/downgrade nomad agents tls configurations via sighup 2018-01-08 09:21:06 -05:00
Preetha 1712b03705
Merge branch 'master' into 0.8 2018-01-03 16:06:38 -06:00
Alex Dadgar bfc62ae41c bump version and remove generated structs 2017-12-19 17:10:52 -08:00
Alex Dadgar f0127afd93 generated files 2017-12-19 16:57:34 -08:00
Michael Schurter 714eb0b266 Services should not require a port
Fixes #3673
2017-12-19 15:50:23 -08:00
Kyle Havlovitz 1c07066064 Add autopilot functionality based on Consul's autopilot 2017-12-18 14:29:41 -08:00
Kyle Havlovitz b775fc7b33
Added support for v2 raft APIs and -raft-protocol option 2017-12-12 10:17:16 -06:00
Alex Dadgar d61ade8f02 remove generated structs 2017-12-11 17:51:41 -08:00
Alex Dadgar 8e63d545c4 generated assets 2017-12-11 17:30:37 -08:00
Michael Schurter cdcefd0908 Use the Service.Hash() method in agent service ids
The allocID and taskName parameters are useless for agents, but it's
still nice to reuse the same hash method for agent and task services.
This brings in the lowercase mode for the agent hash as well.
2017-12-11 16:50:15 -08:00
Michael Schurter 4f1002c1a8 Be more defensive in port checks 2017-12-08 12:27:57 -08:00
Michael Schurter d613e0aaf5 Move service hash logic to Service.Hash method 2017-12-08 12:03:43 -08:00
Michael Schurter b71edf846f Hash fields used in task service IDs
Fixes #3620

Previously we concatenated tags into task service IDs. This could break
deregistration of tag names that contained double //s like some Fabio
tags.

This change breaks service ID backward compatibility so on upgrade all
users services and checks will be removed and re-added with new IDs.

This change has the side effect of including all service fields in the
ID's hash, so we no longer have to track PortLabel and AddressMode
changes independently.
2017-12-08 12:03:43 -08:00
Michael Schurter 91282315d1 Prevent using port 0 with address_mode=driver 2017-12-08 12:03:43 -08:00
Michael Schurter 4b20441eef Validate port label for host address mode
Also skip getting an address for script checks which don't use them.

Fixed a weird invalid reserved port in a TaskRunner test helper as well
as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause
other tests to fail, but we were referencing an invalid PortLabel and
just not catching it before.
2017-12-08 12:03:43 -08:00
Michael Schurter 4347026f83 Test Consul from TaskRunner thoroughly
Rely less on the mockConsulServiceClient because the real
consul.ServiceClient needs all the testing it can get!
2017-12-08 12:03:00 -08:00
Michael Schurter 4ae115dc59 Allow custom ports for services and checks
Fixes #3380

Adds address_mode to checks (but no auto) and allows services and checks
to set literal port numbers when using address_mode=driver.

This allows SDNs, overlays, etc to advertise internal and host addresses
as well as do checks against either.
2017-12-08 12:03:00 -08:00
Michael Schurter 1dd5b3822c
Merge pull request #3608 from hashicorp/b-3342-windows-log-leak
Fix bug in log framer only affecting Windows
2017-12-08 10:59:26 -08:00
Chelsea Holland Komlo 2ea8e43214 code review fixups 2017-12-06 16:37:47 -05:00
Chelsea Holland Komlo a010db084b fix up basic test
add conversion for KillSignal for api/struct representation of task
2017-12-06 14:36:45 -05:00
Michael Schurter b66aa5b7f6
Merge pull request #3563 from hashicorp/b-snapshot-atomic
Atomic Snapshotting / Sticky Volume Migration
2017-12-05 09:16:33 -08:00
Alex Dadgar 0bec137561
Merge pull request #3555 from PagerDuty/fix-loop-on-sigpipe
Do not emit logs on SIGPIPE since logging service could be unavailable
2017-12-04 14:11:05 -08:00
Alex Dadgar ab67a98c13 Emit hostname as a label 2017-12-04 10:42:31 -08:00
Jens Herrmann 5680fcccc2 Fix typos in metric names. #3610 2017-12-01 15:24:14 +01:00
Michael Schurter 2cbde16b9b Add check for Windows ECONNRESET 2017-11-30 21:30:20 -08:00
Michael Schurter 3e8e3aac70 Add defensive check to safeguard from future #3342s
I hate adding "this should never happen" checks, but causing a tight
loop that OOMs Nomad is just too easy in this code otherwise.
2017-11-30 20:37:13 -08:00
Michael Schurter 29d86eb348 Fix race in framer and improperly returned err
Fixes #3342

Two bugs were fixed:

* Closing the StreamFramer's exitCh before setting the error means other
  goroutines blocked on exitCh closing could see the error as nil. This
  was *not* observered.
* parseFramerError on Windows would fall through and return an
  improperly captured nil err variable. There's no need for
  parseFramerError to be a closure which fixes the confusion.
2017-11-30 17:42:53 -08:00
Michael Schurter 5e975bbd0f Add comment and normalize err check ordering
as per PR comments
2017-11-29 17:26:11 -08:00
Michael Schurter d996c3a231 Check for error file when receiving snapshots 2017-11-29 17:26:11 -08:00
Michael Schurter ca946679f6 Destroy partially migrated alloc dirs
Test that snapshot errors don't return a valid tar currently fails.
2017-11-29 17:26:11 -08:00
Michael Lange 96403746b1 Add CORS headers to client fs endpoints 2017-11-21 11:22:42 -08:00
Preetha Appan 3592635ede Populate DisplayMessage in various http endpoints that return allocations, plus unit tests. 2017-11-17 14:53:26 -06:00
Alex Dadgar 05b1588cea Only publish metric when the task is running and dev mode publishes metrics 2017-11-15 13:21:06 -08:00
Max Timchenko 8b7e61d055 Do not emit logs on SIGPIPE since logging service could be unavailable
This should fix https://github.com/hashicorp/nomad/issues/3554
2017-11-15 18:01:41 +02:00
Chelsea Komlo 2dfda33703 Nomad agent reload TLS configuration on SIGHUP (#3479)
* Allow server TLS configuration to be reloaded via SIGHUP

* dynamic tls reloading for nomad agents

* code cleanup and refactoring

* ensure keyloader is initialized, add comments

* allow downgrading from TLS

* initalize keyloader if necessary

* integration test for tls reload

* fix up test to assert success on reloaded TLS configuration

* failure in loading a new TLS config should remain at current

Reload only the config if agent is already using TLS

* reload agent configuration before specific server/client

lock keyloader before loading/caching a new certificate

* introduce a get-or-set method for keyloader

* fixups from code review

* fix up linting errors

* fixups from code review

* add lock for config updates; improve copy of tls config

* GetCertificate only reloads certificates dynamically for the server

* config updates/copies should be on agent

* improve http integration test

* simplify agent reloading storing a local copy of config

* reuse the same keyloader when reloading

* Test that server and client get reloaded but keep keyloader

* Keyloader exposes GetClientCertificate as well for outgoing connections

* Fix spelling

* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Lange eb42e6d219 generated UI routes 2017-11-10 13:29:17 -08:00