open-nomad/client
Seth Hoenig f8596a3602 env_aws: use best-effort lookup table for CPU performance in EC2
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
2020-04-28 19:01:33 -06:00
..
allocdir connect: add unix socket to proxy grpc for envoy (#6232) 2019-09-03 08:43:38 -07:00
allochealth health tracker: account for group service checks 2020-03-22 12:38:37 -04:00
allocrunner csi: checkpoint volume claim garbage collection (#7782) 2020-04-23 11:06:23 -04:00
allocwatcher client/allocwatcher: fix dropped test error (#6592) 2019-10-31 08:29:25 -04:00
config command: use consistent CONSUL_HTTP_TOKEN name 2020-02-12 10:42:33 -06:00
consul docs: remove erroneous characters from comment 2020-03-30 13:26:48 -06:00
devicemanager csi: docstring and log message fixups (#7327) 2020-03-23 13:58:30 -04:00
dynamicplugins csi: dynamically update plugin registration (#7386) 2020-03-23 13:59:25 -04:00
fingerprint env_aws: use best-effort lookup table for CPU performance in EC2 2020-04-28 19:01:33 -06:00
interfaces Populate alloc stats API with device stats 2018-11-16 10:26:32 -05:00
lib ar: plumb client config for networking into the network hook 2019-07-31 01:04:06 -04:00
logmon update grpc 2020-03-03 08:39:54 -05:00
pluginmanager csi: checkpoint volume claim garbage collection (#7782) 2020-04-23 11:06:23 -04:00
servers client: drop unused DC field from servers list 2019-05-20 14:19:15 -07:00
state fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
stats Update gopsutil code 2020-03-15 09:37:05 +01:00
structs Harmonize go-msgpack/codec/codecgen 2020-04-28 17:12:31 -04:00
taskenv Add new setUpstreamsLocked function to avoid lock 2020-03-29 20:34:04 +02:00
testutil fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
vaultclient vendor: vault api and sdk 2020-03-21 17:57:48 +01:00
acl.go Audit config, seams for enterprise audit features 2020-03-23 13:47:42 -04:00
acl_test.go Audit config, seams for enterprise audit features 2020-03-23 13:47:42 -04:00
agent_endpoint.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
agent_endpoint_test.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
alloc_endpoint.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
alloc_endpoint_test.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
alloc_watcher_e2e_test.go tests: swap lib/freeport for tweaked helper/freeport 2019-12-09 08:37:32 -06:00
client.go csi: add node events to report progress mounting and unmounting volumes (#7547) 2020-03-31 17:13:52 -04:00
client_stats_endpoint.go Server side impl + touch ups 2018-02-15 13:59:02 -08:00
client_stats_endpoint_test.go tests: swap lib/freeport for tweaked helper/freeport 2019-12-09 08:37:32 -06:00
client_test.go client: enable nomad client to request and set SI tokens for tasks 2020-01-31 19:03:38 -06:00
csi_endpoint.go csi: make volume GC in job deregister safely async 2020-04-06 10:15:55 -04:00
csi_endpoint_test.go CSI: move node unmount to server-driven RPCs (#7596) 2020-04-02 16:04:56 -04:00
driver_manager_test.go tests: fix data race in client TestDriverManager_Fingerprint_Periodic 2019-05-21 09:49:56 -04:00
fingerprint_manager.go goimports until make check is happy 2019-01-23 06:27:14 -08:00
fingerprint_manager_test.go client/drivermananger: add driver manager 2018-12-18 22:55:18 -05:00
fs_endpoint.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
fs_endpoint_test.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
gc.go Plugins use parent loggers 2019-01-11 11:36:37 -08:00
gc_test.go tests: deflake TestAllocGarbageCollector_MakeRoomFor_MaxAllocs 2020-03-30 07:06:53 -04:00
node_updater.go client: use NewNodeEvent builder for consistency (#7559) 2020-03-31 10:02:16 -04:00
rpc.go CSI: move node unmount to server-driven RPCs (#7596) 2020-04-02 16:04:56 -04:00
rpc_test.go Simplify Bootstrap logic in tests 2020-03-02 13:47:43 -05:00
testing.go goimports until make check is happy 2019-01-23 06:27:14 -08:00
util.go client: defensive against getting stale alloc updates 2019-06-29 04:17:35 -05:00
util_test.go Update state with server 2018-10-16 16:53:29 -07:00