Commit Graph

62 Commits

Author SHA1 Message Date
hc-github-team-nomad-core e5fb6fe687
backport of commit 615e76ef3c23497f768ebd175f0c624d32aeece8 (#17993)
This pull request was automerged via backport-assistant
2023-07-19 13:31:14 -05:00
hashicorp-copywrite[bot] 005636afa0 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Hunter Morris dcaf99dcc1
client: Add AWS EC2 instance-life-cycle from metadata to client fingerprint (#12371) 2022-03-25 11:50:52 -04:00
Yoan Blanc ac0d5d8bd3
chore: bump golangci-lint from v1.24 to v1.39
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2021-04-03 09:50:23 +02:00
Joel May 13faf0d79e Allow client.cpu_total_compute to override attr.cpu.totalcompute 2021-01-07 15:31:11 -05:00
Seth Hoenig e693d15a5b env_aws: get ec2 cpu perf data from AWS API
Previously, Nomad was using a hand-made lookup table for looking
up EC2 CPU performance characteristics (core count + speed = ticks).

This data was incomplete and incorrect depending on region. The AWS
API has the correct data but requires API keys to use (i.e. should not
be queried directly from Nomad).

This change introduces a lookup table generated by a small command line
tool in Nomad's tools module which uses the Amazon AWS API.

Running the tool requires AWS_* environment variables set.
  $ # in nomad/tools/cpuinfo
  $ go run .

Going forward, Nomad can incorporate regeneration of the lookup table
somewhere in the CI pipeline so that we remain up-to-date on the latest
offerings from EC2.

Fixes #7830
2020-10-08 12:01:09 -05:00
Joel May 2adc5bdec7
fingerprinting: add AWS MAC and public-ipv6 (#8887) 2020-09-17 09:03:01 -04:00
Nick Ethier 4a44deaa5c CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Seth Hoenig 880c4e23d3 env_aws: combine 3 log lines into 1 2020-04-29 10:47:36 -06:00
Seth Hoenig 67303b666c
env_aws: downgrade log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:34:26 -06:00
Seth Hoenig 5ddc607701
env_aws: fixup log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:33:53 -06:00
Seth Hoenig f8596a3602 env_aws: use best-effort lookup table for CPU performance in EC2
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
2020-04-28 19:01:33 -06:00
Mahmood Ali 7985b1893f fixup! tests: Add tests for EC2 Metadata immitation cases 2020-03-26 11:37:54 -04:00
Mahmood Ali 1d50379bc6 fingerprint: handle incomplete AWS immitation APIs
Fix a regression where we accidentally started treating non-AWS
environments as AWS environments, resulting in bad networking settings.

Two factors some at play:

First, in [1], we accidentally switched the ultimate AWS test from
checking `ami-id` to `instance-id`.  This means that nomad started
treating more environments as AWS; e.g. Hetzner implements `instance-id`
but not `ami-id`.

Second, some of these environments return empty values instead of
errors!  Hetzner returns empty 200 response for `local-ipv4`, resulting
into bad networking configuration.

This change fix the situation by restoring the check to `ami-id` and
ensuring that we only set network configuration when the ip address is
not-empty.  Also, be more defensive around response whitespace input.

[1] https://github.com/hashicorp/nomad/pull/6779
2020-03-26 11:23:15 -04:00
Danielle b006be623d
Update client/fingerprint/env_aws.go
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-12-16 14:48:52 +01:00
Danielle Lancashire 5a87b3ab4b
env_aws: Disable Retries and set Session cfg
Previously, Nomad used hand rolled HTTP requests to interact with the
EC2 metadata API. Recently however, we switched to using the AWS SDK for
this fingerprinting.

The default behaviour of the AWS SDK is to perform retries with
exponential backoff when a request fails. This is problematic for Nomad,
because interacting with the EC2 API is in our client start path.

Here we revert to our pre-existing behaviour of not performing retries
in the fast path, as if the metadata service is unavailable, it's likely
that nomad is not running in AWS.
2019-12-16 10:56:32 +01:00
Mahmood Ali 293276a457 fingerprint code refactor
Some code cleanup:

* Use a field for setting EC2 metadata instead of env-vars in testing;
but keep environment variables for backward compatibility reasons

* Update tests to use testify
2019-11-26 10:51:28 -05:00
Mahmood Ali 1e48f8e20d fingerprint: avoid api query if config overrides it 2019-11-26 10:51:28 -05:00
Mahmood Ali 5bb9089431 fingerprint: use ec2metadata package 2019-11-26 10:51:27 -05:00
Yorick Gersie 95f81f3eeb fix nil pointer in fingerprinting AWS env leading to crash
HTTP Client returns a nil response if an error has occured. We first
  need to check for an error before being able to check the HTTP response
  code.
2019-04-19 11:07:13 +02:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Danielle Tomlinson 66c521ca17 client: Move fingerprint structs to pkg
This removes a cyclical dependency when importing client/structs from
dependencies of the plugin_loader, specifically, drivers. Due to
client/config also depending on the plugin_loader.

It also better reflects the ownership of fingerprint structs, as they
are fairly internal to the fingerprint manager.
2018-12-01 17:10:39 +01:00
Alex Dadgar 8504505c0d client uses passed logger and fix fingerprinters 2018-10-16 16:53:30 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Josh Soref 6222bd564e spelling: verify 2018-03-11 19:13:32 +00:00
Josh Soref b67449796a spelling: added 2018-03-11 17:34:28 +00:00
Chelsea Holland Komlo b8e8064835 code review fixup 2018-01-31 18:34:03 -05:00
Chelsea Holland Komlo 7b53474a6e add applicable boolean to fingerprint response
public fields and remove getter functions
2018-01-31 13:21:45 -05:00
Chelsea Holland Komlo 7c19de797c create safe getters and setters for fingerprint response 2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo 9a8344333b refactor Fingerprint to request/response construct 2018-01-24 11:54:02 -05:00
Charlie Voiselle 969ddf9c2a Lowered to DEBUG from AD feedback 2017-11-16 14:13:03 -05:00
Charlie Voiselle 1197637251 Dropped loglevel for AWS fingerprinter env reads
Certain environments use WARN for serious logging; however, it's very
possible to have machines without some of the fingerprinted keys
(public-ipv4 and public-hostname specifcally).  Setting log level to
INFO seems more consistent with this possibility.
2017-11-15 18:20:59 +00:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Charlie Voiselle ae466eaaa7 AMI ID is potentally non-unique
Changed the keys map to reflect that.
2017-08-09 12:53:54 -04:00
Alex Dadgar 56f9cf86df Speed up client startup 2017-07-20 22:34:24 -07:00
Alex Dadgar 9497991590 Updated AWS speeds and network_speed now overrides
This PR:

* Makes AWS network speeds more granular
* Makes `network_speed` an override and not a default
* Adds a default of 1000 MBits if no network link speed is detected.

Fixes #1985
2016-11-15 13:55:51 -08:00
Alex Dadgar 92f526d902 Run environmental fingerprinters after host fingerprinters and do an override 2016-11-07 12:21:50 -08:00
Sean Chittenden ec77a1869e
Test for errors 2016-06-16 14:43:46 -07:00
Alex Dadgar a85800188c Respond to comments 2016-01-26 14:55:38 -08:00
Alex Dadgar d5c77cd4a4 Update client fingerprinters 2016-01-26 10:08:01 -08:00
Abhishek Chanda cd51ee6430 Handle non 200 codes while getting env metadata 2015-12-22 05:23:32 +00:00
Alex Dadgar b943c6e278 Remove all calls to the default logger 2015-12-11 15:02:13 -08:00
Alex Dadgar b2daa5e2e6 Standardize log messages in fingerprinters to DEBUG 2015-11-24 11:06:51 -08:00
Guillaume Jacquet 4a3e709eef Fix AWS metadata url
Fix URL. It was printing an error message on startup:
```
2015/11/13 15:49:21 [ERR] fingerprint.env_aws: Error querying AWS Metadata URL, skipping
```

By the way is it safe to use latest? Is there a chance that Amazon decides to change the format of the metadata? It could be safer to use something like `http://169.254.169.254/2014-11-05/meta-data`
2015-11-13 11:03:05 -05:00
Alex Dadgar f9fd83c696 Merge fix 2015-11-05 13:46:02 -08:00
Kenjiro Nakayama 21f537339e Use const value for AWS metadata URL 2015-11-04 00:06:14 +09:00
Jeff Mitchell 959c175ca1 Update the location of cleanhttp 2015-10-22 14:21:07 -04:00
Jeff Mitchell cea5fd9081 Use cleanhttp for truly clean clients and transports. 2015-10-22 10:58:23 -04:00
Daniel Imfeld 9730df8411 Fix old comments and other syntax cleanup 2015-10-12 16:56:33 -05:00
Daniel Imfeld 46bbfc3549 isAWS should return false on GCE
GCE and AWS both expose metadata servers, and GCE's 404 response
includes the URL in the content, which maatches the regex. So,
check the response code as well and if a 4xx code comes back,
take that to meanit's not AWS.
2015-10-05 00:42:34 -05:00