website: add docs on scheduling

2015-09-19 12:02:48 -07:00 · 2015-09-19 12:02:48 -07:00 · 5d091a0536
parent 1a18a57368
commit 5d091a0536
4 changed files with 104 additions and 0 deletions
--- a/website/source/assets/images/eval-flow.png
+++ b/website/source/assets/images/eval-flow.png
--- a/website/source/assets/images/nomad-nouns.png
+++ b/website/source/assets/images/nomad-nouns.png
--- a/website/source/docs/internals/scheduling.html.md
+++ b/website/source/docs/internals/scheduling.html.md
@ -0,0 +1,94 @@
+---
+layout: "docs"
+page_title: "Scheduling"
+sidebar_current: "docs-internals-scheduling"
+description: |-
+  Learn about how schedulig works in Nomad.
+---
+
+# Scheduling
+
+Scheduling is a core function of Nomad. It is the process of assigning tasks
+from jobs to client machines. This process must respect the constraints as declared
+in the job, and optimize for resource utilization by bin packing. This page documents
+the details of how scheduling works in Nomad to help both users and developers
+build a mental model of how it works. The design is heavily inspired by Google's
+work on [Omega: flexible, scalable schedulers for large compute clusters](http://research.google.com/pubs/pub41684.html)
+
+~> **Advanced Topic!** This page covers technical details
+of Nomad. You don't need to understand these details to
+effectively use Nomad. The details are documented here for
+those who wish to learn about them without having to go
+spelunking through the source code.
+
+# Scheduling in Nomad
+
+![Data Model](/assets/images/nomad-nouns.png)
+
+There are four primary "nouns" in Nomad, these are jobs, nodes, allocations, and evaluations.
+Jobs are submitted by users and represent a _desired state_. A job is a declarative description
+of tasks to run which are bounded by constraints and require resources. Nodes are the servers
+in the clusters that tasks can be scheduled on. The mapping of tasks in a job to nodes is done
+using allocations. An allocation is used to declare that a set of tasks in a job should be run
+on a particular node. Scheduling is the process of determining the appropriate allocations and
+is done as part of an evaluation.
+
+An evaluation is created any time the external state, either desired or emergent, changes. The desired
+state is based on jobs, meaning the desired state changes if a new job is submitted, an
+existing job is updated, or a job is deregistered. The emergent state is based on the client
+nodes, and so we must handle the failure of any clients in the system. These events trigger
+the creation of a new evaluation, as Nomad must _evaluate_ the state of the world and reconcile
+it with the desired state.
+
+This diagram shows the flow of an evaluation through Nomad:
+
+![Evaluation Flow](/assets/images/eval-flow.png)
+
+The lifecycle of an evaluation beings with an event causing the evaluation to be
+created. Evaluations are created in the `pending` state and are enqueued into the
+evaluation broker. There is a single evaluation broker which runs on the leader server.
+The evaluation broker is used to manage the queue of pending evaluations, provide priority ordering,
+and ensure at least once delivery.
+
+Nomad servers run scheduling workers, defaulting to one per CPU core, which are used to
+process evaluations. The workers dequeue evaluations from the broker, and then invoke
+the appropriate schedule as specified by the job. Nomad ships with a `service` scheduler
+that optimizes for long-lived services, a `batch` scheduler that is used for fast placement
+of batch jobs, and a `core` scheduler which is used for internal maintenance. Nomad can
+be extended to support custom schedulers as well.
+
+Schedulers are responsible for processing an evaluation and generating an allocation _plan_.
+The plan is the set of allocations to evict, update, or create. The specific logic used to
+generate a plan may vary by scheduler, but generally the scheduler needs to first reconcile
+the desired state with the real state to determine what must be done. New allocations need
+to be placed and existing allocations may need to be updated, migrated, or stopped.
+
+Placing allocations is split into two distinct phases, feasibility
+checking and ranking. In the first phase the scheduler finds nodes that are
+feasible by filtering unhealthy nodes, those missing necessary drivers, and those
+failing the specified constraints.
+
+The second phase is ranking, where the scheduler scores feasible nodes to find the best fit.
+Scoring is primarily based on bin packing, which is used to optimize the resource utilization
+and density of applications, but is also augmented by affinity and anti-affinity rules.
+Once the scheduler has ranked enough nodes, the highest ranking node is selected and
+added to the allocation plan.
+
+When planning is complete, the scheduler submits the plan to the leader and
+gets added to the plan queue. The plan queue manages pending plans, provides priority
+ordering, and allows Nomad to handle concurrency races. Multiple schedulers are running
+in parallel without locking or reservations, making Nomad optimistically concurrent.
+As a result, schedulers might overlap work on the same node and cause resource
+over-subscription. The plan queue allows the leader node to protect against this and
+do partial or complete rejections of a plan.
+
+As the leader processes plans, it creates allocations when there is no conflict
+and otherwise informs the scheduler of a failure in the plan result. The plan result
+provides feedback to the scheduler, allowing it to terminate or explore alternate plans
+if the previous plan was partially or completely rejected.
+
+Once the scheduler has finished processing an evaluation, it updates the status of
+the evaluation and acknowledges delivery with the evaluation broker. This completes
+the lifecycle of an evaluation. Allocations that were created, modified or deleted
+as a result will be picked up by client nodes and will begin execution.
+
--- a/website/source/layouts/docs.erb
+++ b/website/source/layouts/docs.erb
@ -19,6 +19,10 @@

                        <li<%= sidebar_current("docs-internals-gossip") %>>
                        <a href="/docs/internals/gossip.html">Gossip Protocol</a>
+                        </li>
+
+                        <li<%= sidebar_current("docs-internals-scheduling") %>>
+                        <a href="/docs/internals/scheduling.html">Scheduling</a>
                        </li>

 						<li<%= sidebar_current("docs-internals-telemetry") %>>