pyo3/Architecture.md

7.3 KiB

PyO3: Architecture.md

This document roughly describes the high-level architecture of PyO3. If you want to become familiar with the codebase, you are in the right place!

Overview

PyO3 provides a bridge between Rust and Python, based on the [Python C/API]. Thus, PyO3 has low-level bindings of these API as its core. On top of that, we have higher-level bindings to operate Python objects safely. Also, to define Python classes and functions in Rust code, we have trait PyClass<T> and a set of protocol traits (e.g., PyIterProtocol) for supporting object protocols (i.e., __dunder__ methods). Since implementing PyClass requires lots of boilerplates, we have a proc-macro #[pyclass].

To summarize, we have mainly four parts in the PyO3 codebase.

  1. Low-level bindings of Python C/API.
  1. Bindings to Python objects.
  1. PyClass<T> and related functionalities
  1. Protocol methods like __getitem__.
  1. Defining a Python class requires lots of glue code, so we provide proc-macros to simplify the procedure.

Low-level bindings of CPython API

src/ffi contains wrappers of Python C/API.

We aim to provide straight-forward Rust wrappers resembling the file structure of cpython/Include.

However, we still lack some API and continue to refactor the module to completely resemble the CPython's file structure. The tracking issue is #1289, and contribution is welcome.

Bindings to Python Objects

src/types contains bindings to built-in types of Python, such as dict and list. Due to historical reasons, Python's object is called PyAny and placed in src/types/any.rs. Currently, PyAny is a straight-forward wrapper of ffi::PyObject, like:

#[repr(transparent)]
pub struct PyAny(UnsafeCell<ffi::PyObject>);

All built-in types are defined as a C struct. For example, dict is defined as:

typedef struct {
    /* Base object */
    PyObject ob_base;
    /* Number of items in the dictionary */
    Py_ssize_t ma_used;
    /* Dictionary version */
    uint64_t ma_version_tag;
    PyDictKeysObject *ma_keys;
    PyObject **ma_values;
} PyDictObject;

However, we cannot access such a specific data structure with #[cfg(Py_LIMITED_API)] set. Thus, all builtin objects are implemented as opaque types by wrapping PyAny, like:

#[repr(transparent)]
pub struct PyDict(PyAny);

Note that PyAny is not a pointer, and it is usually used as a pointer to the object in the Python heap, as &PyAny. This design choice can be changed (see the discussion in #1056).

Since we need lots of boilerplate for implementing common traits for these types (e.g., AsPyPointer, AsRef<PyAny>, and Debug), we have some macros in src/types/mod.rs.

PyClass

src/pycell.rs, src/pyclass.rs, and src/type_object.rs contains types and traits to make #[pyclass] work. Also, src/pyclass_init.rs and [src/pyclass_slots.rs] have related functionalities.

To realize object-oriented programming in C, all Python objects must have the following two fields at the beginning.

#[repr(C)]
pub struct PyObject {
    pub ob_refcnt: usize,
    pub ob_type: *mut PyTypeObject,
    ...
}

Thanks to this guarantee, casting *mut A to *mut PyObject is valid if A is a Python object.

To ensure this guarantee, we have a wrapper struct PyCell<T> in src/pycell.rs which is roughly:

#[repr(C)]
pub struct PyCell<T: PyClass> {
    object: crate::ffi::PyObject,
    inner: T,
}

Thus, when copying a Rust struct to a Python object, we first allocate PyCell on the Python heap and then copy T. Also, PyCell provides RefCell-like methods to ensure Rust's borrow rules. See the document for more.

PyCell<T> requires that T implements PyClass. This trait is somewhat complex and derives many traits, but the most important one is PyTypeObject in src/type_object.rs. PyTypeObject is also implemented for built-in types. Type objects are singletons, and all Python types have their unique type objects. For example, you can see type({}) shows dict and type(type({})) shows type in Python REPL. T: PyTypeObject implies that T has a corresponding type object.

Protocol methods

Python has some built-in special methods called dunder, such as __iter__. They are called abstract objects layer in Python C/API. We provide a way to implement those protocols by using #[pyproto] and specific traits, such as PyIterProtocol. src/class defines these traits. Each protocol method has a corresponding FFI function. For example, PyIterProtocol::__iter__ has pub unsafe extern "C" fn iter<T>(slf: *mut PyObject) -> *mut PyObject. When #[pyproto] finds that T implements PyIterProtocol::__iter__, it automatically sets iter<T> on the type object of T.

Also, src/class/methods.rs has utilities for #[pyfunction] and src/class/impl_.rs has some internal tricks for making #[pyproto] flexible.

Proc-macros

pyo3-macros provides six proc-macro APIs: pymodule, pyproto, pyfunction, pyclass, pymethods, and #[derive(FromPyObject)]. pyo3-macros-backend has the actual implementations of these APIs. src/derive_utils.rs contains some utilities used in codes generated by these proc-macros, such as parsing function arguments.

Python C/API.