* Improve API docs regarding when we free memory, resolves#311
* Add chapter to guide about when we free memory, resolves#311
* Fix typos in documentation
Co-authored-by: David Hewitt <1939362+davidhewitt@users.noreply.github.com>
* Add links from guide to docs.rs
* Update guide/src/memory.md
Co-authored-by: David Hewitt <1939362+davidhewitt@users.noreply.github.com>
With the recent implementation of non-limited unicode APIs, we're
able to query Python's low-level state to access the raw bytes that
Python is using to store string objects.
This commit implements a safe Rust API for obtaining a view into
Python's internals and representing the raw bytes Python is using
to store strings.
Not only do we allow accessing what Python has stored internally,
but we also support coercing this data to a `Cow<str>`.
Closes#1776.
All symbols which are canonically defined in cpython/unicodeobject.h
and had been defined in our unicodeobject.rs have been moved to our
corresponding cpython/unicodeobject.rs file.
This module is only present in non-limited build configurations, so
we were able to drop the cfg annotations as part of moving the
definitions.
pyo3 doesn't currently define various Unicode bindings that allow the
retrieval of raw data from Python strings. Said bindings are a
prerequisite to possibly exposing this data in the Rust API (#1776).
Even if those high-level APIs never materialize, the FFI bindings are
necessary to enable consumers of the raw C API to utilize them.
This commit partially defines the FFI bindings as defined in
CPython's Include/cpython/unicodeobject.h file.
I used the latest CPython 3.9 Git commit for defining the order
of the symbols and the implementation of various inline preprocessor
macros. I tried to be as faithful as possible to the original
implementation, preserving intermediate `#define`s as inline functions.
Missing symbols have been annotated with `skipped` and symbols currently
defined in `src/ffi/unicodeobject.rs` have been annotated with `move`.
The `state` field of `PyASCIIObject` is a bitfield, which Rust doesn't
support. So we've provided accessor functions for retrieving these
fields' values. No accessor functions are present because you shouldn't
be touching these values from Rust code.
Tests of the bitfield APIs and macro implementations have been added.
The field wasn't defined previously. And the enum wasn't defined as
`[repr(C)]`.
This missing field could result in memory corruption if a Rust-allocated
`PyStatus` was passed to a Python API, which could perform an
out-of-bounds write. In my code, the out-of-bounds write corrupted a
variable on the stack, leading to a segfault due to illegal memory
access. However, this crash only occurred on Rust 1.54! So I initially
mis-attribted it as a compiler bug / regression. It appears that a
low-level Rust change in 1.54.0 changed the LLVM IR in such a way to
cause LLVM optimization passes to produce sufficiently different
assembly code, tickling the crash. See
https://github.com/rust-lang/rust/issues/87947 if you want to see
the wild goose chase I went on in Rust / LLVM land to potentially
pin this on a compiler bug.
Lessen learned: Rust crashes are almost certainly due to use of
`unsafe`.
GILGuard::acquire() cannot be called during multi-phase Python
interpreter initialization because it calls Py_IsInitialized(),
which doesn't report the interpreter as initialized until all
phases of initialization have completed.
PyOxidizer uses the multi-phase initialization API and needs to
interact with pyo3's high-level APIs (not the FFI bindings) after
partial interpreter initialization, before the interpreter is fully
initialized. Attempts to use GILGuard::acquire() result in a panic
due to the aforementioned Py_IsInitialized() check failing.
This commit refactors the GILGuard logic into a function that
obtains the actual GILGuard and another function to perform
checks before calling the aforementioned functions.
A new unsafe `Python::with_gil_unchecked()` has been defined
to acquire the GIL via the unchecked code path so we may obtain
a `Python` during multi-phase initialization (and possibly other
scenarios).