All symbols which are canonically defined in cpython/unicodeobject.h
and had been defined in our unicodeobject.rs have been moved to our
corresponding cpython/unicodeobject.rs file.
This module is only present in non-limited build configurations, so
we were able to drop the cfg annotations as part of moving the
definitions.
pyo3 doesn't currently define various Unicode bindings that allow the
retrieval of raw data from Python strings. Said bindings are a
prerequisite to possibly exposing this data in the Rust API (#1776).
Even if those high-level APIs never materialize, the FFI bindings are
necessary to enable consumers of the raw C API to utilize them.
This commit partially defines the FFI bindings as defined in
CPython's Include/cpython/unicodeobject.h file.
I used the latest CPython 3.9 Git commit for defining the order
of the symbols and the implementation of various inline preprocessor
macros. I tried to be as faithful as possible to the original
implementation, preserving intermediate `#define`s as inline functions.
Missing symbols have been annotated with `skipped` and symbols currently
defined in `src/ffi/unicodeobject.rs` have been annotated with `move`.
The `state` field of `PyASCIIObject` is a bitfield, which Rust doesn't
support. So we've provided accessor functions for retrieving these
fields' values. No accessor functions are present because you shouldn't
be touching these values from Rust code.
Tests of the bitfield APIs and macro implementations have been added.
The setter function will receive a NULL value on deletion requests.
This wasn't properly handled before, leading to a panic.
The new code raises AttributeError in this scenario instead.
A test for the behavior has been added. Documentation has also
been updated to reflect the behavior.
The field wasn't defined previously. And the enum wasn't defined as
`[repr(C)]`.
This missing field could result in memory corruption if a Rust-allocated
`PyStatus` was passed to a Python API, which could perform an
out-of-bounds write. In my code, the out-of-bounds write corrupted a
variable on the stack, leading to a segfault due to illegal memory
access. However, this crash only occurred on Rust 1.54! So I initially
mis-attribted it as a compiler bug / regression. It appears that a
low-level Rust change in 1.54.0 changed the LLVM IR in such a way to
cause LLVM optimization passes to produce sufficiently different
assembly code, tickling the crash. See
https://github.com/rust-lang/rust/issues/87947 if you want to see
the wild goose chase I went on in Rust / LLVM land to potentially
pin this on a compiler bug.
Lessen learned: Rust crashes are almost certainly due to use of
`unsafe`.
GILGuard::acquire() cannot be called during multi-phase Python
interpreter initialization because it calls Py_IsInitialized(),
which doesn't report the interpreter as initialized until all
phases of initialization have completed.
PyOxidizer uses the multi-phase initialization API and needs to
interact with pyo3's high-level APIs (not the FFI bindings) after
partial interpreter initialization, before the interpreter is fully
initialized. Attempts to use GILGuard::acquire() result in a panic
due to the aforementioned Py_IsInitialized() check failing.
This commit refactors the GILGuard logic into a function that
obtains the actual GILGuard and another function to perform
checks before calling the aforementioned functions.
A new unsafe `Python::with_gil_unchecked()` has been defined
to acquire the GIL via the unchecked code path so we may obtain
a `Python` during multi-phase initialization (and possibly other
scenarios).