Skip to content

Add set object functions to c-api#8002

Merged
youknowone merged 1 commit into
RustPython:mainfrom
bschoenmaeckers:c-api-sets
Jun 1, 2026
Merged

Add set object functions to c-api#8002
youknowone merged 1 commit into
RustPython:mainfrom
bschoenmaeckers:c-api-sets

Conversation

@bschoenmaeckers

@bschoenmaeckers bschoenmaeckers commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

I've used the skill from #8001 to implement the missing set c-api functions using an AI agent.

Summary by CodeRabbit

  • New Features

    • Extended set and frozenset support with operations for creation, element management (add, discard, clear), containment checks, and size queries.
  • Chores

    • Added itertools as a workspace dependency.

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds CPython-style C-API bindings for Python set and frozenset types to RustPython. It exposes a new setobject module with eight unsafe C-ABI functions (constructors, mutators, queries, and size retrieval), backed by visibility changes to five internal VM methods.

Changes

Set and FrozenSet C-API Support

Layer / File(s) Summary
C-API Module Setup
crates/capi/Cargo.toml, crates/capi/src/lib.rs
Added itertools workspace dependency and exposed new setobject module in the crate public surface.
VM Method Visibility Exposure
crates/vm/src/builtins/set.rs
Changed Rust visibility from private fn to pub fn for PySet::__contains__, discard, clear, pop and PyFrozenSet::__contains__ to enable C-API layer access.
Set/FrozenSet C-API Implementation
crates/capi/src/setobject.rs
Implemented eight C-ABI functions (PySet_New, PyFrozenSet_New, PySet_Add, PySet_Clear, PySet_Contains, PySet_Discard, PySet_Pop, PySet_Size) with type-check macros and disabled pyo3-based sanity tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • RustPython/RustPython#7871: Introduced the define_py_check! macro in capi/src/object.rs that is used to generate PySet_Check and PyFrozenSet_Check type-check helpers in this PR.

Suggested reviewers

  • youknowone
  • ShaharNaveh

Poem

🐰 Sets wrapped in C, oh what a sight,
With frozen friends and checks so tight,
From RustPython's heart they now take flight,
Through C-API gates, they shine so bright!
A rabbit's work, forever tight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add set object functions to c-api' directly and clearly describes the main change: adding C-API functions for set objects, which is evidenced by the new setobject.rs file with multiple PySet/PyFrozenSet C-ABI functions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/capi/src/setobject.rs`:
- Around line 45-50: PySet_Add currently downcasts the `set` argument only to
`PySet`, rejecting valid `PyFrozenSet` inputs; update `PySet_Add` (in
`crates/capi/src/setobject.rs`) to accept either a `PySet` or a `PyFrozenSet`
(and subtypes) by detecting the concrete type after `with_vm` unwrap and calling
the appropriate mutator: call `PySet::add` for `PySet` and invoke the VM-side
frozenset-initialization/mutation path for `PyFrozenSet` (the same internal
routine used to populate brand-new frozensets) so that `set.add(key, vm)`
semantics are preserved; ensure you clear/reset any cached hash on the
`PyFrozenSet` after mutation so subsequent hash lookups are correct and preserve
existing `with_vm` error propagation.
- Around line 82-90: Replace the current two-step membership + removal in
PySet_Discard with a single discard call that returns whether an item was
removed: add a bool-returning helper method on PySet (e.g.,
PySet::discard_returning_bool) that wraps the existing
PySetInner::discard(...)->PyResult<bool>, then update PySet_Discard to call that
helper once and return its boolean result as c_int; keep using the existing
try_downcast_ref::<PySet> and the same key handling but remove the prior
__contains__ probe so lookup+removal happen in one operation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 20296950-c1ea-46f7-a7d7-1ad6bdea14cd

📥 Commits

Reviewing files that changed from the base of the PR and between 1ce37bf and fdfda26.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • crates/capi/Cargo.toml
  • crates/capi/src/lib.rs
  • crates/capi/src/setobject.rs
  • crates/vm/src/builtins/set.rs

Comment thread crates/capi/src/setobject.rs
Comment on lines +82 to +90
pub unsafe extern "C" fn PySet_Discard(set: *mut PyObject, key: *mut PyObject) -> c_int {
with_vm(|vm| {
let set = unsafe { &*set }.try_downcast_ref::<PySet>(vm)?;
let key = unsafe { &*key };
let had_item = set.__contains__(key, vm)?;
if had_item {
set.discard(key.to_owned(), vm)?;
}
Ok(had_item)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

CPython C API documentation for PySet_Discard: what are its return-value semantics, and is it defined as a single discard operation rather than a separate contains check followed by discard?

💡 Result:

PySet_Discard returns 1 if the element was found and removed, 0 if the element was not found (no action taken), and -1 if an error is encountered [1][2]. It does not raise a KeyError if the element is missing [1][2]. Regarding your second question, PySet_Discard is implemented as a single, atomic-like lookup-and-discard operation rather than separate "contains" and "discard" checks [3][4][5]. The underlying C implementation uses a function (typically named set_discard_entry in the CPython source) that performs a hash table lookup using the provided key [3][5]. If the key is found, it performs the removal in the same pass by updating the entry (replacing the key with a dummy tombstone, updating the set's size, and decrementing the reference count) [3][4][5]. This single-pass approach is more efficient and avoids the redundancy of performing two distinct lookups [5].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Show the exact current implementation around the referenced lines
sed -n '60,120p' crates/capi/src/setobject.rs | nl -ba

# 2) Find the definition and any related helpers
rg -n "PySet_Discard|__contains__|fn discard|discard\(" crates/capi/src/setobject.rs crates/capi/src -S || true

# 3) Search for a bool-returning discard helper in the broader crate
rg -n "discard.*->.*bool|-> *Result<.*bool|discard.*bool|contains.*discard" crates/capi/src -S || true

# 4) Run clippy as requested by guidelines (may be slow; best-effort)
cargo clippy -q || true

Repository: RustPython/RustPython

Length of output: 108


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== setobject.rs around PySet_Discard =="
awk 'NR>=60 && NR<=120 {printf "%5d %s\n", NR, $0}' crates/capi/src/setobject.rs

echo
echo "== occurrences in crates/capi/src =="
rg -n "PySet_Discard|__contains__|\\bPySetInner\\b|\\bPySet\\b|fn +discard\\b|discard\\(" crates/capi/src -S

echo
echo "== look for discard helpers returning bool (best-effort) =="
rg -n "discard.*->.*bool|Result<.*bool>|fn +discard\\b" crates/capi/src -S

echo
echo "== best-effort cargo clippy (per guidelines) =="
cargo clippy -q || echo "cargo clippy failed (toolchain/build error in sandbox)"

Repository: RustPython/RustPython

Length of output: 4212


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find PySet impl block / method definitions
rg -n "impl +PySet|struct +PySet|pub +fn +discard\\b|fn +discard\\b|__contains__\\b" crates -S | head -n 200

echo "----"
# Narrow to files likely defining PySet methods: search for "__contains__" method on PySet
rg -n "PySet.*__contains__|__contains__\\(.*vm\\)" crates -S | head -n 200

echo "----"
# Search for discard method returning Result/Option/boolean
rg -n "PySet.*discard\\b|fn +discard\\b.*->|discard\\b.*->" crates -S | head -n 200

echo "----"
echo "== best-effort cargo clippy =="
cargo clippy -q

Repository: RustPython/RustPython

Length of output: 11361


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find PySet impl block / method definitions
rg -n "impl +PySet|struct +PySet|pub +fn +discard\\b|fn +discard\\b|__contains__\\b" crates -S | head -n 200

echo "----"
# Narrow to files likely defining PySet methods: search for "__contains__" method on PySet
rg -n "PySet.*__contains__|__contains__\\(.*vm\\)" crates -S | head -n 200

echo "----"
# Search for discard method returning Result/Option/boolean
rg -n "PySet.*discard\\b|fn +discard\\b.*->|discard\\b.*->" crates -S | head -n 200

echo "----"
echo "== best-effort cargo clippy =="
cargo clippy -q

Repository: RustPython/RustPython

Length of output: 11633


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== locate PySet methods / impls =="
rg -n "impl +PySet|struct +PySet|__contains__\\b|\\bdiscard\\b" crates -S | head -n 200

echo
echo "== locate __contains__ specifically =="
rg -n "PySet.*__contains__|__contains__\\(.*vm\\)" crates -S | head -n 200

echo
echo "== locate discard signatures =="
rg -n "PySet.*discard\\b|fn +discard\\b.*->|\\bdiscard\\b.*->" crates -S | head -n 200

echo
echo "== cargo clippy (best-effort) =="
cargo clippy -q || echo "cargo clippy failed (toolchain/build error in sandbox)"

Repository: RustPython/RustPython

Length of output: 13841


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Print relevant sections of PySet implementation in RustPython
echo "== crates/vm/src/builtins/set.rs: PySetInner::discard (around 320-360) =="
sed -n '300,380p' crates/vm/src/builtins/set.rs | nl -ba || true

echo
echo "== crates/vm/src/builtins/set.rs: PySet::discard (around 640-710) =="
sed -n '640,720p' crates/vm/src/builtins/set.rs | nl -ba || true

echo
echo "== crates/vm/src/builtins/set.rs: PySet::__contains__ (search + print nearby) =="
rg -n "pub fn __contains__\\b|fn __contains__\\b" crates/vm/src/builtins/set.rs
# Print a small window around each match (best-effort)
for ln in $(rg -n "pub fn __contains__\\b|fn __contains__\\b" crates/vm/src/builtins/set.rs | cut -d: -f1 | head -n 20); do
  echo
  echo "--- around line $ln ---"
  sed -n "$((ln-30)),$((ln+30))p" crates/vm/src/builtins/set.rs | head -n 120
done

echo
echo "== search for any public discard returning bool =="
rg -n "pub +fn +discard\\b.*bool|pub\\(crate\\)? +fn +discard\\b.*bool" crates/vm/src/builtins/set.rs

Repository: RustPython/RustPython

Length of output: 3946


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="crates/vm/src/builtins/set.rs"

echo "== PySetInner::discard around ~320-370 =="
sed -n '300,380p' "$FILE"

echo
echo "== PySet::discard around ~650-710 =="
sed -n '650,720p' "$FILE"

echo
echo "== PySetInner::contains around wherever it is (search) =="
rg -n "impl PySetInner|fn contains\\b|PySetInner.*contains\\b" "$FILE" | head -n 80

echo
echo "== show body around first PySetInner::contains match (best-effort) =="
ln="$(rg -n "fn contains\\b" "$FILE" | head -n 1 | cut -d: -f1)"
if [ -n "$ln" ]; then
  start=$((ln-40))
  end=$((ln+60))
  sed -n "${start},${end}p" "$FILE"
fi

Repository: RustPython/RustPython

Length of output: 7814


Avoid the extra membership probe in PySet_Discard.

Current PySet_Discard does __contains__ and then (conditionally) calls discard, repeating the set lookup and any __hash__/__eq__ side effects; CPython’s PySet_Discard performs lookup+removal in one operation and returns 1/0 accordingly. Add a bool-returning discard helper on PySet (wrapping the existing PySetInner::discard(...)->PyResult<bool>) and call it once here.

pub unsafe extern "C" fn PySet_Discard(set: *mut PyObject, key: *mut PyObject) -> c_int {
    with_vm(|vm| {
        let set = unsafe { &*set }.try_downcast_ref::<PySet>(vm)?;
        let key = unsafe { &*key };
        let had_item = set.__contains__(key, vm)?;
        if had_item {
            set.discard(key.to_owned(), vm)?;
        }
        Ok(had_item)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/capi/src/setobject.rs` around lines 82 - 90, Replace the current
two-step membership + removal in PySet_Discard with a single discard call that
returns whether an item was removed: add a bool-returning helper method on PySet
(e.g., PySet::discard_returning_bool) that wraps the existing
PySetInner::discard(...)->PyResult<bool>, then update PySet_Discard to call that
helper once and return its boolean result as c_int; keep using the existing
try_downcast_ref::<PySet> and the same key handling but remove the prior
__contains__ probe so lookup+removal happen in one operation.

@youknowone youknowone merged commit 885cf5c into RustPython:main Jun 1, 2026
26 checks passed
@bschoenmaeckers bschoenmaeckers deleted the c-api-sets branch June 1, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants