Add set object functions to c-api#8002
Conversation
📝 WalkthroughWalkthroughThis PR adds CPython-style C-API bindings for Python set and frozenset types to RustPython. It exposes a new ChangesSet and FrozenSet C-API Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/capi/src/setobject.rs`:
- Around line 45-50: PySet_Add currently downcasts the `set` argument only to
`PySet`, rejecting valid `PyFrozenSet` inputs; update `PySet_Add` (in
`crates/capi/src/setobject.rs`) to accept either a `PySet` or a `PyFrozenSet`
(and subtypes) by detecting the concrete type after `with_vm` unwrap and calling
the appropriate mutator: call `PySet::add` for `PySet` and invoke the VM-side
frozenset-initialization/mutation path for `PyFrozenSet` (the same internal
routine used to populate brand-new frozensets) so that `set.add(key, vm)`
semantics are preserved; ensure you clear/reset any cached hash on the
`PyFrozenSet` after mutation so subsequent hash lookups are correct and preserve
existing `with_vm` error propagation.
- Around line 82-90: Replace the current two-step membership + removal in
PySet_Discard with a single discard call that returns whether an item was
removed: add a bool-returning helper method on PySet (e.g.,
PySet::discard_returning_bool) that wraps the existing
PySetInner::discard(...)->PyResult<bool>, then update PySet_Discard to call that
helper once and return its boolean result as c_int; keep using the existing
try_downcast_ref::<PySet> and the same key handling but remove the prior
__contains__ probe so lookup+removal happen in one operation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 20296950-c1ea-46f7-a7d7-1ad6bdea14cd
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
crates/capi/Cargo.tomlcrates/capi/src/lib.rscrates/capi/src/setobject.rscrates/vm/src/builtins/set.rs
| pub unsafe extern "C" fn PySet_Discard(set: *mut PyObject, key: *mut PyObject) -> c_int { | ||
| with_vm(|vm| { | ||
| let set = unsafe { &*set }.try_downcast_ref::<PySet>(vm)?; | ||
| let key = unsafe { &*key }; | ||
| let had_item = set.__contains__(key, vm)?; | ||
| if had_item { | ||
| set.discard(key.to_owned(), vm)?; | ||
| } | ||
| Ok(had_item) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
CPython C API documentation for PySet_Discard: what are its return-value semantics, and is it defined as a single discard operation rather than a separate contains check followed by discard?
💡 Result:
PySet_Discard returns 1 if the element was found and removed, 0 if the element was not found (no action taken), and -1 if an error is encountered [1][2]. It does not raise a KeyError if the element is missing [1][2]. Regarding your second question, PySet_Discard is implemented as a single, atomic-like lookup-and-discard operation rather than separate "contains" and "discard" checks [3][4][5]. The underlying C implementation uses a function (typically named set_discard_entry in the CPython source) that performs a hash table lookup using the provided key [3][5]. If the key is found, it performs the removal in the same pass by updating the entry (replacing the key with a dummy tombstone, updating the set's size, and decrementing the reference count) [3][4][5]. This single-pass approach is more efficient and avoids the redundancy of performing two distinct lookups [5].
Citations:
- 1: https://docs.python.org/3/c-api/set.html
- 2: https://github.com/python/cpython/blob/master/Doc/c-api/set.rst
- 3: https://github.com/python/cpython/blob/main/Objects/setobject.c
- 4: https://github.com/python/cpython/blob/fd76eb547dd5d2c8307a89422049b6c3c80541ab/Objects/setobject.c
- 5: https://gopy.tamnd.com/docs/annotations/objects/setobject_c_detail
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Show the exact current implementation around the referenced lines
sed -n '60,120p' crates/capi/src/setobject.rs | nl -ba
# 2) Find the definition and any related helpers
rg -n "PySet_Discard|__contains__|fn discard|discard\(" crates/capi/src/setobject.rs crates/capi/src -S || true
# 3) Search for a bool-returning discard helper in the broader crate
rg -n "discard.*->.*bool|-> *Result<.*bool|discard.*bool|contains.*discard" crates/capi/src -S || true
# 4) Run clippy as requested by guidelines (may be slow; best-effort)
cargo clippy -q || trueRepository: RustPython/RustPython
Length of output: 108
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== setobject.rs around PySet_Discard =="
awk 'NR>=60 && NR<=120 {printf "%5d %s\n", NR, $0}' crates/capi/src/setobject.rs
echo
echo "== occurrences in crates/capi/src =="
rg -n "PySet_Discard|__contains__|\\bPySetInner\\b|\\bPySet\\b|fn +discard\\b|discard\\(" crates/capi/src -S
echo
echo "== look for discard helpers returning bool (best-effort) =="
rg -n "discard.*->.*bool|Result<.*bool>|fn +discard\\b" crates/capi/src -S
echo
echo "== best-effort cargo clippy (per guidelines) =="
cargo clippy -q || echo "cargo clippy failed (toolchain/build error in sandbox)"Repository: RustPython/RustPython
Length of output: 4212
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find PySet impl block / method definitions
rg -n "impl +PySet|struct +PySet|pub +fn +discard\\b|fn +discard\\b|__contains__\\b" crates -S | head -n 200
echo "----"
# Narrow to files likely defining PySet methods: search for "__contains__" method on PySet
rg -n "PySet.*__contains__|__contains__\\(.*vm\\)" crates -S | head -n 200
echo "----"
# Search for discard method returning Result/Option/boolean
rg -n "PySet.*discard\\b|fn +discard\\b.*->|discard\\b.*->" crates -S | head -n 200
echo "----"
echo "== best-effort cargo clippy =="
cargo clippy -qRepository: RustPython/RustPython
Length of output: 11361
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find PySet impl block / method definitions
rg -n "impl +PySet|struct +PySet|pub +fn +discard\\b|fn +discard\\b|__contains__\\b" crates -S | head -n 200
echo "----"
# Narrow to files likely defining PySet methods: search for "__contains__" method on PySet
rg -n "PySet.*__contains__|__contains__\\(.*vm\\)" crates -S | head -n 200
echo "----"
# Search for discard method returning Result/Option/boolean
rg -n "PySet.*discard\\b|fn +discard\\b.*->|discard\\b.*->" crates -S | head -n 200
echo "----"
echo "== best-effort cargo clippy =="
cargo clippy -qRepository: RustPython/RustPython
Length of output: 11633
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== locate PySet methods / impls =="
rg -n "impl +PySet|struct +PySet|__contains__\\b|\\bdiscard\\b" crates -S | head -n 200
echo
echo "== locate __contains__ specifically =="
rg -n "PySet.*__contains__|__contains__\\(.*vm\\)" crates -S | head -n 200
echo
echo "== locate discard signatures =="
rg -n "PySet.*discard\\b|fn +discard\\b.*->|\\bdiscard\\b.*->" crates -S | head -n 200
echo
echo "== cargo clippy (best-effort) =="
cargo clippy -q || echo "cargo clippy failed (toolchain/build error in sandbox)"Repository: RustPython/RustPython
Length of output: 13841
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Print relevant sections of PySet implementation in RustPython
echo "== crates/vm/src/builtins/set.rs: PySetInner::discard (around 320-360) =="
sed -n '300,380p' crates/vm/src/builtins/set.rs | nl -ba || true
echo
echo "== crates/vm/src/builtins/set.rs: PySet::discard (around 640-710) =="
sed -n '640,720p' crates/vm/src/builtins/set.rs | nl -ba || true
echo
echo "== crates/vm/src/builtins/set.rs: PySet::__contains__ (search + print nearby) =="
rg -n "pub fn __contains__\\b|fn __contains__\\b" crates/vm/src/builtins/set.rs
# Print a small window around each match (best-effort)
for ln in $(rg -n "pub fn __contains__\\b|fn __contains__\\b" crates/vm/src/builtins/set.rs | cut -d: -f1 | head -n 20); do
echo
echo "--- around line $ln ---"
sed -n "$((ln-30)),$((ln+30))p" crates/vm/src/builtins/set.rs | head -n 120
done
echo
echo "== search for any public discard returning bool =="
rg -n "pub +fn +discard\\b.*bool|pub\\(crate\\)? +fn +discard\\b.*bool" crates/vm/src/builtins/set.rsRepository: RustPython/RustPython
Length of output: 3946
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="crates/vm/src/builtins/set.rs"
echo "== PySetInner::discard around ~320-370 =="
sed -n '300,380p' "$FILE"
echo
echo "== PySet::discard around ~650-710 =="
sed -n '650,720p' "$FILE"
echo
echo "== PySetInner::contains around wherever it is (search) =="
rg -n "impl PySetInner|fn contains\\b|PySetInner.*contains\\b" "$FILE" | head -n 80
echo
echo "== show body around first PySetInner::contains match (best-effort) =="
ln="$(rg -n "fn contains\\b" "$FILE" | head -n 1 | cut -d: -f1)"
if [ -n "$ln" ]; then
start=$((ln-40))
end=$((ln+60))
sed -n "${start},${end}p" "$FILE"
fiRepository: RustPython/RustPython
Length of output: 7814
Avoid the extra membership probe in PySet_Discard.
Current PySet_Discard does __contains__ and then (conditionally) calls discard, repeating the set lookup and any __hash__/__eq__ side effects; CPython’s PySet_Discard performs lookup+removal in one operation and returns 1/0 accordingly. Add a bool-returning discard helper on PySet (wrapping the existing PySetInner::discard(...)->PyResult<bool>) and call it once here.
pub unsafe extern "C" fn PySet_Discard(set: *mut PyObject, key: *mut PyObject) -> c_int {
with_vm(|vm| {
let set = unsafe { &*set }.try_downcast_ref::<PySet>(vm)?;
let key = unsafe { &*key };
let had_item = set.__contains__(key, vm)?;
if had_item {
set.discard(key.to_owned(), vm)?;
}
Ok(had_item)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/capi/src/setobject.rs` around lines 82 - 90, Replace the current
two-step membership + removal in PySet_Discard with a single discard call that
returns whether an item was removed: add a bool-returning helper method on PySet
(e.g., PySet::discard_returning_bool) that wraps the existing
PySetInner::discard(...)->PyResult<bool>, then update PySet_Discard to call that
helper once and return its boolean result as c_int; keep using the existing
try_downcast_ref::<PySet> and the same key handling but remove the prior
__contains__ probe so lookup+removal happen in one operation.
I've used the skill from #8001 to implement the missing
setc-api functions using an AI agent.Summary by CodeRabbit
New Features
Chores
itertoolsas a workspace dependency.