From bea8d762584a5f3b216a15be200d0f6c80df80f1 Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Tue, 23 Dec 2025 12:14:56 +1000 Subject: [PATCH 01/11] Add CPython teaching materials for Record type project - CLAUDE.md: Teaching mode instructions and architecture overview - teaching-todo.md: 5-phase curriculum building toward Record type - teaching-notes.md: Detailed implementation notes (for Claude reference) The learning project implements a Record type (immutable named container) and BUILD_RECORD opcode, covering PyObject fundamentals, type slots, the evaluation loop, and build system integration. --- CLAUDE.md | 112 +++++++++++++++ teaching-notes.md | 356 ++++++++++++++++++++++++++++++++++++++++++++++ teaching-todo.md | 209 +++++++++++++++++++++++++++ 3 files changed, 677 insertions(+) create mode 100644 CLAUDE.md create mode 100644 teaching-notes.md create mode 100644 teaching-todo.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000000000..80615331d5e1b5 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,112 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Teaching Mode + +This repository is being used as a learning environment for CPython internals. The goal is to teach the user how CPython works, not to write code for them. + +**Behavior Guidelines:** +- Describe implementations and concepts, don't write code unless explicitly asked +- Ask questions to verify understanding ("What do you think ob_refcnt does?") +- Point to specific files and line numbers for the user to read +- When the user is stuck, give hints before giving answers +- Reference `teaching-todo.md` for the structured curriculum +- Reference `teaching-notes.md` for detailed research (student should not read this) +- Encourage use of `dis` module, GDB, and debug builds for exploration + +**The learning project:** Implementing a `Record` type and `BUILD_RECORD` opcode (~300 LoC). This comprehensive project covers: +- PyObject/PyVarObject fundamentals (custom struct, refcounting) +- Type slots (tp_repr, tp_hash, tp_dealloc, tp_getattro, sq_length, sq_item) +- The evaluation loop (BUILD_RECORD opcode in ceval.c) +- Build system integration + +A working solution exists on the `teaching-cpython-solution` branch for reference. + +## Build Commands + +```bash +# Debug build (required for learning - enables assertions and refcount tracking) +./configure --with-pydebug +make + +# Smoke test +./python.exe --version +./python.exe -c "print('hello')" + +# Run specific test +./python.exe -m test test_sys +``` + +After modifying opcodes or grammar: +```bash +make regen-all # Regenerate generated files +make # Rebuild +``` + +## Architecture Overview + +### The Object Model (start here) +- `Include/object.h` - PyObject, PyVarObject, Py_INCREF/DECREF +- `Include/cpython/object.h` - PyTypeObject (the "metaclass" of all types) +- `Objects/*.c` - Concrete type implementations + +### Core Data Structures +| Type | Header | Implementation | +|------|--------|----------------| +| int | `Include/cpython/longintrepr.h` | `Objects/longobject.c` | +| tuple | `Include/cpython/tupleobject.h` | `Objects/tupleobject.c` | +| list | `Include/cpython/listobject.h` | `Objects/listobject.c` | +| dict | `Include/cpython/dictobject.h` | `Objects/dictobject.c` | +| set | `Include/setobject.h` | `Objects/setobject.c` | + +### Execution Engine +- `Include/opcode.h` - Opcode definitions +- `Lib/opcode.py` - Python-side opcode definitions (source of truth) +- `Include/cpython/code.h` - Code object structure +- `Include/cpython/frameobject.h` - Frame object (execution context) +- `Python/ceval.c` - **The interpreter loop** - giant switch on opcodes, stack machine + +### Compiler Pipeline +- `Grammar/python.gram` - PEG grammar +- `Parser/` - Tokenizer and parser +- `Python/compile.c` - AST to bytecode +- `Python/symtable.c` - Symbol table building + +## Key Concepts for Teaching + +**Everything is a PyObject:** +```c +typedef struct { + Py_ssize_t ob_refcnt; // Reference count + PyTypeObject *ob_type; // Pointer to type object +} PyObject; +``` + +**The stack machine:** Bytecode operates on a value stack. `LOAD_FAST` pushes, `BINARY_ADD` pops two and pushes one, etc. + +**Type slots:** `PyTypeObject` has function pointers (tp_hash, tp_repr, tp_call) that define behavior. `len(x)` calls `x->ob_type->tp_as_sequence->sq_length`. + +## Useful Commands for Learning + +```bash +# Disassemble Python code +./python.exe -c "import dis; dis.dis(lambda: [1,2,3])" + +# Check reference count (debug build) +./python.exe -c "import sys; x = []; print(sys.getrefcount(x))" + +# Show total refcount after each statement (debug build) +./python.exe -X showrefcount + +# Run with GDB +gdb ./python.exe +(gdb) break _PyEval_EvalFrameDefault +(gdb) run -c "1 + 1" +``` + +## External Resources + +- Developer Guide: https://devguide.python.org/ +- CPython Internals Book: https://realpython.com/products/cpython-internals-book/ +- PEP 3155 (Qualified names): Understanding how names are resolved diff --git a/teaching-notes.md b/teaching-notes.md new file mode 100644 index 00000000000000..c39fd138525911 --- /dev/null +++ b/teaching-notes.md @@ -0,0 +1,356 @@ +# Teaching Notes (For Claude - Not for Student) + +This file contains detailed research and implementation guidance for teaching CPython internals through building a Record type. + +--- + +## Phase 1: PyObject Fundamentals + +### PyObject Structure +**File:** `Include/object.h:105-109` +```c +typedef struct _object { + _PyObject_HEAD_EXTRA + Py_ssize_t ob_refcnt; + PyTypeObject *ob_type; +} PyObject; +``` + +### PyVarObject Structure +**File:** `Include/object.h:115-118` +```c +typedef struct { + PyObject ob_base; + Py_ssize_t ob_size; +} PyVarObject; +``` + +### Py_INCREF/DECREF +**File:** `Include/object.h:461-508` +- INCREF: Simply `op->ob_refcnt++` +- DECREF: Decrements, calls `_Py_Dealloc(op)` when reaches 0 +- Debug builds track `_Py_RefTotal` and detect negative refcounts + +### Teaching Questions - Answers + +**Q: What are the two fields every Python object has?** +A: `ob_refcnt` (reference count) and `ob_type` (pointer to type object) + +**Q: Why reference counting instead of tracing GC?** +A: Deterministic destruction (know exactly when objects die), simpler implementation, good cache locality. Downside: can't handle cycles (hence the cycle collector supplement). + +**Q: Should Record use PyObject or PyVarObject?** +A: PyVarObject - because Record has variable number of fields. The `ob_size` will store field count. + +--- + +## Phase 2: Data Structures for Record + +### Tuple as Reference +**File:** `Include/cpython/tupleobject.h:5-11` +```c +typedef struct { + PyObject_VAR_HEAD + PyObject *ob_item[1]; // Flexible array member +} PyTupleObject; +``` + +### Record Memory Layout Design +```c +typedef struct { + PyObject_VAR_HEAD // includes ob_size = field count + Py_hash_t r_hash; // cached hash (-1 if not computed) + PyObject *r_names; // tuple of field names (strings) + PyObject *r_values[1]; // flexible array of values +} RecordObject; +``` + +**Design decisions:** +- `r_names` is a tuple (shared across records with same fields) +- `r_values` is inline for cache locality +- `r_hash` cached because immutable (like tuple) +- Use `ob_size` for field count + +**Q: Tradeoff tuple vs dict for names?** +A: Tuple is simpler and faster for small N. Dict would be O(1) lookup but more memory. For typical record sizes (2-10 fields), linear scan of tuple is fine. Could optimize with dict for large records. + +### Key Functions to Study +- `Objects/tupleobject.c:tuple_hash` (line ~350) - hash combining algorithm +- `Objects/tupleobject.c:tuple_richcompare` (line ~600) - element-by-element comparison +- `Objects/tupleobject.c:tuple_dealloc` (line ~250) - DECREF each element + +--- + +## Phase 3: Type Slots Implementation + +### Slot Reference Table + +| Slot | Signature | Purpose | +|------|-----------|---------| +| `tp_dealloc` | `void (*)(PyObject *)` | Release resources | +| `tp_repr` | `PyObject *(*)(PyObject *)` | `repr(obj)` | +| `tp_hash` | `Py_hash_t (*)(PyObject *)` | `hash(obj)` | +| `tp_richcompare` | `PyObject *(*)(PyObject *, PyObject *, int)` | Comparisons | +| `tp_getattro` | `PyObject *(*)(PyObject *, PyObject *)` | `obj.attr` | +| `sq_length` | `Py_ssize_t (*)(PyObject *)` | `len(obj)` | +| `sq_item` | `PyObject *(*)(PyObject *, Py_ssize_t)` | `obj[i]` | + +### Implementation Patterns + +**record_dealloc:** +```c +static void +record_dealloc(RecordObject *r) +{ + Py_ssize_t i, n = Py_SIZE(r); + PyObject_GC_UnTrack(r); // If GC tracked + Py_XDECREF(r->r_names); + for (i = 0; i < n; i++) { + Py_XDECREF(r->r_values[i]); + } + Py_TYPE(r)->tp_free((PyObject *)r); +} +``` + +**record_hash (based on tuple_hash):** +```c +static Py_hash_t +record_hash(RecordObject *r) +{ + if (r->r_hash != -1) + return r->r_hash; + + Py_hash_t hash = 0x345678L; + Py_ssize_t n = Py_SIZE(r); + Py_hash_t mult = 1000003L; + + for (Py_ssize_t i = 0; i < n; i++) { + Py_hash_t h = PyObject_Hash(r->r_values[i]); + if (h == -1) return -1; + hash = (hash ^ h) * mult; + mult += 82520L + n + n; + } + hash += 97531L; + if (hash == -1) + hash = -2; + r->r_hash = hash; + return hash; +} +``` + +**record_getattro:** +```c +static PyObject * +record_getattro(RecordObject *r, PyObject *name) +{ + // First check if it's a field name + if (PyUnicode_Check(name)) { + Py_ssize_t n = Py_SIZE(r); + for (Py_ssize_t i = 0; i < n; i++) { + PyObject *field = PyTuple_GET_ITEM(r->r_names, i); + if (PyUnicode_Compare(name, field) == 0) { + PyObject *val = r->r_values[i]; + Py_INCREF(val); + return val; + } + } + } + // Fall back to generic attribute lookup (for methods, etc.) + return PyObject_GenericGetAttr((PyObject *)r, name); +} +``` + +### Common Pitfalls +1. Forgetting to INCREF return values from sq_item/getattro +2. Not handling negative indices in sq_item +3. Forgetting to handle hash == -1 (error indicator) +4. Not calling PyObject_GC_UnTrack in dealloc if type is GC-tracked + +--- + +## Phase 4: Evaluation Loop + +### Key Locations +- `_PyEval_EvalFrameDefault`: `Python/ceval.c:1577` +- Stack macros: `Python/ceval.c:1391-1433` +- `BUILD_TUPLE`: `Python/ceval.c:2615-2630` +- `BUILD_MAP`: `Python/ceval.c:2648-2700` + +### BUILD_TUPLE Pattern +```c +case TARGET(BUILD_TUPLE): { + PyObject *tup = PyTuple_New(oparg); + if (tup == NULL) + goto error; + while (--oparg >= 0) { + PyObject *item = POP(); + PyTuple_SET_ITEM(tup, oparg, item); + } + PUSH(tup); + DISPATCH(); +} +``` + +### BUILD_RECORD Design +Stack layout: `[..., name1, val1, name2, val2, ..., nameN, valN]` +Oparg: N (number of fields) +Pops: 2*N items +Pushes: 1 record + +```c +case TARGET(BUILD_RECORD): { + Py_ssize_t n = oparg; + PyObject *names = PyTuple_New(n); + if (names == NULL) + goto error; + + // Collect names and values from stack (reverse order) + PyObject **values = PyMem_Malloc(n * sizeof(PyObject *)); + if (values == NULL) { + Py_DECREF(names); + goto error; + } + + for (Py_ssize_t i = n - 1; i >= 0; i--) { + PyObject *val = POP(); + PyObject *name = POP(); + if (!PyUnicode_Check(name)) { + // Error: field name must be string + // cleanup and goto error + } + PyTuple_SET_ITEM(names, i, name); // steals ref + values[i] = val; // we own this ref + } + + PyObject *record = PyRecord_New(names, values, n); + PyMem_Free(values); + if (record == NULL) { + Py_DECREF(names); + goto error; + } + + PUSH(record); + DISPATCH(); +} +``` + +--- + +## Phase 5: Implementation Details + +### File Structure + +**Include/recordobject.h:** +```c +#ifndef Py_RECORDOBJECT_H +#define Py_RECORDOBJECT_H + +#include "Python.h" + +typedef struct { + PyObject_VAR_HEAD + Py_hash_t r_hash; + PyObject *r_names; + PyObject *r_values[1]; +} RecordObject; + +PyAPI_DATA(PyTypeObject) PyRecord_Type; + +#define PyRecord_Check(op) PyObject_TypeCheck(op, &PyRecord_Type) + +PyAPI_FUNC(PyObject *) PyRecord_New(PyObject *names, PyObject **values, Py_ssize_t n); + +#endif +``` + +### Build Integration + +**Makefile.pre.in additions:** +```makefile +OBJECT_OBJS= \ + ... \ + Objects/recordobject.o +``` + +**Python/bltinmodule.c:** +Add to `_PyBuiltin_Init`: +```c +SETBUILTIN("Record", &PyRecord_Type); +``` + +### Opcode Number Selection +Check `Lib/opcode.py` for gaps. In 3.10: +- Gap at 35-48 +- Gap at 58 +- Could use 35 for BUILD_RECORD + +Add to `Lib/opcode.py`: +```python +def_op('BUILD_RECORD', 35) +hasconst.append(35) # or maybe not, depends on design +``` + +--- + +## Teaching Strategies + +### Phase 1 Approach +1. Have student grep for "typedef struct _object" +2. Ask them to explain each field before revealing +3. Demo with: `import sys; x = []; print(sys.getrefcount(x))` +4. Show how INCREF/DECREF work with print statements in debug build + +### Phase 2 Approach +1. Compare tuple and list side by side - why different structures? +2. Have student sketch Record layout before showing solution +3. Discuss space/time tradeoffs + +### Phase 3 Approach +1. Start with dealloc - "what happens when refcount hits 0?" +2. Work through repr next - visible feedback +3. Leave hash for after they understand the algorithm from tuple + +### Phase 4 Approach +1. Use GDB to step through BUILD_TUPLE +2. Print stack_pointer values before/after +3. Have student write pseudocode before C + +### Phase 5 Approach +1. Start with minimal working type (just dealloc + repr) +2. Add features incrementally, test each +3. Opcode last, after type fully works + +--- + +## Quick Reference: Key Line Numbers (3.10) + +| Item | File | Line | +|------|------|------| +| PyObject | Include/object.h | 105 | +| PyVarObject | Include/object.h | 115 | +| Py_INCREF | Include/object.h | 461 | +| Py_DECREF | Include/object.h | 477 | +| PyTypeObject | Include/cpython/object.h | 191 | +| PyTupleObject | Include/cpython/tupleobject.h | 5 | +| tuple_hash | Objects/tupleobject.c | ~350 | +| tuple_richcompare | Objects/tupleobject.c | ~600 | +| tuple_dealloc | Objects/tupleobject.c | ~250 | +| PyTuple_Type | Objects/tupleobject.c | ~750 | +| _PyEval_EvalFrameDefault | Python/ceval.c | 1577 | +| Stack macros | Python/ceval.c | 1391 | +| BUILD_TUPLE | Python/ceval.c | 2615 | +| BUILD_MAP | Python/ceval.c | 2648 | + +--- + +## Solution Branch Reference + +The `teaching-cpython-solution` branch contains: +- `Include/recordobject.h` - Complete header +- `Objects/recordobject.c` - Full implementation (~200 lines) +- Modified `Lib/opcode.py` - BUILD_RECORD definition +- Modified `Python/ceval.c` - BUILD_RECORD handler +- Modified build files +- Test script demonstrating all features + +Use this as reference when student gets stuck, but guide them to discover solutions themselves first. diff --git a/teaching-todo.md b/teaching-todo.md new file mode 100644 index 00000000000000..ef709f534c3f8b --- /dev/null +++ b/teaching-todo.md @@ -0,0 +1,209 @@ +# CPython Internals Learning Path + +A structured curriculum for understanding CPython's implementation by building a custom `Record` type and `BUILD_RECORD` opcode. + +## The Project + +Build a lightweight immutable record type (like a simplified namedtuple): + +```python +r = Record(x=10, y=20, name="point") +r.x # 10 (attribute access) +r[0] # 10 (sequence protocol) +len(r) # 3 +hash(r) # hashable (can be dict key) +repr(r) # Record(x=10, y=20, name='point') +r == r2 # comparable +``` + +--- + +## Phase 1: The Object Model + +### 1.1 PyObject - The Universal Base +- [ ] Read `Include/object.h` - find `PyObject` struct (around line 105) +- [ ] Understand: What are the two fields every Python object has? +- [ ] Find where `Py_INCREF` and `Py_DECREF` are defined +- [ ] Question: Why does CPython use reference counting instead of tracing GC? + +### 1.2 PyVarObject - Variable-Length Objects +- [ ] Find `PyVarObject` in the headers +- [ ] Question: What's the difference between PyObject and PyVarObject? +- [ ] Which built-in types use PyVarObject? (hint: list, tuple, but not dict) +- [ ] **For Record:** Should Record use PyObject or PyVarObject? Why? + +### 1.3 Type Objects +- [ ] Read `Include/cpython/object.h` - find `PyTypeObject` (around line 191) +- [ ] Identify the "slots" - tp_hash, tp_repr, tp_dealloc, tp_getattro, etc. +- [ ] Question: How does Python know what `len(x)` should call for a given type? +- [ ] Find where `PyTuple_Type` is defined in `Objects/tupleobject.c` - study its structure + +--- + +## Phase 2: Concrete Data Structures + +### 2.1 Tuples (Our Reference Implementation) +- [ ] Read `Include/cpython/tupleobject.h` and `Objects/tupleobject.c` +- [ ] Study the PyTupleObject struct - how does it store elements? +- [ ] Find `tuple_hash` - how does tuple compute its hash? +- [ ] Find `tuple_richcompare` - how does equality work? +- [ ] **For Record:** Our Record will store values like tuple, but add field names + +### 2.2 Dictionaries (For Field Name Lookup) +- [ ] Read `Include/cpython/dictobject.h` - understand PyDictObject basics +- [ ] Question: How would we map field names to indices efficiently? +- [ ] Study `PyDict_GetItem` - how to look up a key + +### 2.3 Designing Record's Memory Layout +- [ ] Sketch the RecordObject struct: + - What fields do we need? (field names, values, cached hash?) + - Should we store field names per-instance or share them? +- [ ] Question: What's the tradeoff between storing names as tuple vs dict? + +--- + +## Phase 3: Type Slots Deep Dive + +### 3.1 Essential Slots for Record +- [ ] `tp_dealloc` - Study tuple's dealloc. What must we DECREF? +- [ ] `tp_repr` - Study tuple's repr. How do we build the output string? +- [ ] `tp_hash` - Study tuple's hash. What makes a good hash for immutable containers? +- [ ] `tp_richcompare` - Study tuple's compare. Handle Py_EQ at minimum + +### 3.2 Sequence Protocol (for indexing) +- [ ] Find `PySequenceMethods` in headers +- [ ] Study `sq_length` - returns `Py_ssize_t` +- [ ] Study `sq_item` - takes index, returns item (with INCREF!) +- [ ] **For Record:** Implement these so `r[0]` and `len(r)` work + +### 3.3 Attribute Access (for field names) +- [ ] Study `tp_getattro` - how does attribute lookup work? +- [ ] Look at how namedtuple does it (it's in Python, but concept applies) +- [ ] **For Record:** Map `r.fieldname` to the correct value + +### 3.4 Constructor +- [ ] Study `tp_new` vs `tp_init` - what's the difference? +- [ ] For immutable types, which one do we need? +- [ ] **For Record:** Design the C function signature for creating records + +--- + +## Phase 4: The Evaluation Loop + +### 4.1 ceval.c Overview +- [ ] Open `Python/ceval.c` - find `_PyEval_EvalFrameDefault` (around line 1577) +- [ ] Understand the main dispatch loop structure +- [ ] Find the stack macros: `PUSH()`, `POP()`, `TOP()`, `PEEK()` (around line 1391) + +### 4.2 Study Similar Opcodes +- [ ] Find `BUILD_TUPLE` implementation - how does it pop N items and push a tuple? +- [ ] Find `BUILD_MAP` implementation - how does it handle key-value pairs? +- [ ] Question: What error handling pattern do these opcodes use? + +### 4.3 Design BUILD_RECORD +- [ ] Decide on stack layout: `BUILD_RECORD n` where n is field count +- [ ] Stack before: `[..., name1, val1, name2, val2, ...]` (or different order?) +- [ ] Stack after: `[..., record]` +- [ ] What validation do we need? (names must be strings, no duplicates?) + +--- + +## Phase 5: Implementation + +### 5.1 Create the Header File +- [ ] Create `Include/recordobject.h` +- [ ] Define `RecordObject` struct +- [ ] Declare `PyRecord_Type` +- [ ] Declare `PyRecord_New()` constructor function + +### 5.2 Implement the Type +- [ ] Create `Objects/recordobject.c` +- [ ] Implement `record_dealloc` +- [ ] Implement `record_repr` +- [ ] Implement `record_hash` +- [ ] Implement `record_richcompare` +- [ ] Implement `record_length` (sq_length) +- [ ] Implement `record_item` (sq_item) +- [ ] Implement `record_getattro` (attribute access by name) +- [ ] Define `PyRecord_Type` with all slots filled +- [ ] Implement `PyRecord_New()` - the C API constructor + +### 5.3 Add the Opcode +- [ ] Add `BUILD_RECORD` to `Lib/opcode.py` (pick unused number, needs argument) +- [ ] Run `make regen-opcode` and `make regen-opcode-targets` +- [ ] Implement `BUILD_RECORD` handler in `Python/ceval.c` + +### 5.4 Build System Integration +- [ ] Add `recordobject.c` to the build (Makefile.pre.in or setup.py) +- [ ] Add header to appropriate include lists +- [ ] Register type in Python initialization + +### 5.5 Build and Test +- [ ] Run `make` - fix any compilation errors +- [ ] Test basic creation via C API +- [ ] Test via manual bytecode or compiler modification +- [ ] Verify all operations: indexing, len, hash, repr, equality, attribute access + +--- + +## Verification Checklist + +After implementation, verify each feature: + +```python +# Creation (via whatever mechanism we build) +r = Record(x=10, y=20) + +# Repr +assert repr(r) == "Record(x=10, y=20)" + +# Indexing (sequence protocol) +assert r[0] == 10 +assert r[1] == 20 +assert len(r) == 2 + +# Attribute access +assert r.x == 10 +assert r.y == 20 + +# Hashing (for use as dict key) +d = {r: "value"} +assert d[r] == "value" + +# Equality +r2 = Record(x=10, y=20) +assert r == r2 + +# Immutability (should raise) +# r.x = 30 # AttributeError +# r[0] = 30 # TypeError +``` + +--- + +## Files We'll Create/Modify + +| File | Action | ~Lines | +|------|--------|--------| +| `Include/recordobject.h` | Create | 25 | +| `Objects/recordobject.c` | Create | 200 | +| `Lib/opcode.py` | Modify | 2 | +| `Python/ceval.c` | Modify | 30 | +| `Makefile.pre.in` | Modify | 5 | +| `Python/bltinmodule.c` | Modify | 10 | + +--- + +## How to Use This Guide + +1. Read the specified files - don't skim, trace through the code +2. Answer questions before moving on (write answers down) +3. Use `./python.exe -c "..."` to experiment +4. Use GDB when confused: `gdb ./python.exe` +5. The `dis` module shows bytecode: `import dis; dis.dis(func)` + +Debug build helpers: +```bash +./python.exe -c "import sys; print(sys.getrefcount(x))" +./python.exe -X showrefcount +``` From 47ff3581c41a6f2c4a1fa66392e8d6eb4dbda4f7 Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Tue, 23 Dec 2025 12:30:42 +1000 Subject: [PATCH 02/11] Add implementation lessons learned to teaching notes Document practical issues encountered during Record type implementation: - Hash constants not publicly exposed (copy from tupleobject.c) - GC trashcan macros for deep recursion protection - PyUnicode_Compare error handling - Opcode number selection (166 after DICT_UPDATE) - Python-level constructor (tp_new) for testing - Type initialization order in object.c and bltinmodule.c - Correct allocator (PyObject_GC_NewVar) - tp_basicsize calculation for flexible array members Also added testing checklist for verifying implementation. --- teaching-notes.md | 178 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 174 insertions(+), 4 deletions(-) diff --git a/teaching-notes.md b/teaching-notes.md index c39fd138525911..ec9785ea61828b 100644 --- a/teaching-notes.md +++ b/teaching-notes.md @@ -347,10 +347,180 @@ hasconst.append(35) # or maybe not, depends on design The `teaching-cpython-solution` branch contains: - `Include/recordobject.h` - Complete header -- `Objects/recordobject.c` - Full implementation (~200 lines) -- Modified `Lib/opcode.py` - BUILD_RECORD definition +- `Objects/recordobject.c` - Full implementation (~490 lines) +- Modified `Lib/opcode.py` - BUILD_RECORD opcode 166 - Modified `Python/ceval.c` - BUILD_RECORD handler -- Modified build files -- Test script demonstrating all features +- Modified `Makefile.pre.in` - Build integration +- Modified `Objects/object.c` - Type initialization +- Modified `Python/bltinmodule.c` - Builtin registration Use this as reference when student gets stuck, but guide them to discover solutions themselves first. + +--- + +## Lessons Learned During Implementation + +These are practical issues encountered during actual implementation: + +### 1. Hash Constants Not Publicly Exposed + +The XXH3-style hash constants (`_PyHASH_XXPRIME_1`, `_PyHASH_XXPRIME_2`, `_PyHASH_XXPRIME_5`) used by `tupleobject.c` are defined locally in that file, not in a header. Had to copy them into `recordobject.c`: + +```c +#if SIZEOF_PY_UHASH_T > 4 +#define _PyHASH_XXPRIME_1 ((Py_uhash_t)11400714785074694791ULL) +#define _PyHASH_XXPRIME_2 ((Py_uhash_t)14029467366897019727ULL) +#define _PyHASH_XXPRIME_5 ((Py_uhash_t)2870177450012600261ULL) +#define _PyHASH_XXROTATE(x) ((x << 31) | (x >> 33)) +#else +#define _PyHASH_XXPRIME_1 ((Py_uhash_t)2654435761UL) +#define _PyHASH_XXPRIME_2 ((Py_uhash_t)2246822519UL) +#define _PyHASH_XXPRIME_5 ((Py_uhash_t)374761393UL) +#define _PyHASH_XXROTATE(x) ((x << 13) | (x >> 19)) +#endif +``` + +**Teaching point:** Internal implementation details are often not exposed. Look at how existing code solves the same problem. + +### 2. GC Trashcan for Deep Recursion + +Deallocation needs `Py_TRASHCAN_BEGIN`/`Py_TRASHCAN_END` to prevent stack overflow when destroying deeply nested structures: + +```c +static void +record_dealloc(RecordObject *r) +{ + PyObject_GC_UnTrack(r); + Py_TRASHCAN_BEGIN(r, record_dealloc) + + // ... cleanup code ... + + Py_TRASHCAN_END +} +``` + +**Teaching point:** Check how `tuple_dealloc` and `list_dealloc` handle this. + +### 3. PyUnicode_Compare Error Handling + +`PyUnicode_Compare` can raise exceptions (e.g., if comparison fails). Must check `PyErr_Occurred()`: + +```c +int cmp = PyUnicode_Compare(name, field); +if (cmp == 0) { + // found match +} +if (PyErr_Occurred()) { + return NULL; +} +``` + +### 4. Opcode Number Selection + +Originally planned opcode 35, but ended up using 166 (right after `DICT_UPDATE` at 165). The gaps in the opcode table exist for historical reasons and some are reserved. + +**Teaching point:** Look at recent additions to see where new opcodes go. `Lib/opcode.py` is the source of truth. + +### 5. Python-Level Constructor (tp_new) + +To test the Record type without emitting bytecode, need a Python constructor: + +```c +static PyObject * +record_new(PyTypeObject *type, PyObject *args, PyObject *kwds) +{ + // Record() only accepts keyword arguments + if (PyTuple_GET_SIZE(args) != 0) { + PyErr_SetString(PyExc_TypeError, + "Record() takes no positional arguments"); + return NULL; + } + // ... iterate kwds, build names tuple and values array +} +``` + +This allows: `Record(x=10, y=20)` without needing compiler changes. + +### 6. Type Initialization Order + +The type must be initialized in `_PyTypes_Init()` in `Objects/object.c`: + +```c +INIT_TYPE(PyRecord_Type); +``` + +And exposed as builtin in `Python/bltinmodule.c`: + +```c +SETBUILTIN("Record", &PyRecord_Type); +``` + +### 7. Correct Allocator for GC-Tracked VarObjects + +Use `PyObject_GC_NewVar` for variable-size objects that participate in GC: + +```c +record = PyObject_GC_NewVar(RecordObject, &PyRecord_Type, n); +``` + +Not `PyObject_NewVar` (which doesn't set up GC tracking). + +### 8. tp_basicsize Calculation + +For variable-size objects, `tp_basicsize` should be the size WITHOUT the variable part: + +```c +sizeof(RecordObject) - sizeof(PyObject *) /* tp_basicsize */ +sizeof(PyObject *) /* tp_itemsize */ +``` + +The `- sizeof(PyObject *)` accounts for the `r_values[1]` flexible array member. + +--- + +## Testing Checklist + +After implementation, verify: + +```python +# Basic construction +r = Record(x=10, y=20) + +# repr +assert repr(r) == "Record(x=10, y=20)" + +# Attribute access +assert r.x == 10 +assert r.y == 20 + +# Indexing +assert r[0] == 10 +assert r[-1] == 20 +assert len(r) == 2 + +# Hashing (usable as dict key) +d = {r: "value"} +r2 = Record(x=10, y=20) +assert d[r2] == "value" + +# Equality +assert r == r2 +assert r != Record(x=10, y=30) +assert r != Record(a=10, b=20) # different names + +# Error handling +try: + r[5] +except IndexError: + pass + +try: + Record() # no kwargs +except TypeError: + pass + +try: + Record(1, 2, 3) # positional args +except TypeError: + pass +``` From 30268e21096e5ade8e76a93c880f66f4f5e914fe Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Wed, 24 Dec 2025 12:28:01 +1000 Subject: [PATCH 03/11] starting coding next --- .claude/commands/checkpoint.md | 11 + ...ct.c_0cad5e07ee2d8f6f04e3931237fa6ce1.prob | 1 + summary-sheet.md | 151 ++++++++++++ teaching-todo.md | 214 ++++++++++++++---- 4 files changed, 336 insertions(+), 41 deletions(-) create mode 100644 .claude/commands/checkpoint.md create mode 100644 Objects/.cph/.object.c_0cad5e07ee2d8f6f04e3931237fa6ce1.prob create mode 100644 summary-sheet.md diff --git a/.claude/commands/checkpoint.md b/.claude/commands/checkpoint.md new file mode 100644 index 00000000000000..791c9b58e0bf32 --- /dev/null +++ b/.claude/commands/checkpoint.md @@ -0,0 +1,11 @@ +--- +description: Update teaching-todo.md with progress and notes from our session +--- + +Review the conversation and update the learning progress in @teaching-todo.md: + +1. Mark completed items with `[x]` +2. Mark partially completed items with `[~]` +3. Add **Notes from session:** sections under completed topics with key learnings, Q&A, and decisions made + +Focus on what we actually discussed and learned - don't mark items complete unless we covered them. diff --git a/Objects/.cph/.object.c_0cad5e07ee2d8f6f04e3931237fa6ce1.prob b/Objects/.cph/.object.c_0cad5e07ee2d8f6f04e3931237fa6ce1.prob new file mode 100644 index 00000000000000..ad6f0acbc32b50 --- /dev/null +++ b/Objects/.cph/.object.c_0cad5e07ee2d8f6f04e3931237fa6ce1.prob @@ -0,0 +1 @@ +{"name":"Local: object","url":"/Users/nickvandermerwe/repos/cpython/Objects/object.c","tests":[{"id":1766539391971,"input":"","output":""}],"interactive":false,"memoryLimit":1024,"timeLimit":3000,"srcPath":"/Users/nickvandermerwe/repos/cpython/Objects/object.c","group":"local","local":true} \ No newline at end of file diff --git a/summary-sheet.md b/summary-sheet.md new file mode 100644 index 00000000000000..97aa1c9b0439ce --- /dev/null +++ b/summary-sheet.md @@ -0,0 +1,151 @@ +# CPython Internals Summary Sheet + +## Object Model + +``` +PyObject (fixed size) PyVarObject (variable size) +┌─────────────────┐ ┌─────────────────┐ +│ ob_refcnt │ │ ob_refcnt │ +│ ob_type ────────┼──→ type │ ob_type │ +└─────────────────┘ │ ob_size │ ← element count + ├─────────────────┤ + │ items[0] │ + │ items[1] │ ← flexible array + │ ... │ + └─────────────────┘ +``` + +## Reference Counting + +``` +Py_INCREF(obj) → obj->ob_refcnt++ +Py_DECREF(obj) → obj->ob_refcnt--; if (0) tp_dealloc(obj) +``` + +Rules: +- INCREF when you store a reference +- DECREF when you release it +- INCREF before returning a PyObject* (caller owns it) +- Forget INCREF → use-after-free +- Forget DECREF → memory leak + +Debug builds track `_Py_RefTotal` for leak detection. + +## Type Slots + +``` +obj->ob_type->tp_hash(obj) # hash(obj) +obj->ob_type->tp_repr(obj) # repr(obj) +obj->ob_type->tp_richcompare(obj, other, op) # obj == other +obj->ob_type->tp_getattro(obj, name) # obj.name +obj->ob_type->tp_as_sequence->sq_length(obj) # len(obj) +obj->ob_type->tp_as_sequence->sq_item(obj, i) # obj[i] +``` + +NULL slot → TypeError: type doesn't support operation + +Prefixes: `tp_` (type), `sq_` (sequence), `nb_` (number), `mp_` (mapping) + +## RecordObject Design + +``` +┌─────────────────┐ +│ ob_refcnt │ +│ ob_type │ +│ ob_size = n │ ← field count +├─────────────────┤ +│ names ──────────┼──→ ("x", "y", "z") ← shared tuple +├─────────────────┤ +│ values[0] │ +│ values[1] │ ← flexible array +│ values[2] │ +└─────────────────┘ +``` + +```c +typedef struct { + PyObject_VAR_HEAD + PyObject *names; // tuple of field names + PyObject *values[1]; // flexible array +} RecordObject; +``` + +## Slots to Implement + +| Slot | Purpose | Key Pattern | +|------|---------|-------------| +| `tp_dealloc` | Destructor | DECREF names + each value, then tp_free | +| `tp_repr` | `repr(r)` | PyUnicodeWriter, PyObject_Repr per value | +| `tp_hash` | `hash(r)` | Combine element hashes (xxHash), -1 → -2 | +| `tp_richcompare` | `r == r2` | Compare names AND values, Py_NotImplemented for < > | +| `tp_getattro` | `r.x` | Search names, return values[i], fallback GenericGetAttr | +| `tp_new` | `Record(x=1)` | Parse kwargs, call PyRecord_New | +| `sq_length` | `len(r)` | Return Py_SIZE(self) | +| `sq_item` | `r[i]` | Bounds check, INCREF, return | + +## Evaluation Loop + +``` +_PyEval_EvalFrameDefault() { + for (;;) { + switch (opcode) { + case BUILD_TUPLE: ... + case BUILD_RECORD: ... ← we add this + } + } +} +``` + +Stack macros: +- `POP()` - pop and take ownership +- `PUSH(obj)` - push onto stack +- `PEEK(n)` - read without popping +- `STACK_SHRINK(n)` - drop n items + +## BUILD_RECORD Design + +``` +Stack before: [..., names_tuple, val0, val1, val2] + ↑ └─── oparg=3 ───┘ + └── PEEK(oparg+1) + +Stack after: [..., record] +``` + +```c +case TARGET(BUILD_RECORD): { + PyObject *names = PEEK(oparg + 1); + PyObject **values = &PEEK(oparg); + PyObject *rec = PyRecord_New(names, values, oparg); // steals refs + if (rec == NULL) goto error; + STACK_SHRINK(oparg + 1); + PUSH(rec); + DISPATCH(); +} +``` + +## Reference Stealing vs Copying + +``` +BUILD_TUPLE: POP → SET_ITEM (steals) → no DECREF needed +BUILD_MAP: PEEK → SetItem (copies) → must DECREF after +BUILD_RECORD: PEEK → PyRecord_New (steals) → no DECREF needed +``` + +## Constructors + +``` +PyRecord_New(names, values, n) ← C API, called by opcode, steals refs +record_new(type, args, kwargs) ← Python API (tp_new), parses kwargs +``` + +## Files to Create/Modify + +``` +Include/recordobject.h ← struct + declarations +Objects/recordobject.c ← implementation +Python/ceval.c ← BUILD_RECORD case +Lib/opcode.py ← register opcode number +Makefile.pre.in ← add to build +Python/bltinmodule.c ← register type +``` diff --git a/teaching-todo.md b/teaching-todo.md index ef709f534c3f8b..032b4ed8c5baa7 100644 --- a/teaching-todo.md +++ b/teaching-todo.md @@ -21,33 +21,79 @@ r == r2 # comparable ## Phase 1: The Object Model ### 1.1 PyObject - The Universal Base -- [ ] Read `Include/object.h` - find `PyObject` struct (around line 105) -- [ ] Understand: What are the two fields every Python object has? -- [ ] Find where `Py_INCREF` and `Py_DECREF` are defined -- [ ] Question: Why does CPython use reference counting instead of tracing GC? +- [x] Read `Include/object.h` - find `PyObject` struct (around line 105) +- [x] Understand: What are the two fields every Python object has? +- [x] Find where `Py_INCREF` and `Py_DECREF` are defined +- [x] Question: Why does CPython use reference counting instead of tracing GC? + +**Notes from session:** +- Two fields: `ob_refcnt` (reference count) and `ob_type` (pointer to type object) +- `Py_INCREF`: increments `ob_refcnt` +- `Py_DECREF`: decrements `ob_refcnt`, calls `tp_dealloc` when it hits 0 +- Debug builds also track `_Py_RefTotal` (global count of all refs) for leak detection +- Why refcounting? + - Deterministic destruction (RAII-like) - resources freed immediately when last ref drops + - C extensions can easily participate (just INCREF/DECREF, no root registration) + - No GC pauses + - Tradeoff: can't handle cycles alone, so CPython has supplemental cyclic GC +- C++ `shared_ptr` has same cycle limitation - that's why `weak_ptr` exists ### 1.2 PyVarObject - Variable-Length Objects -- [ ] Find `PyVarObject` in the headers -- [ ] Question: What's the difference between PyObject and PyVarObject? -- [ ] Which built-in types use PyVarObject? (hint: list, tuple, but not dict) -- [ ] **For Record:** Should Record use PyObject or PyVarObject? Why? +- [x] Find `PyVarObject` in the headers +- [x] Question: What's the difference between PyObject and PyVarObject? +- [x] Which built-in types use PyVarObject? (hint: list, tuple, but not dict) +- [x] **For Record:** Should Record use PyObject or PyVarObject? Why? + +**Notes from session:** +- `PyVarObject` adds `ob_size` field - count of elements (not bytes!) +- Used with "flexible array member" pattern: struct ends with `item[1]`, allocate extra space +- `PyObject_HEAD` macro → embeds `PyObject ob_base` +- `PyObject_VAR_HEAD` macro → embeds `PyVarObject ob_base` (includes size) +- Macros are historical - used to expand to raw fields, now just embed the struct +- **Record decision:** Use `PyVarObject` because we store variable number of values ### 1.3 Type Objects -- [ ] Read `Include/cpython/object.h` - find `PyTypeObject` (around line 191) -- [ ] Identify the "slots" - tp_hash, tp_repr, tp_dealloc, tp_getattro, etc. -- [ ] Question: How does Python know what `len(x)` should call for a given type? -- [ ] Find where `PyTuple_Type` is defined in `Objects/tupleobject.c` - study its structure +- [x] Read `Include/cpython/object.h` - find `PyTypeObject` (around line 191) +- [x] Identify the "slots" - tp_hash, tp_repr, tp_dealloc, tp_getattro, etc. +- [x] Question: How does Python know what `len(x)` should call for a given type? +- [x] Find where `PyTuple_Type` is defined in `Objects/tupleobject.c` - study its structure + +**Notes from session:** +- `PyTypeObject` is a big struct of function pointers ("slots") that define type behavior +- Naming conventions: `tp_` (type), `ob_` (object), `sq_` (sequence), `nb_` (number), `mp_` (mapping) +- `len(x)` works by: `obj->ob_type->tp_as_sequence->sq_length` - if any pointer is NULL, raises TypeError +- This slot-based dispatch is what enables Python's duck typing - interpreter just checks "do you have this slot?" +- Slots are fast (2-3 pointer chases) vs method lookup (hashing, dict probe, MRO traversal) +- Adding new slots to PyTypeObject is an ABI-breaking change - done rarely and carefully +- `tp_vectorcall` added in 3.8, `tp_as_async` added in 3.5 (as sub-struct pointer to minimize slots added) +- Most slots established by Python 2.6 - ABI stability concerns limit additions +- New features often work around slots: decorators use function calls, context managers use method lookup --- ## Phase 2: Concrete Data Structures ### 2.1 Tuples (Our Reference Implementation) -- [ ] Read `Include/cpython/tupleobject.h` and `Objects/tupleobject.c` -- [ ] Study the PyTupleObject struct - how does it store elements? -- [ ] Find `tuple_hash` - how does tuple compute its hash? -- [ ] Find `tuple_richcompare` - how does equality work? -- [ ] **For Record:** Our Record will store values like tuple, but add field names +- [x] Read `Include/cpython/tupleobject.h` and `Objects/tupleobject.c` +- [x] Study the PyTupleObject struct - how does it store elements? +- [x] Find `tuple_hash` - how does tuple compute its hash? +- [x] Find `tuple_richcompare` - how does equality work? +- [x] **For Record:** Our Record will store values like tuple, but add field names + +**Notes from session:** +- `PyTupleObject` uses flexible array member: `ob_item[1]` with extra allocation +- `tuple_as_sequence` struct at ~line 804 holds function pointers for sequence ops +- Slots use 0 instead of NULL - same thing in C, just style preference (shorter in struct initializers) +- `tuplelength`: just calls `Py_SIZE(a)` to return ob_size +- `tupleitem`: **must INCREF before returning** - caller gets a "new reference" they must DECREF + - If you forget INCREF: use-after-free bug when something else DECREFs the object +- `sq_ass_item` is 0 (NULL) because tuple is immutable - no assignment allowed +- **tuple_hash**: uses xxHash algorithm (XXPRIME constants), combines hashes of all elements + - If any element is unhashable, returns -1 and propagates error + - `-1` is reserved for error - actual -1 hash gets changed to -2 (a CPython wart) +- **tuple_richcompare**: `op` parameter is Py_EQ, Py_NE, Py_LT, etc. + - Compares element by element until difference found + - For Record: compare both names AND values; different field names = not equal ### 2.2 Dictionaries (For Field Name Lookup) - [ ] Read `Include/cpython/dictobject.h` - understand PyDictObject basics @@ -55,57 +101,143 @@ r == r2 # comparable - [ ] Study `PyDict_GetItem` - how to look up a key ### 2.3 Designing Record's Memory Layout -- [ ] Sketch the RecordObject struct: +- [x] Sketch the RecordObject struct: - What fields do we need? (field names, values, cached hash?) - Should we store field names per-instance or share them? -- [ ] Question: What's the tradeoff between storing names as tuple vs dict? +- [x] Question: What's the tradeoff between storing names as tuple vs dict? + +**Notes from session:** +- Designed struct: + ```c + typedef struct { + PyObject_VAR_HEAD // refcnt, type, ob_size (field count) + PyObject *names; // tuple of field name strings + PyObject *values[1]; // flexible array of values + } RecordObject; + ``` +- **Design decision:** Store names as separate tuple (not inline per-value) + - Pros: Memory efficient when many records share same field names + - Cons: Extra pointer per record, slightly more complex + - Even with inline names, Python interns short strings, so same pointer - but still N extra pointers vs 1 +- **Dealloc must DECREF:** + - `names` (the tuple) + - Each `values[i]` (the stored objects) + - NOT the VAR_HEAD fields (ob_refcnt is ours, ob_type is static, ob_size is just int) --- ## Phase 3: Type Slots Deep Dive ### 3.1 Essential Slots for Record -- [ ] `tp_dealloc` - Study tuple's dealloc. What must we DECREF? -- [ ] `tp_repr` - Study tuple's repr. How do we build the output string? -- [ ] `tp_hash` - Study tuple's hash. What makes a good hash for immutable containers? -- [ ] `tp_richcompare` - Study tuple's compare. Handle Py_EQ at minimum +- [x] `tp_dealloc` - Study tuple's dealloc. What must we DECREF? +- [x] `tp_repr` - Study tuple's repr. How do we build the output string? +- [x] `tp_hash` - Study tuple's hash. What makes a good hash for immutable containers? +- [x] `tp_richcompare` - Study tuple's compare. Handle Py_EQ at minimum + +**Notes from session:** +- **tp_repr**: Uses `PyUnicodeWriter` to build strings piece by piece + - `PyUnicodeWriter_WriteUTF8()` for literal strings, `PyUnicodeWriter_WriteStr()` for PyObjects + - Call `PyObject_Repr()` on each value (returns new ref - must DECREF!) + - `Py_ReprEnter`/`Py_ReprLeave` for cycle detection (prevents infinite recursion on self-referential structures) +- **tp_hash**: xxHash algorithm, combine element hashes + - Return -1 for error (propagate from unhashable elements) + - If actual hash is -1, change to -2 (CPython convention) +- **tp_richcompare**: For Record, compare names tuple AND values; return Py_NotImplemented for ordering ops (< > <= >=) ### 3.2 Sequence Protocol (for indexing) -- [ ] Find `PySequenceMethods` in headers -- [ ] Study `sq_length` - returns `Py_ssize_t` -- [ ] Study `sq_item` - takes index, returns item (with INCREF!) -- [ ] **For Record:** Implement these so `r[0]` and `len(r)` work +- [x] Find `PySequenceMethods` in headers +- [x] Study `sq_length` - returns `Py_ssize_t` +- [x] Study `sq_item` - takes index, returns item (with INCREF!) +- [~] **For Record:** Implement these so `r[0]` and `len(r)` work *(design understood, implementation pending)* + +**Notes from session:** +- `PySequenceMethods` contains: `sq_length`, `sq_concat`, `sq_repeat`, `sq_item`, `sq_ass_item`, `sq_contains`, etc. +- For Record, we need: + ```c + static PySequenceMethods record_as_sequence = { + (lenfunc)record_length, /* sq_length */ + 0, /* sq_concat */ + 0, /* sq_repeat */ + (ssizeargfunc)record_item, /* sq_item */ + // rest NULL - Record is immutable + }; + ``` +- Pattern for `record_item`: bounds check, INCREF, return ### 3.3 Attribute Access (for field names) -- [ ] Study `tp_getattro` - how does attribute lookup work? +- [x] Study `tp_getattro` - how does attribute lookup work? - [ ] Look at how namedtuple does it (it's in Python, but concept applies) -- [ ] **For Record:** Map `r.fieldname` to the correct value +- [x] **For Record:** Map `r.fieldname` to the correct value + +**Notes from session:** +- `tp_getattr` (old, char*) vs `tp_getattro` (new, PyObject*) - use getattro +- Implementation: loop through `self->names`, compare with `name` using `PyUnicode_Compare` +- If found, INCREF and return `self->values[i]` +- If not found, fall back to `PyObject_GenericGetAttr()` for `__class__`, `__doc__`, etc. +- O(n) search is fine for small field counts; could use dict for optimization but not needed ### 3.4 Constructor -- [ ] Study `tp_new` vs `tp_init` - what's the difference? -- [ ] For immutable types, which one do we need? -- [ ] **For Record:** Design the C function signature for creating records +- [x] Study `tp_new` vs `tp_init` - what's the difference? +- [x] For immutable types, which one do we need? +- [x] **For Record:** Design the C function signature for creating records + +**Notes from session:** +- `tp_new`: allocates AND initializes, returns new object +- `tp_init`: mutates already-created object +- For immutable types, use `tp_new` only (no mutation after creation) +- **Two constructors needed:** + - `PyRecord_New(names, values, n)` - C API for BUILD_RECORD opcode, steals references + - `record_new(type, args, kwargs)` - Python API (tp_new slot) for `Record(x=1, y=2)`, parses kwargs +- C API: `PyTuple_New` allocates, `PyTuple_Pack` is convenience wrapper +- For Record, only need `PyRecord_New` (one caller), `record_new` wraps it for Python --- ## Phase 4: The Evaluation Loop ### 4.1 ceval.c Overview -- [ ] Open `Python/ceval.c` - find `_PyEval_EvalFrameDefault` (around line 1577) -- [ ] Understand the main dispatch loop structure -- [ ] Find the stack macros: `PUSH()`, `POP()`, `TOP()`, `PEEK()` (around line 1391) +- [x] Open `Python/ceval.c` - find `_PyEval_EvalFrameDefault` (around line 1577) +- [x] Understand the main dispatch loop structure +- [x] Find the stack macros: `PUSH()`, `POP()`, `TOP()`, `PEEK()` (around line 1391) ### 4.2 Study Similar Opcodes -- [ ] Find `BUILD_TUPLE` implementation - how does it pop N items and push a tuple? -- [ ] Find `BUILD_MAP` implementation - how does it handle key-value pairs? -- [ ] Question: What error handling pattern do these opcodes use? +- [x] Find `BUILD_TUPLE` implementation - how does it pop N items and push a tuple? +- [x] Find `BUILD_MAP` implementation - how does it handle key-value pairs? +- [x] Question: What error handling pattern do these opcodes use? + +**Notes from session:** +- Main loop is giant switch on opcodes in `_PyEval_EvalFrameDefault` +- `oparg` = argument encoded in bytecode (e.g., field count) +- Stack macros: `POP()`, `PUSH()`, `PEEK(n)`, `STACK_SHRINK(n)` +- **BUILD_TUPLE**: POPs n items, uses `PyTuple_SET_ITEM` which **steals** references (no DECREF needed) +- **BUILD_MAP**: Uses PEEK to read items, `PyDict_SetItem` which INCREFs, then POPs and DECREFs separately +- Error pattern: `if (result == NULL) goto error;` +- `DISPATCH()` jumps to next opcode ### 4.3 Design BUILD_RECORD -- [ ] Decide on stack layout: `BUILD_RECORD n` where n is field count -- [ ] Stack before: `[..., name1, val1, name2, val2, ...]` (or different order?) -- [ ] Stack after: `[..., record]` +- [x] Decide on stack layout: `BUILD_RECORD n` where n is field count +- [x] Stack before: `[..., name1, val1, name2, val2, ...]` (or different order?) +- [x] Stack after: `[..., record]` - [ ] What validation do we need? (names must be strings, no duplicates?) +**Notes from session:** +- **Stack layout decision:** `[..., names_tuple, val0, val1, val2]` (Option B) + - names_tuple is pre-built (by BUILD_TUPLE or LOAD_CONST) + - Simpler opcode, names can be shared across records +- `PyRecord_New` will steal references (like BUILD_TUPLE, not BUILD_MAP) +- Implementation sketch: + ```c + case TARGET(BUILD_RECORD): { + PyObject *names = PEEK(oparg + 1); + PyObject **values = &PEEK(oparg); + PyObject *rec = PyRecord_New(names, values, oparg); // steals refs + if (rec == NULL) goto error; + STACK_SHRINK(oparg + 1); + PUSH(rec); + DISPATCH(); + } + ``` + --- ## Phase 5: Implementation From 0c36250d5221c0ddd28697d4c87b899496caceda Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Wed, 24 Dec 2025 15:59:54 +1000 Subject: [PATCH 04/11] boilerplate --- Include/Python.h | 1 + Include/cpython/recordobject.h | 12 ++++ Include/recordobject.h | 22 ++++++ Makefile.pre.in | 3 + Objects/recordobject.c | 124 +++++++++++++++++++++++++++++++++ Python/bltinmodule.c | 1 + 6 files changed, 163 insertions(+) create mode 100644 Include/cpython/recordobject.h create mode 100644 Include/recordobject.h create mode 100644 Objects/recordobject.c diff --git a/Include/Python.h b/Include/Python.h index d3186c32e35500..5f058a276e445c 100644 --- a/Include/Python.h +++ b/Include/Python.h @@ -89,6 +89,7 @@ #include "rangeobject.h" #include "memoryobject.h" #include "tupleobject.h" +#include "recordobject.h" #include "listobject.h" #include "dictobject.h" #include "cpython/odictobject.h" diff --git a/Include/cpython/recordobject.h b/Include/cpython/recordobject.h new file mode 100644 index 00000000000000..d3fede4e5f1a1f --- /dev/null +++ b/Include/cpython/recordobject.h @@ -0,0 +1,12 @@ +#ifndef Py_CPYTHON_RECORDOBJECT_H +# error "this header file must not be included directly" +#endif + +typedef struct { + PyObject_VAR_HEAD + PyObject *names; + /* ob_item contains space for 'ob_size' elements. + Items must normally not be NULL, except during construction when + the record is not yet visible outside the function that builds it. */ + PyObject *ob_item[1]; +} PyRecordObject; diff --git a/Include/recordobject.h b/Include/recordobject.h new file mode 100644 index 00000000000000..72f9dbab75f73e --- /dev/null +++ b/Include/recordobject.h @@ -0,0 +1,22 @@ +/* Record object interface */ + +#ifndef Py_RECORDOBJECT_H +#define Py_RECORDOBJECT_H +#ifdef __cplusplus +extern "C" { +#endif + +PyAPI_DATA(PyTypeObject) PyRecord_Type; +PyAPI_FUNC(PyObject *) PyRecord_New(Py_ssize_t size); +#define PyRecord_Check(op) Py_IS_TYPE(op, &PyRecord_Type) + +#ifndef Py_LIMITED_API +# define Py_CPYTHON_RECORDOBJECT_H +# include "cpython/recordobject.h" +# undef Py_CPYTHON_RECORDOBJECT_H +#endif + +#ifdef __cplusplus +} +#endif +#endif /* !Py_RECORDOBJECT_H */ diff --git a/Makefile.pre.in b/Makefile.pre.in index fa99dd86c416ed..c18b5f647f3f08 100644 --- a/Makefile.pre.in +++ b/Makefile.pre.in @@ -438,6 +438,7 @@ OBJECT_OBJS= \ Objects/sliceobject.o \ Objects/structseq.o \ Objects/tupleobject.o \ + Objects/recordobject.o \ Objects/typeobject.o \ Objects/unicodeobject.o \ Objects/unicodectype.o \ @@ -1100,6 +1101,7 @@ PYTHON_HEADERS= \ $(srcdir)/Include/traceback.h \ $(srcdir)/Include/tracemalloc.h \ $(srcdir)/Include/tupleobject.h \ + $(srcdir)/Include/recordobject.h \ $(srcdir)/Include/unicodeobject.h \ $(srcdir)/Include/warnings.h \ $(srcdir)/Include/weakrefobject.h \ @@ -1138,6 +1140,7 @@ PYTHON_HEADERS= \ $(srcdir)/Include/cpython/sysmodule.h \ $(srcdir)/Include/cpython/traceback.h \ $(srcdir)/Include/cpython/tupleobject.h \ + $(srcdir)/Include/cpython/recordobject.h \ $(srcdir)/Include/cpython/unicodeobject.h \ \ $(srcdir)/Include/internal/pycore_abstract.h \ diff --git a/Objects/recordobject.c b/Objects/recordobject.c new file mode 100644 index 00000000000000..c1ff1ba62fd820 --- /dev/null +++ b/Objects/recordobject.c @@ -0,0 +1,124 @@ + +/* Record object implementation */ + +#include "Python.h" +#include "pycore_abstract.h" // _PyIndex_Check() +#include "pycore_gc.h" // _PyObject_GC_IS_TRACKED() +#include "pycore_initconfig.h" // _PyStatus_OK() +#include "pycore_object.h" // _PyObject_GC_TRACK() +#include "recordobject.h" + +PyAPI_FUNC(PyObject *) PyRecord_New(Py_ssize_t size) +{ + return NULL; +} + +static void +record_dealloc(PyRecordObject *op) +{ + // TODO +} + +static PyObject * +record_repr(PyRecordObject *v) +{ + // TODO + return NULL; +} + +static Py_ssize_t +record_length(PyRecordObject *a) +{ + return Py_SIZE(a); +} + +static PyObject * +recorditem(PyRecordObject *a, Py_ssize_t i) +{ + if (i < 0 || i >= Py_SIZE(a)) { + PyErr_SetString(PyExc_IndexError, "record index out of range"); + return NULL; + } + Py_INCREF(a->ob_item[i]); + return a->ob_item[i]; +} + + +static PySequenceMethods record_as_sequence = { + (lenfunc)record_length, /* sq_length */ + 0, /* sq_concat */ + 0, /* sq_repeat */ + (ssizeargfunc)recorditem, /* sq_item */ + 0, /* sq_slice */ + 0, /* sq_ass_item */ + 0, /* sq_ass_slice */ + 0, /* sq_contains */ +}; + +static Py_hash_t +record_hash(PyRecordObject *v) +{ + return -1; +} + +PyObject * +record_getattro(PyObject *obj, PyObject *name) +{ + // TODO + return NULL; +} + +static PyObject * +record_rich_compare(PyObject *v, PyObject *w, int op) +{ + return NULL; +} + +static PyObject * +record_new(PyTypeObject *type, PyObject *args, PyObject *kwargs) +{ + return NULL; +} + +PyTypeObject PyRecord_Type = { + PyVarObject_HEAD_INIT(&PyType_Type, 0) + "record", + sizeof(PyRecordObject) - sizeof(PyObject *), + sizeof(PyObject *), + (destructor)record_dealloc, /* tp_dealloc */ + 0, /* tp_vectorcall_offset */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_as_async */ + (reprfunc)record_repr, /* tp_repr */ + 0, /* tp_as_number */ + &record_as_sequence, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + (hashfunc)record_hash, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + record_getattro, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + 0, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + record_rich_compare, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + record_new, /* tp_new */ + PyObject_GC_Del, /* tp_free */ + 0, +}; \ No newline at end of file diff --git a/Python/bltinmodule.c b/Python/bltinmodule.c index 6ea20bfc5f7c94..5899a5e6a506f4 100644 --- a/Python/bltinmodule.c +++ b/Python/bltinmodule.c @@ -3036,6 +3036,7 @@ _PyBuiltin_Init(PyInterpreterState *interp) SETBUILTIN("str", &PyUnicode_Type); SETBUILTIN("super", &PySuper_Type); SETBUILTIN("tuple", &PyTuple_Type); + SETBUILTIN("record", &PyRecord_Type); SETBUILTIN("type", &PyType_Type); SETBUILTIN("zip", &PyZip_Type); debug = PyBool_FromLong(config->optimization_level == 0); From 548afb716c03eee0676ebd9d547d1c19dc233641 Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Wed, 24 Dec 2025 16:11:46 +1000 Subject: [PATCH 05/11] first impl --- Objects/recordobject.c | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/Objects/recordobject.c b/Objects/recordobject.c index c1ff1ba62fd820..ca5c96a245eea4 100644 --- a/Objects/recordobject.c +++ b/Objects/recordobject.c @@ -8,9 +8,47 @@ #include "pycore_object.h" // _PyObject_GC_TRACK() #include "recordobject.h" +static inline void +record_gc_track(PyRecordObject *op) +{ + _PyObject_GC_TRACK(op); +} + +static PyRecordObject * +record_alloc(Py_ssize_t size) +{ + PyRecordObject *op; + if (size < 0) { + PyErr_BadInternalCall(); + return NULL; + } + + { + /* Check for overflow */ + if ((size_t)size > ((size_t)PY_SSIZE_T_MAX - (sizeof(PyRecordObject) - + sizeof(PyObject *))) / sizeof(PyObject *)) { + return (PyRecordObject *)PyErr_NoMemory(); + } + op = PyObject_GC_NewVar(PyRecordObject, &PyRecord_Type, size); + if (op == NULL) + return NULL; + } + return op; +} + PyAPI_FUNC(PyObject *) PyRecord_New(Py_ssize_t size) { - return NULL; + PyRecordObject *op; + op = record_alloc(size); + if (op == NULL) { + return NULL; + } + op->names = NULL; + for (Py_ssize_t i = 0; i < size; i++) { + op->ob_item[i] = NULL; + } + record_gc_track(op); + return (PyObject *) op; } static void From 4ac2d51c64163c48482050e2b72722e8a503bd0e Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Thu, 25 Dec 2025 09:59:20 +1000 Subject: [PATCH 06/11] progress --- Objects/recordobject.c | 84 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 82 insertions(+), 2 deletions(-) diff --git a/Objects/recordobject.c b/Objects/recordobject.c index ca5c96a245eea4..bcdcbd82844932 100644 --- a/Objects/recordobject.c +++ b/Objects/recordobject.c @@ -54,13 +54,93 @@ PyAPI_FUNC(PyObject *) PyRecord_New(Py_ssize_t size) static void record_dealloc(PyRecordObject *op) { - // TODO + Py_ssize_t len = Py_SIZE(op); + PyObject_GC_UnTrack(op); + Py_TRASHCAN_BEGIN(op, record_dealloc) + Py_XDECREF(op->names); + if (len > 0) { + Py_ssize_t i = len; + while (--i >= 0) { + Py_XDECREF(op->ob_item[i]); + } + } + Py_TYPE(op)->tp_free((PyObject *)op); + Py_TRASHCAN_END } static PyObject * record_repr(PyRecordObject *v) { - // TODO + Py_ssize_t i, n; + _PyUnicodeWriter writer; + + n = Py_SIZE(v); + if (n == 0) + return PyUnicode_FromString("record()"); + + /* While not mutable, it is still possible to end up with a cycle in a + record through an object that stores itself within a record (and thus + infinitely asks for the repr of itself). This should only be + possible within a type. */ + i = Py_ReprEnter((PyObject *)v); + if (i != 0) { + return i > 0 ? PyUnicode_FromString("record(...)") : NULL; + } + + _PyUnicodeWriter_Init(&writer); + writer.overallocate = 1; + if (Py_SIZE(v) > 1) { + /* "(" + "1" + ", 2" * (len - 1) + ")" */ + writer.min_length = 1 + 1 + (2 + 1) * (Py_SIZE(v) - 1) + 1; + } + else { + /* "(1,)" */ + writer.min_length = 4; + } + + if (_PyUnicodeWriter_WriteASCIIString(&writer, "record(", 7) < 0) + goto error; + + /* Do repr() on each element. */ + for (i = 0; i < n; ++i) { + PyObject *s; + PyObject *name; + + if (i > 0) { + if (_PyUnicodeWriter_WriteASCIIString(&writer, ", ", 2) < 0) + goto error; + } + + name = PyTuple_GET_ITEM(v->names, i); + s = PyObject_Repr(v->ob_item[i]); + if (s == NULL) + goto error; + + if (_PyUnicodeWriter_WriteStr(&writer, name) < 0) { + Py_DECREF(s); + goto error; + } + if (_PyUnicodeWriter_WriteChar(&writer, '=') < 0) { + Py_DECREF(s); + goto error; + } + if (_PyUnicodeWriter_WriteStr(&writer, s) < 0) { + Py_DECREF(s); + goto error; + } + Py_DECREF(s); + } + + writer.overallocate = 0; + if (_PyUnicodeWriter_WriteChar(&writer, ')') < 0) + goto error; + + Py_ReprLeave((PyObject *)v); + return _PyUnicodeWriter_Finish(&writer); + +error: + _PyUnicodeWriter_Dealloc(&writer); + Py_ReprLeave((PyObject *)v); return NULL; } From b1e0ab421c8077cb0abf12b76668b9600cdf3fd9 Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Thu, 25 Dec 2025 10:32:31 +1000 Subject: [PATCH 07/11] constructor --- Objects/recordobject.c | 42 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/Objects/recordobject.c b/Objects/recordobject.c index bcdcbd82844932..6f76e477500fe5 100644 --- a/Objects/recordobject.c +++ b/Objects/recordobject.c @@ -29,7 +29,7 @@ record_alloc(Py_ssize_t size) sizeof(PyObject *))) / sizeof(PyObject *)) { return (PyRecordObject *)PyErr_NoMemory(); } - op = PyObject_GC_NewVar(PyRecordObject, &PyRecord_Type, size); + op = PyObject_NewVar(PyRecordObject, &PyRecord_Type, size); if (op == NULL) return NULL; } @@ -47,7 +47,6 @@ PyAPI_FUNC(PyObject *) PyRecord_New(Py_ssize_t size) for (Py_ssize_t i = 0; i < size; i++) { op->ob_item[i] = NULL; } - record_gc_track(op); return (PyObject *) op; } @@ -55,7 +54,6 @@ static void record_dealloc(PyRecordObject *op) { Py_ssize_t len = Py_SIZE(op); - PyObject_GC_UnTrack(op); Py_TRASHCAN_BEGIN(op, record_dealloc) Py_XDECREF(op->names); if (len > 0) { @@ -176,6 +174,7 @@ static PySequenceMethods record_as_sequence = { static Py_hash_t record_hash(PyRecordObject *v) { + // TODO return -1; } @@ -189,13 +188,46 @@ record_getattro(PyObject *obj, PyObject *name) static PyObject * record_rich_compare(PyObject *v, PyObject *w, int op) { + // TODO return NULL; } static PyObject * record_new(PyTypeObject *type, PyObject *args, PyObject *kwargs) { - return NULL; + PyObject *return_value = NULL; + PyObject *keys_list, *values_list; + Py_ssize_t n, i; + + if (PyTuple_GET_SIZE(args) != 0) { + PyErr_SetString(PyExc_TypeError, "record() takes no positional arguments"); + return NULL; + } + + if (kwargs == NULL || PyDict_GET_SIZE(kwargs) == 0) { + PyRecordObject *rec = (PyRecordObject*) PyRecord_New(0); + if (rec == NULL) return NULL; + rec->names = PyTuple_New(0); + return (PyObject*)rec; + } + + n = PyDict_GET_SIZE(kwargs); + PyRecordObject *rec = (PyRecordObject*) PyRecord_New(n); + if (rec == NULL) return NULL; + + keys_list = PyDict_Keys(kwargs); + rec->names = PyList_AsTuple(keys_list); + Py_DECREF(keys_list); + + values_list = PyDict_Values(kwargs); + for (i = 0; i < n; i++) { + PyObject *val = PyList_GET_ITEM(values_list, i); + Py_INCREF(val); + rec->ob_item[i] = val; + } + Py_DECREF(values_list); + + return (PyObject*) rec; } PyTypeObject PyRecord_Type = { @@ -237,6 +269,6 @@ PyTypeObject PyRecord_Type = { 0, /* tp_init */ 0, /* tp_alloc */ record_new, /* tp_new */ - PyObject_GC_Del, /* tp_free */ + PyObject_Free, /* tp_free */ 0, }; \ No newline at end of file From 92d359829dcad26306ab6cabc1768ccba787f5b9 Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Thu, 25 Dec 2025 10:50:22 +1000 Subject: [PATCH 08/11] progress --- Objects/recordobject.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/Objects/recordobject.c b/Objects/recordobject.c index 6f76e477500fe5..be31d4f69b5668 100644 --- a/Objects/recordobject.c +++ b/Objects/recordobject.c @@ -181,8 +181,32 @@ record_hash(PyRecordObject *v) PyObject * record_getattro(PyObject *obj, PyObject *name) { - // TODO - return NULL; + if (name == NULL || PyUnicode_GetLength(name) == 0) { + PyErr_SetString(PyExc_TypeError, "record name cannot be zero length"); + return NULL; + } + + PyRecordObject* tmp = (PyRecordObject*) obj; + PyObject *ret, *val; + Py_ssize_t n, i; + + n = PyTuple_GET_SIZE(tmp->names); + i = 0; + for (i = 0; i < n; i++) { + val = PyTuple_GET_ITEM(tmp->names, i); + if (val == NULL) { + PyErr_SetString(PyExc_TypeError, "invalid name inside of record object"); + return NULL; + } + + if (PyUnicode_Compare(name, val) == 0) { + ret = tmp->ob_item[i]; + Py_INCREF(ret); + return ret; + } + } + + return PyObject_GenericGetAttr(obj, name); } static PyObject * From 92afde286d5364206202251328168664ce223eff Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Thu, 25 Dec 2025 12:02:25 +1000 Subject: [PATCH 09/11] class is done --- CLAUDE.md | 8 ++++ Objects/recordobject.c | 101 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 105 insertions(+), 4 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 80615331d5e1b5..43179c0d19cc81 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -15,6 +15,14 @@ This repository is being used as a learning environment for CPython internals. T - Reference `teaching-notes.md` for detailed research (student should not read this) - Encourage use of `dis` module, GDB, and debug builds for exploration +**IMPORTANT - Don't write code for the student:** +- Give hints, not implementations +- Point to similar code in tupleobject.c as reference +- Explain what needs to happen, let them write it +- If they're stuck on API: name the function/macro, don't write the call +- Only write code if explicitly asked ("write this for me") +- 2-3 line snippets for syntax are OK; 10+ line functions are NOT + **The learning project:** Implementing a `Record` type and `BUILD_RECORD` opcode (~300 LoC). This comprehensive project covers: - PyObject/PyVarObject fundamentals (custom struct, refcounting) - Type slots (tp_repr, tp_hash, tp_dealloc, tp_getattro, sq_length, sq_item) diff --git a/Objects/recordobject.c b/Objects/recordobject.c index be31d4f69b5668..6ec0279273ddd4 100644 --- a/Objects/recordobject.c +++ b/Objects/recordobject.c @@ -171,11 +171,50 @@ static PySequenceMethods record_as_sequence = { 0, /* sq_contains */ }; +#if SIZEOF_PY_UHASH_T > 4 +#define _PyHASH_XXPRIME_1 ((Py_uhash_t)11400714785074694791ULL) +#define _PyHASH_XXPRIME_2 ((Py_uhash_t)14029467366897019727ULL) +#define _PyHASH_XXPRIME_5 ((Py_uhash_t)2870177450012600261ULL) +#define _PyHASH_XXROTATE(x) ((x << 31) | (x >> 33)) /* Rotate left 31 bits */ +#else +#define _PyHASH_XXPRIME_1 ((Py_uhash_t)2654435761UL) +#define _PyHASH_XXPRIME_2 ((Py_uhash_t)2246822519UL) +#define _PyHASH_XXPRIME_5 ((Py_uhash_t)374761393UL) +#define _PyHASH_XXROTATE(x) ((x << 13) | (x >> 19)) /* Rotate left 13 bits */ +#endif + static Py_hash_t record_hash(PyRecordObject *v) { - // TODO - return -1; + Py_ssize_t i, len = Py_SIZE(v); + PyObject **item = v->ob_item; + PyObject *names = v->names; + + Py_uhash_t acc = _PyHASH_XXPRIME_5; + for (i = 0; i < len; i++) { + Py_uhash_t lane = PyObject_Hash(item[i]); + if (lane == (Py_uhash_t)-1) { + return -1; + } + acc += lane * _PyHASH_XXPRIME_2; + acc = _PyHASH_XXROTATE(acc); + acc *= _PyHASH_XXPRIME_1; + } + Py_uhash_t lane = PyObject_Hash(names); + if (lane == (Py_uhash_t)-1) { + return -1; + } + acc += lane * _PyHASH_XXPRIME_2; + acc = _PyHASH_XXROTATE(acc); + acc *= _PyHASH_XXPRIME_1; + + /* Add input length, mangled to keep the historical value of hash(()). */ + acc += len ^ (_PyHASH_XXPRIME_5 ^ 3527539UL); + + if (acc == (Py_uhash_t)-1) { + return 1546275796; + } + return acc; } PyObject * @@ -212,8 +251,62 @@ record_getattro(PyObject *obj, PyObject *name) static PyObject * record_rich_compare(PyObject *v, PyObject *w, int op) { - // TODO - return NULL; + PyRecordObject *vr, *wr; + Py_ssize_t i; + Py_ssize_t vlen, wlen; + + if (!PyRecord_Check(v) || !PyRecord_Check(w)) + Py_RETURN_NOTIMPLEMENTED; + + vr = (PyRecordObject *)v; + wr = (PyRecordObject *)w; + + vlen = Py_SIZE(vr); + wlen = Py_SIZE(wr); + + /* Note: the corresponding code for lists has an "early out" test + * here when op is EQ or NE and the lengths differ. That pays there, + * but Tim was unable to find any real code where EQ/NE tuple + * compares don't have the same length, so testing for it here would + * have cost without benefit. + */ + + /* Search for the first index where items are different. + * Note that because tuples are immutable, it's safe to reuse + * vlen and wlen across the comparison calls. + */ + int k = PyObject_RichCompareBool(vr->names, wr->names, Py_EQ); + if (k < 0) + return NULL; + if (!k) { + if (op == Py_EQ) Py_RETURN_FALSE; + if (op == Py_NE) Py_RETURN_TRUE; + Py_RETURN_NOTIMPLEMENTED; + } + for (i = 0; i < vlen && i < wlen; i++) { + int k = PyObject_RichCompareBool(vr->ob_item[i], + wr->ob_item[i], Py_EQ); + if (k < 0) + return NULL; + if (!k) + break; + } + + if (i >= vlen || i >= wlen) { + /* No more items to compare -- compare sizes */ + Py_RETURN_RICHCOMPARE(vlen, wlen, op); + } + + /* We have an item that differs -- shortcuts for EQ/NE */ + if (op == Py_EQ) { + Py_RETURN_FALSE; + } + if (op == Py_NE) { + Py_RETURN_TRUE; + } + + /* Compare the final item again using the proper operator */ + Py_RETURN_NOTIMPLEMENTED; } static PyObject * From fda320202112c2112f5cc868910f059a0f76e7c7 Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Thu, 25 Dec 2025 13:46:28 +1000 Subject: [PATCH 10/11] linking all the compiler stuff together. quite a lot of shotgun surgery --- Grammar/python.gram | 9 +- Include/internal/pycore_ast.h | 23 +- Include/internal/pycore_ast_state.h | 1 + Include/opcode.h | 1 + Lib/opcode.py | 1 + Parser/Python.asdl | 1 + Parser/parser.c | 694 ++++++++++++++++------------ Parser/pegen.c | 36 ++ Parser/pegen.h | 4 + Python/Python-ast.c | 135 ++++++ Python/ast.c | 4 + Python/ast_opt.c | 4 + Python/ast_unparse.c | 24 + Python/ceval.c | 18 + Python/compile.c | 35 ++ Python/opcode_targets.h | 2 +- Python/symtable.c | 4 + 17 files changed, 703 insertions(+), 293 deletions(-) diff --git a/Grammar/python.gram b/Grammar/python.gram index de582d2b82ae04..4ed8a5a87c5137 100644 --- a/Grammar/python.gram +++ b/Grammar/python.gram @@ -696,7 +696,7 @@ atom[expr_ty]: | NUMBER | &'(' (tuple | group | genexp) | &'[' (list | listcomp) - | &'{' (dict | set | dictcomp | setcomp) + | &'{' (dict | set | dictcomp | setcomp | record) | '...' { _PyAST_Constant(Py_Ellipsis, NULL, EXTRA) } strings[expr_ty] (memo): a=STRING+ { _PyPegen_concatenate_strings(p, a) } @@ -725,6 +725,13 @@ dict[expr_ty]: CHECK(asdl_expr_seq*, _PyPegen_get_values(p, a)), EXTRA) } | '{' invalid_double_starred_kvpairs '}' +record[expr_ty]: + | '{' '|' a=[kwargs] '|' '}' { + _PyAST_Record( + CHECK(asdl_expr_seq*, _PyPegen_get_record_keys(p, a)), + CHECK(asdl_expr_seq*, _PyPegen_get_record_values(p, a)), + EXTRA) } + | '{' '|' invalid_kwarg '|' '}' dictcomp[expr_ty]: | '{' a=kvpair b=for_if_clauses '}' { _PyAST_DictComp(a->key, a->value, b, EXTRA) } diff --git a/Include/internal/pycore_ast.h b/Include/internal/pycore_ast.h index ebb6a90087bb52..3086a6c8868ac1 100644 --- a/Include/internal/pycore_ast.h +++ b/Include/internal/pycore_ast.h @@ -330,13 +330,14 @@ struct _stmt { }; enum _expr_kind {BoolOp_kind=1, NamedExpr_kind=2, BinOp_kind=3, UnaryOp_kind=4, - Lambda_kind=5, IfExp_kind=6, Dict_kind=7, Set_kind=8, - ListComp_kind=9, SetComp_kind=10, DictComp_kind=11, - GeneratorExp_kind=12, Await_kind=13, Yield_kind=14, - YieldFrom_kind=15, Compare_kind=16, Call_kind=17, - FormattedValue_kind=18, JoinedStr_kind=19, Constant_kind=20, - Attribute_kind=21, Subscript_kind=22, Starred_kind=23, - Name_kind=24, List_kind=25, Tuple_kind=26, Slice_kind=27}; + Lambda_kind=5, IfExp_kind=6, Dict_kind=7, Record_kind=8, + Set_kind=9, ListComp_kind=10, SetComp_kind=11, + DictComp_kind=12, GeneratorExp_kind=13, Await_kind=14, + Yield_kind=15, YieldFrom_kind=16, Compare_kind=17, + Call_kind=18, FormattedValue_kind=19, JoinedStr_kind=20, + Constant_kind=21, Attribute_kind=22, Subscript_kind=23, + Starred_kind=24, Name_kind=25, List_kind=26, Tuple_kind=27, + Slice_kind=28}; struct _expr { enum _expr_kind kind; union { @@ -377,6 +378,11 @@ struct _expr { asdl_expr_seq *values; } Dict; + struct { + asdl_expr_seq *keys; + asdl_expr_seq *values; + } Record; + struct { asdl_expr_seq *elts; } Set; @@ -729,6 +735,9 @@ expr_ty _PyAST_IfExp(expr_ty test, expr_ty body, expr_ty orelse, int lineno, expr_ty _PyAST_Dict(asdl_expr_seq * keys, asdl_expr_seq * values, int lineno, int col_offset, int end_lineno, int end_col_offset, PyArena *arena); +expr_ty _PyAST_Record(asdl_expr_seq * keys, asdl_expr_seq * values, int lineno, + int col_offset, int end_lineno, int end_col_offset, + PyArena *arena); expr_ty _PyAST_Set(asdl_expr_seq * elts, int lineno, int col_offset, int end_lineno, int end_col_offset, PyArena *arena); expr_ty _PyAST_ListComp(expr_ty elt, asdl_comprehension_seq * generators, int diff --git a/Include/internal/pycore_ast_state.h b/Include/internal/pycore_ast_state.h index 882cd09c00628d..ecba08463e7f6e 100644 --- a/Include/internal/pycore_ast_state.h +++ b/Include/internal/pycore_ast_state.h @@ -122,6 +122,7 @@ struct ast_state { PyObject *RShift_singleton; PyObject *RShift_type; PyObject *Raise_type; + PyObject *Record_type; PyObject *Return_type; PyObject *SetComp_type; PyObject *Set_type; diff --git a/Include/opcode.h b/Include/opcode.h index 52039754bd88ea..a9146bda8b63a8 100644 --- a/Include/opcode.h +++ b/Include/opcode.h @@ -135,6 +135,7 @@ extern "C" { #define SET_UPDATE 163 #define DICT_MERGE 164 #define DICT_UPDATE 165 +#define BUILD_RECORD 166 #ifdef NEED_OPCODE_JUMP_TABLES static uint32_t _PyOpcode_RelativeJump[8] = { 0U, diff --git a/Lib/opcode.py b/Lib/opcode.py index 37e88e92df70ec..7b471dc093a1fb 100644 --- a/Lib/opcode.py +++ b/Lib/opcode.py @@ -147,6 +147,7 @@ def jabs_op(name, op): def_op('BUILD_LIST', 103) # Number of list items def_op('BUILD_SET', 104) # Number of set items def_op('BUILD_MAP', 105) # Number of dict entries +def_op('BUILD_RECORD', 166) # Number of record fields name_op('LOAD_ATTR', 106) # Index in name list def_op('COMPARE_OP', 107) # Comparison operator hascompare.append(107) diff --git a/Parser/Python.asdl b/Parser/Python.asdl index 32fdc01a7e0e6e..669b64d3652102 100644 --- a/Parser/Python.asdl +++ b/Parser/Python.asdl @@ -61,6 +61,7 @@ module Python | Lambda(arguments args, expr body) | IfExp(expr test, expr body, expr orelse) | Dict(expr* keys, expr* values) + | Record(expr* keys, expr* values) | Set(expr* elts) | ListComp(expr elt, comprehension* generators) | SetComp(expr elt, comprehension* generators) diff --git a/Parser/parser.c b/Parser/parser.c index b1d8427761097d..9a6913f1d4d1cd 100644 --- a/Parser/parser.c +++ b/Parser/parser.c @@ -225,283 +225,284 @@ static char *soft_keywords[] = { #define set_type 1149 #define setcomp_type 1150 #define dict_type 1151 -#define dictcomp_type 1152 -#define double_starred_kvpairs_type 1153 -#define double_starred_kvpair_type 1154 -#define kvpair_type 1155 -#define for_if_clauses_type 1156 -#define for_if_clause_type 1157 -#define yield_expr_type 1158 -#define arguments_type 1159 -#define args_type 1160 -#define kwargs_type 1161 -#define starred_expression_type 1162 -#define kwarg_or_starred_type 1163 -#define kwarg_or_double_starred_type 1164 -#define star_targets_type 1165 -#define star_targets_list_seq_type 1166 -#define star_targets_tuple_seq_type 1167 -#define star_target_type 1168 -#define target_with_star_atom_type 1169 -#define star_atom_type 1170 -#define single_target_type 1171 -#define single_subscript_attribute_target_type 1172 -#define del_targets_type 1173 -#define del_target_type 1174 -#define del_t_atom_type 1175 -#define t_primary_type 1176 // Left-recursive -#define t_lookahead_type 1177 -#define invalid_arguments_type 1178 -#define invalid_kwarg_type 1179 -#define expression_without_invalid_type 1180 -#define invalid_legacy_expression_type 1181 -#define invalid_expression_type 1182 -#define invalid_named_expression_type 1183 -#define invalid_assignment_type 1184 -#define invalid_ann_assign_target_type 1185 -#define invalid_del_stmt_type 1186 -#define invalid_block_type 1187 -#define invalid_comprehension_type 1188 -#define invalid_dict_comprehension_type 1189 -#define invalid_parameters_type 1190 -#define invalid_parameters_helper_type 1191 -#define invalid_lambda_parameters_type 1192 -#define invalid_lambda_parameters_helper_type 1193 -#define invalid_star_etc_type 1194 -#define invalid_lambda_star_etc_type 1195 -#define invalid_double_type_comments_type 1196 -#define invalid_with_item_type 1197 -#define invalid_for_target_type 1198 -#define invalid_group_type 1199 -#define invalid_import_from_targets_type 1200 -#define invalid_with_stmt_type 1201 -#define invalid_with_stmt_indent_type 1202 -#define invalid_try_stmt_type 1203 -#define invalid_except_stmt_type 1204 -#define invalid_finally_stmt_type 1205 -#define invalid_except_stmt_indent_type 1206 -#define invalid_match_stmt_type 1207 -#define invalid_case_block_type 1208 -#define invalid_as_pattern_type 1209 -#define invalid_class_pattern_type 1210 -#define invalid_class_argument_pattern_type 1211 -#define invalid_if_stmt_type 1212 -#define invalid_elif_stmt_type 1213 -#define invalid_else_stmt_type 1214 -#define invalid_while_stmt_type 1215 -#define invalid_for_stmt_type 1216 -#define invalid_def_raw_type 1217 -#define invalid_class_def_raw_type 1218 -#define invalid_double_starred_kvpairs_type 1219 -#define invalid_kvpair_type 1220 -#define _loop0_1_type 1221 -#define _loop0_2_type 1222 -#define _loop0_4_type 1223 -#define _gather_3_type 1224 -#define _loop0_6_type 1225 -#define _gather_5_type 1226 -#define _loop0_8_type 1227 -#define _gather_7_type 1228 -#define _loop0_10_type 1229 -#define _gather_9_type 1230 -#define _loop1_11_type 1231 -#define _loop0_13_type 1232 -#define _gather_12_type 1233 -#define _tmp_14_type 1234 -#define _tmp_15_type 1235 -#define _tmp_16_type 1236 -#define _tmp_17_type 1237 -#define _tmp_18_type 1238 -#define _tmp_19_type 1239 -#define _tmp_20_type 1240 -#define _tmp_21_type 1241 -#define _loop1_22_type 1242 -#define _tmp_23_type 1243 -#define _tmp_24_type 1244 -#define _loop0_26_type 1245 -#define _gather_25_type 1246 -#define _loop0_28_type 1247 -#define _gather_27_type 1248 -#define _tmp_29_type 1249 -#define _tmp_30_type 1250 -#define _loop0_31_type 1251 -#define _loop1_32_type 1252 -#define _loop0_34_type 1253 -#define _gather_33_type 1254 -#define _tmp_35_type 1255 -#define _loop0_37_type 1256 -#define _gather_36_type 1257 -#define _tmp_38_type 1258 -#define _loop0_40_type 1259 -#define _gather_39_type 1260 -#define _loop0_42_type 1261 -#define _gather_41_type 1262 -#define _loop0_44_type 1263 -#define _gather_43_type 1264 -#define _loop0_46_type 1265 -#define _gather_45_type 1266 -#define _tmp_47_type 1267 -#define _loop1_48_type 1268 -#define _tmp_49_type 1269 -#define _loop1_50_type 1270 -#define _loop0_52_type 1271 -#define _gather_51_type 1272 -#define _tmp_53_type 1273 -#define _tmp_54_type 1274 -#define _tmp_55_type 1275 -#define _tmp_56_type 1276 -#define _loop0_58_type 1277 -#define _gather_57_type 1278 -#define _loop0_60_type 1279 -#define _gather_59_type 1280 -#define _tmp_61_type 1281 -#define _loop0_63_type 1282 -#define _gather_62_type 1283 -#define _loop0_65_type 1284 -#define _gather_64_type 1285 -#define _tmp_66_type 1286 -#define _tmp_67_type 1287 -#define _tmp_68_type 1288 -#define _tmp_69_type 1289 -#define _loop0_70_type 1290 -#define _loop0_71_type 1291 -#define _loop0_72_type 1292 -#define _loop1_73_type 1293 -#define _loop0_74_type 1294 -#define _loop1_75_type 1295 -#define _loop1_76_type 1296 -#define _loop1_77_type 1297 -#define _loop0_78_type 1298 -#define _loop1_79_type 1299 -#define _loop0_80_type 1300 -#define _loop1_81_type 1301 -#define _loop0_82_type 1302 -#define _loop1_83_type 1303 -#define _loop1_84_type 1304 -#define _tmp_85_type 1305 -#define _loop1_86_type 1306 -#define _loop0_88_type 1307 -#define _gather_87_type 1308 -#define _loop1_89_type 1309 -#define _loop0_90_type 1310 -#define _loop0_91_type 1311 -#define _loop0_92_type 1312 -#define _loop1_93_type 1313 -#define _loop0_94_type 1314 -#define _loop1_95_type 1315 -#define _loop1_96_type 1316 -#define _loop1_97_type 1317 -#define _loop0_98_type 1318 -#define _loop1_99_type 1319 -#define _loop0_100_type 1320 -#define _loop1_101_type 1321 -#define _loop0_102_type 1322 -#define _loop1_103_type 1323 -#define _loop1_104_type 1324 -#define _loop1_105_type 1325 -#define _loop1_106_type 1326 -#define _tmp_107_type 1327 -#define _loop0_109_type 1328 -#define _gather_108_type 1329 -#define _tmp_110_type 1330 -#define _tmp_111_type 1331 -#define _tmp_112_type 1332 -#define _tmp_113_type 1333 -#define _loop1_114_type 1334 -#define _tmp_115_type 1335 -#define _tmp_116_type 1336 -#define _tmp_117_type 1337 -#define _loop0_119_type 1338 -#define _gather_118_type 1339 -#define _loop1_120_type 1340 -#define _loop0_121_type 1341 -#define _loop0_122_type 1342 -#define _loop0_124_type 1343 -#define _gather_123_type 1344 -#define _tmp_125_type 1345 -#define _loop0_127_type 1346 -#define _gather_126_type 1347 -#define _loop0_129_type 1348 -#define _gather_128_type 1349 -#define _loop0_131_type 1350 -#define _gather_130_type 1351 -#define _loop0_133_type 1352 -#define _gather_132_type 1353 -#define _loop0_134_type 1354 -#define _loop0_136_type 1355 -#define _gather_135_type 1356 -#define _loop1_137_type 1357 -#define _tmp_138_type 1358 -#define _loop0_140_type 1359 -#define _gather_139_type 1360 -#define _tmp_141_type 1361 -#define _tmp_142_type 1362 -#define _tmp_143_type 1363 -#define _tmp_144_type 1364 -#define _tmp_145_type 1365 -#define _tmp_146_type 1366 -#define _tmp_147_type 1367 -#define _tmp_148_type 1368 -#define _loop0_149_type 1369 -#define _loop0_150_type 1370 -#define _loop0_151_type 1371 -#define _tmp_152_type 1372 -#define _tmp_153_type 1373 -#define _tmp_154_type 1374 -#define _tmp_155_type 1375 -#define _loop0_156_type 1376 -#define _loop1_157_type 1377 -#define _loop0_158_type 1378 -#define _loop1_159_type 1379 -#define _tmp_160_type 1380 -#define _tmp_161_type 1381 -#define _tmp_162_type 1382 -#define _loop0_164_type 1383 -#define _gather_163_type 1384 -#define _loop0_166_type 1385 -#define _gather_165_type 1386 -#define _loop0_168_type 1387 -#define _gather_167_type 1388 -#define _loop0_170_type 1389 -#define _gather_169_type 1390 -#define _tmp_171_type 1391 -#define _tmp_172_type 1392 -#define _tmp_173_type 1393 -#define _tmp_174_type 1394 -#define _tmp_175_type 1395 -#define _tmp_176_type 1396 -#define _tmp_177_type 1397 -#define _tmp_178_type 1398 -#define _loop0_180_type 1399 -#define _gather_179_type 1400 -#define _tmp_181_type 1401 -#define _tmp_182_type 1402 -#define _tmp_183_type 1403 -#define _tmp_184_type 1404 -#define _tmp_185_type 1405 -#define _tmp_186_type 1406 -#define _tmp_187_type 1407 -#define _tmp_188_type 1408 -#define _tmp_189_type 1409 -#define _tmp_190_type 1410 -#define _tmp_191_type 1411 -#define _tmp_192_type 1412 -#define _tmp_193_type 1413 -#define _tmp_194_type 1414 -#define _tmp_195_type 1415 -#define _tmp_196_type 1416 -#define _tmp_197_type 1417 -#define _tmp_198_type 1418 -#define _tmp_199_type 1419 -#define _tmp_200_type 1420 -#define _tmp_201_type 1421 -#define _tmp_202_type 1422 -#define _tmp_203_type 1423 -#define _tmp_204_type 1424 -#define _tmp_205_type 1425 -#define _tmp_206_type 1426 -#define _tmp_207_type 1427 -#define _tmp_208_type 1428 +#define record_type 1152 +#define dictcomp_type 1153 +#define double_starred_kvpairs_type 1154 +#define double_starred_kvpair_type 1155 +#define kvpair_type 1156 +#define for_if_clauses_type 1157 +#define for_if_clause_type 1158 +#define yield_expr_type 1159 +#define arguments_type 1160 +#define args_type 1161 +#define kwargs_type 1162 +#define starred_expression_type 1163 +#define kwarg_or_starred_type 1164 +#define kwarg_or_double_starred_type 1165 +#define star_targets_type 1166 +#define star_targets_list_seq_type 1167 +#define star_targets_tuple_seq_type 1168 +#define star_target_type 1169 +#define target_with_star_atom_type 1170 +#define star_atom_type 1171 +#define single_target_type 1172 +#define single_subscript_attribute_target_type 1173 +#define del_targets_type 1174 +#define del_target_type 1175 +#define del_t_atom_type 1176 +#define t_primary_type 1177 // Left-recursive +#define t_lookahead_type 1178 +#define invalid_arguments_type 1179 +#define invalid_kwarg_type 1180 +#define expression_without_invalid_type 1181 +#define invalid_legacy_expression_type 1182 +#define invalid_expression_type 1183 +#define invalid_named_expression_type 1184 +#define invalid_assignment_type 1185 +#define invalid_ann_assign_target_type 1186 +#define invalid_del_stmt_type 1187 +#define invalid_block_type 1188 +#define invalid_comprehension_type 1189 +#define invalid_dict_comprehension_type 1190 +#define invalid_parameters_type 1191 +#define invalid_parameters_helper_type 1192 +#define invalid_lambda_parameters_type 1193 +#define invalid_lambda_parameters_helper_type 1194 +#define invalid_star_etc_type 1195 +#define invalid_lambda_star_etc_type 1196 +#define invalid_double_type_comments_type 1197 +#define invalid_with_item_type 1198 +#define invalid_for_target_type 1199 +#define invalid_group_type 1200 +#define invalid_import_from_targets_type 1201 +#define invalid_with_stmt_type 1202 +#define invalid_with_stmt_indent_type 1203 +#define invalid_try_stmt_type 1204 +#define invalid_except_stmt_type 1205 +#define invalid_finally_stmt_type 1206 +#define invalid_except_stmt_indent_type 1207 +#define invalid_match_stmt_type 1208 +#define invalid_case_block_type 1209 +#define invalid_as_pattern_type 1210 +#define invalid_class_pattern_type 1211 +#define invalid_class_argument_pattern_type 1212 +#define invalid_if_stmt_type 1213 +#define invalid_elif_stmt_type 1214 +#define invalid_else_stmt_type 1215 +#define invalid_while_stmt_type 1216 +#define invalid_for_stmt_type 1217 +#define invalid_def_raw_type 1218 +#define invalid_class_def_raw_type 1219 +#define invalid_double_starred_kvpairs_type 1220 +#define invalid_kvpair_type 1221 +#define _loop0_1_type 1222 +#define _loop0_2_type 1223 +#define _loop0_4_type 1224 +#define _gather_3_type 1225 +#define _loop0_6_type 1226 +#define _gather_5_type 1227 +#define _loop0_8_type 1228 +#define _gather_7_type 1229 +#define _loop0_10_type 1230 +#define _gather_9_type 1231 +#define _loop1_11_type 1232 +#define _loop0_13_type 1233 +#define _gather_12_type 1234 +#define _tmp_14_type 1235 +#define _tmp_15_type 1236 +#define _tmp_16_type 1237 +#define _tmp_17_type 1238 +#define _tmp_18_type 1239 +#define _tmp_19_type 1240 +#define _tmp_20_type 1241 +#define _tmp_21_type 1242 +#define _loop1_22_type 1243 +#define _tmp_23_type 1244 +#define _tmp_24_type 1245 +#define _loop0_26_type 1246 +#define _gather_25_type 1247 +#define _loop0_28_type 1248 +#define _gather_27_type 1249 +#define _tmp_29_type 1250 +#define _tmp_30_type 1251 +#define _loop0_31_type 1252 +#define _loop1_32_type 1253 +#define _loop0_34_type 1254 +#define _gather_33_type 1255 +#define _tmp_35_type 1256 +#define _loop0_37_type 1257 +#define _gather_36_type 1258 +#define _tmp_38_type 1259 +#define _loop0_40_type 1260 +#define _gather_39_type 1261 +#define _loop0_42_type 1262 +#define _gather_41_type 1263 +#define _loop0_44_type 1264 +#define _gather_43_type 1265 +#define _loop0_46_type 1266 +#define _gather_45_type 1267 +#define _tmp_47_type 1268 +#define _loop1_48_type 1269 +#define _tmp_49_type 1270 +#define _loop1_50_type 1271 +#define _loop0_52_type 1272 +#define _gather_51_type 1273 +#define _tmp_53_type 1274 +#define _tmp_54_type 1275 +#define _tmp_55_type 1276 +#define _tmp_56_type 1277 +#define _loop0_58_type 1278 +#define _gather_57_type 1279 +#define _loop0_60_type 1280 +#define _gather_59_type 1281 +#define _tmp_61_type 1282 +#define _loop0_63_type 1283 +#define _gather_62_type 1284 +#define _loop0_65_type 1285 +#define _gather_64_type 1286 +#define _tmp_66_type 1287 +#define _tmp_67_type 1288 +#define _tmp_68_type 1289 +#define _tmp_69_type 1290 +#define _loop0_70_type 1291 +#define _loop0_71_type 1292 +#define _loop0_72_type 1293 +#define _loop1_73_type 1294 +#define _loop0_74_type 1295 +#define _loop1_75_type 1296 +#define _loop1_76_type 1297 +#define _loop1_77_type 1298 +#define _loop0_78_type 1299 +#define _loop1_79_type 1300 +#define _loop0_80_type 1301 +#define _loop1_81_type 1302 +#define _loop0_82_type 1303 +#define _loop1_83_type 1304 +#define _loop1_84_type 1305 +#define _tmp_85_type 1306 +#define _loop1_86_type 1307 +#define _loop0_88_type 1308 +#define _gather_87_type 1309 +#define _loop1_89_type 1310 +#define _loop0_90_type 1311 +#define _loop0_91_type 1312 +#define _loop0_92_type 1313 +#define _loop1_93_type 1314 +#define _loop0_94_type 1315 +#define _loop1_95_type 1316 +#define _loop1_96_type 1317 +#define _loop1_97_type 1318 +#define _loop0_98_type 1319 +#define _loop1_99_type 1320 +#define _loop0_100_type 1321 +#define _loop1_101_type 1322 +#define _loop0_102_type 1323 +#define _loop1_103_type 1324 +#define _loop1_104_type 1325 +#define _loop1_105_type 1326 +#define _loop1_106_type 1327 +#define _tmp_107_type 1328 +#define _loop0_109_type 1329 +#define _gather_108_type 1330 +#define _tmp_110_type 1331 +#define _tmp_111_type 1332 +#define _tmp_112_type 1333 +#define _tmp_113_type 1334 +#define _loop1_114_type 1335 +#define _tmp_115_type 1336 +#define _tmp_116_type 1337 +#define _tmp_117_type 1338 +#define _loop0_119_type 1339 +#define _gather_118_type 1340 +#define _loop1_120_type 1341 +#define _loop0_121_type 1342 +#define _loop0_122_type 1343 +#define _loop0_124_type 1344 +#define _gather_123_type 1345 +#define _tmp_125_type 1346 +#define _loop0_127_type 1347 +#define _gather_126_type 1348 +#define _loop0_129_type 1349 +#define _gather_128_type 1350 +#define _loop0_131_type 1351 +#define _gather_130_type 1352 +#define _loop0_133_type 1353 +#define _gather_132_type 1354 +#define _loop0_134_type 1355 +#define _loop0_136_type 1356 +#define _gather_135_type 1357 +#define _loop1_137_type 1358 +#define _tmp_138_type 1359 +#define _loop0_140_type 1360 +#define _gather_139_type 1361 +#define _tmp_141_type 1362 +#define _tmp_142_type 1363 +#define _tmp_143_type 1364 +#define _tmp_144_type 1365 +#define _tmp_145_type 1366 +#define _tmp_146_type 1367 +#define _tmp_147_type 1368 +#define _tmp_148_type 1369 +#define _loop0_149_type 1370 +#define _loop0_150_type 1371 +#define _loop0_151_type 1372 +#define _tmp_152_type 1373 +#define _tmp_153_type 1374 +#define _tmp_154_type 1375 +#define _tmp_155_type 1376 +#define _loop0_156_type 1377 +#define _loop1_157_type 1378 +#define _loop0_158_type 1379 +#define _loop1_159_type 1380 +#define _tmp_160_type 1381 +#define _tmp_161_type 1382 +#define _tmp_162_type 1383 +#define _loop0_164_type 1384 +#define _gather_163_type 1385 +#define _loop0_166_type 1386 +#define _gather_165_type 1387 +#define _loop0_168_type 1388 +#define _gather_167_type 1389 +#define _loop0_170_type 1390 +#define _gather_169_type 1391 +#define _tmp_171_type 1392 +#define _tmp_172_type 1393 +#define _tmp_173_type 1394 +#define _tmp_174_type 1395 +#define _tmp_175_type 1396 +#define _tmp_176_type 1397 +#define _tmp_177_type 1398 +#define _tmp_178_type 1399 +#define _loop0_180_type 1400 +#define _gather_179_type 1401 +#define _tmp_181_type 1402 +#define _tmp_182_type 1403 +#define _tmp_183_type 1404 +#define _tmp_184_type 1405 +#define _tmp_185_type 1406 +#define _tmp_186_type 1407 +#define _tmp_187_type 1408 +#define _tmp_188_type 1409 +#define _tmp_189_type 1410 +#define _tmp_190_type 1411 +#define _tmp_191_type 1412 +#define _tmp_192_type 1413 +#define _tmp_193_type 1414 +#define _tmp_194_type 1415 +#define _tmp_195_type 1416 +#define _tmp_196_type 1417 +#define _tmp_197_type 1418 +#define _tmp_198_type 1419 +#define _tmp_199_type 1420 +#define _tmp_200_type 1421 +#define _tmp_201_type 1422 +#define _tmp_202_type 1423 +#define _tmp_203_type 1424 +#define _tmp_204_type 1425 +#define _tmp_205_type 1426 +#define _tmp_206_type 1427 +#define _tmp_207_type 1428 +#define _tmp_208_type 1429 static mod_ty file_rule(Parser *p); static mod_ty interactive_rule(Parser *p); @@ -655,6 +656,7 @@ static expr_ty genexp_rule(Parser *p); static expr_ty set_rule(Parser *p); static expr_ty setcomp_rule(Parser *p); static expr_ty dict_rule(Parser *p); +static expr_ty record_rule(Parser *p); static expr_ty dictcomp_rule(Parser *p); static asdl_seq* double_starred_kvpairs_rule(Parser *p); static KeyValuePair* double_starred_kvpair_rule(Parser *p); @@ -14777,7 +14779,7 @@ slice_rule(Parser *p) // | NUMBER // | &'(' (tuple | group | genexp) // | &'[' (list | listcomp) -// | &'{' (dict | set | dictcomp | setcomp) +// | &'{' (dict | set | dictcomp | setcomp | record) // | '...' static expr_ty atom_rule(Parser *p) @@ -15001,26 +15003,26 @@ atom_rule(Parser *p) D(fprintf(stderr, "%*c%s atom[%d-%d]: %s failed!\n", p->level, ' ', p->error_indicator ? "ERROR!" : "-", _mark, p->mark, "&'[' (list | listcomp)")); } - { // &'{' (dict | set | dictcomp | setcomp) + { // &'{' (dict | set | dictcomp | setcomp | record) if (p->error_indicator) { p->level--; return NULL; } - D(fprintf(stderr, "%*c> atom[%d-%d]: %s\n", p->level, ' ', _mark, p->mark, "&'{' (dict | set | dictcomp | setcomp)")); + D(fprintf(stderr, "%*c> atom[%d-%d]: %s\n", p->level, ' ', _mark, p->mark, "&'{' (dict | set | dictcomp | setcomp | record)")); void *_tmp_113_var; if ( _PyPegen_lookahead_with_int(1, _PyPegen_expect_token, p, 25) // token='{' && - (_tmp_113_var = _tmp_113_rule(p)) // dict | set | dictcomp | setcomp + (_tmp_113_var = _tmp_113_rule(p)) // dict | set | dictcomp | setcomp | record ) { - D(fprintf(stderr, "%*c+ atom[%d-%d]: %s succeeded!\n", p->level, ' ', _mark, p->mark, "&'{' (dict | set | dictcomp | setcomp)")); + D(fprintf(stderr, "%*c+ atom[%d-%d]: %s succeeded!\n", p->level, ' ', _mark, p->mark, "&'{' (dict | set | dictcomp | setcomp | record)")); _res = _tmp_113_var; goto done; } p->mark = _mark; D(fprintf(stderr, "%*c%s atom[%d-%d]: %s failed!\n", p->level, ' ', - p->error_indicator ? "ERROR!" : "-", _mark, p->mark, "&'{' (dict | set | dictcomp | setcomp)")); + p->error_indicator ? "ERROR!" : "-", _mark, p->mark, "&'{' (dict | set | dictcomp | setcomp | record)")); } { // '...' if (p->error_indicator) { @@ -15748,6 +15750,111 @@ dict_rule(Parser *p) return _res; } +// record: '{' '|' kwargs? '|' '}' | '{' '|' invalid_kwarg '|' '}' +static expr_ty +record_rule(Parser *p) +{ + if (p->level++ == MAXSTACK) { + p->error_indicator = 1; + PyErr_NoMemory(); + } + if (p->error_indicator) { + p->level--; + return NULL; + } + expr_ty _res = NULL; + int _mark = p->mark; + if (p->mark == p->fill && _PyPegen_fill_token(p) < 0) { + p->error_indicator = 1; + p->level--; + return NULL; + } + int _start_lineno = p->tokens[_mark]->lineno; + UNUSED(_start_lineno); // Only used by EXTRA macro + int _start_col_offset = p->tokens[_mark]->col_offset; + UNUSED(_start_col_offset); // Only used by EXTRA macro + { // '{' '|' kwargs? '|' '}' + if (p->error_indicator) { + p->level--; + return NULL; + } + D(fprintf(stderr, "%*c> record[%d-%d]: %s\n", p->level, ' ', _mark, p->mark, "'{' '|' kwargs? '|' '}'")); + Token * _literal; + Token * _literal_1; + Token * _literal_2; + Token * _literal_3; + void *a; + if ( + (_literal = _PyPegen_expect_token(p, 25)) // token='{' + && + (_literal_1 = _PyPegen_expect_token(p, 18)) // token='|' + && + (a = kwargs_rule(p), !p->error_indicator) // kwargs? + && + (_literal_2 = _PyPegen_expect_token(p, 18)) // token='|' + && + (_literal_3 = _PyPegen_expect_token(p, 26)) // token='}' + ) + { + D(fprintf(stderr, "%*c+ record[%d-%d]: %s succeeded!\n", p->level, ' ', _mark, p->mark, "'{' '|' kwargs? '|' '}'")); + Token *_token = _PyPegen_get_last_nonnwhitespace_token(p); + if (_token == NULL) { + p->level--; + return NULL; + } + int _end_lineno = _token->end_lineno; + UNUSED(_end_lineno); // Only used by EXTRA macro + int _end_col_offset = _token->end_col_offset; + UNUSED(_end_col_offset); // Only used by EXTRA macro + _res = _PyAST_Record ( CHECK ( asdl_expr_seq * , _PyPegen_get_record_keys ( p , a ) ) , CHECK ( asdl_expr_seq * , _PyPegen_get_record_values ( p , a ) ) , EXTRA ); + if (_res == NULL && PyErr_Occurred()) { + p->error_indicator = 1; + p->level--; + return NULL; + } + goto done; + } + p->mark = _mark; + D(fprintf(stderr, "%*c%s record[%d-%d]: %s failed!\n", p->level, ' ', + p->error_indicator ? "ERROR!" : "-", _mark, p->mark, "'{' '|' kwargs? '|' '}'")); + } + { // '{' '|' invalid_kwarg '|' '}' + if (p->error_indicator) { + p->level--; + return NULL; + } + D(fprintf(stderr, "%*c> record[%d-%d]: %s\n", p->level, ' ', _mark, p->mark, "'{' '|' invalid_kwarg '|' '}'")); + Token * _literal; + Token * _literal_1; + Token * _literal_2; + Token * _literal_3; + void *invalid_kwarg_var; + if ( + (_literal = _PyPegen_expect_token(p, 25)) // token='{' + && + (_literal_1 = _PyPegen_expect_token(p, 18)) // token='|' + && + (invalid_kwarg_var = invalid_kwarg_rule(p)) // invalid_kwarg + && + (_literal_2 = _PyPegen_expect_token(p, 18)) // token='|' + && + (_literal_3 = _PyPegen_expect_token(p, 26)) // token='}' + ) + { + D(fprintf(stderr, "%*c+ record[%d-%d]: %s succeeded!\n", p->level, ' ', _mark, p->mark, "'{' '|' invalid_kwarg '|' '}'")); + _res = _PyPegen_dummy_name(p, _literal, _literal_1, invalid_kwarg_var, _literal_2, _literal_3); + goto done; + } + p->mark = _mark; + D(fprintf(stderr, "%*c%s record[%d-%d]: %s failed!\n", p->level, ' ', + p->error_indicator ? "ERROR!" : "-", _mark, p->mark, "'{' '|' invalid_kwarg '|' '}'")); + } + _res = NULL; + done: + p->level--; + return _res; +} + // dictcomp: '{' kvpair for_if_clauses '}' | invalid_dict_comprehension static expr_ty dictcomp_rule(Parser *p) @@ -29084,7 +29191,7 @@ _tmp_112_rule(Parser *p) return _res; } -// _tmp_113: dict | set | dictcomp | setcomp +// _tmp_113: dict | set | dictcomp | setcomp | record static void * _tmp_113_rule(Parser *p) { @@ -29174,6 +29281,25 @@ _tmp_113_rule(Parser *p) D(fprintf(stderr, "%*c%s _tmp_113[%d-%d]: %s failed!\n", p->level, ' ', p->error_indicator ? "ERROR!" : "-", _mark, p->mark, "setcomp")); } + { // record + if (p->error_indicator) { + p->level--; + return NULL; + } + D(fprintf(stderr, "%*c> _tmp_113[%d-%d]: %s\n", p->level, ' ', _mark, p->mark, "record")); + expr_ty record_var; + if ( + (record_var = record_rule(p)) // record + ) + { + D(fprintf(stderr, "%*c+ _tmp_113[%d-%d]: %s succeeded!\n", p->level, ' ', _mark, p->mark, "record")); + _res = record_var; + goto done; + } + p->mark = _mark; + D(fprintf(stderr, "%*c%s _tmp_113[%d-%d]: %s failed!\n", p->level, ' ', + p->error_indicator ? "ERROR!" : "-", _mark, p->mark, "record")); + } _res = NULL; done: p->level--; diff --git a/Parser/pegen.c b/Parser/pegen.c index f440db4a1c9e37..ad6093dce897c6 100644 --- a/Parser/pegen.c +++ b/Parser/pegen.c @@ -215,6 +215,8 @@ _PyPegen_get_expr_name(expr_ty e) return "dict comprehension"; case Dict_kind: return "dict literal"; + case Record_kind: + return "record literal"; case Set_kind: return "set display"; case JoinedStr_kind: @@ -1924,6 +1926,40 @@ _PyPegen_get_keys(Parser *p, asdl_seq *seq) return new_seq; } +asdl_expr_seq * +_PyPegen_get_record_keys(Parser *p, asdl_seq *seq) +{ + Py_ssize_t len = asdl_seq_LEN(seq); + asdl_expr_seq *keys = _Py_asdl_expr_seq_new(len, p->arena); + if (!keys) return NULL; + + for (Py_ssize_t i = 0; i < len; i++) { + KeywordOrStarred *item = asdl_seq_GET_UNTYPED(seq, i); + keyword_ty kw = item->element; + // Create a Name expression from the identifier + expr_ty name = _PyAST_Name(kw->arg, Load, + kw->lineno, kw->col_offset, + kw->end_lineno, kw->end_col_offset, p->arena); + asdl_seq_SET(keys, i, name); + } + return keys; +} + +asdl_expr_seq * +_PyPegen_get_record_values(Parser *p, asdl_seq *seq) +{ + Py_ssize_t len = asdl_seq_LEN(seq); + asdl_expr_seq *values = _Py_asdl_expr_seq_new(len, p->arena); + if (!values) return NULL; + + for (Py_ssize_t i = 0; i < len; i++) { + KeywordOrStarred *item = asdl_seq_GET_UNTYPED(seq, i); + keyword_ty kw = item->element; + asdl_seq_SET(values, i, kw->value); + } + return values; +} + /* Extracts all values from an asdl_seq* of KeyValuePair*'s */ asdl_expr_seq * _PyPegen_get_values(Parser *p, asdl_seq *seq) diff --git a/Parser/pegen.h b/Parser/pegen.h index 118fbc7b3b78a2..fa1ff9bafb101c 100644 --- a/Parser/pegen.h +++ b/Parser/pegen.h @@ -267,7 +267,11 @@ asdl_int_seq *_PyPegen_get_cmpops(Parser *p, asdl_seq *); asdl_expr_seq *_PyPegen_get_exprs(Parser *, asdl_seq *); expr_ty _PyPegen_set_expr_context(Parser *, expr_ty, expr_context_ty); KeyValuePair *_PyPegen_key_value_pair(Parser *, expr_ty, expr_ty); + asdl_expr_seq *_PyPegen_get_keys(Parser *, asdl_seq *); +asdl_expr_seq *_PyPegen_get_record_keys(Parser *, asdl_seq *); +asdl_expr_seq *_PyPegen_get_record_values(Parser *, asdl_seq *); + asdl_expr_seq *_PyPegen_get_values(Parser *, asdl_seq *); KeyPatternPair *_PyPegen_key_pattern_pair(Parser *, expr_ty, pattern_ty); asdl_expr_seq *_PyPegen_get_pattern_keys(Parser *, asdl_seq *); diff --git a/Python/Python-ast.c b/Python/Python-ast.c index 2f84cad7749dd8..d15274ec5ad43f 100644 --- a/Python/Python-ast.c +++ b/Python/Python-ast.c @@ -136,6 +136,7 @@ void _PyAST_Fini(PyInterpreterState *interp) Py_CLEAR(state->RShift_singleton); Py_CLEAR(state->RShift_type); Py_CLEAR(state->Raise_type); + Py_CLEAR(state->Record_type); Py_CLEAR(state->Return_type); Py_CLEAR(state->SetComp_type); Py_CLEAR(state->Set_type); @@ -544,6 +545,10 @@ static const char * const Dict_fields[]={ "keys", "values", }; +static const char * const Record_fields[]={ + "keys", + "values", +}; static const char * const Set_fields[]={ "elts", }; @@ -1302,6 +1307,7 @@ init_types(struct ast_state *state) " | Lambda(arguments args, expr body)\n" " | IfExp(expr test, expr body, expr orelse)\n" " | Dict(expr* keys, expr* values)\n" + " | Record(expr* keys, expr* values)\n" " | Set(expr* elts)\n" " | ListComp(expr elt, comprehension* generators)\n" " | SetComp(expr elt, comprehension* generators)\n" @@ -1357,6 +1363,10 @@ init_types(struct ast_state *state) 2, "Dict(expr* keys, expr* values)"); if (!state->Dict_type) return 0; + state->Record_type = make_type(state, "Record", state->expr_type, + Record_fields, 2, + "Record(expr* keys, expr* values)"); + if (!state->Record_type) return 0; state->Set_type = make_type(state, "Set", state->expr_type, Set_fields, 1, "Set(expr* elts)"); if (!state->Set_type) return 0; @@ -2733,6 +2743,24 @@ _PyAST_Dict(asdl_expr_seq * keys, asdl_expr_seq * values, int lineno, int return p; } +expr_ty +_PyAST_Record(asdl_expr_seq * keys, asdl_expr_seq * values, int lineno, int + col_offset, int end_lineno, int end_col_offset, PyArena *arena) +{ + expr_ty p; + p = (expr_ty)_PyArena_Malloc(arena, sizeof(*p)); + if (!p) + return NULL; + p->kind = Record_kind; + p->v.Record.keys = keys; + p->v.Record.values = values; + p->lineno = lineno; + p->col_offset = col_offset; + p->end_lineno = end_lineno; + p->end_col_offset = end_col_offset; + return p; +} + expr_ty _PyAST_Set(asdl_expr_seq * elts, int lineno, int col_offset, int end_lineno, int end_col_offset, PyArena *arena) @@ -4294,6 +4322,22 @@ ast2obj_expr(struct ast_state *state, void* _o) goto failed; Py_DECREF(value); break; + case Record_kind: + tp = (PyTypeObject *)state->Record_type; + result = PyType_GenericNew(tp, NULL, NULL); + if (!result) goto failed; + value = ast2obj_list(state, (asdl_seq*)o->v.Record.keys, ast2obj_expr); + if (!value) goto failed; + if (PyObject_SetAttr(result, state->keys, value) == -1) + goto failed; + Py_DECREF(value); + value = ast2obj_list(state, (asdl_seq*)o->v.Record.values, + ast2obj_expr); + if (!value) goto failed; + if (PyObject_SetAttr(result, state->values, value) == -1) + goto failed; + Py_DECREF(value); + break; case Set_kind: tp = (PyTypeObject *)state->Set_type; result = PyType_GenericNew(tp, NULL, NULL); @@ -8348,6 +8392,94 @@ obj2ast_expr(struct ast_state *state, PyObject* obj, expr_ty* out, PyArena* if (*out == NULL) goto failed; return 0; } + tp = state->Record_type; + isinstance = PyObject_IsInstance(obj, tp); + if (isinstance == -1) { + return 1; + } + if (isinstance) { + asdl_expr_seq* keys; + asdl_expr_seq* values; + + if (_PyObject_LookupAttr(obj, state->keys, &tmp) < 0) { + return 1; + } + if (tmp == NULL) { + PyErr_SetString(PyExc_TypeError, "required field \"keys\" missing from Record"); + return 1; + } + else { + int res; + Py_ssize_t len; + Py_ssize_t i; + if (!PyList_Check(tmp)) { + PyErr_Format(PyExc_TypeError, "Record field \"keys\" must be a list, not a %.200s", _PyType_Name(Py_TYPE(tmp))); + goto failed; + } + len = PyList_GET_SIZE(tmp); + keys = _Py_asdl_expr_seq_new(len, arena); + if (keys == NULL) goto failed; + for (i = 0; i < len; i++) { + expr_ty val; + PyObject *tmp2 = PyList_GET_ITEM(tmp, i); + Py_INCREF(tmp2); + if (Py_EnterRecursiveCall(" while traversing 'Record' node")) { + goto failed; + } + res = obj2ast_expr(state, tmp2, &val, arena); + Py_LeaveRecursiveCall(); + Py_DECREF(tmp2); + if (res != 0) goto failed; + if (len != PyList_GET_SIZE(tmp)) { + PyErr_SetString(PyExc_RuntimeError, "Record field \"keys\" changed size during iteration"); + goto failed; + } + asdl_seq_SET(keys, i, val); + } + Py_CLEAR(tmp); + } + if (_PyObject_LookupAttr(obj, state->values, &tmp) < 0) { + return 1; + } + if (tmp == NULL) { + PyErr_SetString(PyExc_TypeError, "required field \"values\" missing from Record"); + return 1; + } + else { + int res; + Py_ssize_t len; + Py_ssize_t i; + if (!PyList_Check(tmp)) { + PyErr_Format(PyExc_TypeError, "Record field \"values\" must be a list, not a %.200s", _PyType_Name(Py_TYPE(tmp))); + goto failed; + } + len = PyList_GET_SIZE(tmp); + values = _Py_asdl_expr_seq_new(len, arena); + if (values == NULL) goto failed; + for (i = 0; i < len; i++) { + expr_ty val; + PyObject *tmp2 = PyList_GET_ITEM(tmp, i); + Py_INCREF(tmp2); + if (Py_EnterRecursiveCall(" while traversing 'Record' node")) { + goto failed; + } + res = obj2ast_expr(state, tmp2, &val, arena); + Py_LeaveRecursiveCall(); + Py_DECREF(tmp2); + if (res != 0) goto failed; + if (len != PyList_GET_SIZE(tmp)) { + PyErr_SetString(PyExc_RuntimeError, "Record field \"values\" changed size during iteration"); + goto failed; + } + asdl_seq_SET(values, i, val); + } + Py_CLEAR(tmp); + } + *out = _PyAST_Record(keys, values, lineno, col_offset, end_lineno, + end_col_offset, arena); + if (*out == NULL) goto failed; + return 0; + } tp = state->Set_type; isinstance = PyObject_IsInstance(obj, tp); if (isinstance == -1) { @@ -11735,6 +11867,9 @@ astmodule_exec(PyObject *m) if (PyModule_AddObjectRef(m, "Dict", state->Dict_type) < 0) { return -1; } + if (PyModule_AddObjectRef(m, "Record", state->Record_type) < 0) { + return -1; + } if (PyModule_AddObjectRef(m, "Set", state->Set_type) < 0) { return -1; } diff --git a/Python/ast.c b/Python/ast.c index 2113124dbd51c2..67e98b612c0876 100644 --- a/Python/ast.c +++ b/Python/ast.c @@ -351,6 +351,10 @@ validate_expr(struct validator *state, expr_ty exp, expr_context_ty ctx) case Tuple_kind: ret = validate_exprs(state, exp->v.Tuple.elts, ctx, 0); break; + case Record_kind: + ret = validate_exprs(state, exp->v.Record.keys, Load, 0); + ret = ret && validate_exprs(state, exp->v.Record.values, Load, 0); + break; case NamedExpr_kind: ret = validate_expr(state, exp->v.NamedExpr.value, Load); break; diff --git a/Python/ast_opt.c b/Python/ast_opt.c index c1fdea3a88c9cb..ff1a9dd68cf754 100644 --- a/Python/ast_opt.c +++ b/Python/ast_opt.c @@ -520,6 +520,10 @@ astfold_expr(expr_ty node_, PyArena *ctx_, _PyASTOptimizeState *state) CALL_SEQ(astfold_expr, expr, node_->v.Dict.keys); CALL_SEQ(astfold_expr, expr, node_->v.Dict.values); break; + case Record_kind: + CALL_SEQ(astfold_expr, expr, node_->v.Record.keys); + CALL_SEQ(astfold_expr, expr, node_->v.Record.values); + break; case Set_kind: CALL_SEQ(astfold_expr, expr, node_->v.Set.elts); break; diff --git a/Python/ast_unparse.c b/Python/ast_unparse.c index 126e9047d58d64..ccc0a876d9b811 100644 --- a/Python/ast_unparse.c +++ b/Python/ast_unparse.c @@ -331,6 +331,28 @@ append_ast_dict(_PyUnicodeWriter *writer, expr_ty e) APPEND_STR_FINISH("}"); } +static int +append_ast_record(_PyUnicodeWriter *writer, expr_ty e) +{ + Py_ssize_t i, value_count; + expr_ty key_node; + + APPEND_STR("{|"); + value_count = asdl_seq_LEN(e->v.Record.values); + + for (i = 0; i < value_count; i++) { + APPEND_STR_IF(i > 0, ", "); + key_node = (expr_ty)asdl_seq_GET(e->v.Record.keys, i); + if (key_node != NULL) { + APPEND_EXPR(key_node, PR_TEST); + APPEND_STR("="); + APPEND_EXPR((expr_ty)asdl_seq_GET(e->v.Record.values, i), PR_TEST); + } + } + + APPEND_STR_FINISH("|}"); +} + static int append_ast_set(_PyUnicodeWriter *writer, expr_ty e) { @@ -867,6 +889,8 @@ append_ast_expr(_PyUnicodeWriter *writer, expr_ty e, int level) return append_ast_ifexp(writer, e, level); case Dict_kind: return append_ast_dict(writer, e); + case Record_kind: + return append_ast_record(writer, e); case Set_kind: return append_ast_set(writer, e); case GeneratorExp_kind: diff --git a/Python/ceval.c b/Python/ceval.c index 9f4ef6be0e1f2a..d46fed5be0f2ee 100644 --- a/Python/ceval.c +++ b/Python/ceval.c @@ -3164,6 +3164,24 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, PyFrameObject *f, int throwflag) DISPATCH(); } + case TARGET(BUILD_RECORD): { + PyRecordObject *rec = (PyRecordObject*) PyRecord_New(oparg); + if (rec == NULL) + goto error; + + PyObject *item = NULL; + while (--oparg >= 0) { + item = POP(); + rec->ob_item[oparg] = item; + } + + item = POP(); + rec->names = item; + + PUSH(rec); + DISPATCH(); + } + case TARGET(BUILD_LIST): { PyObject *list = PyList_New(oparg); if (list == NULL) diff --git a/Python/compile.c b/Python/compile.c index 80caa8fab1f703..dac3b5618e7eca 100644 --- a/Python/compile.c +++ b/Python/compile.c @@ -1082,6 +1082,8 @@ stack_effect(int opcode, int oparg, int jump) case BUILD_SET: case BUILD_STRING: return 1-oparg; + case BUILD_RECORD: + return -oparg; case BUILD_MAP: return 1 - 2*oparg; case BUILD_CONST_KEY_MAP: @@ -4080,6 +4082,34 @@ compiler_dict(struct compiler *c, expr_ty e) return 1; } +static int +compiler_record(struct compiler *c, expr_ty e) +{ + Py_ssize_t i, n; + PyTupleObject* keys; + PyObject* key; + + n = asdl_seq_LEN(e->v.Record.values); + keys = PyTuple_New(n); + if (keys == NULL) { + return 0; + } + for (i = 0; i < n; i++) { + expr_ty key_expr = ((expr_ty) asdl_seq_GET(e->v.Record.keys, i)); + key = key_expr->v.Name.id; + Py_INCREF(key); + PyTuple_SET_ITEM(keys, i, key); + } + ADDOP_LOAD_CONST_NEW(c, keys); + + for (i = 0; i < n; i++) { + VISIT(c, expr, (expr_ty)asdl_seq_GET(e->v.Record.values, i)); + } + + ADDOP_I(c, BUILD_RECORD, n); + return 1; +} + static int compiler_compare(struct compiler *c, expr_ty e) { @@ -4131,6 +4161,8 @@ infer_type(expr_ty e) case List_kind: case ListComp_kind: return &PyList_Type; + case Record_kind: + return &PyRecord_Type; case Dict_kind: case DictComp_kind: return &PyDict_Type; @@ -4160,6 +4192,7 @@ check_caller(struct compiler *c, expr_ty e) case List_kind: case ListComp_kind: case Dict_kind: + case Record_kind: case DictComp_kind: case Set_kind: case SetComp_kind: @@ -5207,6 +5240,8 @@ compiler_visit_expr1(struct compiler *c, expr_ty e) return compiler_ifexp(c, e); case Dict_kind: return compiler_dict(c, e); + case Record_kind: + return compiler_record(c, e); case Set_kind: return compiler_set(c, e); case GeneratorExp_kind: diff --git a/Python/opcode_targets.h b/Python/opcode_targets.h index 951f8f8a5569f7..b76099c60dcc57 100644 --- a/Python/opcode_targets.h +++ b/Python/opcode_targets.h @@ -165,7 +165,7 @@ static void *opcode_targets[256] = { &&TARGET_SET_UPDATE, &&TARGET_DICT_MERGE, &&TARGET_DICT_UPDATE, - &&_unknown_opcode, + &&TARGET_BUILD_RECORD, &&_unknown_opcode, &&_unknown_opcode, &&_unknown_opcode, diff --git a/Python/symtable.c b/Python/symtable.c index 07f9d1132c797e..f2bd91fad150a5 100644 --- a/Python/symtable.c +++ b/Python/symtable.c @@ -1629,6 +1629,10 @@ symtable_visit_expr(struct symtable *st, expr_ty e) VISIT_SEQ_WITH_NULL(st, expr, e->v.Dict.keys); VISIT_SEQ(st, expr, e->v.Dict.values); break; + case Record_kind: + VISIT_SEQ_WITH_NULL(st, expr, e->v.Record.keys); + VISIT_SEQ(st, expr, e->v.Record.values); + break; case Set_kind: VISIT_SEQ(st, expr, e->v.Set.elts); break; From 0ab33f4dc2ecbb154abf63fe0a1b5a9f0dd21bb6 Mon Sep 17 00:00:00 2001 From: zyros-dev Date: Thu, 25 Dec 2025 13:49:52 +1000 Subject: [PATCH 11/11] done! --- teaching-todo.md | 174 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 139 insertions(+), 35 deletions(-) diff --git a/teaching-todo.md b/teaching-todo.md index 032b4ed8c5baa7..7f215b186ad841 100644 --- a/teaching-todo.md +++ b/teaching-todo.md @@ -148,7 +148,7 @@ r == r2 # comparable - [x] Find `PySequenceMethods` in headers - [x] Study `sq_length` - returns `Py_ssize_t` - [x] Study `sq_item` - takes index, returns item (with INCREF!) -- [~] **For Record:** Implement these so `r[0]` and `len(r)` work *(design understood, implementation pending)* +- [x] **For Record:** Implement these so `r[0]` and `len(r)` work **Notes from session:** - `PySequenceMethods` contains: `sq_length`, `sq_concat`, `sq_repeat`, `sq_item`, `sq_ass_item`, `sq_contains`, etc. @@ -243,38 +243,132 @@ r == r2 # comparable ## Phase 5: Implementation ### 5.1 Create the Header File -- [ ] Create `Include/recordobject.h` -- [ ] Define `RecordObject` struct -- [ ] Declare `PyRecord_Type` -- [ ] Declare `PyRecord_New()` constructor function +- [x] Create `Include/recordobject.h` +- [x] Define `RecordObject` struct +- [x] Declare `PyRecord_Type` +- [x] Declare `PyRecord_New()` constructor function + +**Notes from session:** +- Two-level header structure: `Include/recordobject.h` (public) includes `Include/cpython/recordobject.h` (internal) +- Must add `#include "recordobject.h"` to `Include/Python.h` for it to be visible +- `PyRecord_Check` macro simplified to `Py_IS_TYPE(op, &PyRecord_Type)` (no subclass flag) ### 5.2 Implement the Type -- [ ] Create `Objects/recordobject.c` -- [ ] Implement `record_dealloc` -- [ ] Implement `record_repr` -- [ ] Implement `record_hash` -- [ ] Implement `record_richcompare` -- [ ] Implement `record_length` (sq_length) -- [ ] Implement `record_item` (sq_item) -- [ ] Implement `record_getattro` (attribute access by name) -- [ ] Define `PyRecord_Type` with all slots filled -- [ ] Implement `PyRecord_New()` - the C API constructor +- [x] Create `Objects/recordobject.c` +- [x] Implement `record_dealloc` +- [x] Implement `record_repr` +- [x] Implement `record_hash` +- [x] Implement `record_richcompare` +- [x] Implement `record_length` (sq_length) +- [x] Implement `record_item` (sq_item) +- [x] Implement `record_getattro` (attribute access by name) +- [x] Define `PyRecord_Type` with all slots filled +- [x] Implement `PyRecord_New()` - the C API constructor + +**Notes from session:** +- Started by copying tupleobject.c, then stripped out complexity (clinic, free lists, etc.) +- **GC gotcha:** Originally had GC alloc/track/untrack but no `Py_TPFLAGS_HAVE_GC` or `tp_traverse` - caused segfault. Fixed by removing GC entirely (simpler for learning) +- **record_dealloc:** `Py_XDECREF(names)` + XDECREF each value, use XDECREF (handles NULL safely) +- **record_repr:** Use `_PyUnicodeWriter`, format as `record(x=1, y=2)`, don't repr the field names (just write them directly) +- **record_hash:** Copied xxHash pattern from tuple, hash names tuple + all values, `-1` → `-2` for actual -1 hash +- **record_getattro:** Loop through names with `PyUnicode_Compare`, return value with INCREF, fallback to `PyObject_GenericGetAttr` +- **record_richcompare:** Compare names first (if different, not equal), then values; return `Py_NotImplemented` for ordering +- **record_new (tp_new):** Parse kwargs with `PyDict_Keys`/`PyDict_Values`, convert keys to tuple for names ### 5.3 Add the Opcode -- [ ] Add `BUILD_RECORD` to `Lib/opcode.py` (pick unused number, needs argument) -- [ ] Run `make regen-opcode` and `make regen-opcode-targets` -- [ ] Implement `BUILD_RECORD` handler in `Python/ceval.c` +- [x] Add `BUILD_RECORD` to `Lib/opcode.py` (pick unused number, needs argument) +- [x] Run `make regen-opcode` and `make regen-opcode-targets` +- [x] Implement `BUILD_RECORD` handler in `Python/ceval.c` + +**Notes from session:** +- Used opcode 166 (was available) +- Stack layout: `[names_tuple, val0, val1, ...]` - pop values in reverse, then pop names +- Must add stack effect to compile.c: `case BUILD_RECORD: return -oparg;` +- Regenerate with `./python.exe ./Tools/scripts/generate_opcode_h.py` and `./python.exe ./Python/makeopcodetargets.py` ### 5.4 Build System Integration -- [ ] Add `recordobject.c` to the build (Makefile.pre.in or setup.py) -- [ ] Add header to appropriate include lists -- [ ] Register type in Python initialization +- [x] Add `recordobject.c` to the build (Makefile.pre.in or setup.py) +- [x] Add header to appropriate include lists +- [x] Register type in Python initialization + +**Notes from session:** +- Add `Objects/recordobject.o` to `OBJECT_OBJS` in Makefile.pre.in +- Add both headers to `PYTHON_HEADERS` section +- Register with `SETBUILTIN("record", &PyRecord_Type)` in `Python/bltinmodule.c` ### 5.5 Build and Test -- [ ] Run `make` - fix any compilation errors -- [ ] Test basic creation via C API -- [ ] Test via manual bytecode or compiler modification -- [ ] Verify all operations: indexing, len, hash, repr, equality, attribute access +- [x] Run `make` - fix any compilation errors +- [x] Test basic creation via C API +- [x] Test via manual bytecode or compiler modification +- [x] Verify all operations: indexing, len, hash, repr, equality, attribute access + +**Notes from session:** +- All tests pass: + ```python + r = record(x=1, y=2) + print(r) # record(x=1, y=2) + print(r.x) # 1 + print(r[0]) # 1 + print(len(r)) # 2 + print(hash(r)) # works + r2 = record(x=1, y=2) + print(r == r2) # True + ``` + +--- + +## Phase 6: Literal Syntax (Bonus) + +### 6.1 Grammar Extension +- [x] Add `record` rule to `Grammar/python.gram` +- [x] Choose syntax: `{| x=1, y=2 |}` +- [x] Add to `atom` rule with lookahead `&'{'` + +**Notes from session:** +- Reused `kwargs` from function call parsing for `x=1, y=2` syntax +- Had to use separate tokens `'{' '|'` not `'{|'` (multi-char tokens don't exist) +- Grammar rule: `| '{' '|' a=[kwargs] '|' '}' { _PyAST_Record(...) }` + +### 6.2 AST Node +- [x] Add `Record(expr* keys, expr* values)` to `Parser/Python.asdl` +- [x] Run `make regen-ast` to generate `_PyAST_Record` + +**Notes from session:** +- Modeled after `Dict(expr* keys, expr* values)` +- Regeneration creates `_PyAST_Record` function and `Record_kind` enum + +### 6.3 Parser Helpers +- [x] Create `_PyPegen_get_record_keys` in `Parser/pegen.c` +- [x] Create `_PyPegen_get_record_values` in `Parser/pegen.c` + +**Notes from session:** +- `kwargs` returns `KeywordOrStarred*` items, not `KeyValuePair*` like dict +- Had to unwrap each to get `keyword_ty`, then extract `.arg` (identifier) and `.value` (expression) +- For keys, create `Name` expression from the identifier + +### 6.4 Compiler Integration +- [x] Add `compiler_record` function in `Python/compile.c` +- [x] Add `case Record_kind:` to `compiler_visit_expr` +- [x] Add `case Record_kind:` to ALL switches that handle `Dict_kind`: + - `Python/ast.c` (2 switches!) + - `Python/ast_opt.c` + - `Python/ast_unparse.c` + - `Python/symtable.c` + - `Python/compile.c` (3 switches!) + +**Notes from session:** +- **Key lesson:** Every switch on expression kinds needs the new case - easy to miss some! +- `compiler_record`: Build names tuple as constant, VISIT each value, emit BUILD_RECORD +- Extracting keys: `e->v.Record.keys` contains `expr_ty` Name nodes, access with `key_expr->v.Name.id` + +### 6.5 Final Result +- [x] Literal syntax `{| x=1, y=2 |}` works +- [x] Emits BUILD_RECORD opcode +- [x] All features work with literal syntax + +**Notes from session:** +- Full pipeline working: Grammar → Parser → AST → Compiler → Bytecode → Execution +- Test: `r = {|x=1, y=2|}; print(r.x)` → `1` --- @@ -313,16 +407,26 @@ assert r == r2 --- -## Files We'll Create/Modify - -| File | Action | ~Lines | -|------|--------|--------| -| `Include/recordobject.h` | Create | 25 | -| `Objects/recordobject.c` | Create | 200 | -| `Lib/opcode.py` | Modify | 2 | -| `Python/ceval.c` | Modify | 30 | -| `Makefile.pre.in` | Modify | 5 | -| `Python/bltinmodule.c` | Modify | 10 | +## Files Created/Modified + +| File | Action | Notes | +|------|--------|-------| +| `Include/recordobject.h` | Create | Public API header | +| `Include/cpython/recordobject.h` | Create | Internal struct definition | +| `Include/Python.h` | Modify | Add include for recordobject.h | +| `Objects/recordobject.c` | Create | ~300 lines - type implementation | +| `Lib/opcode.py` | Modify | Add BUILD_RECORD = 166 | +| `Python/ceval.c` | Modify | BUILD_RECORD handler | +| `Python/compile.c` | Modify | compiler_record, stack effect, expr switches | +| `Makefile.pre.in` | Modify | Add recordobject.o and headers | +| `Python/bltinmodule.c` | Modify | SETBUILTIN for record | +| `Grammar/python.gram` | Modify | record rule, atom lookahead | +| `Parser/Python.asdl` | Modify | Record AST node | +| `Parser/pegen.c` | Modify | Helper functions for record keys/values | +| `Python/ast.c` | Modify | Record_kind in validation switches | +| `Python/ast_opt.c` | Modify | Record_kind case | +| `Python/ast_unparse.c` | Modify | append_ast_record | +| `Python/symtable.c` | Modify | Record_kind case | ---