Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
6a99107
Parsing VM initial structure
gvanrossum May 26, 2020
8ffc315
Hook things up so we can test it
gvanrossum May 27, 2020
a9ba946
Add debugging printf()s in anger; fix bugs
gvanrossum May 27, 2020
8635a4c
Implement actions
gvanrossum May 27, 2020
1a34aeb
Support optional tokens
gvanrossum May 27, 2020
b4a2292
Support optional rules
gvanrossum May 27, 2020
b1f2a03
Add vmreadme.md; OP_SUCCESS has an argument
gvanrossum May 28, 2020
50aa868
Do optional items differently (with a postfix op)
gvanrossum May 28, 2020
320266b
Compute start/end line/col numbers; add some ideas to vmreadme.md
gvanrossum May 28, 2020
5bf3f9c
Tighten the code; add some speculation to vmreadme
gvanrossum May 28, 2020
816db75
Add OP_NOOP; add enums for rules & actions
gvanrossum May 28, 2020
d2a980f
Implement loops
gvanrossum May 29, 2020
2befb12
Add a few more rules to the grammar
gvanrossum May 29, 2020
175a127
Drop debug printf()s, more flexibility in parse_string()
gvanrossum May 29, 2020
f3f1665
Add memoization, some debug niceties
gvanrossum May 29, 2020
bda3517
Inline helper functions
gvanrossum May 29, 2020
881d756
Explain OP_OPTIONAL better
gvanrossum May 29, 2020
94cfb95
Skeleton of code generator
pablogsal May 29, 2020
4ba6c61
Simplify structure of OP_SUCCESS
gvanrossum May 29, 2020
db628e1
Move opcodes around
gvanrossum May 29, 2020
7798629
Add a 'grammar' for operations
gvanrossum May 29, 2020
412c741
Move generated part of vm.h into vmparse.h
gvanrossum May 29, 2020
020fbd1
Merge branch 'pegenvm_generator' into pegenvm
lysnikolaou May 29, 2020
8c7cffc
Merge branch 'master' into pegenvm
lysnikolaou May 29, 2020
ef1fabd
Clean skeleton of vm_generator
pablogsal May 30, 2020
b497b23
Merge branch 'pegenvm' of github.com:we-like-parsers/cpython into peg…
lysnikolaou May 30, 2020
8eab7e0
Better formatting of generated file; remove unneeded indentation
lysnikolaou May 30, 2020
4a5f823
Add OP_LOOP_COLLECT_NONEMPTY -- used for a+
gvanrossum May 30, 2020
3cee73c
Expand description of root rules
gvanrossum May 30, 2020
9b96df0
Initial support for repeat_0
pablogsal May 30, 2020
2f44ee9
Fix name rules for repeat0 nodes
pablogsal May 30, 2020
6dc7092
Eliminate OP_LOOP_START
gvanrossum May 30, 2020
9e7e12e
Do fewer reallocs (at the cost of an extra int per frame)
gvanrossum May 30, 2020
52f5a75
Speculate how to implement a.b+
gvanrossum May 30, 2020
3b93237
Make memo rule types distinct from token types
gvanrossum May 30, 2020
33522ae
Fix small issues in vmreadme.pm
gvanrossum May 30, 2020
cbd45a5
Add generation of root rules (very coarssely)
gvanrossum May 31, 2020
1a6531d
Add enum for rule types (R_)
gvanrossum May 31, 2020
0226dd5
Generate actions (primitively)
gvanrossum May 31, 2020
c64ff2f
Implement code generation for keywords
lysnikolaou May 31, 2020
e2c4a36
Refactor add_opcode to optionally accept a second argument
lysnikolaou May 31, 2020
21d8b83
Translate item names in actions; use the generated vmparse.h!
gvanrossum May 31, 2020
6fe4f0e
Fix mypy (in vm_generator)
gvanrossum May 31, 2020
6205002
Avoid name conflict for 'f'
gvanrossum May 31, 2020
f72c7b6
Generate code for repeat1 loops
gvanrossum Jun 1, 2020
fc3d4c4
Implement delimited loops (b.a+)
gvanrossum Jun 1, 2020
6c50468
Generate code for delimited loop
gvanrossum Jun 1, 2020
a10babb
Implement soft keywords (hand-written and code generation) (#129)
lysnikolaou Jun 1, 2020
b13169b
Update generated vmparse.h
gvanrossum Jun 1, 2020
c2a7cf6
Fix code generation for if_stmt
gvanrossum Jun 1, 2020
a862a69
Implement lookahead ops
gvanrossum Jun 1, 2020
ab863df
Generate code for lookaheads (only one token supported!)
gvanrossum Jun 1, 2020
1be4b62
Implement left-recursion (with hand-coded vmparse.h)
gvanrossum Jun 2, 2020
b53c2a4
Code generation for left-recursive rules
gvanrossum Jun 2, 2020
7a59aa0
Allow specifying different grammars
gvanrossum Jun 2, 2020
cac4149
Generate code for 'cut'
gvanrossum Jun 2, 2020
180dfff
Support groups and optional in code generator
gvanrossum Jun 2, 2020
e01e643
There's no need to special-case -> in actions
gvanrossum Jun 2, 2020
13c3bbb
Treat TYPE_COMMENT as a token (since it is)
gvanrossum Jun 3, 2020
65304e6
Generate code for Grammar/parser.gram
gvanrossum Jun 3, 2020
10f7be1
Group every opcode with its argument (#131)
pablogsal Jun 3, 2020
a9a4115
Add vm target to pegen script to generate the vm parser (#130)
lysnikolaou Jun 3, 2020
5bb2f57
Selective memoization
gvanrossum Jun 3, 2020
ba8783b
Don't call is_memoized in OP_RETURN_LEFT_REC
gvanrossum Jun 3, 2020
bccd5c8
Different way of doing left-recursion
gvanrossum Jun 5, 2020
2a138cc
Merge remote-tracking branch 'upstream/master' into pegenvm
gvanrossum Aug 16, 2020
2394b96
Remove leftover conflict markers
gvanrossum Aug 16, 2020
be34499
Fix deps for vm.o
gvanrossum Aug 16, 2020
b9394e4
Fix includes for vm.c
gvanrossum Aug 16, 2020
b64113b
Regenerated vmparse.h
gvanrossum Aug 16, 2020
9afa67e
Merge remote-tracking branch 'pegen/pegenvm'
pablogsal Aug 1, 2022
a65604e
bpo-40222: Mark exception table function in the dis module as private
pablogsal Aug 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add vmreadme.md; OP_SUCCESS has an argument
  • Loading branch information
gvanrossum committed May 28, 2020
commit b1f2a03d157418935ea174fc1eca942fa03f3706
4 changes: 2 additions & 2 deletions Parser/pegen/vm.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,29 +3,29 @@ typedef enum _opcodes {
OP_NUMBER,
OP_STRING,
OP_CUT,
OP_SUCCESS,
OP_FAILURE,
// The rest have an argument
OP_TOKEN,
OP_TOKEN_OPTIONAL,
OP_RULE,
OP_RULE_OPTIONAL,
OP_RETURN,
OP_SUCCESS,
} Opcode;

char *opcode_names[] = {
"OP_NAME",
"OP_NUMBER",
"OP_STRING",
"OP_CUT",
"OP_SUCCESS",
"OP_FAILURE",
// The rest have an argument
"OP_TOKEN",
"OP_TOKEN_OPTIONAL",
"OP_RULE",
"OP_RULE_OPTIONAL",
"OP_RETURN",
"OP_SUCCESS",
};

typedef struct _rule {
Expand Down
116 changes: 116 additions & 0 deletions Parser/pegen/vmreadme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
Pegen Virtual Machine
=====================

The Pegen VM is an alternative for the recursive-descent Pegen parser.
The idea is that the grammar (including actions) is identical, but
execution does not use the C stack frame. The hope is that this might
be faster, but this is far from given. If it is clear that it will
*not* be faster, we should stop working on this project.

The runtime uses the same `Parser` structure as the recursive-descent
Pegen parser, and the same helper functions
(e.g., `_PyPegen_singleton_seq`).

The VM uses a stack to hold state during parsing. The grammar is
represented by a few read-only tables. This design may be familiar
from that of CPython's "ceval" VM. The actions are represented by a
function containing a giant switch with one case for each action.

The grammar tables and the action function are meant to be generated
by a parser generator similar to the current one. Because of the
actions, the generator needs to generate C code.

The primary VM state is a stack of `Frame` structures. Each frame
represents a particular attempt to parse a rule at a given point in
the input. The only state separate from the stack is a pointer to the
`Parser` structure.

The main state in a frame is as follows:

- `Rule *rule` -- points to the rule being parsed
- `int mark` -- the input position at the start of the rule invocation
- `int ialt` -- indicates which alternative is currently being tried
- `int iop` -- indicates where we are in the current alternative
- `int cut` -- whether a "cut" was executed in the current alternative

(We'll need state related to line numbers and column offsets as well.)

Note that `rule` and `mark` don't change after the frame is initialized.

In addition each frame has a "value stack" where successfully
recognized tokens and rules are stored. This uses:

- `int ival` -- number of values stored so far
- `void *vals[]` -- values stored (the type is `Token *` or an AST node type)

A `Rule` structure has the following fields:

- `char *name` -- rule name, for debugging (e.g., `"start"`)
- `int alts[]` -- index into `opcodes` array for each alternative,
terminated by `-1`
- `int opcodes[]` -- array of opcodes and their arguments

All rules are collected in a single array; the index in this array
is used by operations that reference other rules.

The `opcodes` array is a sequence of operation codes and arguments.
Some opcodes (e.g., `OP_TOKEN`) are followed by an argument; others
(e.g., `OP_NAME`) are not. Both are representable as integers.

Opcodes
-------

Most operations can succeed or fail, and produce a vaue if they
succeed.

If an operation succeeds, the value is appended to the frame's values
stack, and the VM proceeds to the next opcode.

If an operation fails, the VM resets the input to the frame's mark,
and then proceeds to the next alternative of the frame's rule, if
there is one, and the frame's `cut` flag is not set. If the frame's
`cut` flag is set, or if its rule has no more alternatives, the frame
is popped off the frame stack and the VM proceeds with failure there.

Some operations manipulate other frame fields.

Calls into the support runtime can also produce *errors* -- when an
error is detected, the VM exits, immediately returning `NULL`.

The following opcodes take no argument.

- `OP_NAME` -- call `_PyPegen_name_token()`; fail if it returns
`NULL`, otherwise succeeds with the return value.

- `OP_NUMBER` -- call `_PyPegen_number_token()`; same as `OP_NAME`.

- `OP_STRING` -- call `_PyPegen_string_token()`; same as `OP_NAME`.

- `OP_CUT` -- set the frame's `cut` flag; always succeeds, without a
value.

- `OP_FAILURE` -- report a syntax error and exit the VM.

These opcodes are followed by a single integer argument.

- `OP_TOKEN` -- call `_PyPegen_expect_token()` with the argument;
processing is the same as for `OP_NAME`.

- `OP_TOKEN_OPTIONAL` -- like `OP_TOKEN` but instead of failing,
succeed with a `NULL` value.

- `OP_RULE` -- push a new frame onto the stack, initializing it with
the given rule, the current input position (mark), at the first
alternative and opcode. Then proceed to the first operation of the
new frame.

- `OP_RULE_OPTIONAL` -- this is to `OP_RULE` as `OP_TOKEN_OPTIONAL` is
to `OP_TOKEN`.

- `OP_RETURN` -- call the action given by the argument, then pop the
frame off the stack. Execution then proceeds (in the frame newly
revealed by that pop operation) as if the previous operation
succeeded or failed with the return value of the action.

- `OP_SUCCESS` -- call the action given by the argument, and exit the
VM with its return value as result.