You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,9 @@ language runtime. The main focus is on user-observable behavior of the engine.
5
5
6
6
## Version 22.3.0
7
7
* Rename GraalPython to GraalPy. This change also updates the launchers we ship to include symlinks from `python` and `python3` to `graalpy`, for better integration with other tools.
8
+
* New interpreter backend based on interpreting bytecode. This change should bring better startup performance and memory footprint while retaining good JIT-compiled performance. There is no support for GraalVM instrumentation tools on the bytecode backend yet, so using one of the instrumentation options (e.g. `--inspect`) falls back on the AST backend.
9
+
* New parser generated from CPython's new PEG grammar definition. It brings better compatibility and enables us to implement the `ast` module.
10
+
* Added support for tracing API (`sys.settrace`) which makes `pdb` and related tools work on GraalPy.
8
11
* Updated our pip support to automatically choose the best version for known packages. You can use `pip install pandas`, and pip will select the versions of pandas and numpy that we test in the GraalPy CI.
Copy file name to clipboardExpand all lines: docs/contributor/MISSING.md
+1-7Lines changed: 1 addition & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,6 @@ This is just a snapshot as of 2021-07-29.
36
36
37
37
#### These we should re-implement
38
38
***_codecs_cn, _codecs_hk, _codecs_iso2022, _codecs_jp, _codecs_kr, _codecs_tw, _multibytecodec**: We can just use our own codecs
39
-
***_ctypes, _ctypes_test**: Work in progress
40
39
***_string**: Empty right now, but its only two methods that we can re-implement
41
40
***_tracemalloc**: Memory allocation tracing, we should substitute with the Truffle instrument.
42
41
***_uuid**: Can be implemented ourselves, is just 1 function
@@ -47,25 +46,20 @@ This is just a snapshot as of 2021-07-29.
47
46
***parser**: We need to implement this for our parser
48
47
49
48
### Incompleteness on our part:
50
-
***_ast**: Used in various places, including the help system. Would be nice to support, ours is an empty shell
51
-
***_contextvars**: Very incomplete
52
-
***_multiprocessing**: Work in progress
49
+
***_contextvars**: Work in progress
53
50
***_signal**: Work in progress
54
51
***mmap**: We use this as a mixture from the C module, Python, and Java code. Needs major optimizations.
55
52
***resource**: This is about resources, there should be Truffle APIs for this (there are issues open)
56
53
***unicodedata**: A bit incomplete, but not difficult. Maybe should use a Java ICU library
57
54
58
55
### Basically complete or easy to make so
59
-
***_collections**: We've mostly implemented this in Python (a lot is taken from PyPy), but should intrinsify the module in Java for better performance
60
56
***_md5**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
61
57
***_random**
62
58
***_sha1**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
63
59
***_sha256**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
64
60
***_sha512**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
65
61
***binascii**: Just missing a few methods
66
-
***codecs**
67
62
***functools**: Missing a few functions, we mostly implemented it in Python, but should intrinsify the module in Java for better performance
68
63
***itertools**: We mostly just implement all this in Python (a lot is taken from PyPy), but should intrinsify the module in Java for better performance
69
64
***locale**: Partially Truffle APIs, should probably use more to play nice for embedders
70
65
***readline**: We re-implemented this in terms of JLine used in our launcher
71
-
***zipimport**: We have reimplemented this, but Python 3.8 is moving to a pure-Python impl that we can use
This guide elaborates on how Python files are parsed on the GraalVM Python runtime.
10
10
11
-
## Parser Performance
12
-
13
-
#### Loading code from serialized `.pyc` files is faster than parsing the `.py` file using ANTLR.
14
-
15
-
Creating the abstract syntax tree (AST) for a Python source has two phases.
16
-
The first one creates a simple syntax tree (SST) and a scope tree.
17
-
The second phase transforms the SST to the [Truffle Language Implementation framework](https://github.com/oracle/graal/blob/master/truffle/docs/README.md) tree.
18
-
19
-
For the transformation, the scope tree it needed.
20
-
The scope tree contains scope locations for variable and function definitions, and information about scopes.
21
-
The simple syntax tree contains nodes mirroring the source.
22
-
Comparing the SST and the Language Implementation framework tree, the SST is much smaller.
23
-
It contains just the nodes representing the source in a simple way.
24
-
One SST node is usually translated to many the Language Implementation framework nodes.
25
-
26
-
The simple syntax tree can be created in two ways: with ANTLR parsing, or deserialization from an appropriate `*.pyc` file.
27
-
If there is no appropriate `.pyc` file for a source, then the source is parsed with ANTLR.
28
-
If the Python standard import logic finds an appropriate `.pyc` file, it will just trigger deserialization of the SST and scope tree from it.
29
-
30
-
The deserialization is much faster than source parsing with ANTLR and needs only roughly 30% of the time that ANTLR needs.
31
-
Of course, the first import of a new file is a little bit slower -- besides parsing with ANTLR, the Python standard library import logic serializes the resulting code object to a `.pyc` file, which in our case means
32
-
the SST and scope tree are serialized such a file.
33
-
34
-
35
11
## Creating and Managing pyc Files
36
12
37
13
#### `.pyc` files are created automatically by the GraalVM Python runtime when no or an invalid `.pyc` file is found matching the desired `.py` file.
@@ -48,7 +24,7 @@ The hashcode is generated only based on the Python source by calling `source.has
48
24
The `.pyc` files are also regenerated if a magic number in the Python parser is changed.
49
25
The magic number is hard-coded in the Python source and can not be changed by the user (unless of course that user has access to the bytecode of Python).
50
26
51
-
The developers of GraalVM's Python runtime change the magic number when the format of SST or scope tree binary data is altered.
27
+
The developers of GraalVM's Python runtime change the magic number when the bytecode format changes.
52
28
This is an implementation detail, so the magic number does not have to correspond to the version of GraalVM's Python runtime (just like in CPython).
53
29
The magic number of pyc is a function of the concrete Python runtime Java code that is running.
54
30
@@ -76,29 +52,7 @@ top_folder
76
52
By default the `__pycache__` directory is created on the same directory level as a source code file and in this directory all `.pyc` files from the same directory are stored.
77
53
This folder may store `.pyc` files created with different versions of Python (including, e.g., CPython), so the user may see files ending in `*.cpython3-6.pyc` for example.
78
54
79
-
The current implementation also includes a copy of the original source text in the `.pyc` file.
80
-
This is a minor performance optimization so you can create a `Source` object with the path to the original source file, but you do not need to read the original `*.py` file, which speeds up the process obtaining the Language Implementation framework tree (just one file is read).
81
-
The structure of a `.graalpy.pyc` file is this:
82
-
```python
83
-
MAGIC_NUMBER
84
-
source text
85
-
binary data - scope tree
86
-
binary data - simple syntax tree
87
-
```
88
-
89
-
Note that the `.pyc` files are not an effective means to hide Python library source code from guest code, since the original source can still be recovered.
90
-
Even if the source were omitted, the syntax tree contains enough information to decompile into source code easily.
91
-
92
-
The serialized SST and scope tree are stored in a Python `code` object as well, as the content of the attribute `co_code` (which contains bytecode on CPython). For example:
#### `.pyc` files are largely managed automatically by the runtime in a manner compatible to CPython. Like on CPython there are options to specify their location, and if they should be written at all, and both of these options can be changed by guest code.
55
+
`.pyc` files are largely managed automatically by the runtime in a manner compatible to CPython. Like on CPython there are options to specify their location, and if they should be written at all, and both of these options can be changed by guest code.
102
56
103
57
The creation of `*.pyc` files can be controlled in the same ways as on CPython
@@ -134,10 +88,6 @@ files must be removed by the embedder as required.
134
88
135
89
## Security Considerations
136
90
137
-
The serialization of SST and scope tree is hand-written and during deserialization, it is not possible to load classes other than SST Nodes.
138
-
Java serialization or other frameworks are not used to serialize Java objects.
139
-
The main reason is performance, but this has the effect that no class loading can be forced by a maliciously crafted `.pyc` file.
140
-
141
91
All file operations (obtaining the data, timestamps, and writing `pyc` files)
142
92
are done through the [FileSystem API](https://www.graalvm.org/sdk/javadoc/org/graalvm/polyglot/io/FileSystem.html). Embedders can modify all of these operations by means of custom (e.g., read-only) `FileSystem` implementations.
143
93
The embedder can also effectively disable the creation of `.pyc` files by disabling I/O permissions for GraalVM's Python runtime.
The standard python debugger `pdb` is supported on GraalVM. Refer to the offical [PDB documentation](https://docs.python.org/3/library/pdb.html) for usage.
23
16
24
-
The standard Python built-in `breakpoint()` will work using the [GraalVM's Chrome Inspector](https://github.com/oracle/graal/blob/master/docs/tools/chrome-debugger.md) implementation.
25
-
You can inspect variables, set watch expressions, interactively evaluate code snippets, etc.
26
-
However, this only works if you pass `--inspect` or some other inspect option. Otherwise, `pdb` is triggered as on CPython (and does not currently work).
17
+
### Chrome Inspector
18
+
To enable [GraalVM's Chrome Inspector](https://github.com/oracle/graal/blob/master/docs/tools/chrome-debugger.md) debugger, pass the `--inspect` option to the `graalpy` launcher.
19
+
The built-in `breakpoint()` function will work using the Chrome Inspector implementation when `--inspect` is passed.
0 commit comments