Skip to content

Commit d3dd112

Browse files
committed
Merge branch 'master'
2 parents 7ff8dd4 + f85e33b commit d3dd112

File tree

110 files changed

+2976
-631
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+2976
-631
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ language runtime. The main focus is on user-observable behavior of the engine.
55

66
## Version 22.3.0
77
* Rename GraalPython to GraalPy. This change also updates the launchers we ship to include symlinks from `python` and `python3` to `graalpy`, for better integration with other tools.
8+
* New interpreter backend based on interpreting bytecode. This change should bring better startup performance and memory footprint while retaining good JIT-compiled performance. There is no support for GraalVM instrumentation tools on the bytecode backend yet, so using one of the instrumentation options (e.g. `--inspect`) falls back on the AST backend.
9+
* New parser generated from CPython's new PEG grammar definition. It brings better compatibility and enables us to implement the `ast` module.
10+
* Added support for tracing API (`sys.settrace`) which makes `pdb` and related tools work on GraalPy.
811
* Updated our pip support to automatically choose the best version for known packages. You can use `pip install pandas`, and pip will select the versions of pandas and numpy that we test in the GraalPy CI.
912

1013
## Version 22.2.0

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,13 @@ and interop with other languages, you can use the bundled releases from
2929
git clone https://github.com/graalvm/mx.git
3030
export PATH=$PWD/mx:$PATH
3131
```
32+
* LabsJDK
33+
34+
The following command will download and install JDKs to built GraalVM upon. If successfull, it will print the path to set into your JAVA_HOME.
35+
```shell
36+
mx fetch-jdk
37+
```
38+
3239

3340
#### Building
3441

bisect-benchmark.ini

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22
# This is the configuration file for bisecting benchmark jobs in the CI.
33
# Usage:
44
# - Create a temporary branch based on the main branch (or the bad commit)
5-
# - Fill in this configuration file, commit the changes and push it
5+
# - Fill in this configuration file, preferably using the automated script
6+
# graalpython-apptests/scripts/create-bisect-config
7+
# - Commit and push the file
68
# - The push command output should give you a link to create a PR. Open it, but
79
# don't create a PR. Instead, you should execute the job on your commit using
810
# "Actions->Shcedule CI jobs" in the commit list. You may need to wait a bit

ci.jsonnet

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
{ "overlay": "17edbb52412c55f5a3903eededc75bf33371bd49" }
1+
{ "overlay": "55b5a0614a8d46864bb0197b92d2e20033e58ed7" }

docs/contributor/IMPLEMENTATION_DETAILS.md

Lines changed: 0 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,5 @@
11
# Implementation Details
22

3-
## Abstract Operations on Python Objects
4-
5-
Many generic operations on Python objects in CPython are defined in the header
6-
files `object.h` and `abstract.h`. These operations are widely used and their
7-
interplay and intricacies are the cause for the conversion, error message, and
8-
control flow bugs when not mimicked correctly. Our current approach is to
9-
provide many of these abstract operations as part of the `PythonObjectLibrary`.
10-
11-
### Common operations in the PythonObjectLibrary
12-
13-
The code has evolved over time, so not all built-in nodes are prime examples of
14-
messages that should be used from the PythonObjectLibrary. We are refactoring
15-
this as we go, but here are a few examples for things you can (or should soon be
16-
able to) use the PythonObjectLibrary for:
17-
18-
- casting and coercion to `java.lang.String`, array-sized Java `int`, Python
19-
index, fileno, `double`, filesystem path, iterator, and more
20-
- reading the class of an object
21-
- accessing the `__dict__` attribute of an object
22-
- hashing objects and testing for equality
23-
- testing for truthy-ness
24-
- getting the length
25-
- testing for abstract types such as `mapping`, `sequence`, `callable`
26-
- invoking methods or executing callables
27-
- access objects through the buffer protocol
28-
29-
### PythonObjectLibrary functions with and without state
30-
31-
Usually, there are at least two messages for each operation - one that takes a
32-
`ThreadState` argument, and one that doesn't. The intent is to allow passing of
33-
exception state and caller information similar to how we do it with the `PFrame`
34-
argument even across library messages, which cannot take a VirtualFrame.
35-
36-
All nodes that are used in message implementations must allow uncached
37-
usage. Often (e.g. in the case of the generic `CallNode`) they offer execute
38-
methods with and without frames. If a `ThreadState` was passed to the message, a
39-
frame to pass to the node can be reconstructed using
40-
`PArguments.frameForCall(threadState)`. Here's an example:
41-
42-
```java
43-
@ExportMessage
44-
long messageWithState(ThreadState state,
45-
@Cached CallNode callNode) {
46-
Object callable = ...
47-
48-
if (state != null) {
49-
return callNode.execute(PArguments.frameForCall(state), callable, arguments);
50-
} else {
51-
return callNode.execute(callable, arguments);
52-
}
53-
}
54-
```
55-
56-
*Note*: It is **always** preferable to call an `execute` method with a
57-
`VirtualFrame` when both one with and without exist! The reason is that this
58-
avoids materialization of the frame state in more cases, as described on the
59-
section on Python's global thread state above.
60-
61-
### Other libraries in the codebase
62-
63-
Accessing hashing storages (the storage for `dict`, `set`, and `frozenset`)
64-
should be done via the `HashingStorageLibrary`. We are in the process of
65-
creating a `SequenceStorageLibrary` for sequence types (`tuple`, `list`) to
66-
replace the `SequenceStorageNodes` collection of classes.
67-
683
## Python Global Thread State
694

705
In CPython, each stack frame is allocated on the heap, and there's a global

docs/contributor/MISSING.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@ This is just a snapshot as of 2021-07-29.
3636

3737
#### These we should re-implement
3838
* **_codecs_cn, _codecs_hk, _codecs_iso2022, _codecs_jp, _codecs_kr, _codecs_tw, _multibytecodec**: We can just use our own codecs
39-
* **_ctypes, _ctypes_test**: Work in progress
4039
* **_string**: Empty right now, but its only two methods that we can re-implement
4140
* **_tracemalloc**: Memory allocation tracing, we should substitute with the Truffle instrument.
4241
* **_uuid**: Can be implemented ourselves, is just 1 function
@@ -47,25 +46,20 @@ This is just a snapshot as of 2021-07-29.
4746
* **parser**: We need to implement this for our parser
4847

4948
### Incompleteness on our part:
50-
* **_ast**: Used in various places, including the help system. Would be nice to support, ours is an empty shell
51-
* **_contextvars**: Very incomplete
52-
* **_multiprocessing**: Work in progress
49+
* **_contextvars**: Work in progress
5350
* **_signal**: Work in progress
5451
* **mmap**: We use this as a mixture from the C module, Python, and Java code. Needs major optimizations.
5552
* **resource**: This is about resources, there should be Truffle APIs for this (there are issues open)
5653
* **unicodedata**: A bit incomplete, but not difficult. Maybe should use a Java ICU library
5754

5855
### Basically complete or easy to make so
59-
* **_collections**: We've mostly implemented this in Python (a lot is taken from PyPy), but should intrinsify the module in Java for better performance
6056
* **_md5**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
6157
* **_random**
6258
* **_sha1**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
6359
* **_sha256**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
6460
* **_sha512**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
6561
* **binascii**: Just missing a few methods
66-
* **codecs**
6762
* **functools**: Missing a few functions, we mostly implemented it in Python, but should intrinsify the module in Java for better performance
6863
* **itertools**: We mostly just implement all this in Python (a lot is taken from PyPy), but should intrinsify the module in Java for better performance
6964
* **locale**: Partially Truffle APIs, should probably use more to play nice for embedders
7065
* **readline**: We re-implemented this in terms of JLine used in our launcher
71-
* **zipimport**: We have reimplemented this, but Python 3.8 is moving to a pure-Python impl that we can use

docs/user/ParserDetails.md

Lines changed: 2 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -8,30 +8,6 @@ permalink: /reference-manual/python/ParserDetails/
88

99
This guide elaborates on how Python files are parsed on the GraalVM Python runtime.
1010

11-
## Parser Performance
12-
13-
#### Loading code from serialized `.pyc` files is faster than parsing the `.py` file using ANTLR.
14-
15-
Creating the abstract syntax tree (AST) for a Python source has two phases.
16-
The first one creates a simple syntax tree (SST) and a scope tree.
17-
The second phase transforms the SST to the [Truffle Language Implementation framework](https://github.com/oracle/graal/blob/master/truffle/docs/README.md) tree.
18-
19-
For the transformation, the scope tree it needed.
20-
The scope tree contains scope locations for variable and function definitions, and information about scopes.
21-
The simple syntax tree contains nodes mirroring the source.
22-
Comparing the SST and the Language Implementation framework tree, the SST is much smaller.
23-
It contains just the nodes representing the source in a simple way.
24-
One SST node is usually translated to many the Language Implementation framework nodes.
25-
26-
The simple syntax tree can be created in two ways: with ANTLR parsing, or deserialization from an appropriate `*.pyc` file.
27-
If there is no appropriate `.pyc` file for a source, then the source is parsed with ANTLR.
28-
If the Python standard import logic finds an appropriate `.pyc` file, it will just trigger deserialization of the SST and scope tree from it.
29-
30-
The deserialization is much faster than source parsing with ANTLR and needs only roughly 30% of the time that ANTLR needs.
31-
Of course, the first import of a new file is a little bit slower -- besides parsing with ANTLR, the Python standard library import logic serializes the resulting code object to a `.pyc` file, which in our case means
32-
the SST and scope tree are serialized such a file.
33-
34-
3511
## Creating and Managing pyc Files
3612

3713
#### `.pyc` files are created automatically by the GraalVM Python runtime when no or an invalid `.pyc` file is found matching the desired `.py` file.
@@ -48,7 +24,7 @@ The hashcode is generated only based on the Python source by calling `source.has
4824
The `.pyc` files are also regenerated if a magic number in the Python parser is changed.
4925
The magic number is hard-coded in the Python source and can not be changed by the user (unless of course that user has access to the bytecode of Python).
5026

51-
The developers of GraalVM's Python runtime change the magic number when the format of SST or scope tree binary data is altered.
27+
The developers of GraalVM's Python runtime change the magic number when the bytecode format changes.
5228
This is an implementation detail, so the magic number does not have to correspond to the version of GraalVM's Python runtime (just like in CPython).
5329
The magic number of pyc is a function of the concrete Python runtime Java code that is running.
5430

@@ -76,29 +52,7 @@ top_folder
7652
By default the `__pycache__` directory is created on the same directory level as a source code file and in this directory all `.pyc` files from the same directory are stored.
7753
This folder may store `.pyc` files created with different versions of Python (including, e.g., CPython), so the user may see files ending in `*.cpython3-6.pyc` for example.
7854

79-
The current implementation also includes a copy of the original source text in the `.pyc` file.
80-
This is a minor performance optimization so you can create a `Source` object with the path to the original source file, but you do not need to read the original `*.py` file, which speeds up the process obtaining the Language Implementation framework tree (just one file is read).
81-
The structure of a `.graalpy.pyc` file is this:
82-
```python
83-
MAGIC_NUMBER
84-
source text
85-
binary data - scope tree
86-
binary data - simple syntax tree
87-
```
88-
89-
Note that the `.pyc` files are not an effective means to hide Python library source code from guest code, since the original source can still be recovered.
90-
Even if the source were omitted, the syntax tree contains enough information to decompile into source code easily.
91-
92-
The serialized SST and scope tree are stored in a Python `code` object as well, as the content of the attribute `co_code` (which contains bytecode on CPython). For example:
93-
```python
94-
>>> def add(x, y):
95-
... return x+y
96-
...
97-
>>> add.__code__.co_code
98-
b'\x01\x00\x00\x02[]K\xbf\xd1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 ...'
99-
```
100-
101-
#### `.pyc` files are largely managed automatically by the runtime in a manner compatible to CPython. Like on CPython there are options to specify their location, and if they should be written at all, and both of these options can be changed by guest code.
55+
`.pyc` files are largely managed automatically by the runtime in a manner compatible to CPython. Like on CPython there are options to specify their location, and if they should be written at all, and both of these options can be changed by guest code.
10256

10357
The creation of `*.pyc` files can be controlled in the same ways as on CPython
10458
(c.f. https://docs.python.org/3/using/cmdline.html):
@@ -134,10 +88,6 @@ files must be removed by the embedder as required.
13488

13589
## Security Considerations
13690

137-
The serialization of SST and scope tree is hand-written and during deserialization, it is not possible to load classes other than SST Nodes.
138-
Java serialization or other frameworks are not used to serialize Java objects.
139-
The main reason is performance, but this has the effect that no class loading can be forced by a maliciously crafted `.pyc` file.
140-
14191
All file operations (obtaining the data, timestamps, and writing `pyc` files)
14292
are done through the [FileSystem API](https://www.graalvm.org/sdk/javadoc/org/graalvm/polyglot/io/FileSystem.html). Embedders can modify all of these operations by means of custom (e.g., read-only) `FileSystem` implementations.
14393
The embedder can also effectively disable the creation of `.pyc` files by disabling I/O permissions for GraalVM's Python runtime.

docs/user/Tooling.md

Lines changed: 8 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,18 @@ link_title: Tooling Support for Python
55
permalink: /reference-manual/python/Tooling/
66
---
77
# Tooling Support for Python
8-
9-
GraalVM's Python runtime is incomplete and cannot launch the standard Python debugger `pdb`.
10-
However, it can run the tools that GraalVM provides.
11-
The `graalpy --help:tools` command will give you more information about tools currently supported on Python.
8+
GraalVM Python runtime can run many standard Python tools as well as tools from the GraalVM ecosystem.
9+
The `graalpy --help:tools` command will give you more information about GraalVM tools currently supported on Python.
1210

1311
## Debugger
12+
The built-in `breakpoint()` function will use `pdb` by default.
1413

15-
To enable debugging, pass the `--inspect` option to the `graalpy` launcher.
16-
For example:
17-
```shell
18-
graalpy --inspect -c "breakpoint(); import os; os.exit()"
19-
Debugger listening on port 9229.
20-
To start debugging, open the following URL in Chrome:
21-
chrome-devtools://devtools/bundled/js_app.html?ws=127.0.1.1:9229/76fcb6dd-35267eb09c3
22-
```
14+
### PDB
15+
The standard python debugger `pdb` is supported on GraalVM. Refer to the offical [PDB documentation](https://docs.python.org/3/library/pdb.html) for usage.
2316

24-
The standard Python built-in `breakpoint()` will work using the [GraalVM's Chrome Inspector](https://github.com/oracle/graal/blob/master/docs/tools/chrome-debugger.md) implementation.
25-
You can inspect variables, set watch expressions, interactively evaluate code snippets, etc.
26-
However, this only works if you pass `--inspect` or some other inspect option. Otherwise, `pdb` is triggered as on CPython (and does not currently work).
17+
### Chrome Inspector
18+
To enable [GraalVM's Chrome Inspector](https://github.com/oracle/graal/blob/master/docs/tools/chrome-debugger.md) debugger, pass the `--inspect` option to the `graalpy` launcher.
19+
The built-in `breakpoint()` function will work using the Chrome Inspector implementation when `--inspect` is passed.
2720

2821
## Code Coverage
2922

graalpython/com.oracle.graal.python.shell/src/com/oracle/graal/python/shell/GraalPythonMain.java

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,8 @@ public static void main(String[] args) {
104104

105105
boolean useASTInterpreter = false;
106106

107+
private String toolInstrumentWarning = null;
108+
107109
protected static void setStartupTime() {
108110
if (GraalPythonMain.startupNanoTime == -1) {
109111
GraalPythonMain.startupNanoTime = System.nanoTime();
@@ -124,7 +126,6 @@ protected List<String> preprocessArguments(List<String> givenArgs, Map<String, S
124126
List<String> subprocessArgs = new ArrayList<>();
125127
programArgs = new ArrayList<>();
126128
boolean posixBackendSpecified = false;
127-
boolean toolInstrumentWarning = true;
128129
for (Iterator<String> argumentIterator = arguments.iterator(); argumentIterator.hasNext();) {
129130
String arg = argumentIterator.next();
130131
if (arg.startsWith("-")) {
@@ -203,10 +204,7 @@ protected List<String> preprocessArguments(List<String> givenArgs, Map<String, S
203204
if (argStartsWith(arg, "--agentscript", "--coverage", "--cpusampler", "--cputracer", "--dap",
204205
"--heap.", "--heapmonitor", "--insight", "--inspect", "--lsp", "--memtracer", "--sandbox.")) {
205206
useASTInterpreter = true;
206-
if (toolInstrumentWarning) {
207-
toolInstrumentWarning = false;
208-
System.out.println("WARNING: Switching to AST interpreter due to instruments option " + arg);
209-
}
207+
toolInstrumentWarning = "WARNING: Switching to AST interpreter due to instruments option " + arg;
210208
} else if (arg.startsWith("--llvm.") ||
211209
matchesPythonOption(arg, "CoreHome") ||
212210
matchesPythonOption(arg, "StdLibHome") ||
@@ -620,6 +618,10 @@ protected void launch(Builder contextBuilder) {
620618
contextBuilder.option("python.PyCachePrefix", "/dev/null");
621619
contextBuilder.option("python.EnableBytecodeInterpreter", "false");
622620
contextBuilder.option("python.DisableFrozenModules", "true");
621+
if (toolInstrumentWarning != null) {
622+
System.out.println(toolInstrumentWarning);
623+
toolInstrumentWarning = null;
624+
}
623625
} else {
624626
contextBuilder.option("python.DontWriteBytecodeFlag", Boolean.toString(dontWriteBytecode));
625627
if (cachePrefix != null) {

0 commit comments

Comments
 (0)