This is the artifact of the research paper, "Co-Evolution of Types and Dependencies: Towards Repository-Level Type Inference for Python Code", submitted to FSE 2026.
- Python version == 3.10.0. Although our method is not strictly dependent on the Python version, the behavior of the
ASTmodule can vary slightly across different versions. To reproduce our experiments and avoid unexpected behavior, we recommend using Python 3.10.0, which is the same version used in our experimental environment. - You should check our dependent python libraries in
requirements.txtand runpip install -r requirements.txtto install them.
Run the following command to extract the dataset.
tar -xzvf dataset.tar.gz -C ./dataIf you wish to use your own Large Language Model (LLM) to reproduce our experiments, please modify the corresponding configurations in src/type_llm/utils/config.py, including:
- BASE_URL
- MODEL
- API_KEY
If you do not wish to use your own LLM, you can also use our intermediate results (the conversation logs with the LLM) to reproduce the experiments on the example project:
tar -xzvf LLM_Results.tar.gz -C ./data/intermediateRun the following command to navigate to the source code directory and set the PYTHONPATH.
cd src; export PYTHONPATH=.*If you are using your own LLM, you can edit the value of projects in src/type_llm/utils/config.py and follow the same procedure to reproduce our experiments on other projects.
First, we invoke PyAnalyzer to analyze the mapping relationships between variable references and definitions.
python type_llm/preprocessing/call_PyAnalyzer.pyThen, based on these mappings, we construct the initial EDG.
python type_llm/methods/full_LARRY/PA2EG.pypython type_llm/methods/full_LARRY/Entity_Graph.pyThe partially annotated repositories pre_commit_hooks_1~9 and the fully annotated repository pre_commit_hooks generated in each iteration are saved in the data/intermediate/validation directory.
We use the metrics from TypyBench to evaluate the accuracy of the generated type annotations.
First, build the benchmark dataset.
python type_llm/evaluation/build_evaluation.pyThen, we use the code from TypyBench to calculate the accuracy of the type annotations.
python -m type_llm.evaluation.typybench.evaluation -n pre_commit_hooks -d ../data/evaluation/projects -p ../data/evaluation/results/PyTIRFinally, we consolidate the accuracy data into the data/evaluation/EvalResults directory.
python type_llm/evaluation/merge_csv.pyWe use MyPy to check for type errors in the repository after adding the type annotations.
python type_llm/evaluation/mypy_check.pyOther relevant statistical data can be found in the statistic_data directory.