Skip to content

dessertlab/EAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EAE Reproducibility Repository

This repository contains the data artifacts and model outputs used for our paper on document-level Event Argument Extraction (EAE) on the MAVEN-ARG benchmark:

Schema-Constrained Document-Level Event Argument Extraction with Lightweight LLM Fine-Tuning Pouya Sattari, Roberto Pietrantuono, Antonio Guerriero ECML PKDD 2026, Research Track. Naples, Italy

The goal of this repository is reproducibility. It provides the preprocessed dataset files used in our experiments, together with per-model prompts, raw generations, predictions, evaluation summaries, submission files, and example training/inference notebooks.


Repository Structure

.
├── README.md
├── .gitignore
├── data/
│   ├── README.md
│   └── maven_arg_preprocessed/
│       ├── train_preprocessed.jsonl
│       ├── valid_preprocessed.jsonl
│       └── test_preprocessed.jsonl
└── artifacts/
    ├── llama/
    │   ├── label2role.json
    │   ├── 1-notebooks/
    │   ├── prompts/
    │   ├── generations/
    │   ├── predictions/
    │   └── submissions/
    ├── mistral_nemo/
    │   ├── label2role.json
    │   ├── 1-notebooks/
    │   ├── prompts/
    │   ├── generations/
    │   ├── predictions/
    │   ├── metrics/
    │   └── submissions/
    ├── phi4/
    │   ├── label2role.json
    │   ├── 1-notebooks/
    │   ├── prompts/
    │   ├── generations/
    │   ├── predictions/
    │   └── metrics/
    └── qwen3/
        ├── label2role.json
        ├── 1-notebooks/
        ├── prompts/
        ├── generations/
        ├── predictions/
        ├── metrics/
        └── submissions/

Contents

data/

This directory contains the preprocessed MAVEN-ARG dataset files used in our experiments:

  • train_preprocessed.jsonl
  • valid_preprocessed.jsonl
  • test_preprocessed.jsonl

These files were derived from the original MAVEN-ARG dataset.

The original dataset can be obtained from the official repository:

https://github.com/THU-KEG/MAVEN-Argument

See data/README.md for additional details.


artifacts/

This directory contains the experiment artifacts for each evaluated model.

Each model directory includes the resources needed to reproduce the reported results.

Subdirectories may include:

  • 1-notebooks/
    Example notebooks demonstrating the training and inference workflow used for each model.

  • prompts/
    Prompt-formatted inputs used during model inference.

  • generations/
    Raw model outputs generated during inference.

  • predictions/
    Final prediction files used for evaluation or submission.

  • metrics/
    Evaluation summaries or metric snapshots recorded during experiments.

  • submissions/
    Archived submission files used for official evaluation.

  • label2role.json
    Event-type to role-set mappings used during prompting and argument filtering.


Models Included

Artifacts are provided for the following model families:

  • Llama (Llama-3.1-8B)
  • Mistral-Nemo (12B)
  • Phi-4 (14B)
  • Qwen3 (14B)

Each model directory contains the prompts, outputs, notebooks, and evaluation artifacts corresponding to that model.


Purpose

This repository supports reproducibility by providing:

  • the processed dataset used in the experiments
  • prompt inputs used for model inference
  • raw model generations
  • final predictions used for evaluation
  • evaluation summaries and submission artifacts
  • example notebooks demonstrating the training and inference pipelines

Notes

  • This repository is organized as a reproducibility artifact repository, not as a full training codebase.
  • Only the preprocessed dataset files used in the experiments are included here.
  • The original raw MAVEN-ARG dataset should be obtained from the official source.
  • Some files are relatively large because they contain prompt datasets and model outputs.

Paper

This repository accompanies the paper:

Schema-Constrained Document-Level Event Argument Extraction with Lightweight LLM Fine-Tuning Pouya Sattari, Roberto Pietrantuono, Antonio Guerriero ECML PKDD 2026, Research Track. Naples, Italy


License and Dataset Usage

Please refer to the original MAVEN-ARG repository for dataset licensing and usage conditions:

https://github.com/THU-KEG/MAVEN-Argument

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors