Knowledge Graph RAG Demo

Overview

This repository contains sample code for building a Knowledge Graph + RAG (Retrieval-Augmented Generation) question-answering system on top of a Neo4j database.

The system provides two main functionalities:

Knowledge Graph QA: Answer natural language questions using graph data with an LLM
Episode Retriever: Search for relevant video episodes based on user queries

Use this as a reference or starting point for your own KG + RAG experiments.

Update

2026-01-22: The Knowledge Graph dataset has been released on Hugging Face.
2026-03-27: Some Knowledge Graph datasets have been added to Hugging Face, and the demo code has been updated correspondingly.

Quick Start (Docker Compose)

The fastest way to get started is using Docker Compose.

1. Clone this repository

git clone https://github.com/FujitsuResearch/Knowledge_Data.git
cd Knowledge_Data

2. Download the Knowledge Graph Dataset

Clone the dataset from Hugging Face into the neo4j_import/ directory (Regarding XXXXX in the command, see the table below):

cd neo4j_import
git lfs install
git clone https://huggingface.co/datasets/Fujitsu/XXXXX
cd XXXXX
unzip "*.zip"
cd ../..

`XXXXX`	Type	Note
`FieldWork_Knowledge_Dataset`	Knowledge Graph for Vision Analytics	- You need to accept the terms of use on the Hugging Face dataset page before cloning. - Also apply on the FieldWorkArena page at the same time.
`UbuntuRCA`	Knowledge Graph for Root Cause Analysis	-
`WindowsRCA`	Knowledge Graph for Root Cause Analysis	-
`WindowsRCA_JP`	Knowledge Graph for Root Cause Analysis	-
`ManufacturingRCA`	Knowledge Graph for Root Cause Analysis	-
`ManufacturingRCA_JP`	Knowledge Graph for Root Cause Analysis	-
`ForQA`	Knowledge Graph for Question & Answer	-
`ForQA_JP`	Knowledge Graph for Question & Answer	-

[!Common NOTE] Some .graphml files are compressed as .zip archives. The unzip command above extracts them.

3. Set up environment variables

cp .env.sample .env

Edit .env with your settings:

OPENAI_API_KEY=your-openai-api-key
NEO4J_URI=bolt://localhost:7488
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password

4. Install Python dependencies

We recommend using the Python version specified in .python-version (3.12.5). After that, install Python dependencies as follows.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

5. Start Neo4j

docker compose up -d

Wait a few minutes for Neo4j to start up. You can check if Neo4j is ready by running:

docker compose logs neo4j | grep "Started."

You can also access the Neo4j Browser at http://localhost:7588 to confirm it's running.

Tip

To add new datasets to Neo4j, place them in neo4j_import/ and add a volume mount in docker-compose.yml:

volumes:
  - ./neo4j_import/YourNewDataset:/var/lib/neo4j/import/YourNewDataset

Then restart Neo4j with docker compose down && docker compose up -d.

Note

The Neo4j Docker image changes ownership of mounted directories to UID 7474 (the neo4j user). If you need to modify files in neo4j_import/ after starting Neo4j, you may need to use sudo or restore ownership:

sudo chown -R $(id -u):$(id -g) neo4j_import/

6. Import Knowledge Graph data

cd script
bash run_clear.sh    # Clear existing data
bash run_import.sh   # Import GraphML file

By default, this imports kg_factory_incident_count.graphml. To import a different file, edit script/run_import.sh:

# Example (kg_factory):
FILE_PATH="FieldWork_Knowledge_Dataset/kg_factory/kg_factory_incident_count.graphml"

Note

The FieldWork_Knowledge_Dataset contains multiple domains (factory, retail, warehouse, etc.). See the dataset repository for the full list of available .graphml files.

Tip

You can visualize the imported graph in the Neo4j Browser at http://localhost:7588. Try running the following Cypher query to see the graph structure:

MATCH (n)-[r]->(m) RETURN n, r, m LIMIT 100

7. Run the demos

Demo 1: Knowledge Graph QA (For all knowledge graph datasets) - Ask questions about the knowledge graph:

bash run_kg_rag.sh

Demo 2: Episode Retriever (For FieldWork_Knowledge_Dataset) - Search for relevant video episodes (uses imported data):

bash run_episode_retriever.sh

This will search for episodes matching the query and output time ranges like:

Episode 1: 0.0s - 30.0s (relevance: 0.85)
Episode 2: 30.0s - 60.0s (relevance: 0.72)
...

Tip

Edit the QUERY variable in each script to try different queries:

# In run_kg_rag.sh
QUERY="What is the person in the video doing?"            # For `FieldWork_Knowledge_Dataset`
QUERY="I/O and CRC errors occurred."                      # For `UbuntuRCA`

# In run_episode_retriever.sh  
QUERY="What safety issues occurred in the factory?"       # For `FieldWork_Knowledge_Dataset`

Project Structure

Knowledge_Data/
├── docker-compose.yml      # Neo4j container configuration
├── neo4j_import/           # Mounted to Neo4j's import directory
│   └── FieldWork_Knowledge_Dataset/  # Clone dataset here (Example for "FieldWork_Knowledge_Dataset")
│       └── ...             # e.g., kg_factory/kg_factory_incident_count.graphml
├── script/
│   ├── run_clear.sh        # Clear Neo4j database
│   ├── run_import.sh       # Import GraphML file
│   ├── run_kg_rag.sh       # Run Knowledge Graph QA demo
│   └── run_episode_retriever.sh  # Run Episode Retriever demo
└── .env                    # Environment variables

Script Reference

`kg_rag.py`

Answers natural language questions based on the knowledge graph data using LLM.

python3 kg_rag.py --query "What is the person in the video doing?"

`episode_retriever.py`

Searches for relevant video episodes from the knowledge graph. The retrieved time ranges can be used to extract specific video segments for further analysis (e.g., as input to a Video-LLM).

python3 episode_retriever.py \
    --graph_file "FieldWork_Knowledge_Dataset/kg_factory/kg_factory_incident_count.graphml" \
    --clear_db \
    --query "What safety issues occurred in the factory?" \
    --top_k 10 \
    --threshold 0.5 \
    --episode_duration 30 \
    --verbose

Option	Description	Default
`--query`	Search query (required)	-
`--graph_file`	Path to GraphML file to import before searching	None
`--clear_db`	Clear the database before importing	False
`--top_k`	Maximum number of episodes to retrieve	10
`--threshold`	Relevance threshold (0.0-1.0)	0.0
`--episode_duration`	Duration of each episode in seconds (use 30 for kg_factory)	10.0
`--verbose`	Display detailed output	False

Manual Neo4j Installation (Alternative)

If you prefer to install Neo4j manually instead of using Docker, follow these steps:

Install Neo4j

Install Neo4j into your environment.
See Neo4j Installation Manual for details.
Install Neo4j APOC plugin by following the APOC Installation Manual.

Note

This code has been tested and verified to work with the following version:
apoc-2025.02.0-core.jar

Configure /etc/neo4j/neo4j.conf to allow APOC procedures:

dbms.security.procedures.unrestricted=apoc.*
dbms.security.procedures.allowlist=apoc.*

After editing the configuration, restart Neo4j for the changes to take effect.

Importing GraphML Files

Important

GraphML files must be placed in Neo4j's import directory.

Due to Neo4j's security settings, the APOC import function can only access files within the designated import directory.

Default import directory locations:

Linux: /var/lib/neo4j/import/
macOS (Homebrew): /usr/local/var/neo4j/import/

Example:

sudo cp neo4j_import/FieldWork_Knowledge_Dataset/kg_factory/kg_factory_incident_count.graphml /var/lib/neo4j/import/

Tip

Alternative: Allow arbitrary file paths via APOC configuration

If you prefer to import files from any location:

Create or edit /etc/neo4j/apoc.conf:

apoc.import.file.enabled=true
apoc.import.file.use_neo4j_config=false

Add to /etc/neo4j/neo4j.conf:

dbms.security.allow_csv_import_from_file_urls=true

Restart Neo4j:
```
sudo systemctl restart neo4j
```

⚠️ Security Warning: This allows Neo4j to access any file on the system.

Inquiries and Support (`FieldWork_Knowledge_Dataset`)

To submit an inquiry regarding FieldWork_Knowledge_Dataset, please follow these steps:

Visit our page
Click the "Inquiry" button at the bottom.
Fill out the form completely and accurately.

It may take a few business days to receive a reply.

Inquiries and Support (others)

To submit an inquiry regarding other datasets or demo code, please contact:

Tatsuya Kikuzuki: kikuzuki{at}fujitsu.com

License

See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Graph RAG Demo

Overview

Update

Quick Start (Docker Compose)

1. Clone this repository

2. Download the Knowledge Graph Dataset

3. Set up environment variables

4. Install Python dependencies

5. Start Neo4j

6. Import Knowledge Graph data

7. Run the demos

Project Structure

Script Reference

`kg_rag.py`

`episode_retriever.py`

Manual Neo4j Installation (Alternative)

Install Neo4j

Importing GraphML Files

Inquiries and Support (`FieldWork_Knowledge_Dataset`)

Inquiries and Support (others)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
neo4j_import		neo4j_import
script		script
.env.sample		.env.sample
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Knowledge Graph RAG Demo

Overview

Update

Quick Start (Docker Compose)

1. Clone this repository

2. Download the Knowledge Graph Dataset

3. Set up environment variables

4. Install Python dependencies

5. Start Neo4j

6. Import Knowledge Graph data

7. Run the demos

Project Structure

Script Reference

kg_rag.py

episode_retriever.py

Manual Neo4j Installation (Alternative)

Install Neo4j

Importing GraphML Files

Inquiries and Support (FieldWork_Knowledge_Dataset)

Inquiries and Support (others)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`kg_rag.py`

`episode_retriever.py`

Inquiries and Support (`FieldWork_Knowledge_Dataset`)

Packages