Skip to content

FujitsuResearch/Knowledge_Data

Repository files navigation

Knowledge Graph RAG Demo

Overview

This repository contains sample code for building a Knowledge Graph + RAG (Retrieval-Augmented Generation) question-answering system on top of a Neo4j database.

The system provides two main functionalities:

  • Knowledge Graph QA: Answer natural language questions using graph data with an LLM
  • Episode Retriever: Search for relevant video episodes based on user queries

Use this as a reference or starting point for your own KG + RAG experiments.

Update

  • 2026-01-22: The Knowledge Graph dataset has been released on Hugging Face.
  • 2026-03-27: Some Knowledge Graph datasets have been added to Hugging Face, and the demo code has been updated correspondingly.

Quick Start (Docker Compose)

The fastest way to get started is using Docker Compose.

1. Clone this repository

git clone https://github.com/FujitsuResearch/Knowledge_Data.git
cd Knowledge_Data

2. Download the Knowledge Graph Dataset

Clone the dataset from Hugging Face into the neo4j_import/ directory (Regarding XXXXX in the command, see the table below):

cd neo4j_import
git lfs install
git clone https://huggingface.co/datasets/Fujitsu/XXXXX
cd XXXXX
unzip "*.zip"
cd ../..
XXXXX Type Note
FieldWork_Knowledge_Dataset Knowledge Graph for Vision Analytics - You need to accept the terms of use on the Hugging Face dataset page before cloning.
- Also apply on the FieldWorkArena page at the same time.
UbuntuRCA Knowledge Graph for Root Cause Analysis -
WindowsRCA Knowledge Graph for Root Cause Analysis -
WindowsRCA_JP Knowledge Graph for Root Cause Analysis -
ManufacturingRCA Knowledge Graph for Root Cause Analysis -
ManufacturingRCA_JP Knowledge Graph for Root Cause Analysis -
ForQA Knowledge Graph for Question & Answer -
ForQA_JP Knowledge Graph for Question & Answer -

[!Common NOTE] Some .graphml files are compressed as .zip archives. The unzip command above extracts them.

3. Set up environment variables

cp .env.sample .env

Edit .env with your settings:

OPENAI_API_KEY=your-openai-api-key
NEO4J_URI=bolt://localhost:7488
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password

4. Install Python dependencies

We recommend using the Python version specified in .python-version (3.12.5). After that, install Python dependencies as follows.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

5. Start Neo4j

docker compose up -d

Wait a few minutes for Neo4j to start up. You can check if Neo4j is ready by running:

docker compose logs neo4j | grep "Started."

You can also access the Neo4j Browser at http://localhost:7588 to confirm it's running.

Tip

To add new datasets to Neo4j, place them in neo4j_import/ and add a volume mount in docker-compose.yml:

volumes:
  - ./neo4j_import/YourNewDataset:/var/lib/neo4j/import/YourNewDataset

Then restart Neo4j with docker compose down && docker compose up -d.

Note

The Neo4j Docker image changes ownership of mounted directories to UID 7474 (the neo4j user). If you need to modify files in neo4j_import/ after starting Neo4j, you may need to use sudo or restore ownership:

sudo chown -R $(id -u):$(id -g) neo4j_import/

6. Import Knowledge Graph data

cd script
bash run_clear.sh    # Clear existing data
bash run_import.sh   # Import GraphML file

By default, this imports kg_factory_incident_count.graphml. To import a different file, edit script/run_import.sh:

# Example (kg_factory):
FILE_PATH="FieldWork_Knowledge_Dataset/kg_factory/kg_factory_incident_count.graphml"

Note

The FieldWork_Knowledge_Dataset contains multiple domains (factory, retail, warehouse, etc.). See the dataset repository for the full list of available .graphml files.

Tip

You can visualize the imported graph in the Neo4j Browser at http://localhost:7588. Try running the following Cypher query to see the graph structure:

MATCH (n)-[r]->(m) RETURN n, r, m LIMIT 100

7. Run the demos

Demo 1: Knowledge Graph QA (For all knowledge graph datasets) - Ask questions about the knowledge graph:

bash run_kg_rag.sh

Demo 2: Episode Retriever (For FieldWork_Knowledge_Dataset) - Search for relevant video episodes (uses imported data):

bash run_episode_retriever.sh

This will search for episodes matching the query and output time ranges like:

Episode 1: 0.0s - 30.0s (relevance: 0.85)
Episode 2: 30.0s - 60.0s (relevance: 0.72)
...

Tip

Edit the QUERY variable in each script to try different queries:

# In run_kg_rag.sh
QUERY="What is the person in the video doing?"            # For `FieldWork_Knowledge_Dataset`
QUERY="I/O and CRC errors occurred."                      # For `UbuntuRCA`

# In run_episode_retriever.sh  
QUERY="What safety issues occurred in the factory?"       # For `FieldWork_Knowledge_Dataset`

Project Structure

Knowledge_Data/
├── docker-compose.yml      # Neo4j container configuration
├── neo4j_import/           # Mounted to Neo4j's import directory
│   └── FieldWork_Knowledge_Dataset/  # Clone dataset here (Example for "FieldWork_Knowledge_Dataset")
│       └── ...             # e.g., kg_factory/kg_factory_incident_count.graphml
├── script/
│   ├── run_clear.sh        # Clear Neo4j database
│   ├── run_import.sh       # Import GraphML file
│   ├── run_kg_rag.sh       # Run Knowledge Graph QA demo
│   └── run_episode_retriever.sh  # Run Episode Retriever demo
└── .env                    # Environment variables

Script Reference

kg_rag.py

Answers natural language questions based on the knowledge graph data using LLM.

python3 kg_rag.py --query "What is the person in the video doing?"

episode_retriever.py

Searches for relevant video episodes from the knowledge graph. The retrieved time ranges can be used to extract specific video segments for further analysis (e.g., as input to a Video-LLM).

python3 episode_retriever.py \
    --graph_file "FieldWork_Knowledge_Dataset/kg_factory/kg_factory_incident_count.graphml" \
    --clear_db \
    --query "What safety issues occurred in the factory?" \
    --top_k 10 \
    --threshold 0.5 \
    --episode_duration 30 \
    --verbose
Option Description Default
--query Search query (required) -
--graph_file Path to GraphML file to import before searching None
--clear_db Clear the database before importing False
--top_k Maximum number of episodes to retrieve 10
--threshold Relevance threshold (0.0-1.0) 0.0
--episode_duration Duration of each episode in seconds (use 30 for kg_factory) 10.0
--verbose Display detailed output False

Manual Neo4j Installation (Alternative)

If you prefer to install Neo4j manually instead of using Docker, follow these steps:

Install Neo4j

  1. Install Neo4j into your environment.
    See Neo4j Installation Manual for details.

  2. Install Neo4j APOC plugin by following the APOC Installation Manual.

Note

This code has been tested and verified to work with the following version:
apoc-2025.02.0-core.jar

  1. Configure /etc/neo4j/neo4j.conf to allow APOC procedures:
dbms.security.procedures.unrestricted=apoc.*
dbms.security.procedures.allowlist=apoc.*

After editing the configuration, restart Neo4j for the changes to take effect.

Importing GraphML Files

Important

GraphML files must be placed in Neo4j's import directory.

Due to Neo4j's security settings, the APOC import function can only access files within the designated import directory.

Default import directory locations:

  • Linux: /var/lib/neo4j/import/
  • macOS (Homebrew): /usr/local/var/neo4j/import/

Example:

sudo cp neo4j_import/FieldWork_Knowledge_Dataset/kg_factory/kg_factory_incident_count.graphml /var/lib/neo4j/import/

Tip

Alternative: Allow arbitrary file paths via APOC configuration

If you prefer to import files from any location:

  1. Create or edit /etc/neo4j/apoc.conf:

    apoc.import.file.enabled=true
    apoc.import.file.use_neo4j_config=false
    
  2. Add to /etc/neo4j/neo4j.conf:

    dbms.security.allow_csv_import_from_file_urls=true
    
  3. Restart Neo4j:

    sudo systemctl restart neo4j

⚠️ Security Warning: This allows Neo4j to access any file on the system.


Inquiries and Support (FieldWork_Knowledge_Dataset)

To submit an inquiry regarding FieldWork_Knowledge_Dataset, please follow these steps:

  1. Visit our page
  2. Click the "Inquiry" button at the bottom.
  3. Fill out the form completely and accurately.

It may take a few business days to receive a reply.

Inquiries and Support (others)

To submit an inquiry regarding other datasets or demo code, please contact:

Tatsuya Kikuzuki: kikuzuki{at}fujitsu.com

License

See LICENSE for details.

About

A public dataset of "Usable Knowledge" generated using Fujitsu's Knowledge Graph Enhanced RAG (Retrieval-Augmented Generation).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors