Custom Graph Schema

Custom Graph Schema - Complete Guide

This guide walks you through creating custom graph databases with RoboSystems using the schema.json template. Learn how to design, implement, and query your own graph structures for any domain.

Overview

The Custom Graph Schema demo demonstrates how to create graph databases with custom node types, properties, and relationships. This approach enables:

Custom Data Models: Define any graph structure for your domain
Flexible Schema Design: Nodes and relationships tailored to your needs
Reusable Templates: schema.json as a copy-and-customize starting point
Type-Safe Schemas: Validated property types and required fields
Graph-Native Queries: Leverage Cypher for powerful data analysis
AI-Powered Analysis: Query demo graph data using natural language through any MCP-compatible AI tool

Example Domain - People, Companies, and Projects:

3 Node Types: Person, Company, Project
3 Relationship Types: Employment, Project Participation, Sponsorship
~50 Generated Entities: Realistic sample data with relationships
Interactive Queries: Explore collaboration patterns, team structures, and more

The schema.json file is the official RoboSystems template for custom graph schemas - copy it and customize for your own use cases!

Prerequisites

Before starting, ensure you have:

Docker running locally
RoboSystems development environment set up
Services started with just start

Quick Start

The fastest way to run the complete demo:

# Ensure RoboSystems is running
just start

# Run complete workflow
just demo-custom-graph

What this does:

Creates user account and API key
Creates a new graph database using schema.json
Generates sample data (people, companies, projects)
Uploads and ingests data into the graph
Runs verification queries with beautiful table output

First run: Takes ~1-2 minutes to complete all steps.

Subsequent runs: Reuses credentials and graph (~20 seconds).

Command syntax: just demo-custom-graph [flags] [base_url]

Flags are comma-separated: new-user,new-graph,skip-queries
Base URL defaults to http://localhost:8000

Quick Start Options

# Start fresh with new user and graph
just demo-custom-graph new-user,new-graph

# Create new graph (keep existing user)
just demo-custom-graph new-graph

# Skip verification queries
just demo-custom-graph skip-queries

# Combine multiple flags
just demo-custom-graph new-user,new-graph,skip-queries

The Schema Template: `schema.json`

Location: examples/custom_graph_demo/schema.json

This is the official RoboSystems template for creating custom graph schemas. It demonstrates best practices for schema design and serves as your starting point for any custom graph database.

Schema Structure

{
  "name": "custom_graph_demo",
  "version": "1.0.0",
  "description": "People, companies, and projects schema",
  "extends": "base",
  "nodes": [
    {
      "name": "Person",
      "properties": [
        {"name": "identifier", "type": "STRING", "is_primary_key": true},
        {"name": "name", "type": "STRING", "is_required": true},
        {"name": "age", "type": "INT64"},
        {"name": "title", "type": "STRING"}
      ]
    }
  ],
  "relationships": [
    {
      "name": "PERSON_WORKS_FOR_COMPANY",
      "from_node": "Person",
      "to_node": "Company",
      "properties": [
        {"name": "role", "type": "STRING"}
      ]
    }
  ],
  "metadata": {
    "domain": "custom_graph_demo"
  }
}

Property Types

Supported data types in your schema:

STRING - Text values
INT64 - 64-bit integers
DOUBLE - Floating point numbers
BOOLEAN - True/false values
DATE - Date values (as STRING in ISO format)

Schema Attributes

is_primary_key: true - Unique identifier for the node
is_required: true - Field must have a value
extends: "base" - Inherit base schema properties (identifier, timestamps)

Step-by-Step Walkthrough

The just demo-custom-graph command runs all 5 steps automatically. This section explains what happens during each step.

Step 1: Setup Credentials (`01_setup_credentials.py`)

What happens automatically:

Creates new user in PostgreSQL database
Generates API key for authentication
Stores credentials locally in examples/credentials/config.json

Control via flags:

just demo-custom-graph new-user  # Force new credentials

Manual execution (if needed):

uv run examples/custom_graph_demo/01_setup_credentials.py
uv run examples/custom_graph_demo/01_setup_credentials.py --force  # Force new

Step 2: Create Graph Database (`02_create_graph.py`)

What happens automatically:

Reads schema.json from the demo directory
Creates new Ladybug graph database with custom schema
Registers graph with user account
Stores graph_id in credentials/config.json

Control via flags:

just demo-custom-graph new-graph  # Force new graph

Manual execution (if needed):

uv run examples/custom_graph_demo/02_create_graph.py
uv run examples/custom_graph_demo/02_create_graph.py --reuse  # Reuse existing

Customizing the Schema:

The script loads schema.json from the same directory. To use your own schema:

# In 02_create_graph.py:
def build_custom_schema_definition() -> CustomSchemaDefinition:
  schema_file = Path(__file__).parent / "schema.json"
  # Change to: schema_file = Path(__file__).parent / "my_schema.json"

Step 3: Generate Data (`03_generate_data.py`)

What happens automatically:

Generates sample data matching the schema structure
Creates Parquet files in examples/custom_graph_demo/data/ directory
Includes: Person, Company, Project nodes and their relationships
Validates all required properties are present

Generated data includes:

50 People with realistic names, ages, titles, and interests
10 Companies across various industries and locations
15 Projects with budgets, statuses, and timelines
Employment relationships (Person → Company)
Project participation (Person → Project)
Sponsorship relationships (Company → Project)

Manual execution (if needed):

uv run examples/custom_graph_demo/03_generate_data.py
uv run examples/custom_graph_demo/03_generate_data.py --count 100  # More data
uv run examples/custom_graph_demo/03_generate_data.py --regenerate  # Force regenerate

Step 4: Upload and Ingest (`04_upload_ingest.py`)

What happens automatically:

Upload: Files uploaded to S3 (LocalStack in development)
Stage: Data loaded into DuckDB staging tables
Validate: Automatic data quality checks
Ingest: DuckDB → Ladybug graph database via extension
Verify: Counts verified, relationships checked

Manual execution (if needed):

uv run examples/custom_graph_demo/04_upload_ingest.py

Step 5: Query the Graph (`05_query_graph.py`)

What happens automatically:

Executes all preset queries
Displays results in formatted Rich tables
Shows node counts, relationships, and analysis queries

Control via flags:

just demo-custom-graph skip-queries  # Skip this step

Manual execution (if needed):

# Run all presets
uv run examples/custom_graph_demo/05_query_graph.py --all

# Run specific preset
uv run examples/custom_graph_demo/05_query_graph.py --preset people

# Interactive query mode
uv run examples/custom_graph_demo/05_query_graph.py

Available Preset Queries

The demo includes 10 preset queries demonstrating common graph patterns:

1. Summary - Node Counts

View the overall graph structure:

uv run examples/custom_graph_demo/05_query_graph.py --preset summary

Output:

         Overview of graph structure
┏━━━━━━━━━┳━━━━━━━━┓
┃ label   ┃ count  ┃
┡━━━━━━━━━╇━━━━━━━━┩
│ Person  │ 50     │
│ Company │ 10     │
│ Project │ 15     │
└─────────┴────────┘

2. People - Employment Information

View all people with their roles and companies:

MATCH (p:Person)-[:PERSON_WORKS_FOR_COMPANY]->(c:Company)
RETURN
  p.name,
  p.title,
  c.name AS company,
  p.interests
ORDER BY p.name

3. Companies - Team Overview

View companies with team sizes and sponsored projects:

MATCH (c:Company)
OPTIONAL MATCH (c)<-[:PERSON_WORKS_FOR_COMPANY]-(p:Person)
OPTIONAL MATCH (c)-[:COMPANY_SPONSORS_PROJECT]->(proj:Project)
RETURN
  c.name,
  c.industry,
  c.location,
  count(DISTINCT p) AS team_members,
  count(DISTINCT proj) AS sponsored_projects
ORDER BY team_members DESC

4. Projects - Active Projects

View active projects with team sizes and sponsors:

MATCH (proj:Project)
WHERE proj.status = 'active'
OPTIONAL MATCH (proj)<-[:PERSON_WORKS_ON_PROJECT]-(p:Person)
OPTIONAL MATCH (proj)<-[:COMPANY_SPONSORS_PROJECT]-(c:Company)
RETURN
  proj.name,
  proj.budget,
  count(DISTINCT p) AS team_size,
  collect(DISTINCT c.name) AS sponsors
ORDER BY proj.budget DESC

5. Employment - Who Works Where

See all employment relationships:

MATCH (p:Person)-[:PERSON_WORKS_FOR_COMPANY]->(c:Company)
RETURN p.name AS person, c.name AS company, c.industry
ORDER BY c.name, p.name

6. Project Teams - Team Composition

View project teams with their members:

MATCH (p:Person)-[:PERSON_WORKS_ON_PROJECT]->(proj:Project)
MATCH (proj)<-[:COMPANY_SPONSORS_PROJECT]-(c:Company)
RETURN
  proj.name AS project,
  proj.status,
  proj.budget,
  collect(DISTINCT p.name) AS team_members,
  collect(DISTINCT c.name) AS sponsors
ORDER BY proj.name

7. Cross-Company Collaboration

Discover cross-company project collaborations:

MATCH (p1:Person)-[:PERSON_WORKS_FOR_COMPANY]->(c1:Company),
      (p2:Person)-[:PERSON_WORKS_FOR_COMPANY]->(c2:Company),
      (p1)-[:PERSON_WORKS_ON_PROJECT]->(proj:Project),
      (p2)-[:PERSON_WORKS_ON_PROJECT]->(proj)
WHERE c1.identifier <> c2.identifier AND p1.identifier < p2.identifier
RETURN
  proj.name AS project,
  c1.name AS company_a,
  c2.name AS company_b,
  count(*) AS cross_company_pairs
ORDER BY cross_company_pairs DESC

All Available Presets

summary - Node and relationship counts
people - People with employment info
companies - Companies with team sizes
projects - Active projects overview
employment - Employment relationships
project_teams - Project team composition
cross_company - Cross-company collaboration
company_network - Company collaboration network
person_network - Person collaboration network
industries - Companies by industry

Interactive Query Mode

After running the demo, explore your graph data interactively:

uv run examples/custom_graph_demo/05_query_graph.py

This launches an interactive session where you can:

Run Preset Queries by Name:

> people
> projects
> cross_company

Execute Custom Cypher Queries:

> MATCH (p:Person) WHERE p.age > 40 RETURN p.name, p.age, p.title ORDER BY p.age DESC

List Available Presets:

> presets

Exit the Session:

> quit

Features:

Beautiful Tables: All results display in Rich-formatted tables
Instant Feedback: See results immediately after each query
Explore Freely: Test different queries without rerunning the script
Learning Tool: Great for learning Cypher query patterns

Accessing with MCP Client (For AI Agents)

You can access the demo custom graph through any MCP-compatible AI tool (Claude Desktop, Claude Code, Cursor, Cline, etc.) using the MCP protocol.

Setup MCP Client:

Run just demo-custom-graph to create credentials automatically (your API key is saved to examples/credentials/config.json)
Get your API key and graph ID from the credentials file:

cat examples/credentials/config.json | grep -E "api_key|graph_id"

Add to your MCP tool config. For Claude Desktop:

{
  "mcpServers": {
    "robosystems": {
      "command": "npx",
      "args": ["-y", "@robosystems/mcp"],
      "env": {
        "ROBOSYSTEMS_API_URL": "http://localhost:8000",
        "ROBOSYSTEMS_API_KEY": "rfsabc123xyz...",
        "ROBOSYSTEMS_GRAPH_ID": "your-graph-id"
      }
    }
  }
}

Important: Replace rfsabc123xyz... with your actual API key and your-graph-id with your actual graph ID from the credentials file.

Restart your MCP-compatible AI tool
The MCP server provides these tools:
- get-graph-schema - View available node and relationship types
- read-graph-cypher - Run Cypher queries
- discover-properties - Explore node properties
- get-example-queries - Get sample queries

Example MCP Usage:

You: Show me all people who work on multiple projects in the demo

The AI will use:
1. get-graph-schema to understand Person and Project relationships
2. discover-properties to find relevant Person and Project properties
3. read-graph-cypher to query for multi-project contributors

You: Which companies collaborate on the same projects in the demo?

The AI will use:
1. get-graph-schema to understand the COMPANY_SPONSORS_PROJECT relationships
2. read-graph-cypher to find shared projects between companies
3. Present collaboration patterns with company names and project details

Creating Your Own Custom Schema

The schema.json file is your template. Here's how to create your own graph database schema:

Step 1: Copy the Template

cd examples/custom_graph_demo
cp schema.json my_custom_schema.json

Step 2: Define Your Node Types

Edit my_custom_schema.json and define your nodes:

{
  "name": "my_custom_graph",
  "version": "1.0.0",
  "description": "My custom domain graph",
  "extends": "base",
  "nodes": [
    {
      "name": "Product",
      "properties": [
        {"name": "identifier", "type": "STRING", "is_primary_key": true},
        {"name": "name", "type": "STRING", "is_required": true},
        {"name": "price", "type": "DOUBLE"},
        {"name": "category", "type": "STRING"},
        {"name": "in_stock", "type": "BOOLEAN"}
      ]
    },
    {
      "name": "Customer",
      "properties": [
        {"name": "identifier", "type": "STRING", "is_primary_key": true},
        {"name": "name", "type": "STRING", "is_required": true},
        {"name": "email", "type": "STRING"},
        {"name": "signup_date", "type": "STRING"}
      ]
    },
    {
      "name": "Order",
      "properties": [
        {"name": "identifier", "type": "STRING", "is_primary_key": true},
        {"name": "order_date", "type": "STRING", "is_required": true},
        {"name": "total_amount", "type": "DOUBLE"},
        {"name": "status", "type": "STRING"}
      ]
    }
  ]
}

Step 3: Define Your Relationships

Add relationships between your nodes:

{
  "relationships": [
    {
      "name": "CUSTOMER_PLACED_ORDER",
      "from_node": "Customer",
      "to_node": "Order",
      "properties": [
        {"name": "order_number", "type": "STRING"}
      ]
    },
    {
      "name": "ORDER_CONTAINS_PRODUCT",
      "from_node": "Order",
      "to_node": "Product",
      "properties": [
        {"name": "quantity", "type": "INT64"},
        {"name": "unit_price", "type": "DOUBLE"}
      ]
    }
  ]
}

Step 4: Update the Graph Creation Script

Edit 02_create_graph.py to use your schema:

def build_custom_schema_definition() -> CustomSchemaDefinition:
  schema_file = Path(__file__).parent / "my_custom_schema.json"

  if not schema_file.exists():
    raise FileNotFoundError(f"Schema file not found: {schema_file}")

  with open(schema_file) as f:
    schema_dict = json.load(f)

  return CustomSchemaDefinition.from_dict(schema_dict)

Step 5: Generate Matching Data

Update 03_generate_data.py to generate data matching your schema:

# Generate Product nodes
products_data = []
for i in range(100):
  products_data.append({
    "identifier": str(uuid.uuid4()),
    "name": f"Product {i}",
    "price": round(random.uniform(10.0, 500.0), 2),
    "category": random.choice(["Electronics", "Clothing", "Books"]),
    "in_stock": random.choice([True, False])
  })

# Save to Parquet
df = pd.DataFrame(products_data)
df.to_parquet("data/nodes/Product.parquet")

Step 6: Run Your Custom Pipeline

uv run examples/custom_graph_demo/02_create_graph.py
uv run examples/custom_graph_demo/03_generate_data.py
uv run examples/custom_graph_demo/04_upload_ingest.py
uv run examples/custom_graph_demo/05_query_graph.py --all

Schema Design Best Practices

1. Use Meaningful Node Names

✅ Good: "Customer", "Product", "Order"
❌ Bad: "Node1", "Entity", "Thing"

2. Choose Appropriate Property Types

{
  "properties": [
    {"name": "price", "type": "DOUBLE"},      // ✅ Numeric calculations
    {"name": "quantity", "type": "INT64"},    // ✅ Whole numbers
    {"name": "active", "type": "BOOLEAN"},    // ✅ True/false flags
    {"name": "created_at", "type": "STRING"}  // ✅ ISO date strings
  ]
}

3. Always Include Primary Keys

{
  "properties": [
    {"name": "identifier", "type": "STRING", "is_primary_key": true}
  ]
}

4. Mark Required Fields

{
  "properties": [
    {"name": "name", "type": "STRING", "is_required": true}
  ]
}

5. Use Descriptive Relationship Names

✅ Good: "CUSTOMER_PLACED_ORDER", "PERSON_WORKS_FOR_COMPANY"
❌ Bad: "HAS", "RELATES_TO", "LINKED"

6. Add Relationship Properties

{
  "name": "PERSON_WORKS_ON_PROJECT",
  "properties": [
    {"name": "hours_per_week", "type": "INT64"},
    {"name": "role", "type": "STRING"}
  ]
}

Understanding the Data Model

Demo Schema: People, Companies, Projects

Node Types:

Person: Individuals with names, ages, titles, interests
Company: Organizations with industries, locations, founding years
Project: Work initiatives with budgets, statuses, dates

Relationship Types:

PERSON_WORKS_FOR_COMPANY: Employment relationships with roles and start dates
PERSON_WORKS_ON_PROJECT: Project participation with hours and contributions
COMPANY_SPONSORS_PROJECT: Sponsorship with levels and budget commitments

Graph Traversal Example

Person (Alice Johnson, Software Engineer)
  -[:PERSON_WORKS_FOR_COMPANY {role: "Senior Engineer"}]-> Company (TechCorp)
  -[:PERSON_WORKS_ON_PROJECT {hours_per_week: 40}]-> Project (Cloud Migration)
    <-[:COMPANY_SPONSORS_PROJECT {budget_committed: 500000}]- Company (TechCorp)

This represents: Alice works as a Senior Engineer at TechCorp, dedicating 40 hours/week to the Cloud Migration project, which TechCorp sponsors with $500k.

Visualizing with G.V()

G.V() is our recommended partner for exploring graph databases interactively.

Getting Started

Visit https://gdotv.com/ or download the desktop application
Connect to your custom graph database:
- Database Path: ./data/lbug-dbs/<graph_id>.lbug (get graph_id from credentials/config.json)
Enable "Fetch all edges between vertices" in settings
Run visualization queries to explore your data

Visualization Query Examples

-- Visualize people and their companies
MATCH (p:Person)-[:PERSON_WORKS_FOR_COMPANY]->(c:Company)
RETURN p, c
LIMIT 20

-- View project teams
MATCH (p:Person)-[:PERSON_WORKS_ON_PROJECT]->(proj:Project)
MATCH (proj)<-[:COMPANY_SPONSORS_PROJECT]-(c:Company)
RETURN p, proj, c
LIMIT 15

-- Explore company network through shared projects
MATCH (c1:Company)-[:COMPANY_SPONSORS_PROJECT]->(proj:Project)
      <-[:COMPANY_SPONSORS_PROJECT]-(c2:Company)
WHERE c1.identifier <> c2.identifier
RETURN c1, proj, c2
LIMIT 10

Graph Database Benefits

1. Flexible Data Modeling

Traditional databases require rigid schemas. Graph databases adapt easily:

Add new node types without migration
Add new relationships on the fly
Extend properties as needed

2. Natural Relationship Representation

-- Find colleagues who work on the same projects
MATCH (p1:Person)-[:PERSON_WORKS_ON_PROJECT]->(proj:Project)
      <-[:PERSON_WORKS_ON_PROJECT]-(p2:Person)
WHERE p1.identifier < p2.identifier
RETURN p1.name, p2.name, proj.name

3. Multi-Hop Queries

-- Find companies that collaborate through shared employees
MATCH (c1:Company)<-[:PERSON_WORKS_FOR_COMPANY]-(p:Person)
      -[:PERSON_WORKS_ON_PROJECT]->(proj:Project)
      <-[:PERSON_WORKS_ON_PROJECT]-(p2:Person)
      -[:PERSON_WORKS_FOR_COMPANY]->(c2:Company)
WHERE c1.identifier <> c2.identifier
RETURN c1.name, c2.name, count(DISTINCT proj) AS shared_projects

4. Real-Time Analytics

No pre-computed views needed. Analytics run instantly on current data:

-- Real-time collaboration index
MATCH (p:Person)-[:PERSON_WORKS_ON_PROJECT]->(proj:Project)
WITH p, count(DISTINCT proj) AS project_count
RETURN p.name, project_count
ORDER BY project_count DESC

Common Use Cases

The custom schema pattern works for many domains:

CRM Systems

Nodes: Customer, Contact, Opportunity, Account
Relationships: WORKS_FOR, OWNS, MANAGES

Supply Chain

Nodes: Supplier, Product, Warehouse, Order
Relationships: SUPPLIES, STORES, SHIPS_TO

Knowledge Graphs

Nodes: Concept, Document, Author, Topic
Relationships: MENTIONS, AUTHORED_BY, RELATED_TO

Social Networks

Nodes: User, Post, Group, Event
Relationships: FOLLOWS, POSTED, MEMBER_OF, ATTENDING

Project Management

Nodes: Task, Milestone, Resource, Team
Relationships: DEPENDS_ON, ASSIGNED_TO, PART_OF

Troubleshooting

Demo Fails: "Schema file not found"

Solution: Ensure schema.json exists in the demo directory:

ls examples/custom_graph_demo/schema.json

Demo Fails: "No credentials found"

Solution: Let the demo create credentials automatically, or force new:

just demo-custom-graph new-user

Demo Fails: "Invalid schema"

Solution: Validate your schema JSON format:

# Check for JSON syntax errors
python -m json.tool examples/custom_graph_demo/schema.json

Connection Error

Solution: Ensure RoboSystems services are running:

just start
docker ps  # Verify containers are running

Import Errors

Solution: Install dev dependencies:

just install

Next Steps

Copy schema.json: Use it as your template for custom schemas
Design Your Schema: Define nodes and relationships for your domain
Generate Data: Create Parquet files matching your schema
Query Your Graph: Use Cypher to analyze your custom data
Visualize with G.V(): Explore your graph structure interactively
Learn Cypher: Cypher Manual

Resources

Demo Code: /examples/custom_graph_demo/ in the repository
Schema Template: examples/custom_graph_demo/schema.json
QUICKSTART.md: Detailed quickstart in the demo directory
G.V() Graph IDE: https://gdotv.com/
Cypher Docs: Cypher Manual
RoboSystems API: http://localhost:8000/docs

Support

GitHub Issues: robosystems/issues
Main README: robosystems/README.md
Development Guide: CLAUDE.md

Custom Graph Schema

Custom Graph Schema - Complete Guide

Overview

Prerequisites

Quick Start

Quick Start Options

The Schema Template: schema.json

Schema Structure

Property Types

Schema Attributes

Step-by-Step Walkthrough

Step 1: Setup Credentials (01_setup_credentials.py)

Step 2: Create Graph Database (02_create_graph.py)

Step 3: Generate Data (03_generate_data.py)

Step 4: Upload and Ingest (04_upload_ingest.py)

Step 5: Query the Graph (05_query_graph.py)

Available Preset Queries

1. Summary - Node Counts

2. People - Employment Information

3. Companies - Team Overview

4. Projects - Active Projects

5. Employment - Who Works Where

6. Project Teams - Team Composition

7. Cross-Company Collaboration

All Available Presets

Interactive Query Mode

Accessing with MCP Client (For AI Agents)

Creating Your Own Custom Schema

Step 1: Copy the Template

Step 2: Define Your Node Types

Step 3: Define Your Relationships

Step 4: Update the Graph Creation Script

Step 5: Generate Matching Data

Step 6: Run Your Custom Pipeline

Schema Design Best Practices

1. Use Meaningful Node Names

2. Choose Appropriate Property Types

3. Always Include Primary Keys

4. Mark Required Fields

5. Use Descriptive Relationship Names

6. Add Relationship Properties

Understanding the Data Model

Demo Schema: People, Companies, Projects

Graph Traversal Example

Visualizing with G.V()

Getting Started

Visualization Query Examples

Graph Database Benefits

1. Flexible Data Modeling

2. Natural Relationship Representation

3. Multi-Hop Queries

4. Real-Time Analytics

Common Use Cases

CRM Systems

Supply Chain

Knowledge Graphs

Social Networks

Project Management

Troubleshooting

Demo Fails: "Schema file not found"

Demo Fails: "No credentials found"

Demo Fails: "Invalid schema"

Connection Error

Import Errors

Next Steps

Resources

Support

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

The Schema Template: `schema.json`

Step 1: Setup Credentials (`01_setup_credentials.py`)

Step 2: Create Graph Database (`02_create_graph.py`)

Step 3: Generate Data (`03_generate_data.py`)

Step 4: Upload and Ingest (`04_upload_ingest.py`)

Step 5: Query the Graph (`05_query_graph.py`)