Skip to content

LLM-powered Excel automation with 110 atomic tools for creating, formatting, and analyzing spreadsheets through natural language

License

Notifications You must be signed in to change notification settings

Bread-Technologies/Bread_Excel_Agent

Repository files navigation

Excel Action Space πŸ“Š

LLM-powered Excel automation with 109 atomic tools

A comprehensive system that enables AI models to create, modify, and analyze Excel files through natural language. Built on a three-tier architecture combining openpyxl (formatting), pandas (analysis), and intelligent bridging between them.


Overview

What is this?

Excel Action Space defines a complete atomic action space that can represent ANY human Excel workflow as a sequence of LLM tool calls.

Think of this as defining the "arms and hands" an LLM needs to perform any Excel task a human could do.

Example transformation:

Human Workflow:
  Click cell A1 β†’ Type "Revenue" β†’ Bold it β†’ Color it blue β†’ Enter formula

LLM Tool Calls:
  openpyxl_worksheet_set_item("A1", "Revenue")
  β†’ openpyxl_cell_set_style("A1", font='{"bold": true}', fill='{"color": "0000FF"}')
  β†’ openpyxl_worksheet_set_item("A2", "=SUM(B2:B10)")

Every click, keystroke, and menu selection in Excel maps to a precise tool call. This project catalogs all 109 atomic operations needed to replicate human Excel expertise.

Core Concept

The Excel Action Space = A comprehensive catalog of every meaningful action a human can perform in Excel, mapped to programmatic tool calls an LLM can execute.

Key Principles:

  • Completeness: If a human can do it in Excel, we can represent it as tool calls
  • Atomicity: Each tool performs one fundamental operation (write cell, set color, add chart)
  • Composability: Complex workflows = sequences of simple atomic actions
  • Executable: All tools map directly to openpyxl/pandas/xlwings methods

Why This Matters

  1. Workflow Learning: LLMs can learn complex patterns (building DCF models, creating dashboards) from example sequences
  2. Complete Coverage: No Excel task is "out of reach" - 95%+ of real-world operations are represented
  3. Training Data Generation: Record expert Excel sessions β†’ Convert to tool call sequences β†’ Train LLMs
  4. Reproducibility: Any Excel workflow becomes a reproducible, version-controlled sequence

Vision: Enable LLMs to build investment banking-quality financial models by providing them the complete set of atomic operations humans use.

The Implementation: openpyxl + pandas

The action space is implemented using two battle-tested Python libraries that together provide comprehensive Excel capabilities:

openpyxl (74 tools) - The Formatting Layer:

  • Covers 95%+ of everyday Excel operations
  • File manipulation, cell formatting, charts, formulas, worksheet management
  • Everything you'd accomplish clicking through Excel's UI
  • Works without Excel installed (pure Python implementation)
  • Creates professional-looking spreadsheets programmatically

pandas (30 tools) - The Analysis Layer:

  • Advanced data operations beyond Excel's native capabilities
  • Bulk transformations on massive datasets (millions of rows)
  • Statistical analysis Excel can't perform efficiently
  • Complex aggregations, pivots, and merges
  • Transforms that would take hours manually

Bridge Tools (3 tools) - Seamless Integration:

  • Connect both worlds: use Excel formulas, analyze with pandas power
  • Calculate formulas via xlwings (leverages Excel's calculation engine)
  • Transfer data bidirectionally while maintaining integrity

What This Enables:

  • βœ… Everything desktop Excel can do (formatting, charts, pivot tables, data validation)
  • βœ… Advanced operations beyond Excel (complex statistical analysis, ML-ready transformations)
  • βœ… Professional financial models (DCF valuations, LBO models, Monte Carlo simulations)
  • βœ… 10-100x faster than manual clicking through Excel UI

Quick Start ⚑

Follow these steps:

Prerequisites

  • Python 3.8+
  • An LLM API key (OpenAI, Anthropic, Claude, or any LiteLLM-compatible provider)

Installation

  1. Clone and navigate to the project:

    git clone <your-repo-url>
    cd Excel_Agent
  2. Create virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

Configuration

⚠️ IMPORTANT: Never commit API keys to version control!

  1. Copy the environment template:

    cp .env.example .env
  2. Edit .env with your credentials:

    # Open in your favorite editor
    nano .env
  3. Fill in your API details:

    OPENAI_API_BASE=https://api.openai.com/v1
    OPENAI_API_KEY=sk-your-actual-key-here
    OPENAI_MODEL_NAME=gpt-4
  4. The .env file is in .gitignore and will never be committed

Supported LLM Providers (via LiteLLM):

  • OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
  • Anthropic: claude-opus-4-6, claude-sonnet-4-5
  • Custom endpoints: Any OpenAI-compatible API

For other providers, see the LiteLLM documentation.

Run Your First Automation

Interactive Mode (recommended for learning):

python run.py
# Choose option 2 for interactive mode

Example commands to try:

> Create a spreadsheet with quarterly sales data and save it as sales.xlsx
> Load Apple_DCF_Model.xlsx and explain the key assumptions
> Create a financial model with 5-year revenue projections

Programmatic Usage:

from excel_action_space.integration.llm_client import LLMExcelAutomation

automation = LLMExcelAutomation()
automation.process_user_request(
    "Create a financial model with revenue projections for 2024-2028. "
    "Include rows for Revenue, COGS, Gross Profit, and Operating Expenses. "
    "Use formulas and format as currency. Save as financial_model.xlsx"
)

Start Hacking πŸš€

You're now set up! Here's how to explore:

  1. See available tools:

    • Open excel_action_space/tools/unified_definitions.json to see all 109 tools
    • Each tool has name, description, and parameters
  2. Extend functionality:

    • Add new openpyxl tools: excel_action_space/tools/openpyxl_executor.py
    • Add new pandas tools: excel_action_space/tools/pandas_executor.py
    • Add bridge operations: excel_action_space/tools/bridge_functions.py
  3. Test your changes:

    python run.py  # Interactive mode for quick testing
  4. Understand the flow:

    • User request β†’ LLM (via LiteLLM) β†’ Tool calls β†’ UnifiedExcelExecutor β†’ Excel files

Architecture Overview

Three-Tier System

109 tools organized into three complementary categories:

1. Openpyxl Tools (66 tools) - Formatting & Excel Features

  • File operations (create, load, save, close)
  • Cell operations (read, write, copy, access)
  • Formatting (fonts, colors, borders, alignment, number formats)
  • Charts (bar, line, pie, scatter, area + inspection)
  • Formulas (insert, copy, fill ranges efficiently)
  • Worksheet management (create, delete, copy sheets)
  • Excel features (data validation, protection, freezing, comments)
  • Prefix: openpyxl_*

2. Pandas Tools (30 tools) - Data Analysis & Transformation

  • File I/O (read Excel, CSV, write to Excel with multiple sheets)
  • DataFrame operations (filter, sort, group, pivot)
  • Data transformations (merge, join, aggregate)
  • Statistical analysis (describe, correlate, calculate)
  • DataFrame management (select, list, inspect)
  • Prefix: pandas_*

3. Bridge Tools (3 tools) - Seamless Integration

  • bridge_openpyxl_to_pandas - Load Excel with calculated formulas into pandas
  • bridge_pandas_to_openpyxl - Transfer DataFrame to openpyxl for formatting
  • bridge_sync_file - Synchronize data between both systems
  • Prefix: bridge_*

Critical Workflow Pattern ⚠️

Save-Before-Bridge Rule (enforced by error):

When using formulas with pandas analysis, you MUST follow this sequence:

1. Load or create workbook (openpyxl)
2. Add formulas and formatting
3. SAVE the workbook ← MANDATORY (openpyxl_workbook_save)
4. Bridge to pandas (bridge_openpyxl_to_pandas)
5. Analyze data with pandas

Why is this enforced?

  • The bridge uses xlwings to calculate Excel formulas
  • xlwings breaks openpyxl's file handle (unavoidable)
  • We reload from disk to fix the handle
  • Without saving first, all your changes are LOST

What happens if you skip the save?

Error: "CRITICAL: Workbook has UNSAVED changes. You MUST call 
openpyxl_workbook_save first to prevent data loss..."

The LLM will see this error and automatically correct the workflow.

How It Works

User Request
    ↓
LiteLLM (with 109 tools)
    ↓
Tool Call(s) Generated
    ↓
UnifiedExcelExecutor
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
↓                   ↓                   ↓
OpenpyxlExecutor    PandasExecutor    BridgeConnector
(74 tools)          (30 tools)         (3 tools)
    ↓                   ↓                   ↓
Excel Files        DataFrames         Synced Data

State Management:

  • Single openpyxl workbook active at a time
  • Multiple pandas DataFrames (tracked by ID: df_1, df_2, etc.)
  • Bridge coordinates between both systems
  • Modified flag tracks unsaved changes

Project Structure

Excel_Agent/
β”œβ”€β”€ excel_action_space/
β”‚   β”œβ”€β”€ tools/                      # Core tool implementation
β”‚   β”‚   β”œβ”€β”€ openpyxl_executor.py    # 66 openpyxl tools (formatting, charts, formulas)
β”‚   β”‚   β”œβ”€β”€ pandas_executor.py      # 30 pandas tools (analysis, transformations)
β”‚   β”‚   β”œβ”€β”€ bridge_functions.py     # 3 bridge tools (openpyxl ↔ pandas)
β”‚   β”‚   β”œβ”€β”€ unified_executor.py     # Routes tool calls to correct executor
β”‚   β”‚   └── unified_definitions.json # All 109 tool definitions (LiteLLM format)
β”‚   β”‚
β”‚   β”œβ”€β”€ integration/
β”‚   β”‚   └── llm_client.py           # LiteLLM integration + interactive mode
β”‚   β”‚
β”‚   β”œβ”€β”€ discovery/                  # Research & exploration
β”‚   β”‚   β”œβ”€β”€ research/               # Background on Python Excel libraries
β”‚   β”‚   └── exploration/            # Scripts that discovered tool definitions
β”‚   β”‚
β”‚   β”œβ”€β”€ workflows/                  # Example workflows (currently empty - contribute!)
β”‚   β”œβ”€β”€ tests/                      # Unit and integration tests
β”‚   β”œβ”€β”€ outputs/                    # Generated Excel files
β”‚   └── README.md                   # Detailed documentation
β”‚
β”œβ”€β”€ run.py                          # Quick entry point for testing
β”œβ”€β”€ requirements.txt                # All dependencies
β”œβ”€β”€ plan.md                         # Original project vision
└── README.md                       # This file

Key Files Explained:

  • unified_executor.py: Routes tool calls to the right executor (openpyxl/pandas/bridge)
  • openpyxl_executor.py: Implements all 66 openpyxl tools (the formatting/Excel features layer)
  • pandas_executor.py: Implements all 30 pandas tools (the analysis layer)
  • bridge_functions.py: Implements the 3 critical bridge tools (connects the layers)
  • unified_definitions.json: All 109 tool definitions in LiteLLM format (read this!)
  • llm_client.py: Connects LiteLLM to the executor, handles conversation flow

Development Guide

Adding New Tools

1. Add an Openpyxl Tool

File: excel_action_space/tools/openpyxl_executor.py

def _your_new_tool(self, param1, param2=None):
    """
    Your tool description
    
    Args:
        param1: Description
        param2: Optional description
    """
    if not self.workbook:
        return error("No workbook loaded")
    
    # Your implementation
    self.modified = True  # Mark as modified if changes made
    return success("Operation completed")

Then add to unified_definitions.json:

{
  "type": "function",
  "function": {
    "name": "openpyxl_your_new_tool",
    "description": "Clear description of what it does",
    "parameters": {
      "type": "object",
      "properties": {
        "param1": {
          "type": "string",
          "description": "What this parameter does"
        }
      },
      "required": ["param1"]
    }
  }
}

2. Add a Pandas Tool

File: excel_action_space/tools/pandas_executor.py

Similar process - add method, update unified_definitions.json.

3. Add a Bridge Tool

File: excel_action_space/tools/bridge_functions.py

Be careful with state management between openpyxl and pandas.

Testing Changes

  1. Interactive testing:

    python run.py
    # Option 2: Interactive mode
    > Test your new tool here
  2. Programmatic testing:

    automation = LLMExcelAutomation()
    automation.process_user_request("Use my new tool to...")
  3. Check state:

    state = automation.get_executor_state()
    print(state)

Understanding the Executor Flow

  1. User prompt β†’ LLMExcelAutomation.process_user_request()
  2. LiteLLM generates tool calls β†’ Based on 109 tool definitions
  3. UnifiedExcelExecutor routes β†’ Checks tool prefix (openpyxl_, pandas_, bridge_)
  4. Individual executor executes β†’ Returns {'status': 'success'/'error', 'result': ...}
  5. Result fed back to LLM β†’ LLM can call more tools or respond
  6. Loop until complete β†’ Max 100 iterations

Key Design Principles

  1. Atomic Operations - Each tool does one thing well
  2. Explicit State - Track modified flag, filenames, DataFrame IDs
  3. Fail-Safe - Enforce save-before-bridge to prevent data loss
  4. LLM-Friendly - Clear error messages that guide the LLM
  5. Composable - Complex workflows = sequences of simple tools

Documentation

  • This README - Quick start and overview
  • excel_action_space/README.md - Detailed system architecture
  • excel_action_space/tools/unified_definitions.json - Complete tool catalog
  • plan.md - Original project vision and research
  • excel_action_space/OPTION2_IMPLEMENTATION_COMPLETE.md - Save-before-bridge implementation details
  • excel_action_space/discovery/ - Research on Python Excel libraries

Contributing

Contributions welcome! Areas of interest:

  1. Example workflows - Add to excel_action_space/workflows/examples/
  2. Test coverage - Add to excel_action_space/tests/
  3. New tools - Extend executor functionality
  4. Documentation - Improve guides and examples
  5. Bug fixes - Especially around edge cases

Contribution Guidelines

  1. Ensure you're on the pandas branch
  2. Test changes in interactive mode
  3. Update unified_definitions.json if adding tools
  4. Follow existing code patterns
  5. Add clear docstrings

License

MIT License - See project repository for details


Credits

Built with comprehensive research into Python Excel libraries and real-world financial modeling requirements. See excel_action_space/discovery/research/ for background research.

Key Technologies:


Support

For bugs or feature requests, use the project's issue tracker.


Now go build something awesome! πŸš€

About

LLM-powered Excel automation with 110 atomic tools for creating, formatting, and analyzing spreadsheets through natural language

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages