LLM-powered Excel automation with 109 atomic tools
A comprehensive system that enables AI models to create, modify, and analyze Excel files through natural language. Built on a three-tier architecture combining openpyxl (formatting), pandas (analysis), and intelligent bridging between them.
Excel Action Space defines a complete atomic action space that can represent ANY human Excel workflow as a sequence of LLM tool calls.
Think of this as defining the "arms and hands" an LLM needs to perform any Excel task a human could do.
Example transformation:
Human Workflow:
Click cell A1 β Type "Revenue" β Bold it β Color it blue β Enter formula
LLM Tool Calls:
openpyxl_worksheet_set_item("A1", "Revenue")
β openpyxl_cell_set_style("A1", font='{"bold": true}', fill='{"color": "0000FF"}')
β openpyxl_worksheet_set_item("A2", "=SUM(B2:B10)")
Every click, keystroke, and menu selection in Excel maps to a precise tool call. This project catalogs all 109 atomic operations needed to replicate human Excel expertise.
The Excel Action Space = A comprehensive catalog of every meaningful action a human can perform in Excel, mapped to programmatic tool calls an LLM can execute.
Key Principles:
- Completeness: If a human can do it in Excel, we can represent it as tool calls
- Atomicity: Each tool performs one fundamental operation (write cell, set color, add chart)
- Composability: Complex workflows = sequences of simple atomic actions
- Executable: All tools map directly to openpyxl/pandas/xlwings methods
- Workflow Learning: LLMs can learn complex patterns (building DCF models, creating dashboards) from example sequences
- Complete Coverage: No Excel task is "out of reach" - 95%+ of real-world operations are represented
- Training Data Generation: Record expert Excel sessions β Convert to tool call sequences β Train LLMs
- Reproducibility: Any Excel workflow becomes a reproducible, version-controlled sequence
Vision: Enable LLMs to build investment banking-quality financial models by providing them the complete set of atomic operations humans use.
The action space is implemented using two battle-tested Python libraries that together provide comprehensive Excel capabilities:
openpyxl (74 tools) - The Formatting Layer:
- Covers 95%+ of everyday Excel operations
- File manipulation, cell formatting, charts, formulas, worksheet management
- Everything you'd accomplish clicking through Excel's UI
- Works without Excel installed (pure Python implementation)
- Creates professional-looking spreadsheets programmatically
pandas (30 tools) - The Analysis Layer:
- Advanced data operations beyond Excel's native capabilities
- Bulk transformations on massive datasets (millions of rows)
- Statistical analysis Excel can't perform efficiently
- Complex aggregations, pivots, and merges
- Transforms that would take hours manually
Bridge Tools (3 tools) - Seamless Integration:
- Connect both worlds: use Excel formulas, analyze with pandas power
- Calculate formulas via xlwings (leverages Excel's calculation engine)
- Transfer data bidirectionally while maintaining integrity
What This Enables:
- β Everything desktop Excel can do (formatting, charts, pivot tables, data validation)
- β Advanced operations beyond Excel (complex statistical analysis, ML-ready transformations)
- β Professional financial models (DCF valuations, LBO models, Monte Carlo simulations)
- β 10-100x faster than manual clicking through Excel UI
Follow these steps:
- Python 3.8+
- An LLM API key (OpenAI, Anthropic, Claude, or any LiteLLM-compatible provider)
-
Clone and navigate to the project:
git clone <your-repo-url> cd Excel_Agent
-
Create virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Copy the environment template:
cp .env.example .env
-
Edit
.envwith your credentials:# Open in your favorite editor nano .env -
Fill in your API details:
OPENAI_API_BASE=https://api.openai.com/v1 OPENAI_API_KEY=sk-your-actual-key-here OPENAI_MODEL_NAME=gpt-4
-
The
.envfile is in.gitignoreand will never be committed
Supported LLM Providers (via LiteLLM):
- OpenAI:
gpt-4,gpt-4-turbo,gpt-3.5-turbo - Anthropic:
claude-opus-4-6,claude-sonnet-4-5 - Custom endpoints: Any OpenAI-compatible API
For other providers, see the LiteLLM documentation.
Interactive Mode (recommended for learning):
python run.py
# Choose option 2 for interactive modeExample commands to try:
> Create a spreadsheet with quarterly sales data and save it as sales.xlsx
> Load Apple_DCF_Model.xlsx and explain the key assumptions
> Create a financial model with 5-year revenue projections
Programmatic Usage:
from excel_action_space.integration.llm_client import LLMExcelAutomation
automation = LLMExcelAutomation()
automation.process_user_request(
"Create a financial model with revenue projections for 2024-2028. "
"Include rows for Revenue, COGS, Gross Profit, and Operating Expenses. "
"Use formulas and format as currency. Save as financial_model.xlsx"
)You're now set up! Here's how to explore:
-
See available tools:
- Open
excel_action_space/tools/unified_definitions.jsonto see all 109 tools - Each tool has name, description, and parameters
- Open
-
Extend functionality:
- Add new openpyxl tools:
excel_action_space/tools/openpyxl_executor.py - Add new pandas tools:
excel_action_space/tools/pandas_executor.py - Add bridge operations:
excel_action_space/tools/bridge_functions.py
- Add new openpyxl tools:
-
Test your changes:
python run.py # Interactive mode for quick testing -
Understand the flow:
- User request β LLM (via LiteLLM) β Tool calls β UnifiedExcelExecutor β Excel files
109 tools organized into three complementary categories:
- File operations (create, load, save, close)
- Cell operations (read, write, copy, access)
- Formatting (fonts, colors, borders, alignment, number formats)
- Charts (bar, line, pie, scatter, area + inspection)
- Formulas (insert, copy, fill ranges efficiently)
- Worksheet management (create, delete, copy sheets)
- Excel features (data validation, protection, freezing, comments)
- Prefix:
openpyxl_*
- File I/O (read Excel, CSV, write to Excel with multiple sheets)
- DataFrame operations (filter, sort, group, pivot)
- Data transformations (merge, join, aggregate)
- Statistical analysis (describe, correlate, calculate)
- DataFrame management (select, list, inspect)
- Prefix:
pandas_*
bridge_openpyxl_to_pandas- Load Excel with calculated formulas into pandasbridge_pandas_to_openpyxl- Transfer DataFrame to openpyxl for formattingbridge_sync_file- Synchronize data between both systems- Prefix:
bridge_*
Save-Before-Bridge Rule (enforced by error):
When using formulas with pandas analysis, you MUST follow this sequence:
1. Load or create workbook (openpyxl)
2. Add formulas and formatting
3. SAVE the workbook β MANDATORY (openpyxl_workbook_save)
4. Bridge to pandas (bridge_openpyxl_to_pandas)
5. Analyze data with pandas
Why is this enforced?
- The bridge uses xlwings to calculate Excel formulas
- xlwings breaks openpyxl's file handle (unavoidable)
- We reload from disk to fix the handle
- Without saving first, all your changes are LOST
What happens if you skip the save?
Error: "CRITICAL: Workbook has UNSAVED changes. You MUST call
openpyxl_workbook_save first to prevent data loss..."
The LLM will see this error and automatically correct the workflow.
User Request
β
LiteLLM (with 109 tools)
β
Tool Call(s) Generated
β
UnifiedExcelExecutor
β
βββββββββββββββββββββΌββββββββββββββββββββ
β β β
OpenpyxlExecutor PandasExecutor BridgeConnector
(74 tools) (30 tools) (3 tools)
β β β
Excel Files DataFrames Synced Data
State Management:
- Single openpyxl workbook active at a time
- Multiple pandas DataFrames (tracked by ID: df_1, df_2, etc.)
- Bridge coordinates between both systems
- Modified flag tracks unsaved changes
Excel_Agent/
βββ excel_action_space/
β βββ tools/ # Core tool implementation
β β βββ openpyxl_executor.py # 66 openpyxl tools (formatting, charts, formulas)
β β βββ pandas_executor.py # 30 pandas tools (analysis, transformations)
β β βββ bridge_functions.py # 3 bridge tools (openpyxl β pandas)
β β βββ unified_executor.py # Routes tool calls to correct executor
β β βββ unified_definitions.json # All 109 tool definitions (LiteLLM format)
β β
β βββ integration/
β β βββ llm_client.py # LiteLLM integration + interactive mode
β β
β βββ discovery/ # Research & exploration
β β βββ research/ # Background on Python Excel libraries
β β βββ exploration/ # Scripts that discovered tool definitions
β β
β βββ workflows/ # Example workflows (currently empty - contribute!)
β βββ tests/ # Unit and integration tests
β βββ outputs/ # Generated Excel files
β βββ README.md # Detailed documentation
β
βββ run.py # Quick entry point for testing
βββ requirements.txt # All dependencies
βββ plan.md # Original project vision
βββ README.md # This file
Key Files Explained:
unified_executor.py: Routes tool calls to the right executor (openpyxl/pandas/bridge)openpyxl_executor.py: Implements all 66 openpyxl tools (the formatting/Excel features layer)pandas_executor.py: Implements all 30 pandas tools (the analysis layer)bridge_functions.py: Implements the 3 critical bridge tools (connects the layers)unified_definitions.json: All 109 tool definitions in LiteLLM format (read this!)llm_client.py: Connects LiteLLM to the executor, handles conversation flow
File: excel_action_space/tools/openpyxl_executor.py
def _your_new_tool(self, param1, param2=None):
"""
Your tool description
Args:
param1: Description
param2: Optional description
"""
if not self.workbook:
return error("No workbook loaded")
# Your implementation
self.modified = True # Mark as modified if changes made
return success("Operation completed")Then add to unified_definitions.json:
{
"type": "function",
"function": {
"name": "openpyxl_your_new_tool",
"description": "Clear description of what it does",
"parameters": {
"type": "object",
"properties": {
"param1": {
"type": "string",
"description": "What this parameter does"
}
},
"required": ["param1"]
}
}
}File: excel_action_space/tools/pandas_executor.py
Similar process - add method, update unified_definitions.json.
File: excel_action_space/tools/bridge_functions.py
Be careful with state management between openpyxl and pandas.
-
Interactive testing:
python run.py # Option 2: Interactive mode > Test your new tool here
-
Programmatic testing:
automation = LLMExcelAutomation() automation.process_user_request("Use my new tool to...")
-
Check state:
state = automation.get_executor_state() print(state)
- User prompt β
LLMExcelAutomation.process_user_request() - LiteLLM generates tool calls β Based on 109 tool definitions
- UnifiedExcelExecutor routes β Checks tool prefix (openpyxl_, pandas_, bridge_)
- Individual executor executes β Returns
{'status': 'success'/'error', 'result': ...} - Result fed back to LLM β LLM can call more tools or respond
- Loop until complete β Max 100 iterations
- Atomic Operations - Each tool does one thing well
- Explicit State - Track modified flag, filenames, DataFrame IDs
- Fail-Safe - Enforce save-before-bridge to prevent data loss
- LLM-Friendly - Clear error messages that guide the LLM
- Composable - Complex workflows = sequences of simple tools
- This README - Quick start and overview
excel_action_space/README.md- Detailed system architectureexcel_action_space/tools/unified_definitions.json- Complete tool catalogplan.md- Original project vision and researchexcel_action_space/OPTION2_IMPLEMENTATION_COMPLETE.md- Save-before-bridge implementation detailsexcel_action_space/discovery/- Research on Python Excel libraries
Contributions welcome! Areas of interest:
- Example workflows - Add to
excel_action_space/workflows/examples/ - Test coverage - Add to
excel_action_space/tests/ - New tools - Extend executor functionality
- Documentation - Improve guides and examples
- Bug fixes - Especially around edge cases
- Ensure you're on the
pandasbranch - Test changes in interactive mode
- Update
unified_definitions.jsonif adding tools - Follow existing code patterns
- Add clear docstrings
MIT License - See project repository for details
Built with comprehensive research into Python Excel libraries and real-world financial modeling requirements. See excel_action_space/discovery/research/ for background research.
Key Technologies:
- openpyxl - Excel file manipulation
- pandas - Data analysis
- xlwings - Excel automation
- LiteLLM - LLM integration
For bugs or feature requests, use the project's issue tracker.
Now go build something awesome! π