This document provides instructions and guidelines for AI agents working with the unstructuredDataHandler repository.
unstructuredDataHandler is a Python-based Software Development Life Cycle core project that provides AI/ML capabilities for software development workflows. The repository contains modules for LLM clients, intelligent agents, memory management, prompt engineering, document retrieval, skill execution, and various utilities.
- Primary Language: Python 3.10-3.12
- Secondary Languages: TypeScript (for Azure pipelines), Shell scripts
- Project Type: AI/ML library and tooling for SDLC workflows
Use the reproducible test script. It creates .venv_ci and pins pytest for reliable runs.
./scripts/run-tests.shCreate and activate your own virtual environment, then install dev dependencies.
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements-dev.txtYou must set the PYTHONPATH to the root of the repository for imports to work correctly.
export PYTHONPATH=.Alternatively, you can prefix your commands with PYTHONPATH=.:
PYTHONPATH=. python -m pytestPreferred (isolated venv):
./scripts/run-tests.sh # Full test run
./scripts/run-tests.sh test/unit -k deepagent # Narrow selectionAlternative (local venv):
PYTHONPATH=. python -m pytest test/ -v
PYTHONPATH=. python -m pytest test/ --cov=src/ --cov-report=xml
PYTHONPATH=. python -m pytest test/unit/ -v
PYTHONPATH=. python -m pytest test/integration/ -v
PYTHONPATH=. python -m pytest test/e2e/ -v# Run pylint
python -m pylint src/ --exit-zero
# Run mypy
python -m mypy src/ --ignore-missing-imports --exclude="src/llm/router.py"Note on mypy: The exclusion for src/llm/router.py is necessary to avoid conflicts with src/fallback/router.py.
The core logic is in the src/ directory, which is organized into the following modules:
src/agents/: Agent classes (planner, executor, base agent)src/memory/: Short-term and long-term memory modulessrc/pipelines/: Chat flows, document processing, task routingsrc/retrieval/: Vector search and document lookupsrc/skills/: Web search, code execution capabilitiessrc/vision_audio/: Multimodal processing (image/audio)src/prompt_engineering/: Template management, few-shot, chainingsrc/llm/: OpenAI, Anthropic, custom LLM routingsrc/fallback/: Recovery logic when LLMs failsrc/guardrails/: PII filters, output validation, safetysrc/handlers/: Input/output processing, error managementsrc/utils/: Logging, caching, rate limiting, tokens
Other important directories:
config/: YAML configurations for models, prompts, loggingdata/: Prompts, embeddings, dynamic contentexamples/: Minimal scripts demonstrating key featurestest/: Unit, integration, smoke, and e2e tests
- Install dependencies before making changes.
- Set the
PYTHONPATHfor all commands. - Run tests (
PYTHONPATH=. python -m pytest test/ -v) to validate the current state before making changes. - Configure the agent by editing
config/model_config.yamlbefore running it. - Ensure new Python modules have proper
__init__.pyfiles. - Follow the branch naming convention:
dev/<alias>/<feature>. - Fill out the PR template when submitting a pull request. The template is located at
.github/PULL_REQUEST_TEMPLATE.md.
- Run tests without setting
PYTHONPATH. - Assume
requirements.txtcontains dependencies. - Create modules named "router" (conflicts with existing router.py files).
- Modify Azure pipeline scripts (
build/azure-pipelines/) without TypeScript knowledge.