Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 71 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,70 +1,111 @@
<div align="center">

```
___ _ ___
/ __\ ___ __| | ___ /\/\ / \
/ / / _ \ / _` | / _ \ / \ / /\ /
/ /___ | (_) || (_| || __/ / /\/\ \ / /_//
\____/ \___/ \__,_| \___| \/ \//___,'
```
# codemd

Transform code repositories into markdown-formatted strings ready for LLM prompting. Easily convert your entire codebase into a format that's optimal for code-related prompts with LLMs like GPT-4, Claude, etc.
Ver. 0.0.2
````

# CodeMD

🚀 Transform code repositories into markdown-formatted strings ready for LLM prompting

[![Tests](https://github.com/dotpyu/codemd/actions/workflows/tests.yml/badge.svg)](https://github.com/dotpyu/codemd/actions/workflows/tests.yml)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

</div>

## 📝 Overview

CodeMD helps you convert your entire codebase into a format that's optimal for code-related prompts with Large Language Models (LLMs) like GPT-4, Claude, and others. It automatically processes your code files and outputs them in a clean, markdown-formatted structure that's perfect for LLM interactions.

## ✨ Features

- 🔍 **Smart Directory Scanning**: Recursively scans directories for code files
- 🎯 **Flexible Configuration**:
- Configurable file extensions
- File and pattern exclusion support
- Custom .gitignore support
- 📊 **Intelligent Output**:
- Markdown-formatted code blocks
- Preserved directory structure
- Repository structure visualization
- Token count estimation (with tiktoken)
- 📋 **Convenience**:
- Simple command-line interface
- Direct copy-to-clipboard support
- Multiple output options

### 🎉 Recent Updates

## Features
- Recursively scans directories for code files
- Configurable file extensions
- File and pattern exclusion support
- Markdown-formatted output
- Preserves directory structure in headers
- Simple command-line interface
- Token count estimation (tiktoken package required)
- Direct copy to clipboard
- ⭐ **NEW**: Repository structure visualization (disable with `--no-structure`)
- ⭐ **NEW**: Automatic .gitignore support
- Uses project's .gitignore by default
- Custom .gitignore files via `--gitignore`
- Disable with `--ignore-gitignore`

## Installation
Install from pip
## 🚀 Installation
```bash
pip install codemd
```
or install from source

or install from source!

```bash
git clone https://github.com/dotpyu/codemd.git
cd codemd
pip install -e .
```

## Usage
Basic usage:
## 📖 Usage

### Command Line Interface

**Basic Usage:**
```bash
codemd /path/to/your/code
```

With custom extensions and output file:
**Custom Extensions and Output:**
```bash
codemd /path/to/your/code -e py,java,sql -o output.md
```

Exclude specific patterns or files:
**Pattern Exclusion:**
```bash
codemd /path/to/your/code \
--exclude-patterns "test_,debug_" \
--exclude-extensions "test.py,spec.js"
```

As a Python package:
```python
from codemd import CodeScanner
**.gitignore Configuration:**
```bash
# Use custom gitignore files
codemd /path/to/your/code --gitignore .gitignore .custom-ignore

scanner = CodeScanner(
extensions={'py', 'java'},
exclude_patterns={'test_'},
exclude_extensions={'test.py'}
)
markdown_string = scanner.scan_directory('./my_project')
# Disable gitignore processing
codemd /path/to/your/code --ignore-gitignore
```

## 🤝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## 📄 License

Distributed under the Apache 2.0 License. See `LICENSE` for more information.

## License
Distributed under the Apache 2 License. See `LICENSE` for more information.
---
<div align="center">
Made with ❤️ by Peilin
</div>
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "codemd"
version = "0.0.1"
version = "0.0.2"
authors = [
{ name = "Peilin Yu", email = "peilin_yu@brown.edu" },
]
Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
pathlib>=1.0.1
typing-extensions>=4.0.0
typing-extensions>=4.0.0
pathspec>=0.11.0
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

setup(
name="codemd",
version="0.0.1",
version="0.0.2",
author="Peilin Yu",
author_email="peilin_yu@brown.edu",
description="Transform code repositories into markdown-formatted strings ready for LLM prompting",
Expand Down
2 changes: 1 addition & 1 deletion src/codemd/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .cli import main
from .scanner import CodeScanner

__version__ = "0.0.1"
__version__ = "0.0.2"
__all__ = ["CodeScanner", "main"]
92 changes: 36 additions & 56 deletions src/codemd/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,15 @@

# Non-recursive scan with custom output
codemd /path/to/code --no-recursive -o custom.md

# Disable structure output
codemd /path/to/code --no-structure

# Use specific gitignore files
codemd /path/to/code --gitignore .gitignore .custom-ignore

# Disable gitignore processing
codemd /path/to/code --ignore-gitignore
"""


Expand All @@ -50,55 +59,37 @@ def parse_arguments() -> argparse.Namespace:
epilog=EPILOG
)

parser.add_argument(
'directory',
type=str,
help='Directory to scan'
)
parser.add_argument('directory', type=str, help='Directory to scan')
parser.add_argument('-e', '--extensions', type=str, default='py,java,js,cpp,c,h,hpp',
help='Comma-separated list of file extensions to include (without dots)')
parser.add_argument('--exclude-patterns', type=str, default='',
help='Comma-separated list of patterns to exclude (e.g., test_,debug_)')
parser.add_argument('--exclude-extensions', type=str, default='',
help='Comma-separated list of file patterns to exclude (e.g., test.py,spec.js)')
parser.add_argument('-o', '--output', type=str, default=None,
help='Output file path (if not specified, prints to stdout)')
parser.add_argument('--no-recursive', action='store_true',
help='Disable recursive directory scanning')
parser.add_argument('-v', '--verbose', action='store_true',
help='Enable verbose output')
parser.add_argument('--no-structure', action='store_true',
help='Disable repository structure output')

parser.add_argument(
'-e', '--extensions',
'--gitignore',
type=str,
default='py,java,js,cpp,c,h,hpp',
help='Comma-separated list of file extensions to include (without dots)'
nargs='+',
help='Specify one or more .gitignore files to use'
)

parser.add_argument(
'--exclude-patterns',
type=str,
default='',
help='Comma-separated list of patterns to exclude (e.g., test_,debug_)'
)

parser.add_argument(
'--exclude-extensions',
type=str,
default='',
help='Comma-separated list of file patterns to exclude (e.g., test.py,spec.js)'
)

parser.add_argument(
'-o', '--output',
type=str,
default=None,
help='Output file path (if not specified, prints to stdout)'
)

parser.add_argument(
'--no-recursive',
action='store_true',
help='Disable recursive directory scanning'
)

parser.add_argument(
'-v', '--verbose',
'--ignore-gitignore',
action='store_true',
help='Enable verbose output'
help='Disable .gitignore processing'
)

return parser.parse_args()


def str_to_set(s: str) -> Set[str]:
"""Convert comma-separated string to set of strings."""
return {item.strip() for item in s.split(',') if item.strip()}
Expand Down Expand Up @@ -212,12 +203,11 @@ def format_token_info(token_count: int, model_name: str) -> str:

def main() -> int:
print(BANNER)
print("Version 0.0.1")
print("Version 0.0.2")
print("Transform your code into LLM-ready prompts\n")

try:
args = parse_arguments()

directory = Path(args.directory)
output_file = Path(args.output) if args.output else None

Expand All @@ -232,27 +222,17 @@ def main() -> int:
exclude_patterns = str_to_set(args.exclude_patterns)
exclude_extensions = str_to_set(args.exclude_extensions)

if args.verbose:
print("Configuration:")
print(f" Directory: {directory}")
print(f" Extensions: {', '.join(sorted(extensions))}")
if exclude_patterns:
print(f" Exclude patterns: {', '.join(sorted(exclude_patterns))}")
if exclude_extensions:
print(f" Exclude extensions: {', '.join(sorted(exclude_extensions))}")
if output_file:
print(f" Output: {output_file}")
else:
print(" Output: stdout")
print(f" Recursive: {not args.no_recursive}")
print("\nProcessing files...")

scanner = CodeScanner(
extensions=extensions,
exclude_patterns=exclude_patterns,
exclude_extensions=exclude_extensions
exclude_extensions=exclude_extensions,
gitignore_files=args.gitignore,
ignore_gitignore=args.ignore_gitignore
)

scanner.no_structure = args.no_structure


try:
content = scanner.scan_directory(
directory,
Expand Down
Loading
Loading