Skip to content

rishabhcli/AegisRedact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# Share-Safe Toolkit **Privacy-First Redaction PWA** A cross-platform, browser-only Progressive Web App for safely redacting sensitive information from documents, images, and text files before sharing. All processing happens locally in your browser�nothing is uploaded. ![Version](https://img.shields.io/badge/version-1.1.0-blue) ![License](https://img.shields.io/badge/license-MIT-green) --- ## 🆕 What's New in v1.1 **Format Expansion Release** - We've extended support beyond PDFs and images! - ✨ **Plain Text & Markdown** (.txt, .md) - Redact configuration files, logs, and notes - ✨ **CSV/TSV Support** (.csv, .tsv) - Redact spreadsheet data with column-level operations - 🏗️ **Format Abstraction Layer** - Extensible architecture for adding new formats - 📚 **Comprehensive Documentation** - See [docs/FORMATS.md](docs/FORMATS.md) for details **Coming Soon:** Office documents (.docx, .xlsx, .pptx), Rich Text (.rtf, .html), E-books (.epub) --- ## Features ### Core Capabilities - **Multi-Format Support** 🆕 - **PDFs**: Text-based and scanned documents with OCR - **Images**: JPEG, PNG, WebP, GIF, BMP with automatic EXIF removal - **Plain Text**: .txt and .md files with line-based redaction - **CSV/TSV**: Spreadsheet data with cell and column-level redaction - See [docs/FORMATS.md](docs/FORMATS.md) for full format documentation - **Intelligent PII Detection** - Automatic detection: Emails, phone numbers, SSNs, payment card numbers - Regex + Luhn validation for accuracy - Optional ML-based Named Entity Recognition (NER) - Hybrid detection merging for best results - **Flexible Redaction** - Automatic detection suggestions - Manual redaction boxes (draw with mouse/touch) - Column-based redaction for CSV files - Non-reversible black-box rendering - **Security & Privacy** - 100% client-side processing - No server uploads - No tracking or analytics - Automatic metadata removal (EXIF, GPS) - Flattened exports (no hidden layers or selectable text) - **Modern PWA** - Installable on desktop & mobile - Offline support via Service Worker - Fast, responsive interface - WCAG 2.2 accessible --- ## Security Rationale ### Why Black Boxes + Flattening? **Never use blur or pixelation for redaction.** These techniques are reversible: - **Research evidence**: Studies including work from the PoPETs conference demonstrate that blurred text can be recovered using deconvolution techniques - **Real-world failures**: Multiple high-profile incidents where pixelated data was recovered (see Bishop Fox security advisories) **Our approach**: 1. **Solid black rectangles**: Completely opaque fills (no transparency, no blur) 2. **Flattening**: - PDFs are rasterized to images and embedded in a fresh PDF document - Images are re-encoded through canvas, stripping all metadata - No hidden layers, selectable text, or recoverable data ### Metadata Removal - **Canvas re-encode**: `toBlob()` creates a fresh image without EXIF data - **Expected behavior**: GPS coordinates, camera info, and orientation tags are intentionally removed - **PDF metadata**: Generated PDFs only include safe, user-specified metadata --- ## Tech Stack ### Core Libraries - **Vite** (^5.4.11): Fast build tool with ESM support - **pdfjs-dist** (^4.8.69): PDF rendering and text extraction - **pdf-lib** (^1.17.1): PDF generation with embedded images - **tesseract.js** (^5.1.1): Optional OCR for scanned documents - **browser-fs-access** (^0.35.0): Cross-platform file access (File System Access API + fallback) - **Workbox** (^7.1.1): Service Worker generation for PWA ### Why No Heavy AI? - Pattern detection uses **regex + Luhn validation** (fast, accurate, privacy-preserving) - No paid APIs required - Fully offline-capable - No external dependencies for core functionality --- ## Installation & Development ### Prerequisites - Node.js 18+ and npm ### Setup ```bash # Install dependencies npm install # Run development server npm run dev # Build for production npm run build # Preview production build npm run preview ``` ### Testing ```bash # Run unit tests npm test # Run tests with UI npm run test:ui # Generate coverage report npm run test:coverage ``` --- ## Deployment ### Static Hosting The app is a static site and can be deployed to any HTTPS host: - **Netlify**: Drag `dist/` folder to Netlify drop zone - **Vercel**: `vercel --prod` - **Cloudflare Pages**: Connect repo and deploy - **GitHub Pages**: Push `dist/` to `gh-pages` branch ### Requirements - **HTTPS**: Required for Service Worker and File System Access API - **Modern browser**: Chrome/Edge 86+, Safari 15.4+, Firefox 105+ --- ## Project Structure ``` share-safe/ � public/ � � icons/ # PWA icons � � manifest.webmanifest # PWA manifest � src/ � � lib/ � � � detect/ # PII detection (patterns, Luhn) � � � pdf/ # PDF processing � � � images/ # Image processing & EXIF removal � � � fs/ # File I/O utilities � � � pwa/ # Service Worker registration � � ui/ � � � components/ # UI components � � � App.ts # Main application � � main.ts # Entry point � � styles.css # Styles � tests/ � � unit/ # Unit tests � � e2e.spec.ts # E2E test stubs � vite.config.ts # Vite configuration � workbox.config.mjs # Service Worker config � package.json ``` --- ## Usage ### Basic Workflow 1. **Load Files**: Drag & drop or click to select PDFs/images 2. **Configure Detection**: Toggle email, phone, SSN, card number detection 3. **Review Suggestions**: Approve auto-detected sensitive data 4. **Manual Redaction**: Draw custom boxes with mouse/touch 5. **Export**: Download redacted file with `-redacted` suffix ### Keyboard Shortcuts - `Tab` / `Shift+Tab`: Navigate UI elements - `Enter` / `Space`: Activate buttons - `Delete`: Remove selected redaction box - `+` / `-`: Zoom in/out on canvas ### Accessibility - WCAG 2.2 compliant - Full keyboard navigation - ARIA labels and roles - High contrast mode support - Reduced motion support - Touch-friendly targets (44x44px minimum) --- ## Detection Patterns ### Email Addresses - Pattern: RFC 5322 simplified - Example: `user@example.com` ### Phone Numbers (E.164) - Pattern: `+` followed by 1-15 digits - Example: `+14155552671` ### US Social Security Numbers - Formats: `XXX-XX-XXXX` or `XXXXXXXXX` - Note: SSA randomized allocation in 2011; geography not inferred ### Payment Card Numbers - Method: Luhn algorithm validation - Length: 13-19 digits - Formats: With/without spaces or dashes - Reduces false positives significantly --- ## Known Limitations ### Current Version (1.0.0) 1. **Single-page redaction**: Manual boxes apply to current page only (multi-page tracking coming in v1.1) 2. **OCR performance**: Tesseract.js can be slow on high-resolution pages 3. **Icon placeholders**: Replace `public/icons/` with properly designed icons for production 4. **Pattern detection**: - US-centric patterns (SSN format) - E.164 may match non-phone number sequences - Regex patterns optimized for speed over perfect recall ### Browser Compatibility | Feature | Chrome/Edge | Firefox | Safari | |---------|-------------|---------|--------| | Basic functionality | � 86+ | � 105+ | � 15.4+ | | File System Access | � | L (download fallback) | �� 15.4+ (limited) | | PWA Install | � | �� (Android only) | � (iOS 16.4+) | --- ## Security Best Practices ### For Users 1. **Verify redactions**: Always review auto-detected areas before export 2. **Check manually**: Use zoom to inspect redactions at 200%+ 3. **Test with dummy data**: Practice workflow before redacting real documents 4. **Keep originals secure**: This tool creates redacted copies; manage originals appropriately ### For Developers 1. **Update dependencies**: Regularly check for security updates 2. **CSP headers**: Deploy with Content Security Policy 3. **Subresource Integrity**: Use SRI for CDN resources (if any) 4. **Audit regex**: Avoid catastrophic backtracking (ReDoS) --- ## Roadmap ### v1.1 (Planned) - [ ] Multi-page redaction tracking - [ ] Batch export for multiple files - [ ] Custom pattern definitions - [ ] Undo/redo for redactions - [ ] Export audit log (redaction summary) ### v1.2 (Future) - [ ] International pattern libraries (IBAN, passport numbers, etc.) - [ ] Collaborative redaction (encrypted sharing) - [ ] License verification (optional Gumroad integration) - [ ] Advanced OCR (on-device ML for better accuracy) --- ## Contributing Contributions welcome! Please: 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Commit changes (`git commit -m 'Add amazing feature'`) 4. Push to branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ### Code Style - TypeScript strict mode - ESLint + Prettier (config coming soon) - Test coverage for new features --- ## License MIT License - see [LICENSE](LICENSE) file for details --- ## References & Research ### Redaction Security - **Blur/pixelation reversibility**: PoPETs conference proceedings on deconvolution attacks - **Bishop Fox advisories**: Case studies of failed pixelation redaction - **NIST guidelines**: Digital redaction best practices (SP 800-88) ### Standards & APIs - **E.164**: ITU-T international phone number format - **Luhn algorithm**: ISO/IEC 7812-1 payment card validation - **PDF specification**: ISO 32000-2 (PDF 2.0) - **File System Access API**: W3C specification ### Libraries - [PDF.js](https://mozilla.github.io/pdf.js/): Mozilla's PDF rendering engine - [pdf-lib](https://pdf-lib.js.org/): PDF generation and manipulation - [Tesseract.js](https://tesseract.projectnaptha.com/): Browser-based OCR - [Workbox](https://developer.chrome.com/docs/workbox/): Google's Service Worker toolkit --- ## Support - **Documentation**: - [Supported Formats Guide](docs/FORMATS.md) - Complete format documentation - [Format Handler Development](docs/FORMAT_HANDLER_GUIDE.md) - Guide for developers - Additional docs in `docs/` directory - **Issues**: [GitHub Issues](https://github.com/yourusername/share-safe/issues) - **Discussions**: [GitHub Discussions](https://github.com/yourusername/share-safe/discussions) --- ## Acknowledgments Built with guidance from: - OWASP security best practices - W3C PWA guidelines - Mozilla PDF.js documentation - Accessibility guidelines (WCAG 2.2) --- **Made with security and privacy in mind. Share safely!** =�

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •