Structural signals for machine interpretability
DocSignals is a React-based tool for analyzing structural and semantic signals in HTML documents as they are traversed by machines.
It exposes measurable signals about document consistency, segmentation, and semantic explicitness — without scoring, ranking, or content evaluation.
DocSignals is a document analysis tool that:
- measures structural properties of HTML documents
- detects semantic signals encoded in markup
- makes no quality judgments
- separates measurement from interpretation
It does not evaluate:
- content quality
- SEO performance
- accessibility compliance
- ranking potential
DocSignals focuses solely on document structure and explicitness.
- URL Analysis — Enter any URL to analyze its document structure
- Multiple Fetch Samples — Configurable number of fetches to detect structural differences
- Analysis History — Recent analyses are stored locally for comparison
- Export — Results can be exported as JSON or CSV
- Help Documentation — Built-in help page with detailed explanations
- React 18 (UI framework)
- TypeScript (static typing)
- Vite (build and dev tooling)
- Tailwind CSS v4 (utility-first styling)
- React Router (client-side routing)
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run build
# Preview production build
npm run previewnpm installCopy the example environment file and set your API key:
cp .env .env.localEdit .env.local:
VITE_PROXY_URL=/proxy
VITE_PROXY_API_KEY=your-secret-api-keyThe proxy worker is located in worker/proxy.js. It requires:
- API Key authentication
- CORS origin validation
- Rate limiting (10 requests/minute per IP)
- SSRF protection
cd worker
# Install wrangler if needed
npm install -g wrangler
# Login to Cloudflare
wrangler login
# Set the API key secret (must match VITE_PROXY_API_KEY)
wrangler secret put PROXY_API_KEY
# Deploy
wrangler deployAfter deployment, update the proxy URL in src/fetch.ts if using a different worker URL.
Edit worker/proxy.js to add your production domain to ALLOWED_ORIGINS:
const ALLOWED_ORIGINS = [
'https://your-domain.com',
'http://localhost:5173',
'http://localhost:4173',
];DocSignals requires a proxy to fetch external URLs (to avoid CORS restrictions).
The Vite dev server includes a built-in proxy at /proxy with:
- Rate limiting (10 requests/minute)
- SSRF protection (blocks private/local IPs)
No additional setup required for local development.
For production, deploy the Cloudflare Worker in worker/proxy.js.
The worker includes:
- API Key Authentication — Requests must include
X-Proxy-Keyheader - CORS Validation — Only configured origins are allowed
- Rate Limiting — 10 requests per minute per IP
- SSRF Protection — Blocks private IPv4/IPv6 addresses
See the Setup section above for deployment instructions
Measured Values are raw observations derived directly from document analysis. They are factual, reproducible, and interpretation-free.
Examples include:
Structure
- Fetches performed
- Structural differences across fetches
- DOM node count
- Maximum DOM depth
- Top-level sections
- Shadow DOM hosts
Semantics
- Heading structure (
h1count, level gaps) - Top-level landmark usage
- Generic container ratio (
div/span) - Non-descriptive links
- List structures
- Table header markup
- Language declaration
- Machine-readable time elements
These values describe what exists, not whether it is good or bad.
Interpretation translates measured values into contextual implications for machine readers.
Interpretations are:
- cautious
- modal (using may, can, require)
- non-judgmental
Example:
Measured: 29 levels of DOM nesting
Interpretation: Machines may need to traverse multiple layers to infer context.
Interpretation does not prescribe fixes or optimizations. It explains what additional inference or traversal may be required.
DocSignals treats Shadow DOM as a first-class structural reality.
If a document contains Shadow DOM:
- it is measured
- it is reported
- it is interpreted once, without special weighting
Shadow DOM is not flagged as a problem. It is acknowledged as part of modern document structures that machines must explicitly traverse.
DocSignals is built around a few strict principles:
- Measurement before interpretation
- No scoring systems
- No hidden heuristics
- No “AI-ready” claims
- No SEO framing
If a signal cannot be measured reliably, it is excluded.
Results can be exported as:
- JSON
- CSV
This allows further processing, comparison, or integration into other analysis workflows.
DocSignals is built for:
- developers interested in document structure
- engineers working with crawlers, parsers, or agents
- people thinking about how machines interpret HTML beyond visual rendering
It assumes familiarity with HTML and DOM concepts.
DocSignals analyzes static document structure as fetched. It does not:
- execute JavaScript beyond initial rendering
- simulate user interaction
- infer intent
- evaluate correctness of content
Its purpose is to make structural reality visible, not to judge it.
src/
├── analysis/ # Document analysis logic
│ ├── index.ts # Main analysis entry point
│ ├── compare.ts # Comparison between fetches
│ ├── semantics.ts # Semantic signal detection
│ └── types.ts # TypeScript types
├── components/ # React components
│ ├── Dashboard.tsx
│ ├── HeaderBar.tsx
│ ├── HomePage.tsx
│ ├── HelpPage.tsx
│ └── ...
├── utils/ # Utility functions
├── App.tsx # Main app with routing
└── main.tsx # Entry point
DocSignals is currently in v0.1.0. The scope is intentionally limited and stable.
Future versions may expand measurement coverage, but the core philosophy will remain unchanged.
Built by r0b0tan/cbauerdev
DocSignals evaluates document structure, not content quality or ranking.