papers

Platform for collection, persistence, and statistical analysis of how LLMs cite Brazilian companies across 4 economic sectors.

Longitudinal study (target: 90+ days, ~25,920 observations) focused on citation patterns, visibility, and source attribution by generative search engines (Generative Engine Optimization — GEO).

v2.0.0-reboot (2026-04-23)

Following Paper 4 ("Three Ways to Fail to Conclude", doi.org/10.5281/zenodo.19712217), which documented a triple methodological failure of v1 (H1 RAG underpower, H2 fictitious-probe design-null, H3 asymmetric instrumentation), the codebase was rebooted across 5 waves. The v2 infrastructure closes each failure mode with hardened algorithms, a balanced 128-entity cohort, a 192-query balanced battery, and a pre-registered decision rule. 78/78 tests passing.

Canonical pillars:

NER v2 — NFC+NFKD dual-pass, word-boundary matching, alias and stop-context tables. Dry-run on 2,000 rows: -45% false positives. src/analysis/entity_extraction.py (24 tests)
Cluster-robust CR1 — sandwich estimator with cross-group covariance. src/analysis/cluster_robust.py (6 tests)
Null simulation (Monte Carlo) — empirical null distribution replaces the arbitrary Jaccard 0.30 threshold. src/analysis/null_simulation.py (8 tests)
Power analysis — Rule-of-3 inverse, Cohen's h, design effect, reboot_roadmap(). src/analysis/power_analysis.py (10 tests)
Mixed-effects GLMM — BinomialBayesMixedGLM with random intercepts per entity and per query. src/analysis/mixed_effects.py
Cohort v2 — 80 Brazilian entities + 32 international anchors + 16 fictional decoys (128 total). src/config_v2.py (16 tests)
Query battery v2 — 192 balanced queries across verticals, framings, and directives.
Hypothesis engine — BH-FDR correction with a pre-registered decision rule. src/analysis/hypothesis_engine.py (14 tests)
Forward-only migrations — 0005 NER v2 columns, 0006 SHA-256 response hashes, 0007 fictitious-probe flag.
Reproducibility — Dockerfile, requirements-lock.txt, scripts/reproduce.sh.
Pipeline — collect validate-run --since-minutes N standalone, fail-loud per mandatory LLM, routed_out vs api_failure distinction, updated daily-collect.yml.
Canonical docs — docs/METHODOLOGY_V2.md (source of truth) and CHANGELOG.md.

Study Design

Dimension	Value
Verticals	4 (Fintech, Retail, Healthcare, Technology)
Entities	69 (61 real + 8 fictional for calibration)
LLM Models	4 (GPT-4o-mini, Claude Haiku 4.5, Gemini 2.5 Flash, Perplexity Sonar)
Queries per vertical	12 specific + 6 cross-vertical = 18
Daily observations	~288 (18 queries x 4 models x 4 verticals)
Observations collected	653 citations, 172 contexts, 11 runs
Code	7,010 lines Python, 35 files, 91 commits
Schema	21 tables (citations, contexts, finops, interventions, snapshots, model_versions)
Collection	Automated daily (GitHub Actions, 06:00 UTC)
Persistence	SQLite WAL (canonical ledger) + Supabase (read projection)
Publication target	3 papers (ArXiv, SIGIR/WWW, Information Sciences Q1)

Verticals and Cohorts

Fintech (21 entities)

Real (14): Nubank, PagBank, Cielo, Stone, Banco Inter, Mercado Pago, Itau, Bradesco, C6 Bank, PicPay, Neon, Safra, BTG Pactual, XP Investimentos Cross-market (5): Revolut, Monzo, N26, Chime, Wise Fictional (2): Banco Floresta Digital, FinPay Solutions

Retail (16 entities)

Real (14): Magazine Luiza, Casas Bahia, Americanas, Amazon Brasil, Mercado Livre, Shopee Brasil, Renner, Riachuelo, C&A Brasil, Leroy Merlin, Centauro, Netshoes, Via Varejo, Grupo Pao de Acucar Fictional (2): MegaStore Brasil, ShopNova Digital

Healthcare (16 entities)

Real (14): Dasa, Hapvida, Unimed, Fleury, Rede D'Or, Einstein, Sirio-Libanes, Raia Drogasil, Eurofarma, Ache, EMS, Hypera Pharma, NotreDame Intermedica, SulAmerica Saude Fictional (2): HealthTech Brasil, Clinica Horizonte Digital

Technology (16 entities)

Real (14): Totvs, Stefanini, Tivit, CI&T, Locaweb, Linx, Movile, iFood, Vtex, RD Station, Conta Azul, Involves, Accenture Brasil, IBM Brasil Fictional (2): TechNova Solutions, DataBridge Brasil

Collection Status (March 2026)

Criterion	Target	Current
Total observations	>= 25,920 (288/day x 90 days)	397 (1.5%)
N per LLM	>= 1,000	30-136
Collection days	>= 90 continuous	2
Pre-registered hypotheses	>= 3	0
A/B experiments	>= 2	0
Fictional entity validation	8 (false positive rate)	0 queries

Known Limitations

Effective N < Gross N: 54% of observations are cache hits (identical responses reused). N_eff ~181
Sample imbalance: Gemini Flash has N=3 in 3 of 4 verticals (API failures in early rounds)
Directive queries: Categories like "fintech_trust" produce 100% citation rate by design — do not represent spontaneous citation
Non-stationarity: LLMs update models without notice. model_versions table exists but is not being populated
Non-independent observations: Similar queries to the same model in the same session share internal state

Statistical Methodology

Source of truth: docs/METHODOLOGY_V2.md (v2.0.0-reboot, 2026-04-23). The legacy docs/METHODOLOGY.md is preserved as historical reference for v1 and Paper 4's failure analysis.

Test Framework

Test	Use	Implementation
Chi-squared	Association between query category and citation	`scipy.stats.chi2_contingency` + Cramer's V
Kruskal-Wallis	Comparison of rates across 4+ LLM models (non-parametric)	`scipy.stats.kruskal` + eta-squared
ANOVA one-way	Group comparison (when Levene p > 0.05)	`scipy.stats.f_oneway` + eta-squared
Mann-Whitney U	Citation position (ordinal, non-normal)	`scipy.stats.mannwhitneyu` + rank-biserial r
T-test (ind/paired)	Mean comparison pre/post intervention	`scipy.stats.ttest_ind/rel` + Cohen's d
Logistic regression	Citation predictors (schema, word count, etc.)	`statsmodels.Logit` + pseudo R-squared, AIC, BIC, odds ratios
Correlation	Spearman (default) / Pearson	`scipy.stats.spearmanr/pearsonr`

Multiple Testing Correction

Method	Application
Bonferroni	Family-wise comparisons (across verticals)
Benjamini-Hochberg FDR	Per-entity tests (controls false discovery rate)

Effect Sizes

Metric	Associated Test	Classification
Cohen's d	t-test	0.2 small, 0.5 medium, 0.8 large
Cramer's V	chi-squared	sqrt(chi2 / (n * (min_dim-1)))
Eta-squared	ANOVA/KW	SS_between / SS_total
Rank-biserial r	Mann-Whitney	1 - (2U)/(n1*n2)
Pseudo R-squared	Logistic	McFadden

Context Analysis (Module 7)

Each detected citation undergoes analysis of:

Field	Method
Sentiment	Regex against 16 positive + 12 negative signals (PT-BR + EN), 200-char window
Attribution	Hierarchy: linked (URL present) > named (entity in text) > paraphrased
Factual accuracy	Verification against canonical facts (founding year, CEO, HQ) for 5 key entities
Hedging	16 regular patterns ("according to", "reportedly", "possivelmente")
Position	Tertile: 1 (first third), 2 (middle), 3 (last third) of response

#	Title	Venue	Status	Main Methodology
1	How LLMs Cite Entities Across Industry Verticals	ArXiv	planned	Multi-vertical tracking, ANOVA/KW across models, time series
2	GEO vs SEO: Source Divergence	SIGIR/WWW	planned	Weekly Jaccard index (top-10 Google vs LLM sources), 12+ weeks
3	Industry-Specific Patterns in AI Citation	Information Sciences (Q1)	planned	Fisher exact test, odds ratios, 95% CI, 2 A/B experiments
4	Three Ways to Fail to Conclude: A Null-Triad Post-Mortem	SSRN / arXiv / SIGIR 2027	submitted (10.5281/zenodo.19712217)	Null-triad decomposition: H1 underpower, H2 design-null, H3 instrumentation asymmetry
5	(in preparation)	Elsevier (target)	in preparation — v2 infrastructure operational, 90-day collection window pending OSF preregistration v2	Balanced 128-entity cohort, 192-query battery, cluster-robust CR1, Monte Carlo null, GLMM, BH-FDR

Architecture

src/
  config.py               # Central configuration, cohorts per vertical, LLM configs
  cli.py                  # Main CLI (click)
  collectors/
    base.py               # Multi-provider LLM client + cache + FinOps tracking
    citation_tracker.py   # Module 1: Citation Tracker (4 LLMs x 4 verticals)
    competitor.py         # Module 2: Multi-Vertical Benchmark
    serp_overlap.py       # Module 3: SERP vs AI Overlap
    intervention.py       # Module 4: A/B Testing
    context_analyzer.py   # Module 7: Sentiment, attribution, accuracy
  db/
    schema.sql            # Complete schema (21 tables)
    client.py             # SQLite/Supabase persistence
  persistence/
    timeseries.py         # Module 5: Daily snapshots
  analysis/
    statistical.py        # Module 6: 7 tests + corrections + effect sizes
    visualization.py      # Charts with 95% CI (matplotlib/seaborn)
  finops/
    tracker.py            # Cost per token, 4 providers, budget control
    monitor.py            # Dashboard, alerts, security audit
    secrets.py            # Key rotation, leak scanning, health checks
  api/
    main.py               # FastAPI (endpoints per vertical)
.github/workflows/
  daily-collect.yml       # Daily collection 06:00 UTC
  weekly-benchmark.yml    # Weekly benchmark (Sunday 08:00 UTC)

FinOps

Provider	Model	Cost/MTok (in/out)	Monthly Budget
OpenAI	gpt-4o-mini-2024-07-18	$0.15 / $0.60	$10
Anthropic	claude-haiku-4-5	$0.80 / $4.00	$10
Google	gemini-2.5-flash	$0.15 / $0.60	$5
Perplexity	sonar	$1.00 / $1.00	$10
Global			$70 (hard stop 95%)

Setup

pip install -e ".[dev]"
cp .env.example .env  # Configure API keys
python -m src.cli db migrate
python -m src.cli collect all

Commands

# Collection
python -m src.cli collect all                         # All verticals
python -m src.cli collect all --vertical fintech      # Fintech only
python -m src.cli collect citation                    # Citation Tracker only

# Analysis
python -m src.cli analyze --report                    # Full report
python -m src.cli analyze --report --vertical saude   # Healthcare only
python -m src.cli analyze --visualize                 # Charts per vertical

# Database
python -m src.cli db migrate                          # Apply schema
python -m src.cli db export --format csv              # Export data
python -m src.cli db health                           # Health per vertical

# Migrations (one-time, idempotent)
python -m src.db.migrate_normalize_models              # Normalize GPT model strings
python -m src.db.migrate_cited_entity                  # Backfill cited_entity
python -m src.db.migrate_0003_eficacia_consistencia    # query_type, composite indexes, backfills

# Consolidated export (replaces data/extract_*.py — Onda 4 refactor 2026-04-19)
python scripts/export_data.py --format text
python scripts/export_data.py --format json --output data/dashboard.json
python scripts/export_data.py --format csv --vertical fintech
python scripts/export_data.py --format html

Refactor 2026-04-19 — Quality guards

query_type (directive vs exploratory) isolates framing bias in Paper 1 ANOVA
Fictional entities (FICTIONAL_ENTITIES in src/config.py) calibrate false-positive rate — 8 entities, activatable via env INCLUDE_FICTIONAL_ENTITIES=true
Mandatory LLMs (env MANDATORY_LLMS) enforce balanced cohort — pipeline fails loud if any mandatory provider drops
5 composite indexes on citations(vertical, cited), (vertical, llm), (timestamp, vertical), (llm, model_version), (query_type) prevent table scans at N > 10K
Backfills applied on 940 legacy rows (model_version NULL → model)

See docs/ARCHITECTURE.md for the full flow and docs/audits/2026-03-26/ for historical audit context.

Reproducibility

The v2.0.0-reboot ships a fully pinned, container-reproducible environment:

# One-shot container build + test suite + sample analysis
./scripts/reproduce.sh

# Or, manual Docker workflow
docker build -t papers-v2 .
docker run --rm -v "$PWD":/workspace papers-v2 pytest -q

Artifact	Purpose
`Dockerfile`	Python 3.11 image with system deps and pinned requirements
`requirements-lock.txt`	Fully pinned dependency set (hashes) used by CI and Docker
`scripts/reproduce.sh`	End-to-end: build, migrate, run 78-test suite, emit analysis sample
`CHANGELOG.md`	Version history from v1 through v2.0.0-reboot

Documentation

Document	Description
docs/METHODOLOGY_V2.md	Current statistical methodology (v2.0.0-reboot) — source of truth
CHANGELOG.md	Top-level version history (v1 through v2.0.0-reboot)
docs/ARCHITECTURE.md	Full pipeline flow, schema layers (core vs future), operational commands
docs/METHODOLOGY.md	Historical methodology (v1, pre-reboot) — kept for Paper 4 context
docs/REQUIREMENTS.md	Formal specification (functional/non-functional)
docs/GOVERNANCE.md	Spending policies, ADRs, roadmap
docs/MANUAL.md	Operational manual
docs/CHANGELOG.md	Legacy per-docs change log
docs/audits/2026-03-26/	Archived statistical audit (N=397 snapshot)
output/critica_estatistica_panel.md	Critical review by panel of 7 specialists

License

MIT

Author: Alexandre Caramaschi — CEO of Brasil GEO, former CMO at Semantix (Nasdaq), co-founder of AI Brasil.

Ecosystem

Property	Stack	Status
alexandrecaramaschi.com	Next.js 16 + React 19 + Supabase	Production — 35 courses, 25 insights, 122K+ lines
brasilgeo.ai	Cloudflare Workers	Production — 14 articles
geo-orchestrator	Python + 5 LLMs	Active — multi-LLM pipeline
curso-factory	Python + Jinja2	Active — course generation pipeline
geo-checklist	Markdown	Open-source — GEO audit checklist
llms-txt-templates	Markdown + JSON	Open-source — llms.txt standard
geo-taxonomy	JSON + CSV + Markdown	Open-source — 60+ GEO terms
entity-consistency-playbook	Markdown	Open-source — entity consistency
papers	Python + Supabase	Research — LLM citation study

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

papers — Empirical Multi-Vertical Research on LLM Citations of Brazilian Companies

v2.0.0-reboot (2026-04-23)

Study Design

Verticals and Cohorts

Fintech (21 entities)

Retail (16 entities)

Healthcare (16 entities)

Technology (16 entities)

Collection Status (March 2026)

Known Limitations

Statistical Methodology

Test Framework

Multiple Testing Correction

Effect Sizes

Context Analysis (Module 7)

Papers

Architecture

FinOps

Setup

Commands

Refactor 2026-04-19 — Quality guards

Reproducibility

Documentation

License

Ecosystem

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
.githooks		.githooks
.github/workflows		.github/workflows
.tools		.tools
analysis		analysis
build		build
data		data
docs		docs
drafts		drafts
governance		governance
logs		logs
output		output
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-lock.txt		requirements-lock.txt

Folders and files

Latest commit

History

Repository files navigation

papers — Empirical Multi-Vertical Research on LLM Citations of Brazilian Companies

v2.0.0-reboot (2026-04-23)

Study Design

Verticals and Cohorts

Fintech (21 entities)

Retail (16 entities)

Healthcare (16 entities)

Technology (16 entities)

Collection Status (March 2026)

Known Limitations

Statistical Methodology

Test Framework

Multiple Testing Correction

Effect Sizes

Context Analysis (Module 7)

Papers

Architecture

FinOps

Setup

Commands

Refactor 2026-04-19 — Quality guards

Reproducibility

Documentation

License

Ecosystem

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages