Skip to content

Comments

Ingestion re-implement on updated Elastic.Ingest.Elasticsearch#2755

Draft
Mpdreamz wants to merge 9 commits intomainfrom
feature/ingest-rearch
Draft

Ingestion re-implement on updated Elastic.Ingest.Elasticsearch#2755
Mpdreamz wants to merge 9 commits intomainfrom
feature/ingest-rearch

Conversation

@Mpdreamz
Copy link
Member

@Mpdreamz Mpdreamz commented Feb 22, 2026

Summary

Rearchitect Elasticsearch ingest to use Elastic.Ingest.Elasticsearch with source-generated mappings, centralize endpoint configuration, and add Jina v5 dense embeddings for hybrid search.

Draft because we still need: elastic/elastic-ingest-dotnet#143

Ingest migration

  • Migrate from hand-rolled ElasticsearchIngestChannel to Elastic.Ingest.Elasticsearch 0.24.0 with compile-time source-generated mappings via Elastic.Mapping
  • Replace ElasticsearchIngestChannel and ElasticsearchIngestChannel.Mapping with declarative DocumentationMappingConfig using [Entity<T>] attributes and ConfigureMappings callbacks
  • Remove ElasticsearchOperations and legacy manual index/alias management

Elasticsearch configuration

  • Centralize Elasticsearch connection setup into ElasticsearchEndpointFactory in ServiceDefaults, replacing scattered ElasticsearchOptions and per-service configuration
  • Replace hardcoded IndexName with namespace-based index resolution (WithIndexName / WithNamespace)
  • Add skipOpenApi parameter to ElasticsearchEndpointFactory for services that don't need the OpenAPI schema
  • Remove --no-semantic CLI flag entirely — semantic indexing is now always enabled

Semantic search: Jina v5 dense embeddings

  • Add .jina-embeddings-v5-text-small dense embeddings alongside existing .elser-2-elastic sparse embeddings on semantic index
  • 6 new Jina fields: title.jina, abstract.jina, ai_rag_optimized_summary.jina, ai_questions.jina, ai_use_cases.jina, stripped_body.jina
  • Enables hybrid sparse+dense retrieval; stripped_body now has semantic coverage (ELSER doesn't index it)

Other

Test plan

  • dotnet build from repo root
  • Integration tests pass (search bootstrap, search relevance, MCP remote)
  • Verify semantic index bootstrap creates component templates with both ELSER and Jina fields
  • Verify lexical index bootstrap includes all AddField overrides (rank features, multi-fields, custom analyzers)

…mappings

Replace manual channel orchestration with IncrementalSyncOrchestrator<T> and
source-generated ElasticsearchTypeContext from Elastic.Mapping 0.4.0. Add field
type attributes ([Keyword], [Text], [Object], etc.) directly on DocumentationDocument
to drive the mapping source generator, replacing verbose manual JSON mappings.

- Update Elastic.Ingest.Elasticsearch 0.17.1 → 0.19.0, add Elastic.Mapping 0.4.0
- Add mapping attributes to DocumentationDocument and IndexedProduct
- Create DocumentationMappingConfig.cs with two Entity variants (lexical/semantic)
- Rewrite ElasticsearchMarkdownExporter to use orchestrator for dual-index mode
- Delete ElasticsearchIngestChannel.cs and ElasticsearchIngestChannel.Mapping.cs
- Remove unused ReindexAsync from ElasticsearchOperations
- Update SearchBootstrapFixture to use IngestChannel with semantic type context
Replaces `ElasticsearchOptions` with `DocumentationEndpoints` as the single source of truth for
Elasticsearch configuration across all API apps, MCP server, and integration tests.

- Adds `IndexName` property to `ElasticsearchEndpoint` with a field-backed getter defaulting to
  `{IndexNamePrefix}-dev-latest`.
- Creates `ElasticsearchEndpointFactory` in `ServiceDefaults` to centralize user-secrets and
  environment variable reading, eliminating the duplicated `72f50f33` secrets ID pattern.
- Registers `DocumentationEndpoints` as a singleton in `AddDocumentationServiceDefaults`.
- Updates `ElasticsearchClientAccessor` to accept `DocumentationEndpoints` instead of
  `ElasticsearchOptions`, supporting both API key and basic authentication.
- Updates all gateway consumers (`NavigationSearchGateway`, `FullSearchGateway`,
  `DocumentGateway`, `ElasticsearchAskAiMessageFeedbackGateway`) to use endpoint properties.
- Simplifies all three integration test files (`SearchRelevanceTests`,
  `McpToolsIntegrationTestsBase`, `SearchBootstrapFixture`) to use `ElasticsearchEndpointFactory`
  and `ElasticsearchTransportFactory`, removing manual config construction.
- Deletes `ElasticsearchOptions.cs` and removes `Microsoft.Extensions.Configuration.UserSecrets`
  from the Search project.
Move mapping context (DocumentationMappingContext, LexicalConfig, SemanticConfig,
DocumentationAnalysisFactory) from Elastic.Markdown to Elastic.Documentation so
both indexing and search derive index names from the same source. Add ContentHash
helper to avoid Elastic.Ingest.Elasticsearch dependency in Elastic.Documentation.

Remove IndexName from ElasticsearchEndpoint, add Namespace to DocumentationEndpoints.
ElasticsearchEndpointFactory resolves namespace from DOCUMENTATION_ELASTIC_INDEX env
var (backward compat), DOTNET_ENVIRONMENT, ENVIRONMENT, or falls back to "dev".

ElasticsearchClientAccessor derives SearchIndex and RulesetName from namespace
instead of parsing the old IndexName string. Remove ExtractRulesetName and all
hardcoded "semantic-docs-dev-latest" assignments from tests and config files.
Enable IndexPatternUseBatchDate now that Elastic.Mapping supports it,
and pass batchTimestamp to IngestChannelOptions in the lexical-only path
so the channel uses the exporter's timestamp for index name computation.
…meter

Simplify DocumentationTooling endpoint resolution by delegating to
ElasticsearchEndpointFactory. Add missing skipOpenApi parameter to
IsolatedIndexService.Index call.
The lexical-only code path manually reimplemented drain, delete-stale,
refresh, and alias logic that the orchestrator handles automatically.
Remove the flag end-to-end: CLI parameters, configuration, exporter
branching, and CLI documentation.
@Mpdreamz Mpdreamz self-assigned this Feb 22, 2026
@Mpdreamz Mpdreamz requested review from a team and reakaleek February 22, 2026 17:41
@Mpdreamz Mpdreamz changed the title feature/ingest rearch Ingestion re-implement on updated Elastic.Ingest.Elasticsearch Feb 22, 2026
@github-actions
Copy link

github-actions bot commented Feb 22, 2026

🔍 Preview links for changed docs

Add .jina-embeddings-v5-text-small inference on 6 fields (title, abstract,
ai_rag_optimized_summary, ai_questions, ai_use_cases, stripped_body) to
enable hybrid sparse+dense retrieval. Rename InferenceId to ElserInferenceId
for clarity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant