Ingestion re-implement on updated Elastic.Ingest.Elasticsearch#2755
Draft
Ingestion re-implement on updated Elastic.Ingest.Elasticsearch#2755
Conversation
…mappings Replace manual channel orchestration with IncrementalSyncOrchestrator<T> and source-generated ElasticsearchTypeContext from Elastic.Mapping 0.4.0. Add field type attributes ([Keyword], [Text], [Object], etc.) directly on DocumentationDocument to drive the mapping source generator, replacing verbose manual JSON mappings. - Update Elastic.Ingest.Elasticsearch 0.17.1 → 0.19.0, add Elastic.Mapping 0.4.0 - Add mapping attributes to DocumentationDocument and IndexedProduct - Create DocumentationMappingConfig.cs with two Entity variants (lexical/semantic) - Rewrite ElasticsearchMarkdownExporter to use orchestrator for dual-index mode - Delete ElasticsearchIngestChannel.cs and ElasticsearchIngestChannel.Mapping.cs - Remove unused ReindexAsync from ElasticsearchOperations - Update SearchBootstrapFixture to use IngestChannel with semantic type context
Replaces `ElasticsearchOptions` with `DocumentationEndpoints` as the single source of truth for
Elasticsearch configuration across all API apps, MCP server, and integration tests.
- Adds `IndexName` property to `ElasticsearchEndpoint` with a field-backed getter defaulting to
`{IndexNamePrefix}-dev-latest`.
- Creates `ElasticsearchEndpointFactory` in `ServiceDefaults` to centralize user-secrets and
environment variable reading, eliminating the duplicated `72f50f33` secrets ID pattern.
- Registers `DocumentationEndpoints` as a singleton in `AddDocumentationServiceDefaults`.
- Updates `ElasticsearchClientAccessor` to accept `DocumentationEndpoints` instead of
`ElasticsearchOptions`, supporting both API key and basic authentication.
- Updates all gateway consumers (`NavigationSearchGateway`, `FullSearchGateway`,
`DocumentGateway`, `ElasticsearchAskAiMessageFeedbackGateway`) to use endpoint properties.
- Simplifies all three integration test files (`SearchRelevanceTests`,
`McpToolsIntegrationTestsBase`, `SearchBootstrapFixture`) to use `ElasticsearchEndpointFactory`
and `ElasticsearchTransportFactory`, removing manual config construction.
- Deletes `ElasticsearchOptions.cs` and removes `Microsoft.Extensions.Configuration.UserSecrets`
from the Search project.
Move mapping context (DocumentationMappingContext, LexicalConfig, SemanticConfig, DocumentationAnalysisFactory) from Elastic.Markdown to Elastic.Documentation so both indexing and search derive index names from the same source. Add ContentHash helper to avoid Elastic.Ingest.Elasticsearch dependency in Elastic.Documentation. Remove IndexName from ElasticsearchEndpoint, add Namespace to DocumentationEndpoints. ElasticsearchEndpointFactory resolves namespace from DOCUMENTATION_ELASTIC_INDEX env var (backward compat), DOTNET_ENVIRONMENT, ENVIRONMENT, or falls back to "dev". ElasticsearchClientAccessor derives SearchIndex and RulesetName from namespace instead of parsing the old IndexName string. Remove ExtractRulesetName and all hardcoded "semantic-docs-dev-latest" assignments from tests and config files.
Enable IndexPatternUseBatchDate now that Elastic.Mapping supports it, and pass batchTimestamp to IngestChannelOptions in the lexical-only path so the channel uses the exporter's timestamp for index name computation.
…meter Simplify DocumentationTooling endpoint resolution by delegating to ElasticsearchEndpointFactory. Add missing skipOpenApi parameter to IsolatedIndexService.Index call.
The lexical-only code path manually reimplemented drain, delete-stale, refresh, and alias logic that the orchestrator handles automatically. Remove the flag end-to-end: CLI parameters, configuration, exporter branching, and CLI documentation.
🔍 Preview links for changed docs |
Add .jina-embeddings-v5-text-small inference on 6 fields (title, abstract, ai_rag_optimized_summary, ai_questions, ai_use_cases, stripped_body) to enable hybrid sparse+dense retrieval. Rename InferenceId to ElserInferenceId for clarity.
dfe279a to
50c89b2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rearchitect Elasticsearch ingest to use
Elastic.Ingest.Elasticsearchwith source-generated mappings, centralize endpoint configuration, and add Jina v5 dense embeddings for hybrid search.Draft because we still need: elastic/elastic-ingest-dotnet#143
Ingest migration
ElasticsearchIngestChanneltoElastic.Ingest.Elasticsearch0.24.0 with compile-time source-generated mappings viaElastic.MappingElasticsearchIngestChannelandElasticsearchIngestChannel.Mappingwith declarativeDocumentationMappingConfigusing[Entity<T>]attributes andConfigureMappingscallbacksElasticsearchOperationsand legacy manual index/alias managementElasticsearch configuration
ElasticsearchEndpointFactoryinServiceDefaults, replacing scatteredElasticsearchOptionsand per-service configurationIndexNamewith namespace-based index resolution (WithIndexName/WithNamespace)skipOpenApiparameter toElasticsearchEndpointFactoryfor services that don't need the OpenAPI schema--no-semanticCLI flag entirely — semantic indexing is now always enabledSemantic search: Jina v5 dense embeddings
.jina-embeddings-v5-text-smalldense embeddings alongside existing.elser-2-elasticsparse embeddings on semantic indextitle.jina,abstract.jina,ai_rag_optimized_summary.jina,ai_questions.jina,ai_use_cases.jina,stripped_body.jinastripped_bodynow has semantic coverage (ELSER doesn't index it)Other
PublishBlockerExtensionsand tests for release notes publish blocking logicTest plan
dotnet buildfrom repo rootAddFieldoverrides (rank features, multi-fields, custom analyzers)