I'm beginning to look at Microsoft.Extensions.DataIngestion pipelines. As a test, I considered using an IngestionPipeline to ingest content stored in a CMS SQL database and create a vector store for use with RAG. However, I'm unclear on how to implement it when the data to be ingested is stored in a database.
Currently, both overloads of the ProcessAsync method require file system objects.
|
public async IAsyncEnumerable<IngestionResult> ProcessAsync(DirectoryInfo directory, string searchPattern = "*.*", |
|
SearchOption searchOption = SearchOption.TopDirectoryOnly, [EnumeratorCancellation] CancellationToken cancellationToken = default) |
and
|
public async IAsyncEnumerable<IngestionResult> ProcessAsync(IEnumerable<FileInfo> files, [EnumeratorCancellation] CancellationToken cancellationToken = default) |
|
{ |
Perhaps I misunderstand its purpose or how it's meant to be used, but it would appear that it can only ingest data originating from files. Is that the case?
I'm beginning to look at Microsoft.Extensions.DataIngestion pipelines. As a test, I considered using an
IngestionPipelineto ingest content stored in a CMS SQL database and create a vector store for use with RAG. However, I'm unclear on how to implement it when the data to be ingested is stored in a database.Currently, both overloads of the
ProcessAsyncmethod require file system objects.extensions/src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Lines 80 to 81 in 15ffd76
and
extensions/src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Lines 107 to 108 in 15ffd76
Perhaps I misunderstand its purpose or how it's meant to be used, but it would appear that it can only ingest data originating from files. Is that the case?