Llamaindex's SemanticSplitterNodeParser can sometimes produce chunks that are too large for the embedding model. Unfortunately there is no max length option for the semantic chunking to avoid this issue.
Will have to eventually subclass the SemanticSplitterNodeParser and create a two level safety net that will naively split large chunks into sub-chunks in order to stay under the embedding model input token limits.
Reference:
run-llama/llama_index#12270
Llamaindex's
SemanticSplitterNodeParsercan sometimes produce chunks that are too large for the embedding model. Unfortunately there is no max length option for the semantic chunking to avoid this issue.Will have to eventually subclass the
SemanticSplitterNodeParserand create a two level safety net that will naively split large chunks into sub-chunks in order to stay under the embedding model input token limits.Reference:
run-llama/llama_index#12270