Parameters for the course content extractor.
| Name | Type | Description | Notes |
|---|---|---|---|
| extractor_type | str | Discriminator field. Must be 'course_content_extractor'. | [optional] [default to 'course_content_extractor'] |
| target_segment_duration_ms | int | Target duration for video segments in milliseconds. | [optional] [default to 120000] |
| min_segment_duration_ms | int | Minimum duration for video segments in milliseconds. | [optional] [default to 30000] |
| segmentation_method | str | Video segmentation method: 'scene', 'srt', or 'time'. | [optional] [default to 'scene'] |
| scene_detection_threshold | float | Scene detection sensitivity (0.0-1.0). | [optional] [default to 0.3] |
| use_whisper_asr | bool | Use Whisper ASR for transcription instead of SRT subtitles. | [optional] [default to True] |
| expand_to_granular_docs | bool | Expand each segment into multiple granular documents. | [optional] [default to True] |
| ocr_frames_per_segment | int | Number of frames to OCR per video segment. | [optional] [default to 3] |
| pdf_extraction_mode | str | How to extract PDF content: 'per_page' or 'per_element'. | [optional] [default to 'per_element'] |
| pdf_render_dpi | int | DPI for rendering PDF pages/elements as images. | [optional] [default to 150] |
| detect_code_in_pdf | bool | Whether to detect code blocks in PDF text. | [optional] [default to True] |
| segment_functions | bool | Whether to segment code files into individual functions. | [optional] [default to True] |
| supported_languages | List[str] | Programming languages to extract from code archives. | [optional] |
| run_text_embedding | bool | Generate E5 text embeddings (1024D) for transcripts and text. | [optional] [default to True] |
| run_code_embedding | bool | Generate Jina Code embeddings (768D) for code snippets. | [optional] [default to True] |
| run_visual_embedding | bool | Generate SigLIP visual embeddings (768D) for video frames. | [optional] [default to True] |
| run_structure_embedding | bool | Generate DINOv2 visual structure embeddings (768D) for layout comparison. | [optional] [default to True] |
| visual_embedding_use_case | str | Content type preset for visual embedding strategy. | [optional] [default to 'lecture'] |
| extract_screen_text | bool | Run OCR on video frames to extract on-screen text. | [optional] [default to True] |
| generate_thumbnails | bool | Generate thumbnail images for each learning unit. | [optional] [default to True] |
| use_cdn | bool | Use CDN for thumbnail delivery. | [optional] [default to False] |
| run_vlm_frame_analysis | bool | Run VLM on video frame thumbnails to extract structured fields: frame_type, page_context, ui_labels, workflow_steps, config_options. Enables drift detection and UI comparison use cases. | [optional] [default to False] |
| vlm_provider | str | VLM provider: 'google' (Gemini API) or 'vllm' (local GPU with Qwen2.5-VL). | [optional] [default to 'google'] |
| vlm_model | str | VLM model. For google: 'gemini-2.5-flash'. For vllm: 'Qwen/Qwen2.5-VL-7B-Instruct'. | [optional] [default to 'gemini-2.5-flash'] |
| enrich_with_llm | bool | Use Gemini to generate summaries and enhance descriptions. | [optional] [default to False] |
| llm_prompt | str | Prompt for LLM enrichment when enrich_with_llm=True. | [optional] [default to 'Summarize this educational content segment, highlighting key concepts.'] |
from mixpeek.models.course_content_extractor_params import CourseContentExtractorParams
# TODO update the JSON string below
json = "{}"
# create an instance of CourseContentExtractorParams from a JSON string
course_content_extractor_params_instance = CourseContentExtractorParams.from_json(json)
# print the JSON string representation of the object
print(CourseContentExtractorParams.to_json())
# convert the object into a dict
course_content_extractor_params_dict = course_content_extractor_params_instance.to_dict()
# create an instance of CourseContentExtractorParams from a dict
course_content_extractor_params_from_dict = CourseContentExtractorParams.from_dict(course_content_extractor_params_dict)