Skip to content

fix: YouTubeConverter.accepts() silently drops youtu.be short URLs #1730

@kuishou68

Description

@kuishou68

Bug Description

The YouTubeConverter.accepts() method in _youtube_converter.py only matches URLs that begin with https://www.youtube.com/watch? and silently rejects all youtu.be short URLs (e.g. https://youtu.be/dQw4w9WgXcQ). This means short YouTube links are never converted — the converter returns False and falls through to a generic HTML converter, producing noisy output instead of a clean transcript.

Reproduction

from markitdown import MarkItDown
md = MarkItDown()
# This works:
result = md.convert("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
# This does NOT go through YouTubeConverter:
result = md.convert("https://youtu.be/dQw4w9WgXcQ")

Root Cause

In packages/markitdown/src/markitdown/converters/_youtube_converter.py, the accepts() method contains:

if not url.startswith("https://www.youtube.com/watch?"):
    # Not a YouTube URL
    return False

This check must also accept https://youtu.be/<id> and https://www.youtube.com/shorts/<id> URLs.

Expected Fix

The accepts() check should be extended to recognise short URLs and extract the video ID from the youtu.be/<id> path segment, and the convert() method should handle extracting the video ID from both URL forms.

Related

Issue #1704 mentions that short URLs (youtu.be/…) are silently skipped.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions