Bug Description
The YouTubeConverter.accepts() method in _youtube_converter.py only matches URLs that begin with https://www.youtube.com/watch? and silently rejects all youtu.be short URLs (e.g. https://youtu.be/dQw4w9WgXcQ). This means short YouTube links are never converted — the converter returns False and falls through to a generic HTML converter, producing noisy output instead of a clean transcript.
Reproduction
from markitdown import MarkItDown
md = MarkItDown()
# This works:
result = md.convert("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
# This does NOT go through YouTubeConverter:
result = md.convert("https://youtu.be/dQw4w9WgXcQ")
Root Cause
In packages/markitdown/src/markitdown/converters/_youtube_converter.py, the accepts() method contains:
if not url.startswith("https://www.youtube.com/watch?"):
# Not a YouTube URL
return False
This check must also accept https://youtu.be/<id> and https://www.youtube.com/shorts/<id> URLs.
Expected Fix
The accepts() check should be extended to recognise short URLs and extract the video ID from the youtu.be/<id> path segment, and the convert() method should handle extracting the video ID from both URL forms.
Related
Issue #1704 mentions that short URLs (youtu.be/…) are silently skipped.
Bug Description
The
YouTubeConverter.accepts()method in_youtube_converter.pyonly matches URLs that begin withhttps://www.youtube.com/watch?and silently rejects allyoutu.beshort URLs (e.g.https://youtu.be/dQw4w9WgXcQ). This means short YouTube links are never converted — the converter returnsFalseand falls through to a generic HTML converter, producing noisy output instead of a clean transcript.Reproduction
Root Cause
In
packages/markitdown/src/markitdown/converters/_youtube_converter.py, theaccepts()method contains:This check must also accept
https://youtu.be/<id>andhttps://www.youtube.com/shorts/<id>URLs.Expected Fix
The
accepts()check should be extended to recognise short URLs and extract the video ID from theyoutu.be/<id>path segment, and theconvert()method should handle extracting the video ID from both URL forms.Related
Issue #1704 mentions that short URLs (
youtu.be/…) are silently skipped.