Skip to content

Working With Captions and CaptionArray

The Caption class represents a single caption with its metadata, such as language, and format. The CaptionArray class provides a collection of Caption objects, offering methods to manage and interact with multiple captions easily.

With these tools, you can:

  • Extract captions in various languages and formats (e.g., SRT, TXT).

  • Download and save captions for offline use.

Example

caption_demo.py
1
2
3
4
5
from youtube_dl_scraper import YouTube
youtube = YouTube()
captions = youtube.scrape_captions("https://youtu.be/sF9xYtouZjY?si=z6ZWk4raQeHgQDz")
print(captions)
print(captions.subtitles)
The scrape_caption method returns a CaptionArray object.
output
<youtube_dl_scraper.core.caption_array.CaptionArray object at 0xe60527e0>
({'name': 'English (auto-generated)', 'code': 'a.en'},
{'name': 'English (United States)', 'code': 'en-US'},
{'name': 'Portuguese', 'code': 'pt'},
{'name': 'Spanish', 'code': 'es'})

These features make it seamless to access and utilize YouTube captions efficiently.

Tip

You can also fetch the captions from the video object.

from youtube_dl_scraper import YouTube

youtube = YouTube()
video = youtube.scrape_video("https://youtu.be/sF9xYtouZjY?si=z6ZWk4raQeHgQDz")
captions = video.captions

Fetching a Specific Caption

The CaptionArray object has methods that makes it easy to fetch a caption of a specfic language or langcode1

Difference between normal and translated captions

There are two types of captions Normal and Translated. Which can be determined using the translated attribit from the Caption object. a normal caption is on written by the user or is auto generated by youtube while a translated caption is one translated by AI.

normal captions
>>> print(captions.subtitles)
({'name': 'English (auto-generated)', 'code': 'a.en'},
{'name': 'English (United States)', 'code': 'en-US'},
{'name': 'Portuguese', 'code': 'pt'},
{'name': 'Spanish', 'code': 'es'})


translated captions
>>> print(captions.translations)
({'name': 'Abkhaz', 'code': 'ab'},
{'name': 'Afar', 'code': 'aa'},
{'name': 'Afrikaans', 'code': 'af'},
{'name': 'Akan', 'code': 'ak'},
{'name': 'Albanian', 'code': 'sq'},
{'name': 'Amharic', 'code': 'am'},
{'name': 'Arabic', 'code': 'ar'},
...
{'name': 'Zulu', 'code': 'zu'})

Fetching Caption by Language Name

To get a caption by langname we use the get_captions_by_name method.

Note

The get_captions_by_name method only returns normal Captions. Read more from the api refrence of youtube_dl_scraper.core.caption_array.CaptionArray.

Example

print(captions.get_captions_by_name('Spanish'))
Output:
[<caption.Caption object lang_code: es translated: False>]

You can also fetch a translated caption by langname using the get_translated_captions_by_name() method.

Note

The get_translated_captions_by_name() method only return translated Captions. Read more from the api refrence of youtube_dl_scraper.core.caption_array.CaptionArray.

Example

print(captions.get_translated_captions_by_name('french'))
Output:
[<caption.Caption object lang_code: fr translated: True>,
<caption.Caption object lang_code: crs translated: True>]

Fetching Caption by Language Code

To get a caption by langname, we use the get_captions_by_lang_code method.

Example

print(captions.get_captions_by_lang_code('es'))
Output:
<caption.Caption object lang_code: es translated: False>

You can also fetch a translated caption by langname using the get_translated_captions_by_lang_code() method.

Example

print(captions.get_translated_captions_by_lang_code('fr'))
Output:
<caption.Caption object lang_code: fr translated: True>

Dowloading Captions

Now, that you've gotten the specfic Caption you want, now you can read, save it in different formats.

The Caption object has diffrent methods and attributes to allow this.

Example

caption = captions.get_captions_by_lang_code('es') # fetching spanish caption
print(caption.raw) # fetching caption text
print(caption.txt()) # fetching caption in text format
print(caption.srt()) # fetching caption in srt format
Output:
1
00:00:00,060 --> 00:00:04,219
Hace seis años, Tesla bajó una sorpresa de la parte trasera de un camión.

2
00:00:04,260 --> 00:00:13,380
Se trataba del Roadster 2.0, un auto superdeportivo eléctrico de dos puertas con especificaciones increíbles y el récord de aceleración mundial.

3
00:00:13,420 --> 00:00:17,590
Se suponía que ese sería el auto que acabaría con todos los otros autos a combustible.
...
137
00:12:19,240 --> 00:12:19,775
Adiós.
/storage/emulated/0/Youtube-dl-scraper/Youtube-dl-scraper/downloads/Driving The New Fastest Car Ever Made!.txt
/storage/emulated/0/Youtube-dl-scraper/Youtube-dl-scraper/downloads/Driving The New Fastest Car Ever Made!.srt


  1. Language codes are standardized short identifiers, like en or fr, used to represent languages and their regional or script variants for localization and data interchange.