Pafy Documentation¶
This is the documentation for pafy - a Python library to download YouTube content and retrieve metadata
A quick start intro with usage examples is available in the README
Development / Source code / Bug reporting: github.com/np1/pafy
Homepage: np1.github.io/pafy
API Keys¶
Specifying an API key is optional, as pafy includes one. However, it is prefered that software calling pafy provides it’s own API key, and the default may be removed in the future.
Information from Google about obtaining an API key
-
pafy.
set_api_key
(key)¶ “Sets the API key for pafy to use.”
Parameters: key – API key to use
Pafy Objects and Stream Objects¶
Pafy objects relate to videos hosted on YouTube. They hold metadata such as title, viewcount, author and video ID
Stream objects relate to individual streams of a YouTube video. They hold stream-specific data such as resolution, bitrate and url. Each Pafy object contains multiple stream objects.
Pafy Objects¶
Create a Pafy object using the pafy.new()
function, giving a YouTube video URL as the argument.
-
pafy.
new
(video_url[, basic=True][, gdata=False][, signature=True][, size=False][, callback=None])¶ Creates a new Pafy object. All optional arguments (apart from callback) are used to specify which data items are fetched on initialisation.
Parameters: - url (str) – The YouTube url or 11 character video id of the video
- basic (bool) – fetch basic metadata and streams
- gdata (bool) – fetch gdata info (upload date, description, category, username, likes, dislikes)
- signature (bool) – Note: The signature argument now has no effect and will be removed in a future version
- size (bool) – fetch the size of each stream (slow)(decrypts urls if needed)
- callback (function) – a callback function to receive status strings
Return type: pafy.Pafy
If any of basic, gdata or size are False, those data items will be fetched only when first called for.
The defaults are recommended for most cases. If you wish to create many video objects at once, you may want to set all to False, eg:
vid = pafy.new(basic=False)
This will be quick because no http requests will be made on initialisation.
Setting size to True will override the basic argument and force basic data to be fetched too (basic data is required to obtain Stream objects)
Example:
import pafy
myvid = pafy.new("http://www.youtube.com/watch?v=dQw4w9WgXc")
Pafy Attributes¶
Once you have created a Pafy object using pafy.new()
, several data
attributes are available
The author of the video (str)
-
Pafy.
bigthumb
(*str*)¶ The url of the video’s display image (not always available)
-
Pafy.
bigthumbhd
¶ The url of the video’s larger display image (not always available) (str)
-
Pafy.
category
¶ The category of the video (str)
-
Pafy.
description
¶ The video description text (str)
-
Pafy.
dislikes
¶ The number of dislikes received for the video (int)
-
Pafy.
duration
¶ The duration of the stream (string formatted as HH:MM:SS)
-
Pafy.
keywords
¶ A list of the video’s keywords (not always available) ([str])
-
Pafy.
length
¶ The duration of the streams in seconds (int)
-
Pafy.
likes
¶ The number of likes received for the video (int)
-
Pafy.
published
¶ The upload date of the video (e.g., 2012-10-02 17:17:24) (str)
-
Pafy.
mix
¶ The mix playlist provided by youtube for this video (dict)
-
Pafy.
rating
¶ The rating of the video (0-5), (float)
-
Pafy.
thumb
¶ The url of the video’s thumbnail image (str)
-
Pafy.
title
¶ The title of the video (str)
-
Pafy.
username
¶ The username of the uploader (str)
-
Pafy.
videoid
¶ The 11-character video id (str)
-
Pafy.
viewcount
¶ The viewcount of the video (int)
An example of accessing this video metadata is shown below:
import pafy
v = pafy.new("dQw4w9WgXcQ")
print(v.title)
print(v.duration)
print(v.rating)
print(v.author)
print(v.length)
print(v.keywords)
print(v.thumb)
print(v.videoid)
print(v.viewcount)
Which will result in this output:
Rick Astley - Never Gonna Give You Up
00:03:33
4.75177729422
RickAstleyVEVO
213
['Rick', 'Astley', 'Sony', 'BMG', 'Music', 'UK', 'Pop']
https://i1.ytimg.com/vi/dQw4w9WgXcQ/default.jpg
dQw4w9WgXcQ
69788014
Pafy Methods¶
The Pafy.getbest()
, Pafy.getbestaudio()
and Pafy.getbestvideo()
methods are a quick way to access the highest quality streams for a particular video without needing to query the stream lists.
-
Pafy.
getbest
([preftype="any"][, ftypestrict=True])¶ Selects the stream with the highest resolution. This will return a “normal” stream (ie. one with video and audio)
Parameters: - preftype (str) – Preferred type, set to mp4, webm, flv, 3gp or any
- ftypestrict (boolean) – Set to False to return a type other than that specified in preftype if it has a higher resolution
Return type:
-
Pafy.
getbestaudio
([preftype="any"][, ftypestrict=True])¶ Selects the audio stream with the highest bitrate.
Parameters: - preftype (str) – Preferred type, set to ogg or m4a or any
- ftypestrict (boolean) – Set to False to return a type other than that specified in preftype if that has the highest bitrate
Return type:
-
Pafy.
getbestvideo
([preftype="any"][, ftypestrict=True])¶ Selects the video-only stream with the highest resolution. This will return a “video” stream (ie. one with no audio)
Parameters: - preftype (str) – Preferred type, set to m4v, webm or any
- ftypestrict (boolean) – Set to False to return a type other than that specified in preftype if it has a higher resolution
Return type:
Stream Lists¶
A Pafy object provides multiple stream lists. These are:
-
Pafy.
streams
¶ A list of regular streams (streams containing both audio and video)
-
Pafy.
audiostreams
¶ A list of audio-only streams; aac streams (.m4a) and ogg vorbis streams (.ogg) if available
-
Pafy.
videostreams
¶ A list of video-only streams (Note: these streams have no audio data)
-
Pafy.
oggstreams
¶ A list of ogg vorbis encoded audio streams (Note: may be empty for some videos)
-
Pafy.
m4astreams
¶ A list of aac encoded audio streams
-
Pafy.
allstreams
¶ A list of all available streams
An example of accessing stream lists:
>>> import pafy
>>> v = pafy.new("cyMHZVT91Dw")
>>> v.audiostreams
[audio:m4a@48k, audio:m4a@128k, audio:m4a@256k]
>>> v.streams
[normal:webm@640x360, normal:mp4@640x360, normal:flv@320x240, normal:3gp@320x240, normal:3gp@176x144]
>>> v.allstreams
[normal:webm@640x360, normal:mp4@640x360, normal:flv@320x240, normal:3gp@320x240, normal:3gp@176x144, video:m4v@854x480, video:m4v@640x360, video:m4v@426x240, video:m4v@256x144, audio:m4a@48k, audio:m4a@128k, audio:m4a@256k]
Stream Objects¶
-
class
pafy.
Stream
¶
After you have created a Pafy
object using new()
, you
can then access the streams using one of the Stream Lists, or by calling
Pafy.getbest()
or Pafy.getbestaudio()
on the object.
Stream Attributes¶
A Stream object can be used to access the following attributes
-
Stream.
url
¶ The direct access URL of the stream. This can be used to stream the media in mplayer or vlc, or for downloading with wget or curl. To download directly, use the
Stream.download()
method.
-
Stream.
url_https
¶ The direct access HTTPS URL of the stream.
-
Stream.
bitrate
¶ The bitrate of the stream - if it is an audio stream, otherwise None, This is a string of the form “192k”.
-
Stream.
dimensions
¶ A 2-tuple (x, y) representing the resolution of a video stream.
-
Stream.
extension
¶ The format of the stream, will be one of:
'ogg'
,'m4a'
,'mp4'
,'flv'
,'webm'
,'3gp'
-
Stream.
mediatype
¶ A string attribute that is
'normal'
,'audio'
or'video'
, depending on the content of the stream
-
Stream.
quality
¶ The resolution or the bitrate of the stream, depending on whether the stream is video or audio respectively
-
Stream.
resolution
¶ The resolution of a video as a string, eg: “820x640”. Note if the stream is 3D this will be appended; eg: “820x640-3D”.
For audio streams, this will be set to “0x0”
-
Stream.
rawbitrate
¶ The bitrate of an audio stream, int
For video streams, this will be set to None
-
Stream.
threed
¶ True if the stream is a 3D video (boolean)
-
Stream.
title
¶ The title of the video, this will be the same as
Pafy.title
-
Stream.
notes
¶ Any additional notes regarding the stream (eg, 6-channel surround) str
An example of accessing Stream attributes:
>>> import pafy
>>> v = pafy.new("cyMHZVT91Dw")
>>> v.audiostreams
[audio:m4a@48k, audio:m4a@128k, audio:m4a@256k]
>>> mystream = v.audiostreams[2]
>>> mystream.rawbitrate
255940
>>> mystream.bitrate
'256k'
>>> mystream.url
'http://r20---sn-aigllnes.c.youtube.com/videoplayback?ipbits=8&clen=1130...
Stream Methods¶
-
Stream.
get_filesize
()¶ Returns the filesize of a stream
-
Stream.
download
([filepath=""][, quiet=False][, callback=None][, meta=False][, remux_audio=False])¶ Downloads the stream object, returns the path of the downloaded file.
Parameters: - filepath (string) – The filepath to use to save the stream, defaults to (sanitised) title.extension if ommitted
- quiet (boolean) – If True, supress output of the download progress
- callback (function or None) – Call back function to use for receiving download progress
- meta (bool) – If True, video id and itag are appended to filename
- remux_audio (bool) – If True, remux audio file downloads (fixes some compatibility issues with file format, requires ffmpeg/avconv)
Return type: str
If a callback function is provided, it will be called repeatedly for each chunk downloaded. It must be a function that takes the following five arguments;
- total bytes in stream, int
- total bytes downloaded, int
- ratio downloaded (0-1), float
- download rate (kbps), float
- ETA in seconds, float
Stream.download()
example¶
Example of using stream.download():
import pafy
v = pafy.new("cyMHZVT91Dw")
s = v.getbest()
print("Size is %s" % s.get_filesize())
filename = s.download() # starts download
Will download to the current working directory and output the following progress statistics:
Size is 34775366
1,015,808 Bytes [2.92%] received. Rate: [ 640 kbps]. ETA: [51 secs]
Download using callback example:
import pafy
# callback function, this callback simply prints the bytes received,
# ratio downloaded and eta.
def mycb(total, recvd, ratio, rate, eta):
print(recvd, ratio, eta)
p = pafy.new("cyMHZVT91Dw")
ba = p.getbestaudio()
filename = ba.download(quiet=True, callback=mycb)
The output of this will appear as follows, while the file is downloading:
(16384, 0.001449549245392125, 20.05230682669207)
(32768, 0.00289909849078425, 16.88200659636641)
(49152, 0.004348647736176375, 15.196503182407469)
(65536, 0.0057981969815685, 14.946467230009146)
(81920, 0.007247746226960625, 15.066431667096913)
(98304, 0.00869729547235275, 14.978577915171627)
(114688, 0.010146844717744874, 14.529802172976945)
(131072, 0.011596393963137, 14.31917945870373)
...
Playlist Retrieval¶
The pafy.get_playlist()
function is initialised with similar arguments to pafy.new()
and will return a dict containing metadata and Pafy
objects as listed in the YouTube playlist.
-
pafy.
get_playlist
(playlist_url[, basic=False][, gdata=False][, signature=False][, size=False][, callback=None])¶ Parameters: - playlist_url (str) – The YouTube playlist url
- basic (bool) – fetch basic metadata and streams
- gdata (bool) – fetch gdata info (upload date, description, category, username, likes, dislikes)
- signature (bool) – fetch data required to decrypt urls, if encrypted
- size (bool) – fetch the size of each stream (slow)(decrypts urls if needed)
- callback (function) – a callback function to receive status strings
Return type: dict
The returned dict contains the following keys:
playlist_id: the id of the playlist
likes: the number of likes for the playlist
dislikes: the number of dislikes for the playlist
title: the title of the playlist
author: the author of the playlist
description: the description of the playlist
items: a list of dicts with each dict representing a video and containing the following keys:
pafy: The
Pafy
object for this video, initialised with the arguments given topafy.get_playlist()
playlist_meta: a dict of various video-specific metadata fetched from the playlist data, including:
added, likes, dislikes, thumbnail, is_cc, is_hd, user_id, cc_license, privacy, category_id
pafy.get_playlist()
example¶
>>> import pafy
>>> plurl = "https://www.youtube.com/playlist?list=PL634F2B56B8C346A2"
>>> playlist = pafy.get_playlist(plurl)
>>>
>>> playlist['title']
u'Rick Astley playlist'
>>>
>>> playlist['author']
u'Deborah Back'
>>>
>>> len(playlist['items'])
43
>>>
>>> playlist['items'][21]['pafy']
Title: Body and Soul - Rick astley
Author: jadiafa
ID: QtHnEJ8UArY
Duration: 00:04:11
Rating: 5.0
Views: 18855
Thumbnail: http://i1.ytimg.com/vi/QtHnEJ8UArY/default.jpg
Keywords: Rick, astely, body, and, soul, pop
>>>
>>> playlist['items'][21]['pafy'].audiostreams
[audio:m4a@128k]
>>>
>>> playlist['items'][21]['pafy'].getbest()
normal:webm@640x360
>>>
>>> playlist['items'][21]['pafy'].getbest().url
u'http://r4---sn-4g57knzr.googlevideo.com/videoplayback?ipbits=0&ratebypas...'
The pafy.get_playlist2()
serves the same purpose as the pafy.get_playlist()
, but uses version 3 of youtube’s api, making it able to retrieve playlists of over 200 items. It also provides a different interface, returning a pafy.Playlist
instead of a dictionary.
-
pafy.
get_playlist2
(playlist_url[, basic=False][, gdata=False][, signature=False][, size=False][, callback=None])¶ Parameters: - playlist_url (str) – The YouTube playlist url
- basic (bool) – fetch basic metadata and streams
- gdata (bool) – fetch gdata info (upload date, description, category, username, likes, dislikes)
- signature (bool) – fetch data required to decrypt urls, if encrypted
- size (bool) – fetch the size of each stream (slow)(decrypts urls if needed)
- callback (function) – a callback function to receive status strings
Return type: pafy.Playlist
Playlist Attributes¶
Once you have retrieved a playlist with pafy.get_playlist2()
you can iterate over it to get the Pafy objects for the items in it, or use len(playlist) to get its length. In addition, you can access the following attributes:
-
Pafy.
plid
¶ The ID of the playlist (str)
-
Pafy.
title
The title of the playlist (str)
-
Pafy.
author
The author of the playlist (str)
-
Pafy.
description
The description of the playlist (str)