YouTube Data Tools

Frequently Asked Questions

What is this?

YouTube Data Tools (YTDT) is a collection of simple modules for extracting data from the YouTube platform via the YouTube API v3. It is not a mashup or fully developed analytics software, but a means for researchers to collect data in standard file formats to analyze further in other software packages.

Who develops YTDT?

YTDT is written and maintained by Bernhard Rieder, Associate Professor in Media Studies at the University of Amsterdam and researcher with the Digital Methods Initiative.

Development and maintainance of this tool are financed by the Dutch Platform Digitale Infrastructuur Social Science and Humanities as part of the CAT4SMR project.

Changes or new modules are announced on @RiederB and @cat4smr, but for questions and support please refer to the help section below.

How can I cite YTDT?

There is currently no publication on YTDT. But the different citation standards provide guidelines for how to cite software, e.g. APA: Rieder, Bernhard (2015). YouTube Data Tools (Version 1.42) [Software]. Available from https://ytdt.digitalmethods.net.

Alternatively, you can cite this blog post.

If you are interested in the kind of work that can be done with this tool, check out this research paper.

What kind of files does YTDT generate?

It creates network files in gdf format (a simple text format that specifies a graph) as well as statistical files using a tab-separated format. You can easily change TSV to CSV by searching and replacing all tabs with commas.

These files can then be analyzed and visualized using graph visualization software such as the powerful and very easy to use gephi platform or statistical tools such as R, Excel, SPSS, or others.

I don't know how to use YTDT, can you help me?

There is an introductory video and the interface for each data module contains a description of what is does and links to the relevant sections of the API. Most importantly, to make sense of the data, a good understanding of YouTube's basic architecture is required. The documentation for YouTube's API has comprehensive descriptions of entities and metrics.

We provide user support through a subreddit and a Facebook Group.

What are channel or video ids and how can I find them?

Many of the modules require a video or channel id as input. These can normally be found in the respective YouTube URLs.

For example, in the URL https://www.youtube.com/watch?v=BNM4kEUEcp8 the strange code after the "=" sign is the video id.

Channel ids have a format similar to UCtxGqPJPPi8ptAzB029jpYA and can be found via the channel info module. Just paste the channel URL into the form, the channel id will be in the result.

Where is the video network module?

YouTube removed the "relatedVideos" API in August 2023 and, as a consequence, this module had to be retired.

The tool does not work (correctly)!

While this is very simple software, this can happen for all kinds of reasons. Most problems are due to limitations or bugs in YouTube's Web-API and cannot be solved easily on our side, though. Sometimes the tool will fail because users have been using it too heavily.

High quality bug reports are much appreciated. If you have no experience with reporting bugs effectively, please read this piece. TL;DR: developers need context to debug a tool, when filing a bug report, please add the URL of the call, the browser you are using, a screenshot of the interface output, the data files, and a description of what you have been doing and how the problem manifests itself. Without extensive information it can be very hard to replicate a problem and subsequently fix it.

Please submit bug reports via our subreddit, Facebook Group, or (ideally) github. Please do not use Twitter - we need more information than 280 characters can provide.

I want to make crawls with higher crawl depth!

Since the public version of the script runs on a server that does a bunch of different things, this is not possible due to resource constraints. But you can always get the source code (see below) and remove the line of code that checks for crawl depth. You may still run out of RAM, but networks with > 100K nodes should be easily doable with 4 GB.

Can you add feature X to YTDT?

We cannot make any guarantees, but if you post a feature request in our subreddit or Facebook Group, we will definitely consider it.

Where is the source code?

The full source code is available on github. You'll also find installation instructions there.