2006-2007
A unique historical collection of 1,092,310 YouTube videos with user-generated tags, captured during YouTube's first year of operation. This dataset provides an unprecedented window into early social media folksonomy and community tagging practices.
Captured before algorithmic recommendations dominated, this dataset reveals natural tagging behaviors from YouTube's early community.
Average of 6.9 tags per video, with tags ranging from single words to complex multi-word phrases reflecting diverse categorization strategies.
87-day collection period spanning November 2, 2006 to January 28, 2007, during YouTube's transformative first year.
537,246 unique content creators from around the world, showcasing the early global nature of video sharing.
Most videos have 5-8 tags, showing consistent tagging practices
Top 20 tags reveal early YouTube content themes and categories
Upload activity during the collection period
Most frequently co-occurring tag pairs
This dataset was collected as part of early research on user-generated metadata and tagging conventions in video-sharing platforms. The research examined how YouTube's early community developed collaborative tagging practices and established conventions for categorizing video content.
The findings revealed that approximately 66% of tags had zero algorithmic relevance to video metadata, demonstrating the social and interpretive nature of early YouTube tagging. Users employed tags for discovery, categorization, and social signaling in ways that went beyond simple content description.
Study folksonomy, collaborative tagging, and user-generated metadata systems
Analyze early social media community practices and conventions
Examine internet culture and YouTube's formative period (2006-2007)
Investigate natural language use in tags and multilingual tagging patterns
Develop and test tag-based search and discovery algorithms
Explore popular culture trends and media consumption in late 2006/early 2007
The dataset is available in multiple formats for different use cases:
Complete dataset in SQLite format (~1.1 GB). Ideal for SQL queries and analysis.
Download →JSON Lines format for streaming and web applications. Perfect for JavaScript/Python.
Download →1,000-record samples in JSON format. Quick preview for testing and exploration.
Download →If you use this dataset in your research, please cite:
@misc{burns2006youtube,
author = {Burns, Samuel A. and Geisler, Gary},
title = {YouTube Tagging Dataset (2006-2007)},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17508119},
url = {https://zenodo.org/records/17508119}
}