Historical Dataset

YouTube Tagging Dataset

2006-2007

A unique historical collection of 1,092,310 YouTube videos with user-generated tags, captured during YouTube's first year of operation. This dataset provides an unprecedented window into early social media folksonomy and community tagging practices.

0
Videos
0
Unique Tags
0
Video-Tag Pairs
0
Content Creators

Key Findings

📊

Organic Folksonomy

Captured before algorithmic recommendations dominated, this dataset reveals natural tagging behaviors from YouTube's early community.

🏷️

Rich Metadata

Average of 6.9 tags per video, with tags ranging from single words to complex multi-word phrases reflecting diverse categorization strategies.

📅

Historical Snapshot

87-day collection period spanning November 2, 2006 to January 28, 2007, during YouTube's transformative first year.

🌐

Global Community

537,246 unique content creators from around the world, showcasing the early global nature of video sharing.

Dataset Visualizations

Dataset Overview Infographic
Distribution of Tags per Video

Most videos have 5-8 tags, showing consistent tagging practices

Most Popular Tags

Top 20 tags reveal early YouTube content themes and categories

Upload Activity Timeline

Upload activity during the collection period

Tag Co-occurrence

Most frequently co-occurring tag pairs

Research Context

This dataset was collected as part of early research on user-generated metadata and tagging conventions in video-sharing platforms. The research examined how YouTube's early community developed collaborative tagging practices and established conventions for categorizing video content.

The findings revealed that approximately 66% of tags had zero algorithmic relevance to video metadata, demonstrating the social and interpretive nature of early YouTube tagging. Users employed tags for discovery, categorization, and social signaling in ways that went beyond simple content description.

Publications

  • Geisler, G. and Burns, S. (2007). "Tagging Video: Conventions and Strategies of the YouTube Community." Proceedings of the Joint Conference on Digital Libraries (JCDL 2007), p. 480. DOI: 10.1145/1255175.1255279
  • Geisler, G. and Burns, S. (2008). "Tagging Video: Conventions and Strategies of the YouTube Community." Bulletin of IEEE Technical Committee on Digital Libraries (TCDL) 4(1).

Research Use Cases

Information Science

Study folksonomy, collaborative tagging, and user-generated metadata systems

Social Computing

Analyze early social media community practices and conventions

Digital History

Examine internet culture and YouTube's formative period (2006-2007)

Computational Linguistics

Investigate natural language use in tags and multilingual tagging patterns

Information Retrieval

Develop and test tag-based search and discovery algorithms

Cultural Studies

Explore popular culture trends and media consumption in late 2006/early 2007

Access the Dataset

The dataset is available in multiple formats for different use cases:

SQLite Database

Complete dataset in SQLite format (~1.1 GB). Ideal for SQL queries and analysis.

Download →

CSV Files

Separate CSV files for videos, tags, and relationships (~603 MB total).

Download →

JSONL Files

JSON Lines format for streaming and web applications. Perfect for JavaScript/Python.

Download →

Sample Data

1,000-record samples in JSON format. Quick preview for testing and exploration.

Download →

Citation

If you use this dataset in your research, please cite:

@misc{burns2006youtube,
  author = {Burns, Samuel A. and Geisler, Gary},
  title = {YouTube Tagging Dataset (2006-2007)},
  year = {2025},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.17508119},
  url = {https://zenodo.org/records/17508119}
}