Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 3 additions & 36 deletions get-started/sample-datasets/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ title: 'Tutorials and example datasets'
doc_type: 'landing-page'
---

import { SampleDatasetExplorer } from '/snippets/components/SampleDatasetExplorer/SampleDatasetExplorer.jsx'

<Tip>
These tutorials work with any ClickHouse deployment, including [ClickHouse Cloud](/get-started/setup/cloud).
</Tip>
Expand All @@ -20,39 +22,4 @@ In addition, the sample datasets provide a great experience on working with Clic
learning important techniques and tricks, and seeing how to take advantage of the many powerful
functions in ClickHouse. The sample datasets include:

{/* The following table is automatically generated at build time
by https://github.com/ClickHouse/clickhouse-docs/blob/main/scripts/autogenerate-table-of-contents.sh */}

{/*AUTOGENERATED_START*/}
| Page | Description |
|-----|-----|
| [Amazon customer review](/get-started/sample-datasets/amazon-reviews) | Over 150M customer reviews of Amazon products |
| [AMPLab Big Data Benchmark](/get-started/sample-datasets/amplab-benchmark) | A benchmark dataset used for comparing the performance of data warehousing solutions. |
| [Analyzing Stack Overflow data with ClickHouse](/get-started/sample-datasets/stackoverflow) | Analyzing Stack Overflow data with ClickHouse |
| [Anonymized web analytics](/get-started/sample-datasets/anon-web-analytics-metrica) | Dataset consisting of two tables containing anonymized web analytics data with hits and visits |
| [Brown University Benchmark](/get-started/sample-datasets/brown-benchmark) | A new analytical benchmark for machine-generated log data |
| [COVID-19 open data](/get-started/sample-datasets/covid19) | COVID-19 Open-Data is a large, open-source database of COVID-19 epidemiological data and related factors like demographics, economics, and government responses |
| [dbpedia dataset](/get-started/sample-datasets/dbpedia) | Dataset containing 1 million articles from Wikipedia and their vector embeddings |
| [Environmental sensors data](/get-started/sample-datasets/environmental-sensors) | Over 20 billion records of data from Sensor.Community, a contributors-driven global sensor network that creates Open Environmental Data. |
| [Foursquare places](/get-started/sample-datasets/foursquare-os-places) | Dataset with over 100 million records containing information about places on a map, such as shops, restaurants, parks, playgrounds, and monuments. |
| [Geo data using the cell tower dataset](/get-started/sample-datasets/cell-towers) | Learn how to load OpenCelliD data into ClickHouse, connect Apache Superset to ClickHouse and build a dashboard based on data |
| [GitHub events dataset](/get-started/sample-datasets/github-events) | Dataset containing all events on GitHub from 2011 to Dec 6 2020, with a size of 3.1 billion records. |
| [Hacker News dataset](/get-started/sample-datasets/hacker-news) | Dataset containing 28 million rows of hacker news data. |
| [Hacker News vector search dataset](/get-started/sample-datasets/hacker-news-vector-search) | Dataset containing 28+ million Hacker News postings & their vector embeddings |
| [LAION 5B dataset](/get-started/sample-datasets/laion5b) | Dataset containing 100 million vectors from the LAION 5B dataset |
| [Laion-400M dataset](/get-started/sample-datasets/laion) | Dataset containing 400 million images with English image captions |
| [New York Public Library "What's on the Menu?" dataset](/get-started/sample-datasets/menus) | Dataset containing 1.3 million records of historical data on the menus of hotels, restaurants and cafes with the dishes along with their prices. |
| [New York taxi data](/get-started/sample-datasets/nyc-taxi) | Data for billions of taxi and for-hire vehicle (Uber, Lyft, etc.) trips originating in New York City since 2009 |
| [NOAA Global Historical Climatology Network](/get-started/sample-datasets/noaa) | 2.5 billion rows of climate data for the last 120 yrs |
| [NYPD complaint data](/get-started/sample-datasets/nypd-complaint-data) | Ingest and query Tab Separated Value data in 5 steps |
| [OnTime](/get-started/sample-datasets/ontime) | Dataset containing the on-time performance of airline flights |
| [Star Schema Benchmark (SSB, 2009)](/get-started/sample-datasets/star-schema) | The Star Schema Benchmark (SSB) data set and queries |
| [Taiwan historical weather datasets](/get-started/sample-datasets/tw-weather) | 131 million rows of weather observation data for the last 128 yrs |
| [Terabyte click logs from Criteo](/get-started/sample-datasets/criteo) | A terabyte of click logs from Criteo |
| [The UK property prices dataset](/get-started/sample-datasets/uk-price-paid) | Learn how to use projections to improve the performance of queries that you run frequently using the UK property dataset, which contains data about prices paid for real-estate property in England and Wales |
| [TPC-DS (2012)](/get-started/sample-datasets/tpcds) | The TPC-DS benchmark data set and queries. |
| [TPC-H (1999)](/get-started/sample-datasets/tpch) | The TPC-H benchmark data set and queries. |
| [WikiStat](/get-started/sample-datasets/wikistat) | Explore the WikiStat dataset containing 0.5 trillion records. |
| [Writing queries in ClickHouse using GitHub data](/get-started/sample-datasets/github) | Dataset containing all of the commits and changes for the ClickHouse repository |
| [YouTube dataset of dislikes](/get-started/sample-datasets/youtube-dislikes) | A collection of dislikes of YouTube videos. |
{/*AUTOGENERATED_END*/}
<SampleDatasetExplorer />
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/benchmarks-dark.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/benchmarks-light.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/cell-towers-dark.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/covid19-dark.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/covid19-light.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/criteo-dark.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/criteo-light.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/dbpedia-dark.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/dbpedia-light.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/github-dark.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sample-datasets-grid/github-light.jpg
Binary file added images/sample-datasets-grid/hacker-news-dark.jpg
Binary file added images/sample-datasets-grid/laion-400m-dark.jpg
Binary file added images/sample-datasets-grid/laion-400m-light.jpg
Binary file added images/sample-datasets-grid/laion5b-dark.jpg
Binary file added images/sample-datasets-grid/laion5b-light.jpg
Binary file added images/sample-datasets-grid/menus-dark.jpg
Binary file added images/sample-datasets-grid/menus-light.jpg
Binary file added images/sample-datasets-grid/noaa-dark.jpg
Binary file added images/sample-datasets-grid/noaa-light.jpg
Binary file added images/sample-datasets-grid/nyc-taxi-dark.jpg
Binary file added images/sample-datasets-grid/nyc-taxi-light.jpg
Binary file added images/sample-datasets-grid/ontime-dark.jpg
Binary file added images/sample-datasets-grid/ontime-light.jpg
Binary file added images/sample-datasets-grid/star-schema-dark.jpg
Binary file added images/sample-datasets-grid/tpcds-dark.jpg
Binary file added images/sample-datasets-grid/tpcds-light.jpg
Binary file added images/sample-datasets-grid/tpch-dark.jpg
Binary file added images/sample-datasets-grid/tpch-light.jpg
Binary file added images/sample-datasets-grid/tw-weather-dark.jpg
Binary file added images/sample-datasets-grid/tw-weather-light.jpg
Binary file added images/sample-datasets-grid/wikistat-dark.jpg
Binary file added images/sample-datasets-grid/wikistat-light.jpg
Loading