ClickHouse

ClickHouse Logo

ClickHouse is an open-source column-oriented DBMS for OLAP (online analytical processing).

ClickHouse was developed by the Russian IT company Yandex to meet the challenges of Yandex.Metrica, the second-largest web analytics service in the world.[1][2][3][4] ClickHouse allows analysis of data that is updated in real time. The system is linearly scalable, which makes it possible to store and process trillions of rows and petabytes of data.[5]

The project was opensourced under the Apache 2 license in June 2016.[6]

ClickHouse is used in Yandex projects and outside the company. For example, telemetrical data for the open-source project Yandex.Tank (a load testing tool) is stored in ClickHouse.[6] Yandex.Market uses ClickHouse to monitor site accessibility and KPIs.[7] ClickHouse was also implemented at CERN’s LHCb experiment[8] to store and process metadata on 10 billion events with over 1000 attributes per event, and Tinkoff Bank uses ClickHouse as a data store for a project.[9]

History

Yandex.Metrica previously used a classical approach, when raw data was stored in aggregated form.[10] This approach can help reduce the amount of stored data. However, it has several limitations and disadvantages:

A different approach is to store unaggregated data. Processing raw data requires a high-performance system, since all calculations are made in real time. To solve this problem, a column-oriented DBMS is needed that can handle analytical data on the scale of the entire Internet and doesn't cost too much. Since there wasn't a good solution available, Yandex began developing its own DBMS. The first ClickHouse prototype appeared in 2009. By the end of 2014, Yandex.Metrica version 2.0 was released. The new version has an interface for creating custom reports and uses ClickHouse for storing and processing data.

Features

The main features of the ClickHouse DBMS are:[11]

Limitations

ClickHouse has some features that can be considered disadvantages:

Use cases

ClickHouse was designed for OLAP queries.[11]

One of the common cases for ClickHouse is server log analysis. After setting regular data uploads to ClickHouse (it's recommended to insert data in fairly large batches with more than 1000 rows), it's possible to analyze incidents with instant queries or monitor a service's metrics, such as error rates, response times, and so on.

ClickHouse can also be used as an internal data warehouse for in-house analysts. ClickHouse can store data from different systems (such as Hadoop or certain logs) and analysts can build internal dashboards with the data or perform real-time analysis for business purposes.

Benchmark results

According to benchmark tests conducted by developers,[12] for OLAP queries ClickHouse is more than 100 times faster than Hive (a DBMS based on the Hadoop technology stack) or MySQL (a common RDBMS).

References

  1. "Usage Statistics and Market Share of Traffic Analysis Tools for Websites, November 2016". w3techs.com. Retrieved 2016-11-10.
  2. Datanyze. "Analytics Market Share Report | Competitor Analysis | Google Analytics, Google Universal Analytics, Yandex Metrica". Datanyze. Retrieved 2016-11-10.
  3. Wappalyzer (2011-12-30). "Analytics". wappalyzer.com. Retrieved 2016-11-10.
  4. "Analytics - SEOMON.com". seomon.com. Retrieved 2016-11-10.
  5. "ClickHouse: High-Performance Distributed DBMS for Analytics | Percona Live Amsterdam - Open Source Database Conference 2016". www.percona.com. Retrieved 2016-11-10.
  6. 1 2 "Яндекс открывает ClickHouse". Retrieved 2016-11-10.
  7. "Здоровье Маркета: как мы превращаем логи в графики, Дмитрий Андреев (Яндекс) — События Яндекса". events.yandex.ru. Retrieved 2016-11-10.
  8. "Yandex — Yandex Launches Search Tool for LHC Events at CERN". Yandex. Retrieved 2016-11-10.
  9. "Сравнение аналитических in-memory баз данных". Retrieved 2016-11-10.
  10. "Эволюция структур данных в Яндекс.Метрике". Retrieved 2016-11-10.
  11. 1 2 "ClickHouse Guide". clickhouse.yandex. Retrieved 2016-11-10.
  12. 1 2 "Performance comparison of analytical DBMS". clickhouse.yandex. Retrieved 2016-11-10.
  13. "smi2/phpClickHouse". GitHub. Retrieved 2016-11-10.
  14. "apla/node-clickhouse". GitHub. Retrieved 2016-11-10.
  15. "elcamlost/perl-DBD-ClickHouse". GitHub. Retrieved 2016-11-10.
  16. "archan937/clickhouse". GitHub. Retrieved 2016-11-10.
  17. "hannesmuehleisen/clickhouse-r". GitHub. Retrieved 2016-11-10.
  18. "yandex/clickhouse-jdbc". GitHub. Retrieved 2016-11-10.

External links

This article is issued from Wikipedia - version of the 11/20/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.