From its official documentation,

ClickHouse is a fast open-source OLAP database management system, column-oriented and allows to generate analytical reports using SQL queries in real-time.

ClickHouse manages extremely large volumes of data in a stable and sustainable manner. It currently powers Yandex.Metrica, the world’s second-largest web analytics platform, with over 13 trillion database records and over 20 billion events a day.

What ClickHouse is for and not for?

  • It can ingest a huge amount of data in real-time

ClickHouse local installation:

Instead of going to the apt installation, let's use docker images to get faster. From this docker page, I am doing the below commands to get the ClickHouse up and running.

docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server

The above command will download the ClickHouse docker image from the hub. It would take some time. “docker ps” command can be used to check whether the docker is running fine. The output will be similar like below.

Then, Enter the docker container with the command docker exec -it <your_container_id> /bin/sh. This command will bring the container terminal where we can access the ClickHouse database which is already running. Then, give command clickhouse-client, It will take you to the database shell where we can access all the databases.

Let's use some sample command to check everything worked well. SHOW DATABASES command will give the output like in the below image. The default installation has three tables in it.

use <database_name> command used to select the specific database. Let's see the tables in the database using the below commands.

Now I can use Kafka Topics with Apache Spark to ingest data into the ClickHouse.

That's all about the ClickHouse for now, I will continue updating this page every time I learn new with ClickHouse. ta ta.

Yet another Pythonic-Automation guy.