Part 2 of the series can be found here
part 3 of the series can be found

From its official site

Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications

It groups containers that make up an application into logical units for easy management and discovery. K8s is developed by Google.

  • K8 is a Opensource, Container Orchestration Framework/tool.
  • Means, K8s used to manage the applications made up by hundreds/thousands of containers in different deployment environments like physical machines, cloud machines etc.

Need of Container Orchestration tool:

  • Usage of microservices increased…

From It’s official website,

Apache Kafka is an open-source distributed event streaming platform.

Kafka is a distributed platform consists clients and servers. It runs as a cluster with one or more servers which can span multiple data centers/cloud regions.

Servers acts as storage layers(Brokers) and also they run as Kafka Connect tool to export and import data continuously to integrate with out sources.

Clients allows to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner.

Kafka is highly scalable and Fault-tolerant means that, if any of its…

Previous parts of the series can be found here part-1 , part-2 .

In this post, Lets have our fingers dirty with Kubernetes. There are few ways available to have the Kubernetes setup.

  1. Cloud based K8s cluster setup like GCP, Azure or AWS
  2. Installing K8s tools like Minikube & Kubectl
  3. Using online playgrounds to have hands-on

Cloud based cluster setup is suitable for multi node cluster which is obviously not necessary to get initial hands-on. Installing Minikube & Kubectl is a good option so that we can have a single node cluster where both master & worker nodes will be…

Part 1 of the series can be found here.
Part 3 of the series can be found here.

In the first part, we have seen some important components in the K8s cluster. In this part lets talk about the components in the Architecture.

Node processes:
Let’s have a basic setup of one name node with two application Pods like below.

  • One of the main components of K8s architecture is worker servers or nodes.
  • Each node can have one or more application pods where containers will be placed. Nodes are the servers which actually does the work.
  • Three important process must…

From its official documentation,

ClickHouse is a fast open-source OLAP database management system, column-oriented and allows to generate analytical reports using SQL queries in real-time.

ClickHouse manages extremely large volumes of data in a stable and sustainable manner. It currently powers Yandex.Metrica, the world’s second-largest web analytics platform, with over 13 trillion database records and over 20 billion events a day.

What ClickHouse is for and not for?

  • It can ingest a huge amount of data in real-time
  • Fast SQL query processing
  • Not a transaction database. It is a column-oriented database.
  • No deletion possible in ClickHouse.

ClickHouse local installation:

Instead of going to…

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

what is event streaming?
Capturing data in real-time from multiple sources in the form of streams of events. Storing these streamings can be used for later retrieval, manipulating, and processing. Even It is possible to react to the events in real-time.

Kafka is event streaming, how?
1. Kafka can read and write stream of events
2. Kafka can store data as long as needed.
3. To process streams of events as they occur or retrospectively.

Before continue, There is Introduction to Apache Spark from me. You can head to the link if you are new to Apache Spark

Welcome to some practical explanations to Apache Spark with Scala. There is even Python supported Spark is available which is PySpark. For the sake of this post, I am continuing with Scala with my windows Apache Spark installation.

  1. Spark Shell
    Go to the Spark installation directory and cd to bin folder. type spark-shell, enter. You will see Spark Session being started and showing some logs which are very important.

To the continuation of Apache Spark | A Processing Friend, Here I am again with the second part. Before continuing, I am suggesting to go Part-1 to understand the basics of Apache Spark.

Apache Spark RDD (Image Credits: Dataflair)

RDD (Resilient Distributed Datasets) are the fundamental building blocks of Apache Spark. Spark stores the data in RDD format.

Why RDD:

The growth of the current technology world is something we cant estimate. Now we are in Artificial Intelligence evolution. Tomorrow it will change to something advance. For these advanced technologies, data needed to stored and processed is huge. Companies like Google processes Petabytes of data every day. The…

Apache Spark is an open-sourced, distributed data processing system for big data applications that follows the in-memory caching technique for fast response almost against any data size. From Its official site,

Apache Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs

Four advantages of Apache Spark from its developers,

1. Speed

Runs approx. 100X faster than its competitor Hadoop Eco. It achieves high performance for both Batch and Streaming data.

2. Ease of Use

Supports over 80 high-level operators to build parallel apps including industry rulers…

Nowadays, for anything, for any business, Mobile Application is the best way to reach out to the customers. Businesses can be anything from Photo Editor to Bank Applications. As an Automation Python Developer, I was thinking is there a way to create Mobile Applications in Python. And then, I came to know KIVY. Let’s dive to get some knowledge about it. From its website,

Kivy is an Open source Python library for rapid development of applications that make use of innovative user interfaces, such as multi-touch apps that is Cross platform and Free.


I have used Miniconda3 to install Kivy…

Gobalakrishnan Viswanathan

Yet another Pythonic-Automation guy.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store