Table of Contents

Replicating Data from Postgres to MS SQL Server using Kafka

You might have heard the expression that “Data is the new oil”, that’s because data is key for any business decision system of an organization. But then, the required data is not stored in one place like a regular & vanilla RDBMS. Normally, the data remains scattered across various RDBMS & other data silos. So, it’s quite common to come across situations where data is siting across different kind of sources, and the DBA needs to find a solution to pull all relevant data into a centralized location for further analysis.

The Challenge  

Consider a particular scenario – we have some sales data in AWS Postgres RDS Instance and that data has to be replicated into some tables in their Azure VM with MS SQL Server. But the challenge is heterogeneous replication is marked as deprecated, and it doesn’t work anywhere but only between SQL Server and Oracle. No Postgres allowed in the batch. Also, native heterogeneous replication is not found on the Postgres side. We need third-party tools to achieve this or work manually in HIGH effort  mode, creating a CDC (change data capture) mechanism through the use of triggers, and then getting data from control tables. It sounds fun to implement, but time is of the essence. Getting such a solution on ground would take quite some time. Therefore, another solution had to be used to complete this data movement. 

A Possible Solution  

The Big Data scenario has many tools that can achieve this objective, but they don’t have the advantages that Kafka offered. 

Initially, Apache Kafka was designed as a message queue like app (like IBM MQ Series or even SQL Server Service Broker). The best definition of it is perhaps – “Apache Kafka is more like a  community distributed event streaming technology which can handle trillions of events per day”. But a streaming system must have the ability to get data from different data sources and deliver the messages on different data endpoints. 

There are many tools to achieve this – Azure Event Hubs, AWS Kinesis, or Kafka managed in Confluent Cloud. However, to show how things work under the hood, and since they have similar backgrounds (Apache Kafka open-source code) we have decided to show how to set up the environment from scratch without any managed services for the stream processing. So, the picture would look something like this: 

The basic Apache Kafka components have been enlisted here:

Message: This is the smallest data unit in Kafka. From the perspective of a DBA, a message is simply a row in a database table. In the Kafka, a message is an array of bytes. 

Schema: It’s a structure that accompanies messages, to make them understandable by the consuming applications. 

Topics: Messages categories are Topics. In the DBA world, that is like a table. Messages are read in order from beginning to end in an append-only format. 

Producers and consumers: This idea is very similar to SQL Server Replication publishers and subscribers. However, Kafka has been provided the ability to handle multiple producers and consumers simultaneously. 

Brokers and clusters: A single Kafka server is known as “broker”. It gets messages from the producers, scripts an offset to them, and helps in persisting them on disk. It also serves consumers as they request for messages. In general, in a production environment, brokers are a cluster’s part.  

Connectors: Apache Kafka has been provided with many connectors for handling consumers and producers. Different connectors are available either free of charge or on purchase. 

So, Kafka is the choice for the task at hand, and we’re going to create the environment for the data movement from scratch. 

The Architecture

To achieve the streaming functionality and ensure data flow we will use a set of tables from PostgreSQL DB as source, and a set of tables from SQL Server as destination, while Apache Kafka will be used as the “replication mechanism”. For connecting to Postgres, and detecting its data changes (CDC) a connector plugin is also required. Therefore, we will use Debezium as it is also an open-source and built from the ground up to go along with Kafka as easily as possible. 

Remember for achieving the required goal (replicating data from Postgres to SQL Server using Kafka) we can use a Kafka-as-a-service product in any of the available cloud providers. But, to fully understand that how the architecture works, and also, simply as a showcase, we have made our mind not to use any SaaS related product, nor Docker. Now, we’re ready to proceed. 

Get to the task:  

Let’s start by installing our first VM. We are using a Windows PC to do all the testing, so we are creating the VM in Hyper-V. You can use whichever hypervisor you want. We have used Red Hat Developer edition 8.5 as OS. For that we need to subscribe to the Red Hat Portal. 

1. We will create a VM called “Jupiter” which will act as our Postgres and Kafka server. 

2. As far as Hyper-V is concerned, it will be Gen 2. Ensure that you allocate enough memory so that  you can do proper testing. In our case, we are setting it to 6 GB – dynamic, which means Hyper-V won’t dedicate that memory entirely to the VM.

3. For the sake of networking, we are adding a virtual ethernet adapter which is attached to the physical nic of the host, which means your wi-fi router will assign an IP address directly to the VM. 

4. As we have a small data sample, the space of 60 GB is quite enough. 

5. Ensure to attach the Red Hat iso in this way you can install the OS shortly. 

6. Also ensure that you set a reasonable amount of virtual cpus, and disable Secure Boot for the time being.  

7. Enable all guest services. 

8. Now proceed to installing the OS, and Starting the VM. Select its console window and hit any key to start the install process. 

9. Choose the preferred installation language.  

10. In this screen, we need to adjust the following: Partitioning is set to Auto, Kdump is disabled, enable Network, Root Password is set, create a user. After doing all the desired settings, we will click ‘Begin Installation’. 

11. It will take some time to complete the OS installation. Then, we will reboot. 

12. During the configuration phase, you will have to accept the License terms. 

13. After that, click “Finish Configuration”.  

14. You need to get your new VM registered with your Red Hat account, otherwise, you can’t download the necessary packages for the environment to work properly. Click ‘Register’, and provide your username (NOT EMAIL) and password for your Red Hat account. 

15. At this stage, we can afford to ignore the next screens, and we don’t need to sign on to any other online account. 

16. Now the GUI doesn’t need to be used anymore. We can use any Terminal-like app in Windows to connect with our VM. Now get the IP address for the VM you just installed. It can be done by opening a terminal window and typing: “ifconfig”.

17. MobaXTerm is quite useful and has a very intuitive UI. After connecting, you’ll see a Windows Explorer-like side window, and the main bash terminal where you can input the required commands. 

We have the most hands-on experience in the Kafka technology.

Talk to our Kafka experts & experience the benefit of our methodological approach. Schedule a free assessment today. 

Liked what you read !

Please leave a Feedback

Leave a Reply

Your email address will not be published. Required fields are marked *

Join the sustainability movement

Is your carbon footprint leaving a heavy mark? Learn how to lighten it! ➡️

Register Now

Calculate Your DataOps ROI with Ease!

Simplify your decision-making process with the DataOps ROI Calculator, optimize your data management and analytics capabilities.

Calculator ROI Now!

Related articles you may would like to read

The Transformative Power of Artificial Intelligence in Healthcare
How To Setup An AI Center of Excellence (COE) With Use Cases And Process 

Request a Consultation

Proposals

Know the specific resource requirement for completing a specific project with us.

Blog

Keep yourself updated with the latest updates about Cloud technology, our latest offerings, security trends and much more.

Webinar

Gain insights into latest aspects of cloud productivity, security, advanced technologies and more via our Virtual events.

ISmile Technologies delivers business-specific Cloud Solutions and Managed IT Services across all major platforms maximizing your competitive advantage at an unparalleled value.