From On-Chain to Event Streaming: Getting Started with Aiven’s Apache Kafka
It’s been 11 years since the initial release of Apache Kafka. Although Kafka clusters have been deployed across the globe, large enterprises still struggle with configuring, administering and maintaining a distributed event streaming platform. These tasks require IT infrastructure resources and a specialized, technical expert’s time (all of which increase operating cost).
That’s why, as a Solutions Architect, I often hear IT executives and engineers referring to Kafka as a “sure would be nice to have”, or “hopefully one day”.
These organizations would be happy to know Aiven’s managed Apache Kafka makes deploying and maintaining a production-ready, enterprise grade Kafka cluster up & running a breeze. Talk is cheap though, let’s see just how easy it to get started today.
Kafka Set Up
First, we will need to create an account with Aiven.
- Head to Aiven.io and sign in (or sign up).
- Select “+ Create a new service”
- Choose Apache Kafka, Select your Cloud Provider, Service Cloud Region and Service plan.
- Choose a name for your service and click ‘+Create Service’ to deploy.
Cloud provider will likely depend on your organizational requirements, but you’ll be pleased to know Aiven supports deployments in AWS, GCP, Azure, DigitalOcean and UpCloud. Select the region closest that provides the lowest latency, and a Service Plan that will meet your requirements.
While the service is starting up, click into the deployment to get the credentials required for integration. First copy & save the Service URI. Then, download the following files:
- Access Key (service.key)
- Access Certificate (service.cert)
- CA Certificate (ca.pem)
Now that we have the requisite credentials, lets change a few cluster configuration settings within Aiven’s UI.
To view messages within the Aiven console, enable the ‘Apache Kafka REST API’ setting.
Next, scroll down to the ‘Advanced configuration’ module. Select ‘+ Add configuration option’, choose ‘kafka.auto_topics_enable’ and then toggle this setting to on. This enables us to create topics on the fly, just by publishing events to the cluster.
Believe it or not, that’s all the configuration tasks required. Now we’re ready to get started with some code!
Now that we have the Kafka infrastructure in place, it’s time to decide what data we’ll be streaming. The nice part about working with Kafka (and many open-source platforms) is that a wide client library makes it essentially programming language agnostic. We’ll be using Python today, but feel free to implement with your preferred tooling.
Feel free to examine the codebase we’ll be reviewing here
For this demo, we’ll be generating a JSON object that mimics the Etherscan API response for an account’s on-chain transactions. Here’s our format:
For analysis purposes, we’ll also be adding a unique identifier (UUID) for each event, as well as a datetime of generation (ISO 8601).
To get started, navigate to our local repo and install the kafka-python client. Simply run the following command in your terminal or console.
pip install kafka-python
Then, add the following dependencies:
Create a folder for the certificates we downloaded from Aiven console and store the path in a variable. Here’s how I stored mine:
Next, we’ll use the kafka-python library to declare a KafkaProducer. Here’s where we use the Service URI you copied earlier and refer to the path variable we just declared. We’ll be using SSL to authenticate today but be sure to examine your options here and choose what works best for your organization.
Alright now we’re ready to generate some data! Here’s the code we’ll be using to create the variables for our pseudo-Ethereum transactions.
(Note: you’ll see a wider variety of response data when calling the actual Etherscan API, but this code does a good job representing the structure required.)
Now we have some data generated, let’s use the second part of the function to store it in a Python dict, then convert that dict to a JSON object and return that object:
Now that the hard work is over, we just need to call that function and send the objects to our Aiven Kafka cluster! I’ll be calling the function 3 times, generating 3 distinct events that represent 3 distinct on-chain transactions:
You just published your first events to your Aiven Apache Kafka cluster! Any consumer client can subscribe to this event stream to process this data or store it in a database.
To view the messages within the cluster, navigate back to Aiven’s console, select our Kafka service and route through Topics > ‘ethereum-transactions’ > Messages > [Format: JSON] > Fetch Messages
Congratulations! You’re officially up and running with Aiven’s fully managed Apache Kafka.
Now that we have a program producing messages, we now need a consumer to store our data. Continue on with my second post, where we’ll use Aiven’s managed InfluxDB service to store our data and Aiven’s managed Grafana service to monitor our entire pipeline.