From On-Chain to Event Streaming: Getting Started with Aiven’s Apache Kafka

Adam Noonan
5 min readJun 7, 2022

It’s been 11 years since the initial release of Apache Kafka. Although Kafka clusters have been deployed across the globe, large enterprises still struggle with configuring, administering and maintaining a distributed event streaming platform. These tasks require IT infrastructure resources and a specialized, technical expert’s time (all of which increase operating cost).

That’s why, as a Solutions Architect, I often hear IT executives and engineers referring to Kafka as a “sure would be nice to have”, or “hopefully one day”.

These organizations would be happy to know Aiven’s managed Apache Kafka makes deploying and maintaining a production-ready, enterprise grade Kafka cluster up & running a breeze. Talk is cheap though, let’s see just how easy it to get started today.

Kafka Set Up

First, we will need to create an account with Aiven.

  1. Head to Aiven.io and sign in (or sign up).
  2. Select “+ Create a new service”
  3. Choose Apache Kafka, Select your Cloud Provider, Service Cloud Region and Service plan.
  4. Choose a name for your service and click ‘+Create Service’ to deploy.

Cloud provider will likely depend on your organizational requirements, but you’ll be pleased to know Aiven supports deployments in AWS, GCP, Azure, DigitalOcean and UpCloud. Select the region closest that provides the lowest latency, and a Service Plan that will meet your requirements.

While the service is starting up, click into the deployment to get the credentials required for integration. First copy & save the Service URI. Then, download the following files:

  1. Access Key (service.key)
  2. Access Certificate (service.cert)
  3. CA Certificate (ca.pem)

Now that we have the requisite credentials, lets change a few cluster configuration settings within Aiven’s UI.

To view messages within the Aiven console, enable the ‘Apache Kafka REST API’ setting.

Enable Apache Kafka REST API to view messages within the Aiven user interface

Next, scroll down to the ‘Advanced configuration’ module. Select ‘+ Add configuration option’, choose ‘kafka.auto_topics_enable’ and then toggle this setting to on. This enables us to create topics on the fly, just by publishing events to the cluster.

‘kafka.auto_topics_enable’ setting within Advanced configuration

Believe it or not, that’s all the configuration tasks required. Now we’re ready to get started with some code!

Now that we have the Kafka infrastructure in place, it’s time to decide what data we’ll be streaming. The nice part about working with Kafka (and many open-source platforms) is that a wide client library makes it essentially programming language agnostic. We’ll be using Python today, but feel free to implement with your preferred tooling.

Feel free to examine the codebase we’ll be reviewing here

For this demo, we’ll be generating a JSON object that mimics the Etherscan API response for an account’s on-chain transactions. Here’s our format:

Example response from Etherscan Accounts API

For analysis purposes, we’ll also be adding a unique identifier (UUID) for each event, as well as a datetime of generation (ISO 8601).

To get started, navigate to our local repo and install the kafka-python client. Simply run the following command in your terminal or console.

pip install kafka-python

Then, add the following dependencies:

Python dependencies

Create a folder for the certificates we downloaded from Aiven console and store the path in a variable. Here’s how I stored mine:

Folder path containing Aiven Kafka certificates

Next, we’ll use the kafka-python library to declare a KafkaProducer. Here’s where we use the Service URI you copied earlier and refer to the path variable we just declared. We’ll be using SSL to authenticate today but be sure to examine your options here and choose what works best for your organization.

KafkaProducer configuration with SSL

Alright now we’re ready to generate some data! Here’s the code we’ll be using to create the variables for our pseudo-Ethereum transactions.

(Note: you’ll see a wider variety of response data when calling the actual Etherscan API, but this code does a good job representing the structure required.)

First part of the generate_msgs function: generate some data and store in a variable

Now we have some data generated, let’s use the second part of the function to store it in a Python dict, then convert that dict to a JSON object and return that object:

Store data in a dict, convert to a JSON object and return that object

Now that the hard work is over, we just need to call that function and send the objects to our Aiven Kafka cluster! I’ll be calling the function 3 times, generating 3 distinct events that represent 3 distinct on-chain transactions:

Call the function and publish the JSON objects to Kafka

You just published your first events to your Aiven Apache Kafka cluster! Any consumer client can subscribe to this event stream to process this data or store it in a database.

To view the messages within the cluster, navigate back to Aiven’s console, select our Kafka service and route through Topics > ‘ethereum-transactions’ > Messages > [Format: JSON] > Fetch Messages

Congratulations! You’re officially up and running with Aiven’s fully managed Apache Kafka.

Now that we have a program producing messages, we now need a consumer to store our data. Continue on with my second post, where we’ll use Aiven’s managed InfluxDB service to store our data and Aiven’s managed Grafana service to monitor our entire pipeline.

--

--

Adam Noonan
0 Followers

Empowering others by sharing my excitement for technology.