Twitter Processing with AWS

Twitter Processing Workflow

Have you ever wondered how you might go about ingesting and analyzing tweets from Twitter? I was asked recently about how I would approach this problem and because I’m a huge nerd I took it a step further and built out a proof of concept.

The Goal

The goal of this project was to ingest tweets about the 2020 stimulus, attempt to determine the general sentiment of the tweets, and then provide a visualization of the results.

How I Did It

In this example, I utilize AWS Kinesis Firehose, Lambdas, Amazon Comprehend, ElasticSearch, and Kibana to bring it all together.

Ingestion

Ingestion is handled by a small Python script to act as the client to the Twitter Streaming API. Its job is to connect to the API, receive the incoming tweets and then immediately send it to a Kinesis Firehose. In my example, I store the Python script as a Lambda and trigger it with the AWS cli. The Lambda version of the script is modified to only ingest tweets for 10 seconds before quitting. In a real world scenario this client would run constantly on an EC2 instance or within a Kubernetes cluster as a Docker container.

Analysis

Analysis to determine the sentiment of the tweets is handled by Amazon Comprehend via a Lambda function triggered by the Firehose. The Lamabda returns the detected sentiment, the confidence score, and the original message back to Firehose which then sends the message to the next stage.

Storage

Storage is handled with a managed ElasticSearch cluster which is the target of the Kinesis Firehose. We tell the Firehose to put messages in our selected ElasticSearch cluster inside the given index which we will then use in the visualization stage.

Visualization

Visualizing the results of our processing is handled by Kibana as it is built into the managed ElasticSearch cluster. With Kibana I create a dashboard with a pie chart with each slice representing a sentiment and then underneath we show a table with the sentiment, confidence score, and original message.

Dashboard

Find Out More

To learn more about this solution and see the code, visit the repository on GitHub.

Matt Andes
Matt Andes
Founder of Runic Labs

I specialize in DevOps, cloud computing, virtualization, containerization, and automation.