Amazon ELB Log analyzing the most elegant way

In Blog, Cloud Technology by Jiju Thomas Mathew


Well we at Innoval is managing a Crowd Management system for a State Department for a festival season in a pilgrim center. The system is open to public during October to January every year. 2016-17 is the fifth consecutive season that we are handling this. We designed a framework of our own to deploy the system which was a simple routing system and as it was pluggable, we deployed plugins on top of the base system to enhance the features. Due course we had undergone several changes in the functional as well as architectural aspects to adopt and embrace the cloud as more new features were introduced.

This year with alerts from the department we were quite concerned about the massive security compromise threats and we had blocked off several country blocks using nginx geoip module and blacklisting. On top of these, we were watching unusual activities.

Like previous seasons, we had the elb logs being written to S3. Normally every year by the end of the season, we try to do a sync of the S3 logs to an EC2 instance, then fireup an instance of mysql to import the whole logs using “LOAD DATA” query. Then drop unwanted colums, and dump the CSV export, bzip and off load to one of our inhouse development system over night to do the sql analysis.

But this year, I got the announcement of Amazon Athena, well myself with a colleague thought we should attempt this and see what we can do. The aws blog itself had an example. Everything was simple, create the SQL View with the pattern matching the log files, and our first query was counting requests grouped by response code, wow 30.52 GB was scanned in 1m 48s. There was no overheads of importing, downloading, querying etc. We just run the query and leave it there, later we can go to the Athena history and view the results. Instantly we were excited, and we tried a couple of more queries. Well thanks to Amazon Athena, we could identify a single IP which had pounded our ELB instance with all sort of junk requests and created about 57K entries in a three minute period all triggering the “HTTP 404”.

There was one issue with Athena, the “Export Results as CSV” was showing some kind of S3 Error which we are pretty sure will be corrected soon by the AWS Team.