AWS Lambda can be used in many places. As a fully managed service, it is best fit for light-weight, short and easy task such as event trigger, s3 data filter and migration, and recently a rather interesting self-assigned task slash hobby, status reports on slack. The infrastructure is also light-weight.

We Code In Style

There are multiple ways to develop on Glue, we will introduce Jupyter Notebook as it is widely used by data scientist these days. Please note that AWS provides Jupyter Notebook in corporation of multiple data processing services to save its users from software installation and possible limitation on local computation resource. Although AWS recommended Zeppline for For more information, you could find tutorial here. Keep in mind that when you use local Jupyter Notebook connects with Dev endpoint to perform data computation, you are saving cost on EC2 used to serve Jupyter Notebook on AWS. …

It happened and it was not a smooth drive.

AWS Glue
AWS Glue

This year we received a case that we were asked to support to build a data pipeline on AWS Glue. As a heavy EMR user, Glue is not a familiar concept. Through months of struggling, digging, and thinking, here is some of the observations.

AWS Glue was first announced in 2017. In order to fulfill what was not provided by EMR, AWS released Glue. Glue is a serverless ETL service built on the top of AWS EMR.

AWS breaks ETL service down into distinct modules that work both independently and coherently. Although AWS Glue was labeled an ETL service, it…

AWS Fargate has gotten more and more attention these days. Serverless has become the trend. Fargate as an API Service can surely extended itself with Application Elastic Balancer. Fargate itself can be extended horizontally, i.e. adding more number of tasks, with the measure of CPUUtilization and MemoryUtilization preset CloudWatch alarms. Fargate, as powerful as it is, is able to scale out and scale in based on the number of items in SQS. The steps are presented under presumption that there is a working Fargate already.

Configurations are as simple as two-steps,

  1. Create an alarm on CloudWatch
  2. Update Fargate service

Create an alarm on CloudWatch


AWS Elastic Map Reduce (EMR) is a service to perform big data analysis. AWS grouped EC2s with high performance profile into a cluster mode with Hadoop and Spark of different versions pre-installed for the need of big data analysis. EMR charges on EC2 hourly rate as well as an hourly management fee.

AWS provides premium cloud computing service for big data. AWS provides quite a selections on EMR server choices. From general purpose, i.e. if you are not sure best instance type for your case, m-class type instance would be most safe choice. We usually use c-class type instance for…

AWS Fargate can be a useful service for data processing. It bears serverless attribute like AWS Lambda. It has, nevertheless, more extensive functionality. In this section, I will introduce a way to set up AWS Fargate as a batch processor connecting AWS S3 and AWS Elastic Map Reduce (EMR) service.

Fargate here is to fulfill the following tasks,

  1. Read message from pre-defined SQS
  2. Move files from S3 bucket folder (raw) to another S3 bucket folder (fargate_proc)
  3. Kick off EMR job to count the number of records in data file
  4. Output result of the count from EMR job to S3 bucket…

While Docker is widely used on in software world these days, AWS caught up with the trend by providing a Container Registry Service that allows us to create, delete, and put versions on designated images. AWS Elastic Container Registry service aims to provide a repository of images to facilitate AWS Elastic Container Service (ECS). To work with Elastic Container Registry service, there are few command line instructions to be familiarized with.

Elastic Container Repository Creation

We need to first create a repository on ECR, much like the concept of using github. Elastic Container Repository can be created using aws cli and console.

Create repository using AWS Console

AWS uses account ID in various places. This is a short reading helping you to identify the place to find your AWS account id and its application in various places. In my case, I was working on pushing a docker image onto AWS Elastic Container Registry (ECR) when I realized I need to locate AWS account id as part of cli.

Account ID on AWS Console

Log into AWS Console, click on My Account.

So the story started like this. CMS team has been developing some first-class and highly customized content management system at enterprise level for some time now. We have reached a point where we can boldly called our product stablized, of which allows us to explore more with experiences gained from current product. It was a simple casual small chat with Drake one morning that there has always been a blind side of engineers to view a product from users’ perspective while eighty or even more percent of CMS users are not engineers. Drake suggested to hold an event much like…

Ava Chen

The ultimate sophistication is to be true to yourself.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store