Using Kinesis and Lambda: A Review of the Cost of Big Data With Amazon Web Services
At SOLTECH, we’re always thinking of ways to be more efficient with our clients’ time and money. A topic of discussion lately has been focused around using Kinesis and Lambda. Is it best to use both versus several severs running 24/7? We understand wanting to know what is best for your project, so in this article, let’s review the cost and benefits of big data with Amazon Web Services (AWS) – specifically using Kinesis and Lambda
Quickly becoming one of the most common approaches to processing big data, Amazon Web Services’ Kinesis and Lambda products offer a quick and customizable solution to many companies’ needs. The ability to both vertically and horizontally scale in this environment either automatically or with a couple of clicks, is something that Big Data developers love.
Using Kinesis and Lambda
Kinesis offers a great approach to allowing your data to be consumed by many different applications and sources, all of which can work independently of one another in completely different ways if you so choose.
On the processing side of things, Lambda allows you to code serverless bits of logic, in a few different languages, that can either consume these Kinesis streams or other event sources like Amazon’s SNS for messaging. Chances are you have already heard of these services being used by someone or some company you know, but may be concerned about the price tag.
The Benefits of Amazon Web Services
The great thing about Amazon Web Services (AWS) is you pay for what you use, you aren’t ever forced into contracts or agreements where you may not be utilizing every penny you’re spending. This makes using Kinesis and Lambda one of the more cost effective tools.
Estimating Kinesis Costs
With Kinesis, this is no different, you pay for the amount of shards you use and for how long you use them. Shards represent throughput units; you calculate the number of shards you need by how much data you expect your Kinesis stream to handle as well as how many consumers of this stream you need.
If you need to increase or decrease the number of shards, you can now easily do so in the AWS console. One shard supports up to 1MB/sec of input data; 2MB/sec of output data as well as supporting 1000 records per second.
The costs associated per shard are $0.015 / shard hour and $0.014 / 1 million PUT payload units (these are always rounded up to the nearest 25KB).
At the lowest possible configuration, that is 1 shard with less than 1 million PUT units per month, you would be charged per month (assuming a month with 31 days) $11.16 for the shard plus a negligible amount for the PUT units. In an actual scenario where you may collect data across thousands of devices every so many seconds, this cost will quickly increase.
A Real Example of Kinesis Costs
Let’s break this down to help better understand the costs. A great example is, say you have 200 devices that in aggregate send an average of 10000 records per second. Each record is 250 bytes with one consumer of the stream (remember the 2MB output per second), you would need 10 shards. This leads to a cost of:
Shard Charge per month: $0.015 (shard hour) x 24 (hours) x 31 (days) x 10 (shards) = $111.60
PUT Payload Units per month: 1 25KB PUT Payload Unit (per record) x 10000 records (per second) x 60 (seconds) x 60 (minutes) x 24 (hours) x 31 (days) = 26,784 million
Total Monthly Charge: 26,784 x $0.014 = $374.98
This equals a total of $486.58 per month for Kinesis alone. Notice however that I said that each record is 250 bytes but each PUT record counts as 25KB, that’s a lot of wasted space!
The trick here is to make sure you aggregate your data as much as possible if you can.
Assume you can aggregate each device’s 50 records per second into one request, this would equate to 12.5kb per Payload Unit and reduce the amount of records sent per second to 200.
Because of the reduced number of records being sent per second, this drops the needed amount of shards from 10 to 3.
Kinesis Pricing Break Down
The new total would be as follows:
Shard Charge per month: $0.015 (shard hour) x 24 (hours) x 31 (days) x 3 (shards) = $33.48
PUT Payload Units per month: 1 25KB PUT Payload Unit (per record) x 200 records (per second) x 60 (seconds) x 60 (minutes) x 24 (hours) x 31 (days) = 535.68 million
Total Monthly Charge: 535.68 x $0.014 = $7.50
Your new monthly total would merely be $40.98, that’s almost an order of magnitude of difference. Now, understandably, the ability to aggregate the data in such a manner may not be possible. However, it would be well worth your time trying to package up your data as much as possible.
Estimating Lambda Costs
The costs associated with Lambda depend on the amount of memory you dedicate to it. In addition, it depends on the time it takes to run (charged in durations of 100ms), and the number of invocations per month.
Generally, you want your Lambdas to run no more than a couple of seconds but there is a tradeoff in cost as the more memory you allocate (which also increases its CPU power proportionally) increases the cost per 100ms that it runs.
This time associated cost is charged in GB-seconds. The good news is that every AWS account gets so many free seconds per month of Lambda execution time. (This is depending on your configuration.) Every account gets 1 million free requests per month.
A Real Example of Lambda Costs
Here’s a great example, let’s say your Lambda on average takes about 2 seconds. With our 3 shards from above this will equate to 3 Lambdas running concurrently (one per shard). Therefore the request rate is 1.5 requests per second (3 shards / 2 seconds). At this rate, 4,017,600 requests per month. The pricing level for a 256MB Lambda is $0.000000417 per 100ms with 1,600,000 free seconds per month. Requests are charged at $0.20 per 1 million.
Lambda Pricing Break Down
The pricing break down for the charges would be as follows:
Total Number of Seconds: 4,017,600 (requests) x 2 (seconds) = 8,035,200
GB-seconds: 8,035,200 (seconds) x 256/1024 = 2,008,800
Charged GB-seconds: 2,008,800 – 1,600,000 (free tier) = 408,800
Compute Charge per month: $0.000000417 x 10 (seconds) x 408,800 = $1.70
Charged Requests: 4,017,600 (requests) – 1,000,000 (free tier) = 3,017,600
Requests Charge per month: 3.0176 * $0.20 = $0.60
Total Monthly Charge: $2.30
Now, what if you needed to bump up the Lambda’s memory to reduce processing time?
Let’s say you change the memory to 512MB and that reduces the average running time down to 1.5 seconds. Here’s the changes in cost:
Total Number of Seconds: 5,356,800 (requests) x 1.5 (seconds) = 8,035,200
GB-seconds: 8,035,200 (seconds) x 512/1024 = 4,017,600
Charged GB-seconds: 4,017,600 – 800,000 (free tier) = 3,217,600
Compute Charge per month: $0.000000834 x 10 (seconds) x 3,217,600 = $26.83
Charged Requests: 5,356,800 (requests) – 1,000,000 (free tier) = 4,356,800
Requests Charge per month: 4.3568 * $0.20 = $0.87
Total Monthly Charge: $27.70
Thus, for a half of a second in processing time savings, you spend more than 10x the amount. Such a decrease in time may actually be worth the extra cost as many projects that solve Big Data problems often need to report in as real time as possible.
Whatever solution you’re designing, writing, or even maintaining, it’s well worth your time in estimating your costs. Remember, minor changes can lead to a difference in price tag.