I recently had to fix a Node.js lambda function that was abruptly stopped by AWS before it completed processing, reaching the maximum memory given to the function. On a bare metal server, you can add one more RAM module and hope for the best. However, in a serverless environment there are limitations. Particularly in AWS, the most you can give to a lambda function is 3,008 MB. You might assume it’s quite enough, but in this case that assumption would be incorrect.
This function was not complicated, but it had to parse a CSV (comma-separated values) file and for each record perform some tasks: validate, read something from DynamoDB, then do one or two writes per record, depending on some data.
The complicated thing was that it had to wait until all rows were processed. Once the overall process was completed, it returned the result: how many of the rows were processed successfully, how many gave an error and what those errors were (validation).
The even more complicated thing was that someone wanted to process a 70k records file and the 3,000 MB memory was not enough for this above process.
In order to solve this issue, we proposed a few solutions:
Of course, the first thing we thought was to just move this outside of lambda. In AWS this could be done with ECS (Elastic Container Service). This could work, but it would add yet one more service to maintain and know about. Furthermore, it could be very expensive, depending on how much memory was actually needed.
Splitting the CSV is definitely possible, but error prone. How few rows are too few, how small should the chunks be and how do we ensure this is done in a solid failure-proof way? The CSVs were uploaded by a third party. Most synchronized nightly. Probably automated. Ugly.
Probably time consuming, but easily the solution that scales best if it proves to be effective.
Implementing solution #3
The code was outdated, built on Node v6, with deep nested callbacks, albeit somewhat managed with the famous “async” library.
Step 0: Refactor
Up until recently, AWS supported versions 6 and 8.10, so we went with 8, which provides support for Promises and native async/await to get rid of those deep nested callbacks.
The initial implementation had a pretty major issue: each record was processed individually, although it contained some data that was duplicated from other records. So, there were duplicate reads from DynamoDB. A lot of them.
A better solution was to group the records by the common criteria and then process the groups in parallel, as well as processing all the records within each group in parallel. Promise and async/await for the win! The resulting code was far smaller, easier to understand, did ~90% less reads from the DB and … still reached the memory limit.
Here I have the result from a demo repo I set up to test this (processing 100 groups with 1000 records each):
$ node index.js
Memory used before processing all records: 9.17 MB
Memory used after processing all records: 92.79 MB
Process time: 3352.570ms
After investigating what could possibly be eating up all the juicy RAM, it turns out that Promise is not particularly memory friendly. Bluebird was suggested, so let’s try it.
$ npm i bluebird
const Promise = require(‘bluebird’);
Easy fix. Memory dropped. By ~30%. But the function still timed out for the big files. Not good.
Here’s the test output:
$ node index.js
Memory used before processing all records: 9.3 MB
Memory used after processing all records: 67.32 MB
Process time: 3169.421ms
It turns out that waiting for all the promises to continue means that we have all those promises stored in memory. Go figure…
So, we needed to reduce the number of requests we did in parallel. Bluebird to the rescue again, with Promise.map. Using the concurrency option of this function we can set how many concurrent items should be processed at a given time.
And the final test output:
$ node index.js
Memory used before processing all records: 9.29 MB
Memory used after processing all records: 17.34 MB
Process time: 30132.855ms
What’s even better is that with this approach the memory peak is stable. It does not increase with the number of items there are to process, because after each batch of records is processed, the Garbage Collector kicks in.
Granted, this did increase the total time it takes to process the entire set, but for this particular scenario we’re only interested in reasonable and predictable memory usage.
The real-world code uses ~400 MB of memory and processes 10k records in about 30 seconds. We deemed that acceptable as opposed to running out of memory after 5 minutes.