Nov 28, 2025

Process large files at scale with AWS Lambda

Danilo Desole

4 minutes read

Multithreading lambda: process large file at scale

Background

Using serverless technologies like AWS Lambda to process large files can be challenging. Developers commonly encounter Runtime.OutOfMemory errors or face unexpectedly high costs when working with large-scale file processing.

Over the years, Virtuability has helped customers develop serverless solutions that can process millions of files of almost any size.

This blog post demonstrates how to process large documents at scale using Lambda, leveraging streaming and multithreading to reduce memory consumption and execution time.

Understanding multithreading

Multithreading is a programming technique that allows multiple threads of execution to run concurrently within a single process. In the context of file processing, this means multiple files can be handled simultaneously rather than waiting for each file to complete before starting the next one.

When processing files in AWS Lambda, multithreading offers several advantages:

I/O Bound Operations: File operations (reading from S3, writing to S3, network calls) are I/O bound, meaning the CPU spends most of its time waiting for data transfer. During this wait time, other threads can perform their work.
Better Resource Utilisation: Modern Lambda execution environments have multiple vCPUs available (depending on memory allocation). Multithreading leverages these resources effectively.
Reduced Execution Time: Processing multiple files in parallel significantly reduces the total execution time, which directly translates to cost savings in Lambda.

Multithreading works best when operations are I/O bound rather than CPU bound. File processing from S3 is an ideal use case because most of the time is spent transferring data over the network.

Python is single-threaded

Python has a unique characteristic called the Global Interpreter Lock (GIL), which means that only one thread can execute Python bytecode at a time, even on multi-core processors. This might seem like it defeats the purpose of multithreading, but there’s an important distinction to understand.

The GIL and I/O Operations

While the GIL prevents true parallel execution of Python code, it is released during I/O operations. This is crucial for our use case because:

When a thread is waiting for network I/O (like downloading from S3), the GIL is released
Other threads can execute during this waiting period
For I/O-bound tasks like file processing, Python’s threading still provides significant performance improvements

Threading vs Multiprocessing

For our file processing scenario, we use Python’s threading module rather than multiprocessing because:

I/O Bound Work: Since we’re primarily moving data to/from S3, the GIL isn’t a bottleneck
Lower Memory Overhead: Threads share the same memory space, while multiprocessing creates separate processes with duplicated memory
Simpler Communication: Threads can share data structures easily without complex inter-process communication

If the workload were CPU-intensive (like image processing or data transformation), multiprocessing would be the better choice. However, for streaming files from S3 to another destination, threading provides the optimal balance of performance and resource efficiency in Lambda’s execution environment.

Performance comparison

In this repository, you’ll find code examples for three Lambda implementations, each managing 10 identical files of 100MB each:

Sequential Lambda: Processes each file sequentially
Parallel Lambda: Uses multithreading to process files in parallel
Parallel Multipart Lambda: Uses multithreading and multipart uploads

The deployment instructions are in the README file. Once deployed, performance can be measured using CloudWatch X-Ray.

The sequential Lambda takes approximately 21 seconds to process all 10 files, averaging 2 seconds per file. This execution time becomes problematic when processing millions of files at scale.

The parallel and multipart Lambdas take 13 and 12 seconds respectively—nearly half the time of the sequential implementation.

CloudWatch XRay

Conclusion

Processing large files in AWS Lambda doesn’t have to result in memory errors or excessive costs. By leveraging multithreading and streaming techniques, we can achieve:

50% reduction in execution time: As demonstrated, parallel processing cut the execution time from 21 seconds to around 12 seconds
Better memory efficiency: Streaming files instead of loading them entirely into memory prevents Runtime.OutOfMemory errors
Cost optimisation: Faster execution times directly translate to lower Lambda costs, especially when processing millions of files

The key takeaways for developers working with large files in Lambda:

Use streaming to avoid loading entire files into memory
Leverage multithreading for I/O-bound operations like S3 transfers
Consider multipart uploads for very large files to further improve performance
Always monitor with CloudWatch and X-Ray to identify bottlenecks

While Python’s GIL might seem like a limitation, it doesn’t prevent us from achieving significant performance gains in I/O-bound serverless workloads. The techniques shown here can be applied to various scenarios where Lambda needs to process large volumes of data efficiently.

At Virtuability, we’ve helped numerous customers implement these patterns to build scalable, cost-effective serverless solutions. If you’re facing challenges with file processing at scale, these techniques provide a solid foundation for optimization.