Python Tutorial

How to Read Large Text Files in Python

Memory-efficient methods to read large files in Python: file iterator, generators, chunked reading, streaming, and measuring memory usage.

Drake Nguyen

Founder · System Architect

3 min read
How to Read Large Text Files in Python
How to Read Large Text Files in Python

Efficient ways to read large file in python

When you need to read large files, loading the entire file into memory (for example using read() or readlines()) can cause high memory usage or even crashes. Below are memory-efficient patterns for how to read large text files in python without memory issues, how to stream file contents, and how to measure resource usage while processing big files.

Read line by line using the file iterator

The simplest and most common approach is to treat the file object as an iterator. This lets Python yield one line at a time, which is ideal for python read large file line by line and for streaming logs or large CSVs without buffering the whole file in memory.

file_name = 'large_file.txt'

with open(file_name, 'r', encoding='utf-8') as f:
    line_count = 0
    for line in f:            # python file iterator example
        # process each line here
        line_count += 1

print(f"Processed {line_count} lines")

Use a generator for on-demand line processing

If you want a reusable iterator that does extra work before yielding lines (filtering, decoding, or batching), wrap the logic in a generator. This is also a good pattern for python read large file line by line using generator.

def line_generator(path):
    with open(path, 'r', encoding='utf-8') as fh:
        for line in fh:
            yield line.strip()   # yield one cleaned line at a time

for ln in line_generator('large_file.txt'):
    # handle ln without storing all lines
    pass

Read in chunks when a file has very long lines

When a file contains very large single lines or binary data, read fixed-size chunks to avoid high peak memory usage. Adjust chunk_size to balance I/O overhead and memory usage.

def read_in_chunks(file_object, chunk_size=1024):
    while True:
        chunk = file_object.read(chunk_size)   # python read file in chunks with read(size)
        if not chunk:
            break
        yield chunk

with open('big_blob.dat', 'rb') as bf:   # use 'rb' for binary files
    for part in read_in_chunks(bf, chunk_size=4096):
        # process or stream the chunk
        pass

Copy or stream files without loading into memory

To copy or stream a large text file (or to transform while copying), iterate and write line by line or chunk by chunk. This avoids building large intermediate structures.

with open('source.txt', 'r', encoding='utf-8') as src, \
     open('dest.txt', 'w', encoding='utf-8') as dst:
    for line in src:
        dst.write(line)   # python read large file and write without loading into memory

Measure file size and monitor memory usage

Before processing, you can check file size and measure peak memory to evaluate options for buffering. Use os.stat() to get the file size and resource.getrusage() to inspect ru_maxrss on POSIX systems.

import os
import resource

path = 'large_file.txt'
size_bytes = os.stat(path).st_size
print(f"File size: {size_bytes / (1024 * 1024):.2f} MB")

# run your reading routine here (example: iterate the file)

print('Peak memory (KB):', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

Best practices and tips

  • Avoid readlines() for very large files; it returns a list and can cause high memory usage. (python read file without readlines)
  • Prefer the file object iterator or generators for most text-processing tasks (file iteration python, iterate over file object line by line).
  • Choose an appropriate chunk_size when streaming binary or files with huge single lines (python read file in chunks 1024 bytes or 4096 bytes are common starting points).
  • Use the with open(...) context manager to ensure files close automatically (python open file).
  • For CSVs or structured files, consider libraries that support streaming parsing to reduce peak memory (e.g., csv.reader with generator-based handling).
  • Profile peak memory (resource module) if you need to compare methods like readlines vs readline vs iterator.

Tip: For very large log or data files, streaming and processing each unit (line or chunk) immediately keeps your Python process memory-efficient and scalable.

When to choose each approach

  • Use the file iterator for normal text files with well-defined newlines (python read text file).
  • Use chunked reading for extremely long single-line files or binary streams (python chunked reading python).
  • Use generators when you want composable, lazy pipelines that process records without materializing them.

Following these patterns will help you read large files in Python reliably and with minimal memory footprint—whether you are streaming logs, processing huge CSVs line by line, or copying large binary blobs.

Stay updated with Netalith

Get coding resources, product updates, and special offers directly in your inbox.