Python Forum
If I open a file write or append, is the file loaded into RAM?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
If I open a file write or append, is the file loaded into RAM?
#1
Recently there was a thread about opening enormous files, > 300GB and processing them line by line. People often want to do this.

My question is: when we open a file write mode or append mode, is the whole file loaded into memory, or do we just get a pointer to the last line of the file?

If I do this, the whole file will be loaded:

test_file = '/home/peterr/temp/bits&bytes.txt'
with open(test_file, 'r') as infile:
    print(f'infile size = {sys.getsizeof(infile)}')
    data = infile.read()    
    print(f'data size = {sys.getsizeof(data)}')
If you open the file with a generator, you get 1 line at a time, and the generator uses hardly any memory, 232 bytes.

I think, if you open a file in append mode, what you actually get is a pointer to the last line of the file. I think you don't load the whole file into memory.

Not many people have northwards of 350GB RAM in their laptop. You may not even have 350GB of free drive space. It could be that the file has only 1 big line. I know the file can be loaded in chunks and we can process the chunks.

I tried the following and neither infile nor outfile are very big, just 216 bytes.

import sys

test_file = '/home/peterr/temp/bits&bytes.txt' 
# open a file append and add lines, 100 at a time
for j in range(5):
    # outfile size is always 216
    with open(test_file, 'a') as outfile:
        print(f'outfile size = {sys.getsizeof(outfile)}')
        for i in range(1,101):
            outfile.write(f'Line {i}: Hello Binary World!\n')
            print(f'outfile size = {sys.getsizeof(outfile)}')
    # infile size is always 216
    with open(test_file, 'r') as infile:
        print(f'infile size = {sys.getsizeof(infile)}')
        # data size obviously gets bigger each time around
        data = infile.read()    
        print(f'data size = {sys.getsizeof(data)}')
So I think, if I did this, I would never overload RAM. Assuming I have enough drive space to accommodate another 350GB file, will this work without overloading RAM? Later on delete the original file if it is no longer needed.

def gen_loader(test_file):
    with open(test_file, 'r', ) as infile:
        for line in infile:
            yield line

test_file = '/home/peterr/temp/bits&bytes.txt'  # big daddy of a file
result_file = '/home/peterr/temp/result_file.txt' # to begin with result_file does not exist
with open(result_file, 'a') as res:          
    for line in gen_loader(test_file):
        # process line somehow
        newline = 'XYZ' + line
        res.write(newline)
    print(f'res size = {sys.getsizeof(res)}') # 216
Have understood the technicalities of file input and output correctly?
Reply
#2
(Jan-11-2026, 12:40 AM)Pedroski55 Wrote: My question is: when we open a file write mode or append mode, is the whole file loaded into memory, or do we just get a pointer to the last line of the file?

Simply open()ing a file doesn't load any part of the data of the file into memory, whether it's opened in read, write, or append mode. You have to somehow cause a read() before the data is read.

If you don't read the whole thing into memory, then it's not stored in memory. You could iterate over the file line-by-line with a generator or several kinds of loops and the total size of the file doesn't matter to RAM.

Likewise, if you're just writing line-by-line in a loop (and the lines are reasonably sized), the whole file is not stored in RAM. Constantly closing and re-opening the file in append mode doesn't help.

(Jan-11-2026, 12:40 AM)Pedroski55 Wrote: I tried the following and neither infile nor outfile are very big, just 216 bytes.

infile and outfile are simply filehandle objects. They will not increase in size. You can't estimate the amount of RAM that is being used by looking at those variables.

If instead you did:
infile = open(file)
data = read(infile)
Then infile will always be the same size, but data could be huge if the file is large.
Reply
#3
def gen_loader(test_file):
    with open(test_file, 'r', ) as infile:
        for line in infile:
            yield line
Die Geneator-Funktion benötigt pro Iteration nur RAM für eine Zeile.
Mein erster Gedanke bei sowas ist immer, wie man das ausnutzen kann.

Um so ein Problem herbeizuführen, kann man eine 100 GiB große Datei erstellen, die leer ist. Unter Windows hab ich das nicht getestet. Unter Linux wird der Speicherplatz nicht belegt, nur reserviert. D.h. es werden keine Bytes geschrieben, aber wenn man die Datei liest, bekommt man nur null-bytes.

Ich hatte mal ein Testprogramm geschrieben und mittels ulimit Python auf 4 GiB RAM beschränkt.
Das kommt dabei raus:
Error:
[deadeye@nexus ~]$ ulimit -Sv 4194304 ; python mm.py empty.bin created empty.bin deleted Traceback (most recent call last): File "/home/deadeye/mm.py", line 34, in <module> for line in line_printer(): ~~~~~~~~~~~~^^ File "/home/deadeye/mm.py", line 29, in line_printer for line in fd: # <- MemoryError wird hier ausgelöst ^^ MemoryError
Code:
import os

from contextlib import contextmanager
from pathlib import Path


@contextmanager
def create_file() -> Path:
    file = Path("empty.bin")
    file.unlink(missing_ok=True)
    file.touch()

    # 100 GiB 0-bytes
    # nur mit Linux getestet
    os.truncate(file, 100 * 1024**3)
    print(f"{file} created")

    try:
        yield file
    finally:
        file.unlink()
        print(f"{file} deleted")


def line_printer():
    with create_file() as file:
        with file.open("rb") as fd:
            for line in fd:  # <- MemoryError wird hier ausgelöst
                yield line


if __name__ == "__main__":
    for line in line_printer():
        pass
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#4
Hi,

open(filename) as file does exactly what the function name is implying: it opens the file and returns a file object. See Python documentation on the open function. No file content is read at this point.

The file object offers various methods to deal with the file object like reading from the file. file_object.read()and file_object.readlines() reads the complete file into memory, file_object.readline() reads a single line, respectively reads until a new line character is reached, for line in file_object: iterates line by line over the file, file_object.read(size) reads a portion of the file defined by size. Acc. to Python's documentation, trying to read files twice as big as the available memory can cause problems (for whatever reason - feel free to investigate yourself :-) ). For details and more method of file objects, see Pythons' documentation.

Regards, noisefloor
Reply
#5
Danke für die Hinweise! Thanks for the tips.

Quote:Die Geneator-Funktion benötigt pro Iteration nur RAM für eine Zeile.
Mein erster Gedanke bei sowas ist immer, wie man das ausnutzen kann.

Ja genau, was vonnöten ist, ist eine Art Revers-Generator, der dann die bearbeiteten Zeilen eins für eins in die Ausgangsdatei schreibt. Ich denke, das ist was append tut.

The generator is not a memory problem, as long as the lines are not enormously long. We could use readlines() in the generator.

Advice from linuxquestions.org:
Quote:We used to open text file in append mode in C language like this:

Quote:file_ptr = fopen("example.txt", "a");

Quote:Opening a file in append mode will not load the whole file into memory.

What I am wondering is, how much memory will appending a line to a file use? As I understand it, appending does not load the entire file into memory. I think it just adds a line at the end of the file. If that is correct, then reading a very big file line by line and saving that line to another file will never overtax the computer's RAM. That could fill up the computer's hard drive, if the files are >300GB as recently mentioned, but that will not use much RAM. So don't try that unless you have a lot of spare hard drive! (I think we should keep the old file, at least until we are happy with the outcome.)

But I'm thinking, if we read a big file with a generator by lines, process each line, then append that line to an output file, then close the output file, at least RAM will never be overtaxed.

However, I don't know anything about the technical side of computer memory and write() operations! And I don't have 300GB spare to try this!
Reply
#6
(Jan-12-2026, 12:43 AM)Pedroski55 Wrote: As I understand it, appending does not load the entire file into memory.

You're correct. Neither does writing a file (whether in append mode or not).

What eats up the memory is storing the contents of the entire file (either accidentally or on purpose).
Reply
#7
@bowlofred

If I open the file mode='w', a new file is created. Continuous writing will eventually overload RAM, because the file is still open. That is not good here. So open mode='a'.

What I don't know is: if I open a file mode='a', I believe I get a pointer to the end of the file. (This is CPython)

Now I imagine what happens when I call:

file.write(text)
At the end of the file is an EOF character, whatever that looks like. The operating system somehow writes my text to the end of the file and then a new EOF marker. The actual contents of the file are never loaded into RAM, as I understand it.

Is that what happens?

If that is the case, then dealing with any size file by opening mode='a' will never overload RAM. Maybe it will overload your hard drive!
Reply
#8
Hi,

your guessing of what a and w are doing is wrong. Again, it's all written in the Python Documentation: wopen a file for writing. If the file had any content before, it will be deleted and new content is written. a appends new content to the end of the file. If file is not existing (and only then), a new file is created. For a, Python sets a pointer to the end of the file so it knows where to start writing. Python's file objects what a tell method returning the current position in the file, seek jumps to a position in the file. These mehtod's are hardly used in "normal" programming unless you need to do a deep dive into the weeds of low-level operations on a file content.

To see when the content is actually written into the file is probably more an implementation details and, in addition to that, depends on the operating system and possibly the file system. Linux uses for example a kernel-side write buffer (can of course be disabled) so even when a program thinks content is written, it may be still in Linux's write buffer and is scheduled to be flushed to disc at a later point. As far as I know, Windows does not use a write buffer (which is one of the reason why it is fairly safe to pull a USB pen drive from Windows any time when not being actively access while it may result in data loss on Linux if the write buffer wasn't flushed by the kernel or unmounting the device or manual flushing from the command line).

Regards, noisefloor
Reply
#9
Try using the buffering argument in open.
with open(test_file, 'r', buffering=X) as infile:
Set buffering to 1 for line buffering in text mode. Set buffering to 0 for unbuffered mode. Buffering defaults to -1 which picks a buffering mode based on the file mode and environment. What is your environment? Maybe your default is fully buffered.
Reply
#10
(Jan-13-2026, 01:08 AM)Pedroski55 Wrote: If I open the file mode='w', a new file is created. Continuous writing will eventually overload RAM, because the file is still open. That is not good here. So open mode='a'.

This is incorrect. The contents of the file are not stored in RAM. Each individual write is, but the buffer is reused. So writing GB of data to a file in a loop does not consume GB of RAM.

If you do the equivalent of
with open("mybigfile.txt", mode="w") as f:
    for i in range(1_000_000_000)
        f.write("some data to write")
It takes an identical amount of RAM as if you'd opened the file in append mode (or if you repeatedly re-opened the file in append mode).


(Jan-13-2026, 01:08 AM)Pedroski55 Wrote: The actual contents of the file are never loaded into RAM, as I understand it.

That is always true. RAM stores the contents of individual reads and writes, and it holds file/stream buffers to make writing more efficient. Usually these buffers are not huge compared to the size of a machine. The contents of a file are only held in RAM if you take some special step to do so like

f = open("mybigfile")
all_the_data = f.read()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  open a text file using list() Pedroski55 2 114 Feb-25-2026, 06:57 PM
Last Post: noisefloor
  how to write/overwrite data in a txt. file according to inp Quinn 2 1,586 Aug-12-2025, 04:20 PM
Last Post: Quinn
Question [SOLVED] Open file, and insert space in string? Winfried 7 2,523 May-28-2025, 07:56 AM
Last Post: Winfried
  How can I write formatted (i.e. bold, italic, change font size, etc.) text to a file? JohnJSal 13 36,670 May-20-2025, 12:26 PM
Last Post: hanmen9527
  How to write variable in a python file then import it in another python file? tatahuft 4 2,252 Jan-01-2025, 12:18 AM
Last Post: Skaperen
  [SOLVED] [Linux] Write file and change owner? Winfried 6 3,192 Oct-17-2024, 01:15 AM
Last Post: Winfried
  Trying to open depracated joblib file mckennamason 0 2,156 Sep-19-2024, 03:30 PM
Last Post: mckennamason
  JSON File - extract only the data in a nested array for CSV file shwfgd 2 2,250 Aug-26-2024, 10:14 PM
Last Post: shwfgd
  FileNotFoundError: [Errno 2] No such file or directory although the file exists Arnibandyo 0 2,846 Aug-12-2024, 09:11 AM
Last Post: Arnibandyo
  "[Errno 2] No such file or directory" (.py file) IbrahimBennani 13 14,740 Jun-17-2024, 12:26 AM
Last Post: AdamHensley

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020