[Python-Dev] A draft PEP for a new memory model
Paul Barrett
Barrett@stsci.edu
Fri, 14 Sep 2001 10:26:26 -0400
The following is the beginnings of a PEP for a new memory model for
Python. It currently contains only the motivation section and a
description of a preliminary design. I'm submitting the PEP in its
current form to get a feel for whether or not I should pursue this
proposal and to find out if I am overlooking any details that would
make it incompatible with Python's core implementation, i.e.
implementing it would cause too much of an affect on Python's
performance.
I do plan to implement something along these lines, but may have to
change my approach if I hear comments about this PEP to the contrary.
Cheers,
Paul
PEP: XXX
Title: A New Memory Management Model for Python
Version: $Revision: 1.3 $
Last-Modified: $Date: 2001/08/20 23:59:26 $
Author: barrett@stsci.edu (Paul Barrett)
Status: Draft
Type: Standards Track
Created: 05-Sep-2001
Python-Version: 2.3
Post-History:
Replaces: PEP 42
Abstract
This PEP proposes a new memory management model to provide better
support for the various types of memory found in modern operating
systems. The proposed model separates the memory object from its
access method. In simplest terms, memory objects only allocate
memory, while access objects only provide access to that memory.
This separation allows various types of memory to share a common
interface or access object and vice versa.
Motivation
There are three sequence objects which share similar interfaces,
but have different intended uses. The first is the indispensable
'string' object. A 'string' is an immutable sequence of
characters and supports slicing, indexing, concatenation,
replication, and related string-type operations. The second is
the 'array' object. Like a 'list', it is a mutable sequence and
supports slicing, indexing, concatenation, and replication, but
its values are constrained to one of several basic types, namely
characters, integers, and floating point numbers. This constraint
enables efficient storage of the values. The third object is the
'buffer' which behaves similar to a string object at the Python
programming level: it supports slicing, indexing, concatenation,
and related string-like operations. However, its data can come
from either a block of memory or an object that exports the buffer
interface, such as 'mmap', the memory-mapped file object which is
its prime justification.
Each object has been used at one time or other as a way of
allocating read-write memory from the heap. The 'string' object
is often used at the C programming level because it is a standard
Python object, but its use goes counter to its intended behavior
of being immutable. The preferred way of allocating such memory
is
the 'array' object, but its insistence on returning a
representation of itself for both the 'repr' and 'str' methods
makes it cumbersome to use. In addition, the use of a 'string' as
an initializer during 'array' creation is inefficient, because the
memory is temporarily allocated twice, once for the 'string' and
once for the 'array'. This is particularly onerous when
allocating tens of megabytes of memory.
The 'buffer' object also has its problems, some of which have been
discussed on python-dev. Some of the more important ones are: (1)
the 'buffer' object always returns a read-only 'buffer', even for
read-write objects. This is apparently a bug in the 'buffer'
object, which is fixable. (2) The buffer API provides no
guarantee about the lifetime of the base pointer - even if the
'buffer' object holds a reference to the base object, since there
is no locking mechanism associated with the base pointer. For
example, if the initial 'buffer' is deleted, the memory pointer of
the derived 'buffer' will refer to freed memory. This situation
happens most often at the C programming level as in the following
situation:
PyObject *base = PyBuffer_New(100);
PyObject *buffer = PyBuffer_FromObject(base);
Py_DECREF(base);
This problem is also fixable. And (3) the 'buffer' object cannot
easily be used to allocate read-write memory at the Python
programming level. The obvious approach is to use a 'string' as
the base object of the 'buffer'. Yet, a 'string' is immutable
which means the 'buffer' object derived from it is also immutable,
even if problem (1) is fixed. The only alternative at the Python
programming level is to use the cumbersome 'array' object or to
create your own version of the 'buffer' object to allocate a block
of memory.
We feel that the solution to these and other problems is best
illustrated by problem (3), which can essentially be described as
the simple operation of allocating a block of read-write memory
from the heap. Python currently provides no standard way of doing
this. It is instead done by subterfuge at the C programming level
using the 'string', 'array', or 'buffer' APIs. A solution to this
specific problem is to include a 'malloc' object as part of
standard Python. This object will be used to allocate a block of
memory from the heap and the 'buffer' object will be use to access
this memory just as it is used to access data from a memory-mapped
file. Yet, this hints at a more general solution, the creation of
two classes of objects, one for memory-allocation, and one for
memory-access.
The Model
We propose a new memory-management model for Python which
separates the allocation object from its access method. This
mix-and-match memory model will enable various access objects,
such as 'array', 'string', and 'file', to access easily the data
from different types of memory, namely heap, shared, and
memory-mapped files; or in other words, different types of memory
can share a common interface (see figure below). It will also
provide better support for the various types of memory found in
modern operating systems.
|---------------------------------------------------|
| interface layer |
| ----------------------------------------------- |
| array | string | file | ... |
|===================================================|
| data layer |
| ----------------------------------------------- |
| heap memory | shared memory | memory mapped file |
|---------------------------------------------------|
Memory Objects
Modern operating systems, such as Unix and Windows, provide access
to several different types of memory, namely heap, shared, and
memory-mapped files. These memory types share two common
attributes, a pointer to the memory and the size of the memory.
This information is usually sufficient for objects whose data uses
heap memory, since the object is expected to have sole control
over that memory throughout the lifetime of the object. For
objects whose data also uses shared and memory-mapped files, an
additional attribute is necessary for access permission. However,
the issue of how to handle memory persistence across processes
does not appear well-defined in modern OSs, but appears to be left
to the programmer to implement. In any case, a fourth attribute
to handle memory persistence seems imperative.
Access Objects
Consider 'array', 'buffer', and 'string' objects. Each provides,
more or less, the same string-like interface to its underlying
data. They each support slicing, indexing, concatenation, and
replication of the data. They differ primarily in the types of
initializing data and the permissions associated with the
underlying data. Currently, the 'array' initializer accepts only
'list' and 'string' objects. If this was extended to include
objects that support the 'buffer interface', then the distinction
between the 'array' and 'buffer' objects would disappear, since
they both support the sequence interface and the same set of base
objects. The 'buffer' object is therefore redundant and no longer
necessary.
The 'string' and 'array' objects would still be distinct, since
the 'array' object encompasses more data-types than does the
'string' object. The 'array' object is also mutable requiring its
underlying data to be read-write, while the 'string' object is
immutable requiring read-only data. This new memory-management
model therefore suggests that the 'string' object support the
'buffer interface' with the proviso that the data have read-only
permission.
Implementation
References
Copyright
This document has been placed in the public domain.
--
Paul Barrett, PhD Space Telescope Science Institute
Phone: 410-338-4475 ESS/Science Software Group
FAX: 410-338-4767 Baltimore, MD 21218