[Python-Dev] A draft PEP for a new memory model

Paul Barrett Barrett@stsci.edu
Fri, 14 Sep 2001 10:26:26 -0400
Previous message: [Python-Dev] Re: PEP 269
Next message: [Python-Dev] A draft PEP for a new memory model
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The following is the beginnings of a PEP for a new memory model for
Python.  It currently contains only the motivation section and a
description of a preliminary design.  I'm submitting the PEP in its
current form to get a feel for whether or not I should pursue this
proposal and to find out if I am overlooking any details that would
make it incompatible with Python's core implementation, i.e.
implementing it would cause too much of an affect on Python's
performance.

I do plan to implement something along these lines, but may have to
change my approach if I hear comments about this PEP to the contrary.

Cheers,
Paul



           PEP:  XXX
         Title:  A New Memory Management Model for Python
       Version:  $Revision: 1.3 $
 Last-Modified:  $Date: 2001/08/20 23:59:26 $
        Author:  barrett@stsci.edu (Paul Barrett)
        Status:  Draft
          Type:  Standards Track
       Created:  05-Sep-2001
Python-Version:  2.3
  Post-History:
      Replaces:  PEP 42


Abstract

    This PEP proposes a new memory management model to provide better
    support for the various types of memory found in modern operating
    systems.  The proposed model separates the memory object from its
    access method.  In simplest terms, memory objects only allocate
    memory, while access objects only provide access to that memory.
    This separation allows various types of memory to share a common
    interface or access object and vice versa.


Motivation

    There are three sequence objects which share similar interfaces,
    but have different intended uses.  The first is the indispensable
    'string' object.  A 'string' is an immutable sequence of
    characters and supports slicing, indexing, concatenation,
    replication, and related string-type operations.  The second is
    the 'array' object.  Like a 'list', it is a mutable sequence and
    supports slicing, indexing, concatenation, and replication, but
    its values are constrained to one of several basic types, namely
    characters, integers, and floating point numbers.  This constraint
    enables efficient storage of the values.  The third object is the
    'buffer' which behaves similar to a string object at the Python
    programming level: it supports slicing, indexing, concatenation,
    and related string-like operations.  However, its data can come
    from either a block of memory or an object that exports the buffer
    interface, such as 'mmap', the memory-mapped file object which is
    its prime justification.
    
    Each object has been used at one time or other as a way of
    allocating read-write memory from the heap.  The 'string' object
    is often used at the C programming level because it is a standard
    Python object, but its use goes counter to its intended behavior
    of being immutable.  The preferred way of allocating such memory
is
    the 'array' object, but its insistence on returning a
    representation of itself for both the 'repr' and 'str' methods
    makes it cumbersome to use.  In addition, the use of a 'string' as
    an initializer during 'array' creation is inefficient, because the
    memory is temporarily allocated twice, once for the 'string' and
    once for the 'array'.  This is particularly onerous when
    allocating tens of megabytes of memory.
    
    The 'buffer' object also has its problems, some of which have been
    discussed on python-dev.  Some of the more important ones are: (1)
    the 'buffer' object always returns a read-only 'buffer', even for
    read-write objects.  This is apparently a bug in the 'buffer'
    object, which is fixable.  (2) The buffer API provides no
    guarantee about the lifetime of the base pointer - even if the
    'buffer' object holds a reference to the base object, since there
    is no locking mechanism associated with the base pointer.  For
    example, if the initial 'buffer' is deleted, the memory pointer of
    the derived 'buffer' will refer to freed memory.  This situation
    happens most often at the C programming level as in the following
    situation:
    
        PyObject *base = PyBuffer_New(100);
        PyObject *buffer = PyBuffer_FromObject(base);
        Py_DECREF(base); 
    
    This problem is also fixable.  And (3) the 'buffer' object cannot
    easily be used to allocate read-write memory at the Python
    programming level.  The obvious approach is to use a 'string' as
    the base object of the 'buffer'.  Yet, a 'string' is immutable
    which means the 'buffer' object derived from it is also immutable,
    even if problem (1) is fixed.  The only alternative at the Python
    programming level is to use the cumbersome 'array' object or to
    create your own version of the 'buffer' object to allocate a block
    of memory.
    
    We feel that the solution to these and other problems is best
    illustrated by problem (3), which can essentially be described as
    the simple operation of allocating a block of read-write memory
    from the heap.  Python currently provides no standard way of doing
    this.  It is instead done by subterfuge at the C programming level
    using the 'string', 'array', or 'buffer' APIs.  A solution to this
    specific problem is to include a 'malloc' object as part of
    standard Python.  This object will be used to allocate a block of
    memory from the heap and the 'buffer' object will be use to access
    this memory just as it is used to access data from a memory-mapped
    file.  Yet, this hints at a more general solution, the creation of
    two classes of objects, one for memory-allocation, and one for
    memory-access.


The Model


    We propose a new memory-management model for Python which
    separates the allocation object from its access method.  This
    mix-and-match memory model will enable various access objects,
    such as 'array', 'string', and 'file', to access easily the data
    from different types of memory, namely heap, shared, and
    memory-mapped files; or in other words, different types of memory
    can share a common interface (see figure below).  It will also
    provide better support for the various types of memory found in
    modern operating systems.
    
    
         |---------------------------------------------------|
         |                  interface layer                  |
         |  -----------------------------------------------  |
         |     array    |     string    |    file    |  ...  |
         |===================================================|
         |                     data layer                    |
         |  -----------------------------------------------  |
         |  heap memory | shared memory | memory mapped file |
         |---------------------------------------------------|
    
    
    Memory Objects
    
    Modern operating systems, such as Unix and Windows, provide access
    to several different types of memory, namely heap, shared, and
    memory-mapped files.  These memory types share two common
    attributes, a pointer to the memory and the size of the memory.
    This information is usually sufficient for objects whose data uses
    heap memory, since the object is expected to have sole control
    over that memory throughout the lifetime of the object.  For
    objects whose data also uses shared and memory-mapped files, an
    additional attribute is necessary for access permission.  However,
    the issue of how to handle memory persistence across processes
    does not appear well-defined in modern OSs, but appears to be left
    to the programmer to implement.  In any case, a fourth attribute
    to handle memory persistence seems imperative.

    Access Objects

    Consider 'array', 'buffer', and 'string' objects.  Each provides,
    more or less, the same string-like interface to its underlying
    data.  They each support slicing, indexing, concatenation, and
    replication of the data.  They differ primarily in the types of
    initializing data and the permissions associated with the
    underlying data.  Currently, the 'array' initializer accepts only
    'list' and 'string' objects.  If this was extended to include
    objects that support the 'buffer interface', then the distinction
    between the 'array' and 'buffer' objects would disappear, since
    they both support the sequence interface and the same set of base
    objects.  The 'buffer' object is therefore redundant and no longer
    necessary.

    The 'string' and 'array' objects would still be distinct, since
    the 'array' object encompasses more data-types than does the
    'string' object.  The 'array' object is also mutable requiring its
    underlying data to be read-write, while the 'string' object is
    immutable requiring read-only data.  This new memory-management
    model therefore suggests that the 'string' object support the
    'buffer interface' with the proviso that the data have read-only
    permission.
    
    
    Implementation
    
    
    
    References
    
    
    
    Copyright
    
    This document has been placed in the public domain.


-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218
Previous message: [Python-Dev] Re: PEP 269
Next message: [Python-Dev] A draft PEP for a new memory model
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]