Christian Heimes | 9 Jan 00:51 2008
Picon

PEP: Lazy module imports and post import hook

I've attached the first public draft of my first PEP. A working patch
against the py3k branch is available at http://bugs.python.org/issue1576

Christian
PEP: 369
Title: Lazy importing and post import hooks
Version: $Revision$
Last-Modified: $Date$
Author: Christian Heimes <christian(at)cheimes(dot)de>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 02-Jan-2008
Python-Version: 2.6, 3.0
Post-History:

Abstract
========

This PEP proposes enhancements for the import machinery to add lazy
importing and post import hooks to Python.

It is intended primarily to support the wider use of abstract base
classes that is expected in Python 3.0.

Rationale
=========

In current Python an import always loads a module from the disk even if
the importing module never actually uses the module named in the import
statement. It requires some extra code to conditionally import modules
or the unnecessary imports can slow down a small script.

Embedding import statements inside functions is no solution, as doing
so invokes the import machinery every time the function is called.
Hiding the import inside a function also makes the modules dependencies
less clear.

Python also has no API to hook into the import machinery and execute code
*after* a module is successfully loaded. The import hooks of PEP 302 are
about finding modules and loading modules but they were not designed to
as post import hooks.

 - An import always loads the module from the disk which may cause a 
   considerable speed impact on the execution time of a small script.

 - Conditional imports make the code harder to read and may lead to slow
   and ugly function level imports.

 - Python can't notify code when a module is loaded.

Use cases
=========

A use case for a post import hook is mentioned in Nick Coghlan's initial
posting [1]_. about callbacks on module import. It was found during the
development of Python 3.0 and its ABCs. We wanted to register classes 
like decimal.Decimal with an ABC but the module should not be imported
on every interpreter startup. Nick came up with this example::

    <at> imp.when_imported('decimal')
   def register(decimal):
       Inexact.register(decimal.Decimal)

The function ``register`` is registered as callback for the module named
'decimal'. When decimal is imported the function is called with the module
object as argument.

While this particular example isn't necessary in practice, (as
decimal.Decimal will inherit from the appropriate abstract Number base
class in 2.6 and 3.0), it still illustrates the principle.

Existing implementations
========================

There are two major implementations for lazy imports in the Python world.

PJE's peak.util.imports [3] supports lazy modules an post load hooks. My
implementation shares a lot with his and it's partly based on his ideas.

Zope 3's zope.deferredimport doesn't have post import hooks but it has
additional methods for deprecation warnings.

Post import hook implementation
===============================

Post import hooks are called after a module has been loaded. The hooks
are callable which take one argument, the module instance. They are 
registered by the dotted name of the module, e.g. 'os' or 'os.path'.

The callable are stored in the dict ``sys.post_import_hooks`` which
is a mapping from names (as string) to a list of callables or None.

States
------

No hook was registered
''''''''''''''''''''''

sys.post_import_hooks contains no entry for the module

A hook is registered and the module is not loaded yet
'''''''''''''''''''''''''''''''''''''''''''''''''''''

The import hook registry contains an entry 
sys.post_import_hooks["name"] = [hook1]

A module is successfully loaded
'''''''''''''''''''''''''''''''

The import machinery checks if sys.post_import_hooks contains post import
hooks for the newly loaded module. If hooks are found then the hooks are
called in the order they were registered with the module instance as first
argument. The processing of the hooks is stopped when a method raises an
exception. At the end the entry for the module name is removed from
sys.post_import_hooks, even when an error has occured.

A module can't be loaded
''''''''''''''''''''''''

The import hooks are neither called nor removed from the registry. It may be
possible to load the module later.

A hook is registered but the module is already loaded
'''''''''''''''''''''''''''''''''''''''''''''''''''''

The hook is fired immediately. 

C API
-----

New PyImport_* API functions
''''''''''''''''''''''''''''

PyObject* PyImport_GetPostImportHooks(void)
    Returns the dict sys.post_import_hooks or NULL

PyObject* PyImport_NotifyModuleLoaded(PyObject *module)
   Notify the post import system that a module was requested. Returns the
   module or NULL if an error has occured.

PyObject* PyImport_RegisterPostImportHook(PyObject *callable, PyObject *mod_name)
   Register a new hook ``callable`` for the module ``mod_name``

The PyImport_PostImportNotify() method is called by PyImport_ImportModuleLevel()::

   PyImport_ImportModuleLevel(...)
   {
        ...
        result = import_module_level(name, globals, locals, fromlist, level);
        result = PyImport_PostImportNotify(result);
        ...
   }

Python API
----------

The import hook registry and two new API methods are exposed through the ``sys``
and ``imp`` module.

sys.post_import_hooks
    The dict contains the post import hooks:  {"name" : [hook1, hook2, ...], ...}

imp.register_post_import_hook(hook, name)

imp.notify_module_loaded(module) -> module

The when_imported function decorator is also in the imp module,
which is equivalent to:

def when_imported(name):
    def register(hook):
        register_post_import_hook(hook, name)
    return register

Lazy import implementation
==========================

Lazy import (also known as deferred import) makes a module object available
without locating and loading the actual file for the module. The real module
is loaded upon the first attribute access using the standard import mechanism.

Only a limited set of attributes can be read w/o loading the real module, that
is ``__name__`` and ``__lazy_import__``. The former variable is used to load
the actual module while the second signals the lazyness of the module. It's
not required for the C implementation but it was added for user implementation
of lazy modules as requested by PJE <<Reference here would be good>>.

Every read attempt to another attribute or every write attempt causes the
real module to be loaded. If the load fails a ``LazyImportError`` (subclass of
``ImportError`` is raised and future access of the module object will raise
the same error.

The real module doesn't replace the lazy module. References to the lazy module
are still valid and don't cause another read attempt. The implementation
assigns real->md_dict lazy->md_dict (the __dict__ attributes) so that every read
and write to the former lazy module ends up in the real module's namespace
__dict__. The code also tries to unload the real module but it may not be
possible when e.g mod_a loads mod_b, mod_b loads mod_c and mod_c import mod_a
again. This doesn't cause a problem with the namespace dict but
the identity check ``mod_c.mod_a is mod_a`` may be false.

A puer Python implementation of the loader code may look like this (pseudo code)::

    lazy = sys.modules[name]
    del sys.modules[name]
    real = __import__(name)
    lazy.__dict__ = real.__dict__
    sys.modules[name] = lazy

The real module or an imported module by the real module may keep a reference
to the real module instance. Because both the formerly lazy module instance
and the real module share the same __dict__ every modification on one module
is instantly available on the other object.

C API
-----

The module object struct gains two more entries. ``md_name`` holds the name of
a lazy module (the __name__ attribute) and ``md_lazy`` signals the import
status.

PyModuleType changes
''''''''''''''''''''

typedef enum {
        Py_MOD_INVALID = -1,
        Py_MOD_LOADED,
        Py_MOD_LAZY,
        Py_MOD_KEEP_DICT
} PyModule_State;

typedef struct {
        PyObject_HEAD
        PyObject *md_dict;
        PyObject *md_name;
        PyModule_State md_lazy;
} PyModuleObject;

PyObject * PyModule_NewLazy(const char *name)
    Creates a new lazy module instance

int PyModule_IsLazy(PyObject *module)
    Checks if the module is lazy. The function first checks module->md_lazy.
    If ``md_lazy`` is Py_MOD_LOADED it also checks the attribute 
    __lazy_import__.

Py_MOD_INVALID
   real module can't be loaded, further attribute access raises an error
Py_MOD_LOADED
   module is loaded
Py_MOD_LAZY
   module is lazy, write and read access except __name__ and
   __lazy_import__ will load the real module.
Py_MOD_KEEP_DICT
   Intermediate state of a real module, md_dict isn't cleared

The last state is requires to prevent ``module_dealloc`` from replacing the
values of the module dict with None.

Python API
----------

``__lazy_import__`` module attribute
    The module attribute ``__lazy_import__`` can be used by 3rd party
    implements of lazy modules to signal the laziness of a module.

imp.is_lazy(mod) -> bool
    Checks if the module is lazy, falls back to ``__lazy_import__``

imp.import_lazy(name) -> module instance (lazy)
    Imports a module lazy, e.g. ``import_lazy("spam.ham")`` puts
    *spam.ham* in sys modules and returns the *spam.ham* module
    with actually loading it.

imp.new_lazy_module(name) -> module instance
    Create a new lazy module instance w/o putting it into sys.modules

imp.when_imported(name) -> decorator function
   for  <at> when_imported(name) def hook(module): pass

Open issues
===========

Nick: There also needs to be a discussion of the import lock and potential hidden
deadlock issues. Specifically, the first access to the lazy module that causes
the real module to be loaded will attempt to acquire the import lock. Carelessly
mixing lazy importing with threaded code is a recipe for trouble

Backwards Compatibility
=======================

The new features and API don't conflict with old import system of Python and 
don't cause any backward compatibility issues for most software. However
systems like PEAK and Zope which implement their own lazy import magic need
to follow some rules.

The post import hook and lazy modules were carefully designed to cooperate
with existing systems. It's the suggestion of the PEP author to replace
own on-load-hooks with the new hook API. The alternative lazy or deferred
imports will still work but the implementations must call the 
``imp.notify_module_loaded`` function.

Reference Implementation
========================

A reference implementation is already implemented and working. It still
requires some cleanups, documentation updates and additional unit tests.

Acknowledgments
===============

Nick Coghlan, for proof reading and the initial discussion
Phillip J. Eby, for his implementation in PEAK and help with my own implementation

Copyright
=========

This document has been placed in the public domain.

References
==========

.. [1] Interest in PEP for callbacks on module import
   http://permalink.gmane.org/gmane.comp.python.python-3000.devel/11126

.. [2] PEP 302: New Import Hooks
   http://www.python.org/dev/peps/pep-0302/

.. [3] peak.utils.imports
   http://svn.eby-sarna.com/Importing/peak/util/imports.py?view=markup

.. [4] zope.deferredimport
   http://svn.zope.org/zope.deferredimport/trunk/src/zope/deferredimport/


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
_______________________________________________
Python-Dev mailing list
Python-Dev <at> python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org

Gmane