Myghty Documentation

Version: 1.2 Last Updated: 07/07/10 12:55:17

View: Paged | One Page

Table of Contents

Previous: Session | Next: Advanced Resolver Configuration

Data Caching

Caching a Component's Output

Programmatic Interface

More on Caching

Cache Options

Cache Types

Myghty provides the ability to cache any kind of data, including component output, component return values, and user-defined data structures, for fast re-retrieval. All components, whether file-based or subcomponent, are provided with their own cache namespace, which in turn stores any number of key/value pairs. Included with Myghty are implementations using files, dbm files, direct in-memory dictionaries, and memcached. User-defined implementations are also supported and can be specified at runtime. The particular implementation used can optionally be specified on a per-key basis, so that any single namespace can store individual values across any number of implementations at the same time.

Caching is generally used when there are process-intensive and/or slow operations that must be performed to produce output, and those operations also have little or no dependencies on external arguments, i.e. their value is not likely to change in response to request variables. Examples of things that are good for caching include weather information, stock data, sports scores, news headlines, or anything other kind of data that can be expensive to retrieve and does not need to be updated with real-time frequency.

The front-end to the mechanism is provided by the myghty.cache package, and can be configured and used through global configuration parameters, per-component flags, and a programmatic interface.

Caching a Component's Output

The simplest form of caching is the caching of a component's return value and output content via flags. An example using a subcomponent (subcomponents are explained in How to Define a Subcomponent):

<%def heavytext>
    <%flags>
        use_cache = True
        cache_type = 'dbm'
        cache_expiretime = 30
    </%flags>
    <%init>
        data = big_long_query()
    </%init>
    Your big query returned: <% data.get_info() %>
</%def>

In this example, the component's output text and its return value (if any) will be stored the first time it is called. Any calls to this component within the next 30 seconds will automatically return the cached value, and the %init section and body will not be executed. At the moment 30 seconds have elapsed, the first call to occur within this new period will result in the component executing in full again and recreating its output text and return value. Subsequent calls that occur during this second execution will continue to return the prior value until the new value is complete. Once the new value is complete, it is stored in the cache and the expiration counter begins again, for the next 30 seconds.

Note that the parameter cache_type is set to 'dbm', which indicates that dbm-based caching is used. This is the default setting when a data_dir parameter is configured with the Myghty application.

For components that contain a %filter section, the result of filtering is stored in the cache as well. This allows the cache to be useful in limiting the execution of a process-intensive or time-consuming filtering function.

When a component is recompiled, its cache contents are automatically expired, so that the cache can be refreshed with the value returned by the newly modified component. This means it is safe to set a cache setting with no expire time at all for a component whose output never changes, and in fact such a component only executes once per compilation and never at all again, for the life of the cache.

back to section top

Programmatic Interface

The traditional caching interface looks like this:

<%init>
    def create_data():
        return get_some_data()

    cache = m.get_cache()
    data = cache.get_value('mydata', type='memory', 
        createfunc=create_data, expiretime=60)
</%init>

Where "mydata" is a key that the data will be keyed upon, the type of cache is in memory only, the create_data() function is used to create the initial value of 'mydata' as well as regenerating it when it is expired, and the expire time is 60 seconds.

The creation function argument is optional, and the cache can be populated externally as well:

<%init>

    cache = m.get_cache()
    if not cache.has_key('mydata'):
        cache.set_value('mydata', get_some_data(), expiretime=60)

    data = cache.get_value('mydata')

</%init>

This is a more familiar way to check a dictionary for a value and set it. However, the previous "creation function" methodology has a significant advantage, in that it allows the cache mechanism to execute the function in the context of a global "busy" lock, which prevents other processes and threads from executing the same function at the same time, and instead forces them to retrieve the previous expired value until the new value is completed. If no previous value exists, they all wait for the single process/thread to create the new value. For a creation function that is slow or uses a lot of resources, this limits those resources to only one concurrent usage at a time, and once a cache value exists, only one request will experience any slowness upon recreation.

To programmatically cache the output text of a component, use the m.cache_self() method on request, which is a reentrant component-calling method:

<%init>
    if m.cache_self(key="mykey"):
        return
</%init>

# rest of component

For an uncached component, the cache_self method will execute the current component a second time. Upon the second execution, when the cache_self line is encountered the second time, it returns false and allows the component to complete its execution. The return value and output is cached, after being sent through any output filters present. Then returning up to the initial cache_self call, it returns true and delivers the components output and optionally its return value. Filtering is also disabled in the component as it should have already occurred within the original caching step. The process for an already cached component is that it simply returns true and delivers the component output.

To get the component's return value via this method:

<%init>
    ret = Value()
    if m.cache_self(key="mykey", retval = ret):
        return ret()

    # rest of component
    return 3 + 5
</%init>

A value object is used here to pass a return value via a method parameter. The return value is simply the cached return value of the component.

Generally, the %flags method of caching a component's output and return value is a lot easier than the programmatic interface. The main advantage of the programmatic interface is if the actual key is to be programmatically decided based on component arguments it can be figured out at runtime and sent as the "key" argument. This also applies if any of the other properties of the cache are to be determined at run-time rather than compile time.

back to section top

More on Caching

The cached information may be shared within the scope of one process or across multiple processes. Synchronization mechanisms are used to insure that the regeneration is only called by one thread of one process at a time, returning the expired value to other processes and threads while the regeneration is occuring. This maximizes performance even for a very slow data-regeneration mechanism. In the case of a non-memory-based cache, an external process can also access the same cache.

Note that Myghty only includes thread-scoped synchronization for the Windows platform (contributors feel free to contribute a Win32 file locking scheme). The "file" and "dbm" cache methodologies therefore may not be entirely synchronized across multiple processes on Windows. This only occurs if multiple servers are running against the same cache since Windows doesnt have any forking capability and therefore an Apache server or similar is only using threads.

Caching has an assortment of container methodolgies, such as MemoryContainer and DBMContainer, and provides a base Container class that can be subclassed to add new methodologies. A single component's cache can have containers of any number of different types and configurations.

Caching of the URI resolution step can also be done to improve performance. See use_static_source for more information on using the URICache.

back to section top

Cache Options

Caching options are all specified as Myghty configuration parameters in the form cache_XXXX, to identify them as options being sent to the Cache object. When calling the m.get_cache() method, parameters may be specified with or without the cache_ prefix; they are stripped off. While some cache options apply to the Cache object itself, others apply to specific forms of the Container class, the two current classes being MemoryContainer and DBMContainer.

The full list of current options is as follows:

cache_container_class (class object)	default: None used by: Cache
This is a class object which is expected to be a subclass of myghty.container.Container, which will be used to provide containment services. This option need only be used in the case of a user-defined container class that is not built into the static list of options in the Cache class. To use one of the built in container classes, use cache_type instead.
cache_data_dir (string)	default: same as Interpreter data_dir used by: DBMContainer
This is the data directory to be used by the DBMContainer (file-based cache) to store its DBM files as well as its lockfiles. It is set by default to be the same as the data_dir parameter for the Myghty application. As it creates its own subdirectories for its files (as does Interpreter), the files are kept separate from Myghty compiled pages.
cache_dbm_dir (string)	default: cache_data_dir + '/container_dbm' used by: DBMContainer
This value indicates the directory in which to create the dbm files used by DBMContainer; if not present, defaults to a subdirectory beneath cache_data_dir.
cache_dbmmodule (module)	default: anydbm used by: DBMContainer
DBMContainer uses dbm files via the Python built-in anydbm module, which searches for a platform specific dbm module to use. Any Python dbm module can be specified here in place of it. To specify this option under mod_python as an Apache configuration directive, use this format: PythonOption MyghtyCacheDbmmodule "__import__('gdbm')"
cache_debug_file (file object)	default: None used by: Cache
If pointing to a Python file object, container operations performed by the caching system will be logged, allowing the viewing of how often data is being refreshed as well as how many concurrent threads and processes are hitting various containers. When running with ApacheHandler or CGIHandler, this parameter can be set to the standard Apache log via the parameter log_cache_operations.
cache_lock_dir (string)	default: cache_data_dir + '/container_dbm_lock' used by: DBMContainer
This value indicates the directory in which to create the lock files used by DBMContainer; if not present, defaults to a subdirectory beneath cache_data_dir.
cache_url (string)	default: None used by: MemcachedNamespaceManager
The memcached URL to connect to for memcached usage, e.g. "localhost:11211".
cache_type (module)	default: file or memory used by: Cache
This is the type of container being used. Current options are file, dbm, memory, and ext:memcached. This option defaults to dbm if a data_dir option is present in the current application, else uses memory.
log_cache_operations (boolean)	default: False used by: ApacheHandler or CGIHandler
Sets the Cache cache_debug_file argument to point to the Apache error log or standard error output. See cache_debug_file.

back to section top

Cache Types

'dbm' - uses the anydbm module combined with cPickle to store data.
'file' - uses the cPickle module combined with regular file access to store data. This method may be faster than 'dbm' if the entire contents of the file are retrieved often, whereas dbm is faster for pulling a single key out of a larger set of data.
'memory' - uses a regular Python dictionary. Speed is the fastest, but the cache is not useable across processes nor is it persistent across server restarts. It also has the highest utilization of RAM.
'ext:memcached' - uses memcached for caching and requires the Python memcached module to be installed.

back to section top

Previous: Session | Next: Advanced Resolver Configuration