Wordpress to Python
===================

Turns a wordpress export xml file into a python dictionary

.. note::
    taking coffee for donations :)


Usage
-----

    >>> from web2py_utils import wordpress2py

Retrieve a python dict that represents the wordpress database::

    >>> data = wordpress2py.word2py(open('/path/to/wordpress.2009-11-30.xml', 'r'))

Insert data into web2py DAL using a schema

    >>> ids_inserted = wordpress2py.schema_migrate(db, schema, '/path/to/wordpress.2009-11-30.xml')

Use the data dictionary to create a custom migration function.

Dictionary layout is documented in the word2py function.

Schema Key Patterns
-------------------

.. code-block:: python

    {
        '<DATA TABLE>': {
            '<DATA COLUMN>': '<DAL TABLE>/<DAL FIELD>',
        },
        '<DATA TABLE>': {
            '<PYTHON EXEC doit>': {
                '<DATA COLUMN>': '<DAL TABLE>/<DAL FIELD>',
            }
        }
    }

Schema Options
--------------

.. code-block:: python

    '<DATA TABLE>': {
        'categories',
        'tags',
        'posts',
        'comments',
        'post_categories',
        'post_tags',
    }

    '<DATA COLUMN>': {
        'categories' ->
            name
            slug
            parent
        'tags' ->
            name
            slug
        'posts' ->
            id
            title
            slug
            status
            type
            link
            pub_date
            description
            content
            post_date
            post_date_gmt
            categories -> list of strings (categories slug)
            tags -> list of strings (tags slug)
        'comments' ->
            id
            author
            author_email
            author_url
            author_ip
            date
            date_gmt
            content
            approved
    }

EXEC ENVIRONMENT
----------------

.. warning::
    IMPORTANT

    THIS CODE MUST BE VALID PYTHON CODE
    IT MUST SET A VARIABLE NAMED doit TO EITHER TRUE OR FALSE
    IF TRUE, THE RECORDS IN THE CORRESPONDING DICT ARE INSERTED INTO THE DATABASE
    THIS CAN BE RECURSIVE

    EXEC ENVIRONMENT HAS ACCESS TO THESE VARIABLES
    data <dict> (this is the data from wordpress)
    data['title'] # sample for posts
    data['parent'] # sample for categories
    This data matches options for <DATA COLUMN>

Example Schemas
---------------

The module comes with two example schemas

wordpress2py.default_mengu_blog_schema::

    {
        'categories': {
            'name': 'category/title',
        },
        'posts': {
            'doit = True if data["type"] == "post" else False': {
                'title': 'post/title',
                'content': 'post/body',
                'post_date': 'post/dateline',
            },
            'doit = True if data["type"] == "page" else False': {
                'title': 'page/title',
                'content': 'page/content',
            },
        },
        'comments': {
            'id_post': 'comment/post_id',
            'author': 'comment/name',
            'author_email': 'comment/email',
            'content': 'comment/comment',
            'date': 'comment/dateline',
        },
        'post_categories': {
            'id_category': 'relations/category',
            'id_post': 'relations/post',
        },
    }

.. note::

    Take a good look at the 'doit' keys. This is how to use EXEC ENVIRONMENT
    effectively. This basically allows you some control over how
    your data will go into the database. In case you have multiple
    tables for different post types.

wordpress2py.default_schema::

     {
        'categories': {
            'name': 'category/title',
            'parent': 'category/parent',
        },
        'tags': {
            'name': 'tag/title',
        },
        'posts': {
            'title': 'post/title',
            'slug': 'post/slug',
            'status': 'post/status',
            'type': 'post/type',
            'post_date': 'post/pub_date',
            'content': 'post/content',
        },
        'comments': {
            'id_post': 'comment/id_post',
            'author': 'comment/author',
            'author_email': 'comment/email',
            'author_url': 'comment/site',
            'date': 'comment/posted_on',
            'approved': 'comment/approved',
            'content': 'comment/content',
        },
        'post_categories': {
            'id_category': 'category_relations/category',
            'id_post': 'category_relations/post',
        },
        'post_tags': {
            'id_tag': 'tag_relations/tag',
            'id_post': 'tag_relations/post',
        },
    }

Example Custom Migration
------------------------

This is an example custom migration script to export to mengu blog.

This is just here for a full reference in case you have more complex needs.
However the schema works perfectly and is very versatile.::

    def custom_migrate_to_mengu_database(db):
        data = word2py(open('wordpress_export.xml', 'r'))

        category_ids = {}
        post_ids = {}
        comment_ids = {}

        for c in data['categories']:
            category_ids[c['name']] = db.category.insert(title=c['name'])

        for post in data['posts']:
            if post['type'] == 'post':
                post_id = db.post.insert(
                    title = post['title'],
                    body = post['content'],
                    dateline = post['pub_date'],
                )

                for c in post['categories']:
                    db.relations.insert(
                        post = post_id,
                        category = category_ids[c]
                    )

                for c in post['comments']:
                    comment_id = db.comment.insert(
                        post_id = post_id,
                        name = c['author'],
                        email = c['author_email'],
                        comment = c['content'],
                        dateline = c['date']
                    )
            elif post['type'] == 'page':
                post_id = db.page.insert(
                    title = post['title'],
                    content = post['content']
                )

Word 2 Py
---------

Requires elementtree

Returns python dictionary representing the wordpress blog.
Certain metadata may be missing.

Content is sorted based on the arrangment of the data in the xml file.

Dict structure::

    # -> means a list, or array.

    db {
        title
        link
        description
        pub_date
        language
        categories ->
            name
            slug
            parent
            description (if available)
        tags ->
            name
            slug
        posts ->
            id
            title
            slug
            status
            type
            link
            pub_date
            description
            content
            post_date
            post_date_gmt
            categories -> flat array
            tags -> flat array
            comments ->
                id
                author
                author_email
                author_url
                author_ip
                date
                date_gmt
                content
                approved
    }