Wordpress to Python

Turns a wordpress export xml file into a python dictionary

Note

taking coffee for donations :)

Usage

>>> from web2py_utils import wordpress2py

Retrieve a python dict that represents the wordpress database:

>>> data = wordpress2py.word2py(open('/path/to/wordpress.2009-11-30.xml', 'r'))

Insert data into web2py DAL using a schema

>>> ids_inserted = wordpress2py.schema_migrate(db, schema, '/path/to/wordpress.2009-11-30.xml')

Use the data dictionary to create a custom migration function.

Dictionary layout is documented in the word2py function.

Schema Key Patterns

{
    '<DATA TABLE>': {
        '<DATA COLUMN>': '<DAL TABLE>/<DAL FIELD>',
    },
    '<DATA TABLE>': {
        '<PYTHON EXEC doit>': {
            '<DATA COLUMN>': '<DAL TABLE>/<DAL FIELD>',
        }
    }
}

Schema Options

'<DATA TABLE>': {
    'categories',
    'tags',
    'posts',
    'comments',
    'post_categories',
    'post_tags',
}

'<DATA COLUMN>': {
    'categories' ->
        name
        slug
        parent
    'tags' ->
        name
        slug
    'posts' ->
        id
        title
        slug
        status
        type
        link
        pub_date
        description
        content
        post_date
        post_date_gmt
        categories -> list of strings (categories slug)
        tags -> list of strings (tags slug)
    'comments' ->
        id
        author
        author_email
        author_url
        author_ip
        date
        date_gmt
        content
        approved
}

EXEC ENVIRONMENT

Warning

IMPORTANT

THIS CODE MUST BE VALID PYTHON CODE IT MUST SET A VARIABLE NAMED doit TO EITHER TRUE OR FALSE IF TRUE, THE RECORDS IN THE CORRESPONDING DICT ARE INSERTED INTO THE DATABASE THIS CAN BE RECURSIVE

EXEC ENVIRONMENT HAS ACCESS TO THESE VARIABLES data <dict> (this is the data from wordpress) data[‘title’] # sample for posts data[‘parent’] # sample for categories This data matches options for <DATA COLUMN>

Example Schemas

The module comes with two example schemas

wordpress2py.default_mengu_blog_schema:

{
    'categories': {
        'name': 'category/title',
    },
    'posts': {
        'doit = True if data["type"] == "post" else False': {
            'title': 'post/title',
            'content': 'post/body',
            'post_date': 'post/dateline',
        },
        'doit = True if data["type"] == "page" else False': {
            'title': 'page/title',
            'content': 'page/content',
        },
    },
    'comments': {
        'id_post': 'comment/post_id',
        'author': 'comment/name',
        'author_email': 'comment/email',
        'content': 'comment/comment',
        'date': 'comment/dateline',
    },
    'post_categories': {
        'id_category': 'relations/category',
        'id_post': 'relations/post',
    },
}

Note

Take a good look at the ‘doit’ keys. This is how to use EXEC ENVIRONMENT effectively. This basically allows you some control over how your data will go into the database. In case you have multiple tables for different post types.

wordpress2py.default_schema:

 {
    'categories': {
        'name': 'category/title',
        'parent': 'category/parent',
    },
    'tags': {
        'name': 'tag/title',
    },
    'posts': {
        'title': 'post/title',
        'slug': 'post/slug',
        'status': 'post/status',
        'type': 'post/type',
        'post_date': 'post/pub_date',
        'content': 'post/content',
    },
    'comments': {
        'id_post': 'comment/id_post',
        'author': 'comment/author',
        'author_email': 'comment/email',
        'author_url': 'comment/site',
        'date': 'comment/posted_on',
        'approved': 'comment/approved',
        'content': 'comment/content',
    },
    'post_categories': {
        'id_category': 'category_relations/category',
        'id_post': 'category_relations/post',
    },
    'post_tags': {
        'id_tag': 'tag_relations/tag',
        'id_post': 'tag_relations/post',
    },
}

Example Custom Migration

This is an example custom migration script to export to mengu blog.

This is just here for a full reference in case you have more complex needs. However the schema works perfectly and is very versatile.:

def custom_migrate_to_mengu_database(db):
    data = word2py(open('wordpress_export.xml', 'r'))

    category_ids = {}
    post_ids = {}
    comment_ids = {}

    for c in data['categories']:
        category_ids[c['name']] = db.category.insert(title=c['name'])

    for post in data['posts']:
        if post['type'] == 'post':
            post_id = db.post.insert(
                title = post['title'],
                body = post['content'],
                dateline = post['pub_date'],
            )

            for c in post['categories']:
                db.relations.insert(
                    post = post_id,
                    category = category_ids[c]
                )

            for c in post['comments']:
                comment_id = db.comment.insert(
                    post_id = post_id,
                    name = c['author'],
                    email = c['author_email'],
                    comment = c['content'],
                    dateline = c['date']
                )
        elif post['type'] == 'page':
            post_id = db.page.insert(
                title = post['title'],
                content = post['content']
            )

Word 2 Py

Requires elementtree

Returns python dictionary representing the wordpress blog. Certain metadata may be missing.

Content is sorted based on the arrangment of the data in the xml file.

Dict structure:

# -> means a list, or array.

db {
    title
    link
    description
    pub_date
    language
    categories ->
        name
        slug
        parent
        description (if available)
    tags ->
        name
        slug
    posts ->
        id
        title
        slug
        status
        type
        link
        pub_date
        description
        content
        post_date
        post_date_gmt
        categories -> flat array
        tags -> flat array
        comments ->
            id
            author
            author_email
            author_url
            author_ip
            date
            date_gmt
            content
            approved
}

Table Of Contents

Previous topic

Widgets

Next topic

py2jquery

This Page