Wordpress to Python =================== Turns a wordpress export xml file into a python dictionary .. note:: taking coffee for donations :) Usage ----- >>> from web2py_utils import wordpress2py Retrieve a python dict that represents the wordpress database:: >>> data = wordpress2py.word2py(open('/path/to/wordpress.2009-11-30.xml', 'r')) Insert data into web2py DAL using a schema >>> ids_inserted = wordpress2py.schema_migrate(db, schema, '/path/to/wordpress.2009-11-30.xml') Use the data dictionary to create a custom migration function. Dictionary layout is documented in the word2py function. Schema Key Patterns ------------------- .. code-block:: python { '': { '': '/', }, '': { '': { '': '/', } } } Schema Options -------------- .. code-block:: python '': { 'categories', 'tags', 'posts', 'comments', 'post_categories', 'post_tags', } '': { 'categories' -> name slug parent 'tags' -> name slug 'posts' -> id title slug status type link pub_date description content post_date post_date_gmt categories -> list of strings (categories slug) tags -> list of strings (tags slug) 'comments' -> id author author_email author_url author_ip date date_gmt content approved } EXEC ENVIRONMENT ---------------- .. warning:: IMPORTANT THIS CODE MUST BE VALID PYTHON CODE IT MUST SET A VARIABLE NAMED doit TO EITHER TRUE OR FALSE IF TRUE, THE RECORDS IN THE CORRESPONDING DICT ARE INSERTED INTO THE DATABASE THIS CAN BE RECURSIVE EXEC ENVIRONMENT HAS ACCESS TO THESE VARIABLES data (this is the data from wordpress) data['title'] # sample for posts data['parent'] # sample for categories This data matches options for Example Schemas --------------- The module comes with two example schemas wordpress2py.default_mengu_blog_schema:: { 'categories': { 'name': 'category/title', }, 'posts': { 'doit = True if data["type"] == "post" else False': { 'title': 'post/title', 'content': 'post/body', 'post_date': 'post/dateline', }, 'doit = True if data["type"] == "page" else False': { 'title': 'page/title', 'content': 'page/content', }, }, 'comments': { 'id_post': 'comment/post_id', 'author': 'comment/name', 'author_email': 'comment/email', 'content': 'comment/comment', 'date': 'comment/dateline', }, 'post_categories': { 'id_category': 'relations/category', 'id_post': 'relations/post', }, } .. note:: Take a good look at the 'doit' keys. This is how to use EXEC ENVIRONMENT effectively. This basically allows you some control over how your data will go into the database. In case you have multiple tables for different post types. wordpress2py.default_schema:: { 'categories': { 'name': 'category/title', 'parent': 'category/parent', }, 'tags': { 'name': 'tag/title', }, 'posts': { 'title': 'post/title', 'slug': 'post/slug', 'status': 'post/status', 'type': 'post/type', 'post_date': 'post/pub_date', 'content': 'post/content', }, 'comments': { 'id_post': 'comment/id_post', 'author': 'comment/author', 'author_email': 'comment/email', 'author_url': 'comment/site', 'date': 'comment/posted_on', 'approved': 'comment/approved', 'content': 'comment/content', }, 'post_categories': { 'id_category': 'category_relations/category', 'id_post': 'category_relations/post', }, 'post_tags': { 'id_tag': 'tag_relations/tag', 'id_post': 'tag_relations/post', }, } Example Custom Migration ------------------------ This is an example custom migration script to export to mengu blog. This is just here for a full reference in case you have more complex needs. However the schema works perfectly and is very versatile.:: def custom_migrate_to_mengu_database(db): data = word2py(open('wordpress_export.xml', 'r')) category_ids = {} post_ids = {} comment_ids = {} for c in data['categories']: category_ids[c['name']] = db.category.insert(title=c['name']) for post in data['posts']: if post['type'] == 'post': post_id = db.post.insert( title = post['title'], body = post['content'], dateline = post['pub_date'], ) for c in post['categories']: db.relations.insert( post = post_id, category = category_ids[c] ) for c in post['comments']: comment_id = db.comment.insert( post_id = post_id, name = c['author'], email = c['author_email'], comment = c['content'], dateline = c['date'] ) elif post['type'] == 'page': post_id = db.page.insert( title = post['title'], content = post['content'] ) Word 2 Py --------- Requires elementtree Returns python dictionary representing the wordpress blog. Certain metadata may be missing. Content is sorted based on the arrangment of the data in the xml file. Dict structure:: # -> means a list, or array. db { title link description pub_date language categories -> name slug parent description (if available) tags -> name slug posts -> id title slug status type link pub_date description content post_date post_date_gmt categories -> flat array tags -> flat array comments -> id author author_email author_url author_ip date date_gmt content approved }