Skip to content

Latest commit

 

History

History
112 lines (74 loc) · 4.8 KB

README.md

File metadata and controls

112 lines (74 loc) · 4.8 KB

WordPress to Markdown Exporter

Update: I don't have much time to maintain this project, but I would really appreciate community help. If you looking for an open source project to contribute, it's a great opportunity. Pull request a very appreciated by me and migrating WordPress users.

A python script to convert WordPress XML dump to a set of plain text/markdown files. Intended to be used for migration from WordPress to public-static website generator, but could also be helpful as general purpose WordPress content processor.

Installation

The script could be installed by command:

pip install git+https://github.com/dreikanter/wp2md

It will install wp2md and the following dependencies:

Usage

Export WordPress data to XML file (Tools → Export → All content):

WordPress content export

And then run the following command:

wp2md -d /export/path/ wordpress-dump.xml

Where /export/path/ is the directory where post and page files will be generated, and wordpress-dump.xml is the XML file exported by WordPress.

Use --help parameter to see the complete list of command line options:

usage: wp2md [options] source

Export WordPress XML dump to markdown files

positional arguments:
  source      source XML dump exported from WordPress

optional arguments:
  -h, --help  show this help message and exit
  -v           verbose logging
  -l FILE      log to file
  -d PATH      destination path for generated files
  -u FMT       <pubDate> date/time parsing format
  -o FMT       <wp:post_date> and <wp:post_date_gmt> parsing format
  -f FMT       date/time fields format for exported data
  -p FMT       date prefix format for generated files
  -m           preprocess content with Markdown (helpful for MD input)
  -n LEN       post name (slug) length limit for file naming
  -r           generate reference links instead of inline
  -ps PATH     post files path (see docs for variable names)
  -pg PATH     page files path
  -dr PATH     draft files path
  -cp PATH     custom post type files path
  -op PATH     path for all posts. This will override path for every post type except draft path
  -url         keep absolute URLs in hrefs and image srcs
  -b URL       base URL to subtract from hrefs (default is the root)
  -i POST_TYPE include only these post types
  -e POST_TYPE exclude these post types

The output

The script generates a separate file for each post, page and draft, and groups it by configurable directory structure. By default posts are grouped by year-named directories and pages are just stored to the output folder.

Exported files

But you could specify different directory structure and file naming pattern using -ps, -pg and -dr parameters for posts, pages and drafts respectively. For example -ps {year}/{month}/{day}/{title}.md will produce date-based subfolders for blog posts.

Options available for directory structure

  {year}       Post published year
  {month}      Post published month
  {day}        Post published day
  {title}      Title of the post
  {post_type}  Name of the current post type. Example - 'post', 'page'. Added for custom post type support

Each exported file has a straightforward structure intended for further processing with public-static website generator. It has an INI-like formatted header followed by markdown-formatted post (or page) contents:

title: Я.Субботник в Санкт-Петербурге, 3 декабря
link: http://paradigm.ru/yandex-subbotni
creator: admin
description: 
post_id: 635
post_date: 2011-11-23 22:10:35
post_date_gmt: 2011-11-23 19:10:35
comment_status: open
post_name: yandex-subbotnik
status: publish
post_type: post

# Я.Субботник в Санкт-Петербурге, 3 декабря

Я.Субботник в Санкт-Петербурге пройдет 3 декабря в [офисе Яндекса](http://company.yandex.ru/contacts/spb/).
...

If the post contains comments, they will be included below.

See also

Copyright and licensing

Copyright © 2013 by Alex Musayev.
License: GNU (see LICENSE).

Project home: https://github.com/dreikanter/wp2md.