Using Markdown as a Python Library

First and foremost, Python-Markdown is intended to be a python library module used by various projects to convert Markdown syntax into HTML.

The Basics

To use markdown as a module:

import markdown
html = markdown.markdown(your_text_string)

The Details

Python-Markdown provides two public functions (markdown.markdown and markdown.markdownFromFile) both of which wrap the public class markdown.Markdown. If you’re processing one document at a time, these functions will serve your needs. However, if you need to process multiple documents, it may be advantageous to create a single instance of the markdown.Markdown class and pass multiple documents through it. If you do use a single instance though, make sure to call the reset method appropriately (see below).

markdown.markdown (text [, **kwargs])

The following options are available on the markdown.markdown function:

  • text (required): The source unicode string.

    Important

    Python-Markdown expects Unicode as input (although some simple ASCII strings may work) and returns output as Unicode. Do not pass encoded strings to it! If your input is encoded, (e.g. as UTF-8), it is your responsibility to decode it. For example:

    input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8")
    text = input_file.read()
    html = markdown.markdown(text)
    

    If you want to write the output to disk, you must encode it yourself:

    output_file = codecs.open("some_file.html", "w", 
                              encoding="utf-8", 
                              errors="xmlcharrefreplace"
    )
    output_file.write(html)
    
  • extensions: A list of extensions.

    Python-Markdown provides an API for third parties to write extensions to the parser adding their own additions or changes to the syntax. A few commonly used extensions are shipped with the markdown library. See the extension documentation for a list of available extensions.

    The list of extensions may contain instances of extensions and/or strings of extension names.

    extensions=[MyExtension(), 'path.to.my.ext']
    

    Note

    The prefered method is to pass in an instance of an extension. Strings should only be used when it is impossable to import the Extension Class directly (from the command line or in a template).

    When passing in extension instances, each class instance must be a subclass of markdown.extensions.Extension and any configuration options should be defined when initiating the class instance rather than using the extension_configs keyword. For example:

    from markdown.extensions import Extension
    class MyExtension(Extension):
        # define your extension here...
    
    markdown.markdown(text, extensions=[MyExtension(option='value')])
    

    If an extension name is provided as a string, the extension must be importable as a python module on your PYTHONPATH. Python’s dot notation is supported. Therefore, to import the ‘extra’ extension, one could do extensions=['markdown.extensions.extra']

    Additionaly, a Class may be specified in the name. The class must be at the end of the name and be seperated by a colon from the module.

    Therefore, if you were to import the class like this:

    from path.to.module import SomeExtensionClass
    

    Then the named extension would comprise this string:

    "path.to.module:SomeExtensionClass"
    

    Note

    You should only need to specify the class name if more than one extension is defined within the same module. The extensions that come with Python-Markdown do not need to have the class name specified. However, doing so will not effect the behavior of the parser.

    When loading an extension by name (as a string), you may pass in configuration settings to the extension using the extension_configs keyword.

    See Also

    See the documentation of the Extension API for assistance in creating extensions.

  • extension_configs: A dictionary of configuration settings for extensions.

    Any configuration settings will only be passed to extensions loaded by name (as a string). When loading extensions as class instances, pass the configuration settings directly to the class when initializing it.

    Note

    The prefered method is to pass in an instance of an extension, which does not require use of the extension_configs keyword at all. See the extensions keyword for details.

    The dictionary of configuration settings must be in the following format:

    extension_configs = 
    {
        'extension_name_1':
        {
            'option_1': 'value_1',
            'option_2': 'value_2'
        },
    {
        'extension_name_2':
        {
            'option_1': 'value_1'
        }
    }
    

    See the documentation specific to the extension you are using for help in specifying configuration settings for that extension.

  • output_format: Format of output.

    Supported formats are:

    • "xhtml1": Outputs XHTML 1.x. Default.
    • "xhtml5": Outputs XHTML style tags of HTML 5
    • "xhtml": Outputs latest supported version of XHTML (currently XHTML 1.1).
    • "html4": Outputs HTML 4
    • "html5": Outputs HTML style tags of HTML 5
    • "html": Outputs latest supported version of HTML (currently HTML 4).

    The values can be in either lowercase or uppercase.

    Warning

    It is suggested that the more specific formats (“xhtml1”, “html5”, & “html4”) be used as the more general formats (“xhtml” or “html”) may change in the future if it makes sense at that time.

  • safe_mode: Disallow raw html.

    Warning

    safe_mode” is pending deprecation and should not be used.

    HTML sanitizers (like Bleach) provide a better solution for dealing with markdown text submitted by untrusted users.

    import markdown
    import bleach
    html = bleach.clean(markdown.markdown(untrusted_text))
    

    See the release notes for more info.

    The following values are accepted:

    • False (Default): Raw HTML is passed through unaltered.

    • replace: Replace all HTML blocks with the text assigned to html_replacement_text To maintain backward compatibility, setting safe_mode=True will have the same effect as safe_mode='replace'.

      To replace raw HTML with something other than the default, do:

      md = markdown.Markdown(safe_mode='replace', 
                         html_replacement_text='--RAW HTML NOT ALLOWED--')
      
    • remove: All raw HTML will be completely stripped from the text with no warning to the author.

    • escape: All raw HTML will be escaped and included in the document.

      For example, the following source:

      Foo <b>bar</b>.
      

      Will result in the following HTML:

      <p>Foo &lt;b&gt;bar&lt;/b&gt;.</p>
      

    Note

    “safe_mode” also alters the default value for the enable_attributes option.

  • html_replacement_text: Text used when safe_mode is set to replace. Defaults to [HTML_REMOVED].

    Warning

    html_replacement_text” is pending deprecation and should not be used. See the release notes for more info.

  • tab_length: Length of tabs in the source. Default: 4

  • enable_attributes: Enable the conversion of attributes. Defaults to True, unless safe_mode is enabled, in which case the default is False.

    Note

    safe_mode only overrides the default. If enable_attributes is explicitly set, the explicit value is used regardless of safe_mode. However, this could potentially allow an untrusted user to inject JavaScript into your documents.

  • smart_emphasis: Treat _connected_words_ intelligently Default: True

  • lazy_ol: Ignore number of first item of ordered lists. Default: True

    Given the following list:

    4. Apples
    5. Oranges
    6. Pears
    

    By default markdown will ignore the fact the the first line started with item number “4” and the HTML list will start with a number “1”. If lazy_ol is set to False, then markdown will output the following HTML:

    <ol>
      <li start="4">Apples</li>
      <li>Oranges</li>
      <li>Pears</li>
    </ol>
    

markdown.markdownFromFile (**kwargs)

With a few exceptions, markdown.markdownFromFile accepts the same options as markdown.markdown. It does not accept a text (or Unicode) string. Instead, it accepts the following required options:

  • input (required): The source text file.

    input may be set to one of three options:

    • a string which contains a path to a readable file on the file system,
    • a readable file-like object,
    • or None (default) which will read from stdin.
  • output: The target which output is written to.

    output may be set to one of three options:

    • a string which contains a path to a writable file on the file system,
    • a writable file-like object,
    • or None (default) which will write to stdout.
  • encoding: The encoding of the source text file. Defaults to “utf-8”. The same encoding will always be used for input and output. The ‘xmlcharrefreplace’ error handler is used when encoding the output.

    Note

    This is the only place that decoding and encoding of unicode takes place in Python-Markdown. If this rather naive solution does not meet your specific needs, it is suggested that you write your own code to handle your encoding/decoding needs.

markdown.Markdown ([**kwargs])

The same options are available when initializing the markdown.Markdown class as on the markdown.markdown function, except that the class does not accept a source text string on initialization. Rather, the source text string must be passed to one of two instance methods:

  • Markdown.convert(source)

    The source text must meet the same requirements as the text argument of the markdown.markdown function.

    You should also use this method if you want to process multiple strings without creating a new instance of the class for each string.

    md = markdown.Markdown()
    html1 = md.convert(text1)
    html2 = md.convert(text2)
    

    Depending on which options and/or extensions are being used, the parser may need its state reset between each call to convert, otherwise performance can degrade drastically:

    html1 = md.convert(text1)
    md.reset()
    html2 = md.convert(text2)
    

    To make this easier, you can also chain calls to reset together:

    html3 = md.reset().convert(text3)
    
  • Markdown.convertFile(**kwargs)

    The arguments of this method are identical to the arguments of the same name on the markdown.markdownFromFile function (input, output, and encoding). As with the convert method, this method should be used to process multiple files without creating a new instance of the class for each document. State may need to be reset between each call to convertFile as is the case with convert.