Modular Crypt Format¶

A explanation about a standard that isn’t

Overview¶

A number of the hashes in Passlib are described as adhering to the “Modular Crypt Format”. This page is an attempt to document what that means.

In short, the modular crypt format (MCF) is a standard for encoding password hash strings, which requires hashes have the format $identifier$content; where identifier is an short alphanumeric string uniquely identifying a particular scheme, and content is the contents of the scheme, using only the characters in the regexp range [a-zA-Z0-9./].

However, there’s no official specification document describing this format. Nor is there a central registry of identifiers, or actual rules. The modular crypt format is more of an ad-hoc idea rather than a true standard.

The rest of this page is an attempt to describe what is known, at least as far as the hashes supported by Passlib.

History¶

Historically, most unix systems supported only des_crypt. Around the same time, many incompatible variations were also developed, but their hashes were not easily distinguishable from each other (see Archaic Unix Hashes); making it impossible to use multiple hashes on one system, or progressively migrate to a newer scheme.

This was solved with the advent of the MCF, which was introduced around the time that md5_crypt was developed. This format allows hashes from multiple schemes to exist within the same database, by requiring that all hash strings begin with a unique prefix using the format $identifier$ .

Requirements¶

Unfortunately, there is no specification document for this format. Instead, it exists in de facto form only; the following is an attempt to roughly identify the conventions followed by the modular crypt format hashes found in Passlib:

Hash strings should use only 7-bit ascii characters.

No known OS or application generates hashes which violate this rule. However, some systems (e.g. Linux) will happily accept hashes which contain 8-bit characters in their salt, This is probably a case of “permissive in what you accept, strict in what you generate”.
Hash strings should start with the prefix $identifier$ , where identifier is a short string uniquely identifying hashes generated by that algorithm, using only lower case ascii letters, numbers, and hyphens (c.f. the list of known identifiers below).

When MCF was first introduced, most schemes choose a single digit as their identifier (e.g. $1$ for md5_crypt). Because of this, some older systems only look at the first character when attempting to distinguish hashes. However, as Unix variants have branched off, new schemes were developed which used larger identifying strings (e.g. $sha1$ for sha1_crypt).

At this point, any new hash schemes should probably use a 6-8 character descriptive identifier, to avoid potential namespace clashes.
Hashes should only contain the ascii letters a-z and A-Z, ascii numbers 0-9, and the characters ./; though additionally they may use the $ character as an internal field separator.

This is the least adhered-to of any modular crypt format convention. Other characters (such as +=,-) are used by various formats.

The only hard and fast stricture is that :;!* and all non-printable or 8-bit characters be avoided, since this would interfere with parsing of the Unix shadow password file, where these hashes are typically stored.

Pretty much all older modular-crypt-format hashes use ascii letters, numbers, ., and / to provide base64 encoding of their raw data, though the exact character value assignments vary between hashes (see passlib.utils.h64). Many newer hashes use + instead of ., to adhere closer to the base64 standard.
Hash schemes should put their “digest” portion at the end of the hash, preferably separated by a $.

This allows password hashes to be easily truncated to a “configuration string” containing just the identifying prefix, rounds, salt, etc.

This configuration string then encodes all the information generated needed to generate a new hash in order to verify a password, without having to perform excessive parsing.

Most modular crypt format hashes follow this convention, though some (like bcrypt) omit the $ separator between the configuration and the digest.

Furthermore, there is no set standard about whether configuration strings should or should not include a trailing $ at the end, though the general rule is that hashing should behave the same in either case (sun_md5_crypt behaves particularly poorly regarding this last point).

Note

All of the above is guesswork based on examination of existing hashes and OS implementations; and was written merely to clarify the issue of what the “modular crypt format” is. It is drawn from no authoritative sources.

Identifiers & Platform Support¶

OS Defined Hashes¶

The following table lists of all the major MCF hashes supported by Passlib, and indicates which operating systems offer native support:

Scheme	Prefix	Linux	FreeBSD	NetBSD	OpenBSD	Solaris
`des_crypt`		y	y	y	y	y
`bsdi_crypt`	`_`		y	y	y
`md5_crypt`	$1$	y	y	y	y	y
`bcrypt`	$2$ , $2a$ , $2x$ , $2y$ $2b$		y	y	y	y
`bsd_nthash`	$3$		y
`sha256_crypt`	$5$	y	8.3+			y
`sha512_crypt`	$6$	y	8.3+			y
`sun_md5_crypt`	$md5$ , `$md5,`					y
`sha1_crypt`	$sha1$			y

Additional Platforms¶

The modular crypt format is also supported to some degree by the following operating systems and platforms:

MacOS X	Darwin’s native `crypt()` provides limited functionality, supporting only `des_crypt` and `bsdi_crypt`. OS X uses a separate system for its own password hashes.
Google App Engine	As of 2011-08-19, Google App Engine’s `crypt()` implementation appears to match that of a typical Linux system (as listed in the previous table).

Application-Defined Hashes¶

The following table lists the other MCF hashes supported by Passlib. These hashes can be found in various libraries and applications (and are not natively supported by any known OS):

Scheme	Prefix	Primary Use (if known)
`apr_md5_crypt`	$apr1$	Apache htdigest files
`argon2`	$argon2i$ , $argon2d$
`bcrypt_sha256`	$bcrypt-sha256$	Passlib-specific
`phpass`	$P$ , $H$	PHPass-based applications
`pbkdf2_sha1`	$pbkdf2$	Passlib-specific
`pbkdf2_sha256`	$pbkdf2-sha256$	Passlib-specific
`pbkdf2_sha512`	$pbkdf2-sha512$	Passlib-specific
`scram`	$scram$	Passlib-specific
`cta_pbkdf2_sha1`	$p5k2$ [1]
`dlitz_pbkdf2_sha1`	$p5k2$ [1]
`scrypt`	$scrypt$	Passlib-specific

Footnotes

[1]	(1, 2) `cta_pbkdf2_sha1` and `dlitz_pbkdf2_sha1` both use the same identifier. While there are other internal differences, the two can be quickly distinguished by the fact that cta hashes always end in `=`, while dlitz hashes contain no `=` at all.