bridgedb.parse

Package containing modules for parsing data.

exception InvalidBase64[source]

Bases: exceptions.ValueError

Raised if parsing or decoding cannot continue due to invalid base64.

padBase64(b64string)[source]

Re-add any stripped equals sign character padding to a b64 string.

Parameters:b64string (string) – A base64-encoded string which might have had its trailing equals sign (=) padding removed.
Raises ValueError:
 if there was any error while manipulating the string.
Returns:A properly-padded (according to the base64 spec: RFC 4648) string.
parseUnpaddedBase64(field)[source]

Parse an unpadded, base64-encoded field.

The field will be re-padded, if need be, and then base64 decoded.

Parameters:field (str) – Should be some base64-encoded thing, with any trailing =-characters removed.
Raises InvalidBase64:
 if there is an error in either unpadding or decoding field.
Return type:str
Returns:The base64-decoded field.

Utilities for parsing IP and email addresses.

bridgedb.parse.addr

parse.addr
 | |_ extractEmailAddress() - Validate a :rfc:2822 email address.
 | |_ isIPAddress() - Check if an arbitrary string is an IP address.
 | |_ isIPv4() - Check if an arbitrary string is an IPv4 address.
 | |_ isIPv6() - Check if an arbitrary string is an IPv6 address.
 | \_ isValidIP() - Check that an IP address is valid.
 |
 |_ PortList - A container class for validated port ranges.

How address validity is determined

The following terms define addresses which are not valid. All other addresses are taken to be valid.

Private IP Address Ranges

Private Address

These address ranges are reserved by IANA for private intranets, and not routable to the Internet:

10.0.0.0    - 10.255.255.255  (10.0.0.0/8)
172.16.0.0  - 172.31.255.255  (172.16.0.0/12)
192.168.0.0 - 192.168.255.255 (192.168.0.0/16)

For additional information, see RFC 1918.

Reserved and Special Use Addresses

Unspecified Address
Default Route

Current network (only valid as source address). See RFC 1122. An Unspecified Address in the context of firewalls means “all addresses of the local machine”. In a routing context, it is usually termed the Default Route, and it means the default route (to “the rest of” the internet). See RFC 1700. For example:

0.0.0.0/8
::/128
Loopback Address

Reserved for loopback and IPC on the localhost. See RFC 1122. Example:

127.0.0.0
Localhost Address

Loopback IP addresses (refers to self). See RFC 5735. Examples include:

127.0.0.1 - 127.255.255.254   (127.0.0.0/8)
::1

These are the link-local blocks, used for communication between hosts on a single link. See RFC 3927. Examples:

169.254.0.0/16
fe80::/64
Multicast Address

Reserved for multicast addresses. See RFC 3171. For example:

224.0.0.0 - 239.255.255.255 (224.0.0.0/4)
Private Address

Reserved for private networks. See RFC 1918. Some examples include:

10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
Reserved Address

Reserved (former Class E network). See RFC 1700, RFC 3232, and RFC 5735. The one exception to this rule is the Limited Broadcast Address, 255.255.255.255 for which packets at the IP layer are not forwarded to the public internet. For example:

240.0.0.0 - 255.255.255.255 (240.0.0.0/4)
Limited Broadcast Address

Limited broadcast address (limited to all other nodes on the LAN). See RFC 919. For IPv4, 255 in any part of the IP is reserved for broadcast addressing to the local LAN, e.g.:

255.255.255.255

Warning

The ipaddr module (as of version 2.1.10) does not understand the following reserved addresses:

Reserved Address (Protocol Assignments)

Reserved for IETF protocol assignments. See RFC 5735. Example:

192.0.0.0/24
Reserved Address (6to4 Relay Anycast)

IPv6 to IPv4 relay. See RFC 3068. Example:

192.88.99.0/24
Reserved Address (Network Benchmark)

Network benchmark tests. See RFC 2544. Example:

198.18.0.0/15
Reserved Address (TEST-NET-1)

Reserved for use in documentation and example code. It is often used in conjunction with domain names example.com or example.net in vendor and protocol documentation. See RFC 1166. For example:

192.0.2.0/24
Reserved Address (TEST-NET-2)

TEST-NET-2. See RFC 5737. Example:

198.51.100.0/24
Reserved Address (TEST-NET-3)

TEST-NET-3. See RFC 5737. Example:

203.0.113.0/24
Shared Address Space

See RFC 6598. Example:

100.64.0.0/10
Site-Local Address
Unique Local Address

Similar uses to Limited Broadcast Address. For IPv6, everything becomes convoluted and complicated, and then redefined. See RFC 4193, RFC 3879, and RFC 3513. The ipaddr.IPAddress.is_site_local() method only checks to see if the address is a Unique Local Address vis-á-vis RFC 3513 §2.5.6, e.g.:

ff00::0/8
fec0::/10
ASPECIAL = u'-_+/=_~'
These are the special characters which RFC2822 allows within email addresses:
ASPECIAL = ‘!#$%&*+-/=?^_`{|}~’ + “\’”
…But these are the only ones we’re confident that we can handle correctly:
ASPECIAL = ‘-_+/=_~’
exception BadEmail(msg, email)[source]

Bases: exceptions.Exception

Exception raised when we get a bad email address.

exception InvalidPort[source]

Bases: exceptions.ValueError

Raised when a given port number is invalid.

exception UnsupportedDomain[source]

Bases: exceptions.ValueError

Raised when we get an email address from an unsupported domain.

canonicalizeEmailDomain(domain, domainmap)[source]

Decide if an email was sent from a permitted domain.

Parameters:
  • domain (str) – The domain portion of an email address to validate. It will be checked that it is one of the domains allowed to email requests for bridges to the EmailDistributor.
  • domainmap (dict) –

    A map of permitted alternate domains (in lowercase) to their canonical domain names (in lowercase). This can be configured with the EMAIL_DOMAIN_MAP option in bridgedb.conf, for example:

    EMAIL_DOMAIN_MAP = {'mail.google.com': 'gmail.com',
                        'googlemail.com': 'gmail.com'}
    
Raises UnsupportedDomain:
 

if the domain portion of the email address is not within the map of alternate to canonical allowed domain names.

Return type:

str

Returns:

The canonical domain name for the email address.

extractEmailAddress(emailaddr)[source]

Given an email address, obtained for example, via a From: or Sender: email header, try to extract and parse (according to RFC 2822) the local and domain portions.

We only allow the following form:

LOCAL_PART := DOTATOM
DOMAIN := DOTATOM
ADDRSPEC := LOCAL_PART "@" DOMAIN

In particular, we are disallowing: obs-local-part, obs-domain, comment, and obs-FWS. Other forms exist, but none of the incoming services we recognize support them.

Parameters:emailaddr – An email address to validate.
Raises BadEmail:
 if the emailaddr couldn’t be validated or parsed.
Returns:A tuple of the validated email address, containing the mail local part and the domain:
(LOCAL_PART, DOMAIN)
isIPAddress(ip, compressed=True)[source]

Check if an arbitrary string is an IP address, and that it’s valid.

Parameters:
  • ip (basestring or int) – The IP address to check.
  • compressed (boolean) – If True, return a string representing the compressed form of the address. Otherwise, return an ipaddr.IPAddress instance.
Return type:

A ipaddr.IPAddress, or a string, or False

Returns:

The IP, as a string or a class, if it passed the checks. Otherwise, returns False.

isIPv(version, ip)[source]

Check if ip is a certain version (IPv4 or IPv6).

Parameters:
  • version (integer) – The IPv[4|6] version to check; must be either 4 or 6. Any other value will be silently changed to 4.
  • ip – The IP address to check. May be an any type which ipaddr.IPAddress will accept.
Return type:

boolean

Returns:

True, if the address is an IPv4 address.

isIPv4(ip)[source]

Check if an address is IPv4.

Attention

This does not check validity. See isValidIP().

Parameters:ip (basestring or int) – The IP address to check.
Return type:boolean
Returns:True if the address is an IPv4 address.
isIPv6(ip)[source]

Check if an address is IPv6.

Attention

This does not check validity. See isValidIP().

Parameters:ip (basestring or int) – The IP address to check.
Return type:boolean
Returns:True if the address is an IPv6 address.
isValidIP(ip)[source]

Check that an IP (v4 or v6) is valid.

The IP address, ip, must not be any of the following:

If it is an IPv6 address, it also must not be:

>>> from bridgedb.parse.addr import isValidIP
>>> isValidIP('1.2.3.4')
True
>>> isValidIP('1.2.3.255')
True
>>> isValidIP('1.2.3.256')
False
>>> isValidIP('1')
False
>>> isValidIP('1.2.3')
False
>>> isValidIP('xyzzy')
False
Parameters:ip (An ipaddr.IPAddress, ipaddr.IPv4Address, ipaddr.IPv6Address, or str) – An IP address. If it is a string, it will be converted to a ipaddr.IPAddress.
Return type:boolean
Returns:True, if ip passes the checks; False otherwise.
normalizeEmail(emailaddr, domainmap, domainrules, ignorePlus=True)[source]

Normalise an email address according to the processing rules for its canonical originating domain.

The email address, emailaddr, will be parsed and validated, and then checked that it originated from one of the domains allowed to email requests for bridges to the EmailDistributor via the canonicaliseEmailDomain() function.

Parameters:
  • emailaddr (str) – An email address to normalise.
  • domainmap (dict) –

    A map of permitted alternate domains (in lowercase) to their canonical domain names (in lowercase). This can be configured with the EMAIL_DOMAIN_MAP option in bridgedb.conf, for example:

    EMAIL_DOMAIN_MAP = {'mail.google.com': 'gmail.com',
                        'googlemail.com': 'gmail.com'}
    
  • domainrules (dict) –

    A mapping of canonical permitted domain names to a list of rules which should be applied to processing them, for example:

    EMAIL_DOMAIN_RULES = {'gmail.com': ["ignore_dots", "dkim"]
    

    Currently, "ignore_dots" means that all "." characters will be removed from the local part of the validated email address.

  • ignorePlus (bool) – If True, assume that blackhole+kerr@torproject.org is an alias for blackhole@torproject.org, and remove everything after the first '+' character.
Raises:
  • UnsupportedDomain – if the email address originated from a domain that we do not explicitly support.
  • BadEmail – if the email address could not be parsed or validated.
Return type:

str

Returns:

The validated, normalised email address, if it was from a permitted domain. Otherwise, returns an empty string.

class PortList(*args, **kwargs)[source]

Bases: object

A container class for validated port ranges.

From torspec.git/dir-spec.txt §2.3:

portspec ::= “*” | port | port “-” port
port ::= an integer between 1 and 65535, inclusive.

[Some implementations incorrectly generate ports with value 0.
Implementations SHOULD accept this, and SHOULD NOT generate it.
Connections to port 0 are never permitted.]

Variables:ports (set) – All ports which have been added to this PortList.

Create a PortList.

Parameters:args – Should match the portspec defined above.
Raises:InvalidPort, if one of args doesn’t match port as defined above.
PORTSPEC_LEN = 16

The maximum number of allowed ports per IP address.

add(*args)[source]

Add a port (or ports) to this PortList.

Parameters:args – Should match the portspec defined above.
Raises:InvalidPort, if one of args doesn’t match port as defined above.

Parsers for Tor Bridge descriptors, including bridge-networkstatus documents, bridge-server-descriptor``s, and ``bridge-extrainfo descriptors.

bridgedb.parse.descriptors

DescriptorWarning - Raised when we parse a very odd descriptor.
deduplicate - Deduplicate a container of descriptors, keeping only the newest
              descriptor for each router.
parseNetworkStatusFile - Parse a bridge-networkstatus document generated and
                         given to us by the BridgeAuthority.
parseServerDescriptorsFile - Parse a file containing
                             bridge-server-descriptors.
parseExtraInfoFiles - Parse (multiple) file(s) containing bridge-extrainfo
                      descriptors.
exception DescriptorWarning[source]

Bases: exceptions.Warning

Raised when we parse a very odd descriptor.

parseNetworkStatusFile(filename, validate=True, skipAnnotations=True, descriptorClass=<class 'stem.descriptor.router_status_entry.RouterStatusEntryV3'>)[source]

Parse a file which contains an @type bridge-networkstatus document.

See ticket #12254 for why networkstatus-bridges documents don’t look anything like the networkstatus v2 documents that they are purported to look like. They are missing all headers, and the entire footer (including authority signatures).

Parameters:
Raises:
  • InvalidRouterNickname – if one of the routers in the networkstatus file had a nickname which does not conform to Tor’s nickname specification.
  • ValueError – if the contents of a descriptor are malformed and validate is True.
  • IOError – if the file at filename can’t be read.
Return type:

list

Returns:

A list of stem.descriptor.router_status_entry.RouterStatusEntry.

parseServerDescriptorsFile(filename, validate=True)[source]

Open and parse filename, which should contain @type bridge-server-descriptor.

Note

We have to lie to Stem, pretending that these are @type server-descriptor, not @type bridge-server-descriptor. See ticket #11257.

Parameters:
  • filename (str) – The file to parse descriptors from.
  • validate (bool) – Whether or not to validate descriptor contents. (default: True)
Return type:

list

Returns:

A list of :class:`stem.descriptor.server_descriptor.RelayDescriptor`s.

deduplicate(descriptors, statistics=False)[source]

Deduplicate some descriptors, returning only the newest for each router.

Note

If two descriptors for the same router are discovered, AND both descriptors have the same published timestamp, then the router’s fingerprint WILL BE LOGGED ON PURPOSE, because we assume that router to be broken or malicious.

Parameters:
Return type:

dict

Returns:

A dictionary mapping router fingerprints to their newest available descriptor.

parseExtraInfoFiles(*filenames, **kwargs)[source]

Open filenames and parse any @type bridge-extrainfo-descriptor contained within.

Warning

This function will not check that the router-signature at the end of the extrainfo descriptor is valid. See bridgedb.bridges.Bridge._verifyExtraInfoSignature for a method for checking the signature. The signature cannot be checked here, because to do so, we would need the latest, valid, corresponding signing-key for the Bridge.

Note

This function will call deduplicate() to deduplicate the extrainfo descriptors parsed from all filenames.

Kwargs validate:
 If there is a 'validate' keyword argument, its value will be passed along as the 'validate' argument to stem.descriptor.extrainfo_descriptor.BridgeExtraInfoDescriptor. The 'validate' keyword argument defaults to True, meaning that the hash digest stored in the router-digest line will be checked against the actual contents of the descriptor and the extrainfo document’s signature will be verified.
Return type:dict
Returns:A dictionary mapping bridge fingerprints to their corresponding, deduplicated stem.descriptor.extrainfo_descriptor.RelayExtraInfoDescriptor.

bridgedb.parse.fingerprint

Utility functions for converting between various relay fingerprint formats, and checking their validity.

toHex - Convert a fingerprint from its binary representation to hexadecimal.
fromHex - Convert a fingerprint from hexadecimal to binary.
isValidFingerprint - Validate a fingerprint.
HEX_FINGERPRINT_LEN = 40

The required length for hexidecimal representations of hash digest of a Tor relay’s public identity key (a.k.a. its fingerprint).

toHex()

(callable) Convert a value from binary to hexidecimal representation.

fromHex()

(callable) Convert a value from hexidecimal to binary representation.

isValidFingerprint(fingerprint)[source]

Determine if a Tor relay fingerprint is valid.

Parameters:fingerprint (str) – The hex-encoded hash digest of the relay’s public identity key, a.k.a. its fingerprint.
Return type:bool
Returns:True if the fingerprint was valid, False otherwise.

Parsers for HTTP and Email headers.

bridgedb.parse.headers

parseAcceptLanguage - Parse the contents of a client 'Accept-Language' header
parseAcceptLanguage(header)[source]

Parse the contents of a client ‘Accept-Language’ header.

Parse the header in the following manner:

  1. If header is None or an empty string, return an empty list.
  2. Split the header string on any commas.
  3. Chop of the RFC2616 quality/level suffix. We ignore these, and just use the order of the list as the preference order, without any parsing of quality/level assignments.
  4. Add a fallback language of the same type if it is missing. For example, if we only got [‘es-ES’, ‘de-DE’], add ‘es’ after ‘es-ES’ and add ‘de’ after ‘de-DE’.
  5. Change all hyphens to underscores.
Parameters:header (string) – The contents of an ‘Accept-Language’ header, i.e. as if taken from twisted.web.server.Request.getHeader.
Return type:list
Returns:A list of language codes (with and without locales), in order of preference.

Parsers for bridge nicknames.

bridgedb.parse.nicknames

nicknames
 |_ isValidRouterNickname - Determine if a nickname is according to spec
exception InvalidRouterNickname[source]

Bases: exceptions.ValueError

Router nickname doesn’t follow tor-spec.

isValidRouterNickname(nickname)[source]

Determine if a router’s given nickname meets the specification.

Parameters:nickname (string) – An OR’s nickname.
Return type:bool
Returns:True if the nickname is valid, False otherwise.

Parsers for BridgeDB commandline options.

bridgedb.parse.options

bridgedb.parse.options
 |__ setConfig()
 |__ getConfig() - Set/Get the config file path.
 |__ setRundir()
 |__ getRundir() - Set/Get the runtime directory.
 |__ parseOptions() - Create the main options parser for BridgeDB.
 |
 \_ BaseOptions - Base options, included in all other options menus.
     ||
     |\__ findRundirAndConfigFile() - Find the absolute path of the config
     |                                file and runtime directory, or find
     |                                suitable defaults.
     |
     |__ SIGHUPOptions - Menu to explain SIGHUP signal handling and usage.
     |__ SIGUSR1Options - Menu to explain SIGUSR1 handling and usage.
     |
     |__ MockOptions - Suboptions for creating fake bridge descriptors for
     |                 testing purposes.
     \__ MainOptions - Main commandline options parser for BridgeDB.
setConfig(path)[source]

Set the absolute path to the config file.

See BaseOptions.postOptions().

Parameters:path (string) – The path to set.
getConfig()[source]

Get the absolute path to the config file.

Return type:string
Returns:The path to the config file.
setRundir(path)[source]

Set the absolute path to the runtime directory.

See BaseOptions.postOptions().

Parameters:path (string) – The path to set.
getRundir()[source]

Get the absolute path to the runtime directory.

Return type:string
Returns:The path to the config file.
parseOptions()[source]

Create the main options parser and its subcommand parsers.

Any UsageErrors which are raised due to invalid options are ignored; their error message is printed and then we exit the program.

Return type:MainOptions
Returns:The main options parsing class, with any commandline arguments already parsed.
class BaseOptions[source]

Bases: twisted.python.usage.Options

Base options included in all main and sub options menus.

Create an options parser. All flags, parameters, and attributes of this base options parser are inherited by all child classes.

longdesc = u'BridgeDB is a proxy distribution system for\n private relays acting as bridges into the Tor network. See `bridgedb\n <command> --help` for addition help.'
optParameters = [[u'config', u'c', None, u'Configuration file [default: <rundir>/bridgedb.conf]'], [u'rundir', u'r', None, u"Change to this directory before running. [default: `os.getcwd()']\n\n All other paths, if not absolute, should be relative to this path.\n This includes the config file and any further files specified within\n the config file.\n "]]
opt_quiet()[source]

Decrease verbosity

opt_verbose()[source]

Increase verbosity

opt_q()

Decrease verbosity

opt_v()

Increase verbosity

opt_version()[source]

Display BridgeDB’s version and exit.

static findRundirAndConfigFile(rundir=None, config=None)[source]

Find the absolute path of the config file and runtime directory, or find suitable defaults.

Attempts to set the absolute path of the runtime directory. If the config path is relative, its absolute path is set relative to the runtime directory path (unless it starts with ‘.’ or ‘..’, then it is interpreted relative to the current working directory). If the path to the config file is absolute, it is left alone.

Parameters:
  • rundir (string or None) – The user-supplied path to the runtime directory, from the commandline options (i.e. options = BaseOptions().parseOptions(); options['rundir'];).
  • config (string or None) – The user-supplied path to the config file, from the commandline options (i.e. options = BaseOptions().parseOptions(); options['config'];).
Raises:

twisted.python.usage.UsageError if either the runtime directory or the config file cannot be found.

postOptions()[source]

Automatically called by parseOptions().

Determines appropriate values for the ‘config’ and ‘rundir’ settings.

class MockOptions[source]

Bases: bridgedb.parse.options.BaseOptions

Suboptions for creating necessary conditions for testing purposes.

Create an options parser. All flags, parameters, and attributes of this base options parser are inherited by all child classes.

optParameters = [[u'descriptors', u'n', 1000, u'Generate <n> mock bridge descriptor sets\n (types: netstatus, extrainfo, server)']]
class SIGHUPOptions[source]

Bases: bridgedb.parse.options.BaseOptions

Options menu to explain usage and handling of SIGHUP signals.

Create an options parser. All flags, parameters, and attributes of this base options parser are inherited by all child classes.

longdesc = u'If you send a SIGHUP to a running BridgeDB process, the\n servers will parse and reload all bridge descriptor files into the\n databases.\n\n Note that this command WILL NOT handle sending the signal for you; see\n signal(7) and kill(1) for additional help.'
class SIGUSR1Options[source]

Bases: bridgedb.parse.options.BaseOptions

Options menu to explain usage and handling of SIGUSR1 signals.

Create an options parser. All flags, parameters, and attributes of this base options parser are inherited by all child classes.

longdesc = u'If you send a SIGUSR1 to a running BridgeDB process, the\n servers will dump all bridge assignments by distributor from the\n databases to files.\n\n Note that this command WILL NOT handle sending the signal for you; see\n signal(7) and kill(1) for additional help.'
class MainOptions[source]

Bases: bridgedb.parse.options.BaseOptions

Main commandline options parser for BridgeDB.

Create an options parser. All flags, parameters, and attributes of this base options parser are inherited by all child classes.

optFlags = [[u'dump-bridges', u'd', u'Dump bridges by hashring assignment into files'], [u'reload', u'R', u'Reload bridge descriptors into running servers']]
subCommands = [[u'mock', None, <class 'bridgedb.parse.options.MockOptions'>, u'Generate a testing environment'], [u'SIGHUP', None, <class 'bridgedb.parse.options.SIGHUPOptions'>, u'Reload bridge descriptors into running servers'], [u'SIGUSR1', None, <class 'bridgedb.parse.options.SIGUSR1Options'>, u'Dump bridges by hashring assignment into files']]

Parsers for Tor version number strings.

bridgedb.parse.versions

Version - Holds, parses, and does comparison operations for package
          version numbers.
exception InvalidVersionStringFormat[source]

Bases: exceptions.ValueError

Raised when a version string is not in a parseable format.

class Version(version, package=None)[source]

Bases: twisted.python.versions.Version

Holds, parses, and does comparison operations for version numbers.

Attr str package:
 The package name, if available.
Attr int major:The major version number.
Attr int minor:The minor version number.
Attr int micro:The micro version number.
Attr str prerelease:
 The prerelease specifier isn’t always present, though when it is, it’s usually separated from the main major.minor.micro part of the version string with a -, +, or # character. Sometimes the prerelease is another number, although often it can be a word specifying the release state, i.e. +alpha, -rc2, etc.

Create a version object.

Comparisons may be computed between instances of :class:`Version`s.

>>> from bridgedb.parse.versions import Version
>>> v1 = Version("0.2.3.25", package="tor")
>>> v1.base()
'0.2.3.25'
>>> v1.package
'tor'
>>> v2 = Version("0.2.5.1-alpha", package="tor")
>>> v2
Version(package=tor, major=0, minor=2, micro=5, prerelease=1-alpha)
>>> v1 == v2
False
>>> v2 > v1
True
Parameters:
  • version (str) – A Tor version string specifier, i.e. one taken from either the client-versions or server-versions lines within a Tor cached-consensus file.
  • package (str) – The package or program which we are creating a version number for.
base()[source]

Get the base version number (with prerelease).

Return type:string
Returns:A version number, without the package/program name, and with the prefix (if available). For example: ‘0.2.5.1-alpha’.
getPrefixedPrerelease(separator='.')[source]

Get the prerelease string, prefixed by the separator prefix.

Parameters:separator (string) – The separator to use between the rest of the version string and the prerelease string.
Return type:string
Returns:The separator plus the prefix, i.e. ‘.1-alpha’.