Welcome to rcmp’s documentation!¶

`RCMP`¶

Rcmp is a more flexible replacement for filecmp from the standard Python library.

The basic idea here is that depending on content, files don’t always have to be entirely bitwise identical in order to be equivalent or “close enough” for many purposes like comparing the results of two builds. For example, some (broken) file formats embed a time stamp indicating the time when a file was produced even though the file system already tracks this information. Build the same file twice and the two copies will initially appear to be different due to the embedded time stamp. Only when the irrelevant embedded time stamp differences are ignored do the two files show out to otherwise be the same.

Rcmp includes a flexible extension structure to allow for precisely these sorts of living and evolving comparisons.

Extended Path Names¶

Rcmp is capable of recursively descending into a number of different file types including:

file system directories
archival and aggregating types including:
- ar
- cpio
- tar
compressed files including:
- zip
- gzip

In order to describe file locations which may extend beyond the traditional file system paths, rcmp introduces an extended path naming scheme. Traditional paths are described using the traditional slash separated list of names, /etc/hosts. And components which are included in other files, like a file located within a tar archive, are described using a sequence of brace encapsulated file format separaters. So, for instance, a file named foo located within a gzip compressed, (.gz), tar archive named tarchive.tar would be described as tarchive.tar.gz{gzip}tarchive.tar{tar}foo. And these can be combined as with /home/rich/tarchive.tar.gz{gzip}tarchive.tar{tar}foo.

Items which are not in the file system proper are referred to internally as being “boxed”.

Script Usage¶

Rcmp is both a library and a command line script for driving the library.

Class Architecture¶

class rcmp.Item(name)¶

Things which can be compared are represented internally by instances of class Item. These can be items in the file system, like a file or directory, or in an archive, like an archive member.

This is used for caching the results from calls like stat and for holding content.

Parameters:	name (string) – file system name

boxed¶

Returns True if and only if we are “boxed”. That is, if we are not located directly in the file system but instead are encapsulated within some other file.

Return type:	boolean

close()¶: Close any outstanding file descriptor if relevant.

content¶

The contents of the entire file, in memory.

Return type:	bytearray or possibly an mmap’d section of file.

device¶

Return device number from stat.

Return type:	string

exists¶

Check for existence. Boxed items always exist. Unboxed items exist if they exist in the file system.

Return type:	boolean

fd¶

If we have a file descriptor, return it. If not, then open one, cache it, and return it.

Return type:	file

inode¶

Return the inode number from stat.

Return type:	string

isdir¶

Return True if and only if we are represent a file system directory.

Return type:	boolean

islnk¶

Return True if and only if we represent a symbolic link.

Return type:	boolean

isreg¶

Return True if and only if we represent a regular file.

Return type:	boolean

link¶

Return a string representing the path to which the symbolic link points. This presumes that we are a symbolic link.

Return type:	string

name¶

name in the extended file system name space of this Item.

Return type:	string

size¶

Return our size. Look it up in stat, (and cache the result), if we don’t already know what it is.

Return type:	int

stat¶

If we have a statbuf, return it.

If not, then look one up, cache it, and return it.

Return type:	statbuf

class rcmp.Items¶

There is a global set of all instances of class Item stored in the singular class Items.

This exists primarily to prevent us from creating a duplicate Item for the same path name.

Note

The class is used directly here as a global aggregator, a singleton. It is never instantiated but instead the class itself is used as a singleton.

classmethod delete(name)¶

Delete an Item from the set.

Parameters:	name (string) – name of the `Item` to be deleted.

classmethod find_or_create(name)¶

Look up an Item with name. If necessary, create it.

Parameters:	name (string) – the name of the :py:class`Item` to look up
Return type:	`Item`

class rcmp.Same¶: Returned to indicate an authoritative claim of sufficient identicality. No further comparators need be tried.

Note

The class itself is used as a constant. It is never instantiated.

class rcmp.Different¶: Returned to indicate an authoritative claim of difference. No further comparators need be tried.

Note

The class itself is used as a constant. It is never instantiated.

class rcmp.Comparator¶

Represents a single comparison heuristic. This is an abstract class. It is intended solely to act as a base class for subclasses. It is never instantiated.

Subclasses based on Comparator implement individual heuristics for comparing items when applied to a Comparison. There are many Comparator subclasses included.

There are no instantiation variables nor properties.

applies(comparison)¶

Return True if and only if we apply to the given comparison.

Return type:	boolean

cmp(comparison)¶

Apply ourselves to the given Comparison.

If can make an authoritative determination about whether the Items are alike then return either Same or Different. If we can make no such determination, then return a non-True value.

Return type:	`Same`, `Different`, or a non-True value

class rcmp.Aggregator(comparators=[])¶

This is an abstract base class intended for things which are composed of other things. So, for instance, a directory, or a file archive.

cmp(comparison)¶: Compare our lists and return the result.

class rcmp.Comparison(lname=u'', rname=u'', litem=False, ritem=False, comparators=False, ignores=[], exit_asap=False)¶

Represents a pair of objects to be compared.

An instance of Comparison comprises a pair of Item, a list of Comparator, and a method for applying the list of Comparator to the pair of Item and returning an answer.

If exit_asap is true, the first difference will end the comparison. If it is not true, the comparison will continue despite knowing that our aggregate result is that we are Different. This is useful for getting a complete list of all differences.

exit_asap=False is like “make -k” in the sense that it reports on all differences rather than stopping after the first.

Parameters:	lname (string) – path name of the first thing, (the leftmost one) rname (string) – path name of the second thing, (the rightmost one) comparators (list of `Comparator`) – list of comparators to be applied ignores (list of strings) – wild card patterns of path names to be ignored exit_asap (boolean) – exit as soon as possible

cmp()¶

Compare our pair of Item.

Run through our list of Comparator calling each one in turn with our pair of Item. Each comparator is expected to return either:

any non True value, (null, False, etc): indicating an indeterminate result, that is, that this particular comparator could make no authoritative determinations and that the next comparator in the list should be tried
Same: an authoritative declaration that the items are sufficiently alike and thus no further comparators need be tried
Different: an authoritative declaration that the items are insufficiently alike and thus no further comparators need be tried.

If no Comparator returns non-null, then IndeterminateResult will be raised.

pair¶: A 2 item list of the items to be compared

class rcmp.ComparisonList(stuff, comparators=False, ignores=[], exit_asap=False)¶

Represents a pair of lists of path names to be compared - one from column a, one from column b, etc.

An instance of ComparisonList is very similar to a Comparison except that instead of a pair of Items, it comprises a pair of lists of path names

Parameters:	stuff (a (2-element) list of lists of string) – path names to be compared

In all other ways, this class resembles Comparison.

Comparators¶

Listed in default order of application:

class rcmp.NoSuchFileComparator¶: Objects are different if either one is missing.

class rcmp.InodeComparator¶: Objects with the same inode and device are identical.

class rcmp.EmptyFileComparator¶: Two files which are each empty are equal. In particular, we don’t need to open them or read them to make this determination.

class rcmp.DirComparator(comparators=[])¶: Objects which are directories are special. They match if their contents match.

class rcmp.ArMemberMetadataComparator¶: Verify the metadata of each member of an ar archive.

class rcmp.BitwiseComparator¶: Objects which are bitwise identical are close enough.

class rcmp.SymlinkComparator¶: Symlinks are equal if they point to the same place.

class rcmp.BuriedPathComparator¶

Files which differ only in that they have their paths buried in them aren’t really different.

(currently unused).

class rcmp.ElfComparator¶: Elf files are different if any of the important sections are different.

class rcmp.ArComparator(comparators=[])¶: Ar archive files are different if any of the important members are different.

class rcmp.AMComparator¶: Automake generated Makefiles have some nondeterminisms. They’re the same if they’re the same aside from that. (May also need to make some allowance for different tool sets later.)

class rcmp.ConfigLogComparator¶: When autoconf tests fail, there’s a line written to the config.log which exposes the name of the underlying temporary file. Since the name of this temporary file changes from build to build, it introduces a nondeterminism.

Note

I’d ignore config.log files, (and started to do exactly that), but it occurs to me that differences in autoconf configuration are quite likely to cause build differences. So I’ve been more surgical.

class rcmp.KernelConfComparator¶: When “make config” is run in the kernel, it generates an auto.conf file which includes a time stamp. I think these files are important enough to merit more surgical checking. This comparator blots out the 4th line.

class rcmp.ZipComparator(comparators=[])¶: Zip archive files are different if any of the members are different.

class rcmp.TarComparator(comparators=[])¶: Tar archive files are different if any of the important members are different.

Note

must be called before GzipComparator in order to exploit the Python tarfile module’s ability to open compressed archives.

class rcmp.GzipComparator(comparators=[])¶: Gzip archives only have one member but the archive itself sadly includes a timestamp. You can see the timestamp using “gzip -l -v”.

class rcmp.CpioMemberMetadataComparator¶: Verify the metadata of each member of a cpio archive.

class rcmp.CpioComparator(comparators=[])¶: Cpio archive files are different if any of the important members are different.

class rcmp.DateBlotBitwiseComparator¶: Objects which are bitwise identical after date blotting are close enough. But this should only be tried late.

class rcmp.FailComparator¶: Used as a catchall - just return Difference

Utilities¶

rcmp.date_blot(input_string)¶

Convert dates embedded in a string into innocuous constants of uniform length.

Parameters:	input_string – input string
Return type:	string

rcmp.ignoring(ignores, fname)¶

Given a list of file names to be ignored and a specific file name to check, return the first ignore pattern from the list that matches the file name.

Parameters:	ignores (list of strings) – ignore patterns fname (string) – file name to check
Return type:	string or False (Can be used as a predicate.)

Exceptions¶

exception rcmp.RcmpException¶: Base class for all rcmp exceptions

exception rcmp.IndeterminateResult¶: Raised when we can’t make any authoritative determination. At the top level, this is an error condition as this case indicates that we’ve failed to accomplish our job. Note that this is significantly different from the non-True value returned by Comparator subclasses to indicate that they have no authoritative result.

Logging strategy:¶

Rcmp uses the python standard logging facility. The only non-obvious bits are that definitive differences are logged at WARNING level. Definitive Sames are logged at WARNING - 1. And indefinite results are logged at WARNING - 2. This allows for linearly increasing volumes of logging info starting with the information that is usually more important first.

Note

I keep thinking that it would be better to create an IgnoringComparator that simply returned Same. It would make much of the code much simpler. However, it would mean that we’d build entire trees in some cases and compare them all just to produce constants. This way we clip the tree.

Welcome to rcmp’s documentation!¶

`RCMP`¶

Extended Path Names¶

Script Usage¶

Class Architecture¶

Comparators¶

Utilities¶

Exceptions¶

Logging strategy:¶

Indices and tables¶

Table Of Contents

This Page

Navigation

Welcome to rcmp’s documentation!¶

RCMP¶

Extended Path Names¶

Script Usage¶

Class Architecture¶

Comparators¶

Utilities¶

Exceptions¶

Logging strategy:¶

Indices and tables¶

Table Of Contents

This Page

Quick search

Navigation

`RCMP`¶