Document model
A content block that is octet data
Return a file object with the data.
| Parameters: | safe – If True, return a new file object that is a copy of the data. You will be responsible for closing the file. Otherwise, it will be the original file object that is seeked to the correct offset. Be sure to not read beyond its length and seek back to the original position if necessary. |
|---|
Return an iterable of bytes of the source data
Return a BinaryBlock using given file object
Set the reference to the file or filename with the data.
This is a convenience function to setting the attributes individually.
Reference to a file containing the content block data.
When reading, the file is seeked to file_offset.
The length of the data
The filename of the referenced data. It must be a valid file.
The file object to be read from. It is important that this file object is not shared or race conditions will occur. File objects are not closed automatically.
Return a file object with the data.
| Parameters: | safe – If True, return a new file object that is a copy of the data. You will be responsible for closing the file. Otherwise, it will be the original file object that is seeked to the correct offset. Be sure to not read beyond its length and seek back to the original position if necessary. |
|---|
A content block (fields/data) within a Record.
If this block was loaded from a file, this attribute will be a BinaryBlock of the original file. Otherwise, this attribute is None.
Return a BlockWithPayload
| Parameters: |
|
|---|
Metaclass that indicates this object can be serialized to bytes
Return an iterable of bytes
Load and return BinaryBlock or BlockWithPayload
Name and value pseudo-map list
Behaves like a dict or mutable mapping. Mutable mapping operations remove any duplicates in the field list.
Fields extended with a HTTP status attribute.
The str of the HTTP status message and code.
Append a name-value field to the list
Count the number of times this name occurs in the list
Return a list of values
Return the index of the first occurance of given name
Scan for multiline value which is prefixed with a space or tab
Return the underlying list
A header of a WARC Record.
A str containing the version
Data within a content block that has fields
Return a file object with the data.
| Parameters: | safe – If True, return a new file object that is a copy of the data. You will be responsible for closing the file. Otherwise, it will be the original file object that is seeked to the correct offset. Be sure to not read beyond its length and seek back to the original position if necessary. |
|---|
Return an iterable of bytes of the source data
Set the reference to the file or filename with the data.
This is a convenience function to setting the attributes individually.
A WARC Record within a WARC file.
If this record was loaded from a file, this attribute contains an int describing the location of the record in the file.
Metaclass that indicates this object can be serialized to str
A Web ARChive file model.
Typically, large streaming operations should use open() and read_record() functions.
Open and load the contents of the given filename.
The records are located in records.
Archive process tools
Base class for iterating through records
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
Version info
Verification helpers
Return True if the content block hash digest is valid
Return True if the payload hash digest is valid
Utility functions
Buffers the file to disk large parts at a time
Flush and close the IO object.
This method has no effect if the file is already closed.
Disconnect this buffer from its underlying raw stream and return it.
After the raw stream has been detached, the buffer is in an unusable state.
Flush write buffers, if applicable.
This is not implemented for read-only and non-blocking streams.
Read and return up to n bytes, with at most one read() call to the underlying raw stream. A short result does not imply that EOF is imminent.
Returns an empty bytes object on EOF.
Read and return a line from the stream.
If limit is specified, at most limit bytes will be read.
The line terminator is always b’n’ for binary files; for text files, the newlines argument to open can be used to select the line terminator(s) recognized.
Return a list of lines from the stream.
hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.
Truncate file to size bytes.
File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.
Write the given buffer to the IO stream.
Returns the number of bytes written, which is never less than len(b).
Raises BlockingIOError if the buffer is full and the underlying raw stream cannot accept more data at the moment.
A cache containing references to file objects.
File objects are closed when expired. Class is thread safe and will only return file objects belonging to its own thread.
True if the file is closed.
Disconnect this buffer from its underlying raw stream and return it.
After the raw stream has been detached, the buffer is in an unusable state.
Returns underlying file descriptor if one exists.
An IOError is raised if the IO object does not use a file descriptor.
Get a read-write view over the contents of the BytesIO object.
Retrieve the entire contents of the BytesIO object.
Always returns False since BytesIO objects are not connected to a tty-like device.
If the size argument is negative, read until EOF is reached. Return an empty string at EOF.
If the size argument is negative or omitted, read until EOF is reached. Return an empty string at EOF.
Returns number of bytes read (0 for EOF), or None if the object is set not to block as has no data to read.
Retain newline. A non-negative size argument limits the maximum number of bytes to return (an incomplete line may be returned then). Return an empty string at EOF.
Call readline() repeatedly and return a list of the lines so read. The optional size argument, if given, is an approximate bound on the total number of bytes in the lines returned.
Returns the new absolute position.
Size defaults to the current file position, as returned by tell(). The current file position is unchanged. Returns the new size.
Return the number of bytes written.
Note that newlines are not added. The sequence can be any iterable object producing strings. This is equivalent to calling write() for each string.
Adds _index_xxxxxx to the path.
It uses the basename aka filename of the path to generate the hex hash digest suffix.
Like shutil.copyfileobj() but with limit on how much to copy
Find the offset from current position of pattern
Renames files if they conflict with a directory in given path.
If a file has the same name as the directory, the file is renamed using append_index_filename(). :