XRootDPyFS

XRootDPyFS is a PyFilesystem interface for XRootD.

XRootD protocol aims at giving high performance, scalable fault tolerant access to data repositories of many kinds. The XRootDPyFS adds a high-level interface on top of the existing Python interface (pyxrootd) and makes it easy to e.g. copy a directory in parallel or recursively remove a directory.

Installation

If you just want to try out the library, the easiest is to use Docker. See Getting started below for details.

XRootDPyFS depends on PyFilesystem and XRootD Python bindings.

XRootDPyFS is not Python 3 compatible due to the underlying Python bindings not being Python 3 compatible.

XRootD Python bindings

The XRootD Python bindings can be somewhat tricky to install if this is your first experience with XRootD. First you must install XRootD as usual, then the Python bindings. The Python bindings are installed using python setup.py install and requires access to the xrootd headers and library. If these can’t be found you need to set the XRD_LIBDIR and XRD_INCDIR environment variables. See the OS X example below.

Cent OS 7/YUM based

Install XRootD + Python bindings using the official YUM repositories, e.g.:

$ rpm -Uvh \
  http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
$ yum install -y xrootd xrootd-server xrootd-client xrootd-client-devel \
  xrootd-python

See http://xrootd.org/dload.html to get the YUM repository addresses for other RedHat based distributions/versions.

Ubuntu

There is no official support for XRootD on Ubuntu, so you will have to install XRootD from the source distribution.

OS X

First, install XRootD using Homebrew:

$ brew install xrootd

Next, install the XRootD Python bindings:

$ xrootd -v
v4.1.1
$ VER=4.1.1
$ git clone git://github.com/xrootd/xrootd-python.git
$ cd xrootd-python
$ XRD_LIBDIR=/usr/local/lib/ \
  XRD_INCDIR=/usr/local/Cellar/xrootd/$VER/include/xrootd \
  python setup.py install

Note, you might want to activate a virtualenv prior to running the last python setup.py install. Also, in case you do not have cmake installed, you can get it easily via brew install cmake.

XRootDPyFS

Once the XRootD Python bindings have been installed, xrootdpyfs itself is on PyPI so all you need is:

$ pip install xrootdpyfs

Getting started

The easiest way to run the examples is to use the provided docker container. This way you do not need to have a local XRootD server plus all the libraries installed:

$ docker build -t xrootd .
$ docker run -h xrootdpyfs -it xrootd bash

Next, start a XRootD server in the container and fire up an ipython shell:

[xrootdpyfs@xrootdpyfs code]$ xrootd -b -l /dev/null
[xrootdpyfs@xrootdpyfs code]$ ipython

Quick examples

Here is a quick example of a file listing with the xrootd PyFilesystem integration:

>>> from xrootdpyfs import XRootDPyFS
>>> fs = XRootDPyFS("root://localhost//tmp/")
>>> fs.listdir("xrootdpyfs")
['test.txt']

Or, alternatively using the PyFilesystem opener (note the first import xrootdpyfs is required to ensure the XRootDPyFS opener is registered):

>>> import xrootdpyfs
>>> from fs.opener import opener
>>> fs, path = opener.parse("root://localhost//tmp/")
>>> fs.listdir("xrootdpyfs")
[u'test.txt']

Reading files:

>>> f = fs.open("xrootdpyfs/test.txt")
>>> f.read()
'Welcome to xrootdpyfs!'
>>> f.close()

Reading files using the getcontents() method:

>>> fs.getcontents("xrootdpyfs/test.txt")
'Welcome to xrootdpyfs!'

Writing files:

>>> f = fs.open("xrootdpyfs/hello.txt", "w+")
>>> f.write("World")
>>> f.close()

Writing files using the setcontents() method (returns the number of bytes written):

>>> fs.setcontents("xrootdpyfs/test.txt", "World")
5

API

Filesystem interface

PyFilesystem implementation of XRootD protocol.

XRootDPyFS is a subclass of PyFilesystem FS class and thus implements the entire PyFilesystem Filesystem interface .

Note

All methods prefixed with xrd in XRootDPyFS are specific to XRootDPyFS and not supported by other PyFilesystem implementations.

class xrootdpyfs.fs.XRootDPyFS(url, query=None)[source]

XRootD PyFilesystem interface.

The argument query is particular useful for specifying e.g. Kerberos or GSI authentication without adding it in the URL. The following:

fs = XRootDPyFS(
    "root://localhost?&xrd.wantprot=krb5&xrd.k5ccname=/tmp/krb_filexxx"
)

is equivalent to:

fs = XRootDPyFS(
    "root://localhost",
    {"xrd.wantprot": "krb5", "xrd.k5ccname": "/tmp/krb_filexxx"}
)

This way you can easily separate the URL from the authentication query parameters. Note that xrd.k5ccname specifies a Kerberos ticket and not a keytab.

Parameters:
  • url – A root URL.
  • query (dict) – Dictionary of key/values to append to the URL query string. The contents of the dictionary gets merged with any querystring provided in the url.
copy(src, dst, overwrite=False)[source]

Copy a file from source to destination.

Parameters:
  • src (string) – Source path.
  • dst (string) – Destination path.
  • overwrite (bool) – If True, then an existing file at the destination may be overwritten; If False then DestinationExistsError will be raised.
copydir(src, dst, overwrite=False, parallel=True)[source]

Copy a directory from source to destination.

By default the copy is done by recreating the source directory structure at the destination, and then copy files in parallel from source to destination.

Parameters:
  • src (string) – Source directory path.
  • dst (string) – Destination directory path.
  • overwrite (bool) – If True then any existing files in the destination directory will be overwritten.
  • parallel (bool) – If True (default), the copy will be done in parallel.
exists(path)[source]

Check if a path references a valid resource.

Parameters:path (string) – A path in the filesystem.
Return type:bool
getinfo(path)[source]

Return information for a path as a dictionary.

The following values can be found in the info dictionary:

  • size - Number of bytes used to store the file or directory.

  • created_time - A datetime object containing the time the

    resource was created.

  • modified_time - A datetime object containing the time the

    resource was modified.

  • accessed_time - A datetime object containing the time the

    resource was accessed.

  • offline - True if file/directory is offline.

  • writable - True if file/directory is writable.

  • readable - True if file/directory is readable.

  • executable - True if file/directory is executable.

Parameters:path (string) – Path to retrieve information about.
Return type:dict
getpathurl(path, allow_none=False, with_querystring=False)[source]

Get URL that corresponds to the given path.

ilistdir(path='./', wildcard=None, full=False, absolute=False, dirs_only=False, files_only=False)[source]

Generator yielding the files and directories under a given path.

This method behaves identically to fs.base:FS.listdir() but returns an generator instead of a list.

isdir(path, _statobj=None)[source]

Check if a path references a directory.

Parameters:path (string) – a path in the filesystem
Return type:bool
isfile(path, _statobj=None)[source]

Check if a path references a file.

Parameters:path (string) – a path in the filesystem
Return type:bool
listdir(path='./', wildcard=None, full=False, absolute=False, dirs_only=False, files_only=False)[source]

List the the files and directories under a given path.

The directory contents are returned as a list of unicode paths.

Parameters:
  • path (string) – Path to list.
  • wildcard (string containing a unix filename pattern, or a callable that accepts a path and returns a boolean) – Return only paths that matches the wildcard
  • full (bool) – Return full paths (relative to the base path).
  • absolute (bool) – Return absolute paths (paths beginning with /)
  • dirs_only (bool) – If True, return only directories.
  • files_only (bool) – If True, return only files.
Return type:

Iterable of paths.

Raises:
  • fs.errors.ResourceInvalidError – If the path exists, but is not a directory.
  • fs.errors.ResourceNotFoundError – If the path is not found.
makedir(path, recursive=False, allow_recreate=False)[source]

Make a directory on the filesystem.

Parameters:
  • path (string) – Path of directory.
  • recursive (bool) – If True, any intermediate directories will also be created.
  • allow_recreate – If True, re-creating a directory wont be an error.
Raises:
  • fs.errors.DestinationExistsError – If the path is already existing, and allow_recreate is False.
  • fs.errors.ResourceInvalidError – If a containing directory is missing and recursive is False or if a path is an existing file.
move(src, dst, overwrite=False, **kwargs)[source]

Move a file from one location to another.

Parameters:
  • src (string) – Source path.
  • dst (string) – Destination path.
  • overwrite (bool) – When True the destination will be overwritten (if it exists), otherwise a DestinationExistsError will be thrown.
Raises:
  • fs.errors.DestinationExistsError – If destination exists and overwrite is False.
  • fs.errors.ResourceInvalidError – If source is not a file.
  • fs.errors.ResourceNotFoundError – If source was not found.
movedir(src, dst, overwrite=False, **kwargs)[source]

Move a directory from one location to another.

Parameters:
  • src (string) – Source directory path.
  • dst (string) – Destination directory path.
  • overwrite (bool) – When True the destination will be overwritten (if it exists), otherwise a DestinationExistsError will be thrown.
Raises:
  • fs.errors.DestinationExistsError – If destination exists and overwrite is False.
  • fs.errors.ResourceInvalidError – If source is not a directory.
  • fs.errors.ResourceNotFoundError – If source was not found.
open(path, mode='r', buffering=-1, encoding=None, errors=None, newline=None, line_buffering=False, **kwargs)[source]

Open the given path and return a file-like object.

Parameters:
  • path (string) – Path to file that should be opened.
  • mode (string) – Mode of file to open, identical to the mode string used in ‘file’ and ‘open’ builtins.
  • buffering – An optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer.
  • encoding – Determines encoding used when writing unicode data.
  • errors – An optional string that specifies how encoding and decoding errors are to be handled (e.g. strict, ignore or replace).
  • newline – Newline character to use (either \\n, \\r, \\r\\n, '' or None).
  • line_buffering – Unsupported. Anything by False will raise and error.
Return type:

A file-like object.

Raises:
  • fs.errors.ResourceInvalidError – If an intermediate directory is an file.
  • fs.errors.ResourceNotFoundError – If the path is not found.
remove(path)[source]

Remove a file from the filesystem.

Parameters:

path (string) – Path of the resource to remove.

Raises:
  • fs.errors.ResourceInvalidError – If the path is a directory.
  • fs.errors.DirectoryNotEmptyError – If the directory is not empty.
removedir(path, recursive=False, force=False)[source]

Remove a directory from the filesystem.

Parameters:
  • path (string) – Path of the directory to remove.
  • recursive (bool) – Unsupported by XRootDPyFS implementation.
  • force (bool) – If True, any directory contents will be removed (recursively). Note that this can be very expensive as the xrootd protocol does not support recursive deletes - i.e. the library will do a full recursive listing of the directory and send a network request per file/directory.
Raises:
  • fs.errors.DirectoryNotEmptyError – If the directory is not empty and force is False.
  • fs.errors.ResourceInvalidError – If the path is not a directory.
  • fs.errors.ResourceNotFoundError – If the path does not exist.
rename(src, dst)[source]

Rename a file or directory.

Parameters:
  • src (string) – path to rename.
  • dst (string) – new name.
Raises:
  • DestinationExistsError – if destination already exists.
  • ResourceNotFoundError – if source does not exists.
xrd_checksum(path, _statobj=None)[source]

Get checksum of file from server.

Specific to XRootDPyFS. Note not all XRootD servers support the checksum operation (in particular the default local xrootd server).

Parameters:

src (string) – File to calculate checksum for.

Raises:
  • fs.errors.UnsupportedError – If server does not support checksum calculation.
  • fs.errors.FSError – If you try to get the checksum of e.g. a directory.
xrd_client

Pyxrootd filesystem client.

Specific to XRootDPyFS.

xrd_get_rooturl()[source]

Get the URL with query string for this FS.

Specific to XRootDPyFS.

xrd_ping()[source]

Ping xrootd server.

Specific to XRootDPyFS.

File interface

File-like interface for interacting with files over the XRootD protocol.

class xrootdpyfs.xrdfile.XRootDPyFile(path, mode='r', buffering=-1, encoding=None, errors=None, newline=None, line_buffering=False, buffer_size=None, **kwargs)[source]

File-like interface for working with files over XRootD protocol.

This class understands and will accept the following mode strings, with any additional characters being ignored:

  • r - Open the file for reading only.
  • r+ - Open the file for reading and writing.
  • r- - Open the file for streamed reading; do not allow seek/tell.
  • w - Open the file for writing only; create the file if it doesn’t exist; truncate it to zero length.
  • w+ - Open the file for reading and writing; create the file if it doesn’t exist; truncate it to zero length.
  • w- - Open the file for streamed writing; do not allow seek/tell.
  • a - Open the file for writing only; create the file if it doesn’t exist; place pointer at end of file.
  • a+ - Open the file for reading and writing; create the file if it doesn’t exist; place pointer at end of file.

Note

Streamed reading/writing modes has no performance advantages over non-streamed reading/writing for XRootD.

Parameters:
  • path (string) – Path to file that should be opened.
  • mode (string) – Mode of file to open, identical to the mode string used in ‘file’ and ‘open’ builtins.
  • buffering – An optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer.
  • encoding – Determines encoding used when writing unicode data.
  • errors – An optional string that specifies how encoding and decoding errors are to be handled (e.g. strict, ignore or replace).
  • newline – Newline character to use (either \\n, \\r, \\r\\n, '' or None).
  • line_buffering – Unsupported. Anything by False will raise and error.
  • buffer_size – Buffer size used when reading files (defaults to 64K). This can likely be optimized to chunks up to 2MB depending on your desired memory usage.
close()[source]

Close the file, including flushing the write buffers.

The file may not be accessed further once it is closed.

closed

Check if file is closed.

fileno()[source]

Get the underlying file descriptor.

Unsupported by XRootDPyFS (added for io module compatibility).

flush()[source]

Flush write buffers.

isatty()[source]

Check if file is a TTY (false always).

Added for io module compatibility.

name

Get filename.

next()[source]

Return next item for file iteration.

read(sizehint=-1)[source]

Read sizehint bytes from the file object.

If no sizehint is provided the entire file is read! Multiple calls to this method after EOF as been reached, will return an empty string.

Oaram sizehint:Number of bytes to read from file object.
readable()[source]

Check if file is readable.

readline()[source]

Read one entire line from the file.

A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line).

readlines()[source]

Read until EOF using readline().

Warning

This methods reads the entire file into memory! You are probably better off using either xreadlines or just normal iteration over the file object.

seek(offset, whence=0)[source]

Set the file’s internal position pointer, approximately.

The possible values of whence and their meaning are defined in the Linux man pages for lseek(): http://man7.org/linux/man-pages/man2/lseek.2.html

SEEK_SET
The internal position pointer is set to offset bytes.
SEEK_CUR
The ipp is set to its current position plus offset bytes.
SEEK_END
The ipp is set to the size of the file plus offset bytes.
seekable()[source]

Check if file is seekable.

size

Get file size.

tell()[source]

Get the location of the file’s internal position pointer.

truncate(size=None)[source]

Truncate the file’s size to size.

Note that size will never be None; if it was not specified by the user the current file position is used.

writable()[source]

Check if file is writable.

write(data, flushing=False)[source]

Write the given string to the file.

If the keyword argument ‘flushing’ is true, it indicates that the internal write buffers are being flushed, and all the given data is expected to be written to the file.

writelines(sequence)[source]

Write an sequence of lines to file.

xreadlines(sizehint=-1)[source]

Get an iterator over number of lines.

Opener

PyFilesystem opener for XRootD.

class xrootdpyfs.opener.XRootDPyOpener[source]

XRootD PyFilesystem Opener.

desc = 'Opens a filesystem via the XRootD protocol.'
classmethod get_fs(registry, fs_name, fs_name_params, fs_path, writeable, create_dir)[source]

Get a XRootDPyFS object.

Parameters:
  • fs_name – The name of the opener, as extracted from the protocol part of the url.
  • fs_name_params – Reserved for future use.
  • fs_path – Path part of the url.
  • writeable – If True, then get_fs must return an FS that can be written to.
  • create_dir – If True then get_fs should attempt to silently create the directory referenced in the path.
names = ['root', 'roots']

Environment

Set global timeout behavior in environment.

Note

XRootD timeout behavior depends on a number of different parameters:

  • Timeout resolution: The time interval between timeout detection.
  • Timeout: The time to wait for a response to a request (should be larger than timeout resolution).
  • Connection window: The time interval during which a single new connection will be attempted. Subsequent attempts will not append until the next window.
  • Connection retry: Number of connection windows to try before declaring permanent failure.
xrootdpyfs.env.set_connectionretry(value)[source]

Number of connection attempts that should be made.

I.e number of available connection windows before declaring a permanent failure.

Sets the environment variable XRD_CONNECTIONRETRY.

xrootdpyfs.env.set_connectionwindow(value)[source]

Set time window for the connection establishment.

A connection failure is declared if the connection is not established within the time window. If a connection failure happened earlier then another connection attempt will only be made at the beginning of the next window.

Sets the environment variable `XRD_CONNECTIONWINDOW.

xrootdpyfs.env.set_timeout(value)[source]

Default value for the time after which an error is declared.

This value can be overwritten on case-by-case in xrootdpyfs.fs.XRootDPyFS.

Sets the environment variable XRD_REQUESTTIMEOUT.

xrootdpyfs.env.set_timeoutresolution(value)[source]

Set resolution for the timeout events.

Ie. timeout events will be processed only every number of seconds.

Sets the environment variable XRD_TIMEOUTRESOLUTION.

Changes

Version 0.1.2 (released 2016-08-19)

  • Fixes issue with generated root url when query string was present.

Version 0.1.1 (released 2015-10-09)

  • Changed package name from xrootdfs to xrootdpyfs due to naming conflict with XRootD FUSE.

Version 0.1.0 (released 2015-09-29)

  • Initial public release.

Contributing

Bug reports, feature requests, and other contributions are welcome. If you find a demonstrable problem that is caused by the code of this library, please:

  1. Search for already reported problems.
  2. Check if the issue has been fixed or is still reproducible on the latest master branch.
  3. Create an issue with a test case.

If you create a feature branch, you can run the tests to ensure everything is operating correctly. The easiest is to run the tests using Docker:

$ docker build -t xrootd .
$ docker run -h xrootdpyfs -it xrootd

You can also run the tests locally:

$ ./run-tests.sh

You will however need to start a local XRootD server, e.g.:

$ xrootd -b -l /dev/null /tmp <tmpfolder>

where, <tmpfolder> is dependent on your system (e.g. on OS X it is /var/folders, while on Linux it can be left empty).

Note

XRootD have issues with Docker’s default hostname, thus it is important to supply a host name to docker run via the -h option.

License

xrootdpyfs is free software; you can redistribute it and/or modify it under the terms of the Revised BSD License quoted below.

Copyright (C) 2015 CERN.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.

Authors

Contact us at info@inveniosoftware.org