XRootDPyFS¶
XRootDPyFS is a PyFilesystem interface for XRootD.
XRootD protocol aims at giving high performance, scalable fault tolerant access to data repositories of many kinds. The XRootDPyFS adds a high-level interface on top of the existing Python interface (pyxrootd) and makes it easy to e.g. copy a directory in parallel or recursively remove a directory.
Installation¶
If you just want to try out the library, the easiest is to use Docker. See Getting started below for details.
XRootDPyFS depends on PyFilesystem and XRootD Python bindings.
XRootDPyFS is not Python 3 compatible due to the underlying Python bindings not being Python 3 compatible.
XRootD Python bindings¶
The XRootD Python bindings can be somewhat tricky to install if this is your
first experience with XRootD. First you must install XRootD as usual, then the
Python bindings. The Python bindings are installed using
python setup.py install
and requires access to the xrootd headers and
library. If these can’t be found you need to set the XRD_LIBDIR
and
XRD_INCDIR
environment variables. See the OS X example below.
Cent OS 7/YUM based¶
Install XRootD + Python bindings using the official YUM repositories, e.g.:
$ rpm -Uvh \
http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
$ yum install -y xrootd xrootd-server xrootd-client xrootd-client-devel \
xrootd-python
See http://xrootd.org/dload.html to get the YUM repository addresses for other RedHat based distributions/versions.
Ubuntu¶
There is no official support for XRootD on Ubuntu, so you will have to install XRootD from the source distribution.
OS X¶
First, install XRootD using Homebrew:
$ brew install xrootd
Next, install the XRootD Python bindings:
$ xrootd -v
v4.1.1
$ VER=4.1.1
$ git clone git://github.com/xrootd/xrootd-python.git
$ cd xrootd-python
$ XRD_LIBDIR=/usr/local/lib/ \
XRD_INCDIR=/usr/local/Cellar/xrootd/$VER/include/xrootd \
python setup.py install
Note, you might want to activate a virtualenv prior to running the last
python setup.py install
. Also, in case you do not have cmake
installed,
you can get it easily via brew install cmake
.
XRootDPyFS¶
Once the XRootD Python bindings have been installed, xrootdpyfs itself is on PyPI so all you need is:
$ pip install xrootdpyfs
Getting started¶
The easiest way to run the examples is to use the provided docker container. This way you do not need to have a local XRootD server plus all the libraries installed:
$ docker build -t xrootd .
$ docker run -h xrootdpyfs -it xrootd bash
Next, start a XRootD server in the container and fire up an ipython shell:
[xrootdpyfs@xrootdpyfs code]$ xrootd -b -l /dev/null
[xrootdpyfs@xrootdpyfs code]$ ipython
Quick examples¶
Here is a quick example of a file listing with the xrootd PyFilesystem integration:
>>> from xrootdpyfs import XRootDPyFS
>>> fs = XRootDPyFS("root://localhost//tmp/")
>>> fs.listdir("xrootdpyfs")
['test.txt']
Or, alternatively using the PyFilesystem opener (note the first
import xrootdpyfs
is required to ensure the XRootDPyFS
opener is registered):
>>> import xrootdpyfs
>>> from fs.opener import opener
>>> fs, path = opener.parse("root://localhost//tmp/")
>>> fs.listdir("xrootdpyfs")
[u'test.txt']
Reading files:
>>> f = fs.open("xrootdpyfs/test.txt")
>>> f.read()
'Welcome to xrootdpyfs!'
>>> f.close()
Reading files using the getcontents()
method:
>>> fs.getcontents("xrootdpyfs/test.txt")
'Welcome to xrootdpyfs!'
Writing files:
>>> f = fs.open("xrootdpyfs/hello.txt", "w+")
>>> f.write("World")
>>> f.close()
Writing files using the setcontents()
method (returns the number of bytes
written):
>>> fs.setcontents("xrootdpyfs/test.txt", "World")
5
API¶
Filesystem interface¶
PyFilesystem implementation of XRootD protocol.
XRootDPyFS
is a subclass of PyFilesystem FS class and thus
implements the entire PyFilesystem
Filesystem interface
.
Note
All methods prefixed with xrd
in XRootDPyFS
are specific to
XRootDPyFS and not supported by other PyFilesystem implementations.
-
class
xrootdpyfs.fs.
XRootDPyFS
(url, query=None)[source]¶ XRootD PyFilesystem interface.
The argument
query
is particular useful for specifying e.g. Kerberos or GSI authentication without adding it in the URL. The following:fs = XRootDPyFS( "root://localhost?&xrd.wantprot=krb5&xrd.k5ccname=/tmp/krb_filexxx" )
is equivalent to:
fs = XRootDPyFS( "root://localhost", {"xrd.wantprot": "krb5", "xrd.k5ccname": "/tmp/krb_filexxx"} )
This way you can easily separate the URL from the authentication query parameters. Note that
xrd.k5ccname
specifies a Kerberos ticket and not a keytab.Parameters: - url – A root URL.
- query (dict) – Dictionary of key/values to append to the URL query string.
The contents of the dictionary gets merged with any querystring
provided in the
url
.
-
copy
(src, dst, overwrite=False)[source]¶ Copy a file from source to destination.
Parameters: - src (string) – Source path.
- dst (string) – Destination path.
- overwrite (bool) – If True, then an existing file at the destination may
be overwritten; If False then
DestinationExistsError
will be raised.
-
copydir
(src, dst, overwrite=False, parallel=True)[source]¶ Copy a directory from source to destination.
By default the copy is done by recreating the source directory structure at the destination, and then copy files in parallel from source to destination.
Parameters: - src (string) – Source directory path.
- dst (string) – Destination directory path.
- overwrite (bool) – If True then any existing files in the destination directory will be overwritten.
- parallel (bool) – If True (default), the copy will be done in parallel.
-
exists
(path)[source]¶ Check if a path references a valid resource.
Parameters: path (string) – A path in the filesystem. Return type: bool
-
getinfo
(path)[source]¶ Return information for a path as a dictionary.
The following values can be found in the info dictionary:
size
- Number of bytes used to store the file or directory.created_time
- A datetime object containing the time theresource was created.
modified_time
- A datetime object containing the time theresource was modified.
accessed_time
- A datetime object containing the time theresource was accessed.
offline
- True if file/directory is offline.writable
- True if file/directory is writable.readable
- True if file/directory is readable.executable
- True if file/directory is executable.
Parameters: path (string) – Path to retrieve information about. Return type: dict
-
getpathurl
(path, allow_none=False, with_querystring=False)[source]¶ Get URL that corresponds to the given path.
-
ilistdir
(path='./', wildcard=None, full=False, absolute=False, dirs_only=False, files_only=False)[source]¶ Generator yielding the files and directories under a given path.
This method behaves identically to
fs.base:FS.listdir()
but returns an generator instead of a list.
-
isdir
(path, _statobj=None)[source]¶ Check if a path references a directory.
Parameters: path (string) – a path in the filesystem Return type: bool
-
isfile
(path, _statobj=None)[source]¶ Check if a path references a file.
Parameters: path (string) – a path in the filesystem Return type: bool
-
listdir
(path='./', wildcard=None, full=False, absolute=False, dirs_only=False, files_only=False)[source]¶ List the the files and directories under a given path.
The directory contents are returned as a list of unicode paths.
Parameters: - path (string) – Path to list.
- wildcard (string containing a unix filename pattern, or a callable that accepts a path and returns a boolean) – Return only paths that matches the wildcard
- full (bool) – Return full paths (relative to the base path).
- absolute (bool) – Return absolute paths (paths beginning with /)
- dirs_only (bool) – If True, return only directories.
- files_only (bool) – If True, return only files.
Return type: Iterable of paths.
Raises: - fs.errors.ResourceInvalidError – If the path exists, but is not a directory.
- fs.errors.ResourceNotFoundError – If the path is not found.
-
makedir
(path, recursive=False, allow_recreate=False)[source]¶ Make a directory on the filesystem.
Parameters: - path (string) – Path of directory.
- recursive (bool) – If True, any intermediate directories will also be created.
- allow_recreate – If True, re-creating a directory wont be an error.
Raises: - fs.errors.DestinationExistsError – If the path is already existing, and allow_recreate is False.
- fs.errors.ResourceInvalidError – If a containing directory is missing and recursive is False or if a path is an existing file.
-
move
(src, dst, overwrite=False, **kwargs)[source]¶ Move a file from one location to another.
Parameters: - src (string) – Source path.
- dst (string) – Destination path.
- overwrite (bool) – When True the destination will be overwritten (if it exists), otherwise a DestinationExistsError will be thrown.
Raises: - fs.errors.DestinationExistsError – If destination exists and
overwrite
is False. - fs.errors.ResourceInvalidError – If source is not a file.
- fs.errors.ResourceNotFoundError – If source was not found.
-
movedir
(src, dst, overwrite=False, **kwargs)[source]¶ Move a directory from one location to another.
Parameters: - src (string) – Source directory path.
- dst (string) – Destination directory path.
- overwrite (bool) – When True the destination will be overwritten (if it exists), otherwise a DestinationExistsError will be thrown.
Raises: - fs.errors.DestinationExistsError – If destination exists and overwrite is False.
- fs.errors.ResourceInvalidError – If source is not a directory.
- fs.errors.ResourceNotFoundError – If source was not found.
-
open
(path, mode='r', buffering=-1, encoding=None, errors=None, newline=None, line_buffering=False, **kwargs)[source]¶ Open the given path and return a file-like object.
Parameters: - path (string) – Path to file that should be opened.
- mode (string) – Mode of file to open, identical to the mode string used in ‘file’ and ‘open’ builtins.
- buffering – An optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer.
- encoding – Determines encoding used when writing unicode data.
- errors – An optional string that specifies how encoding and
decoding errors are to be handled (e.g.
strict
,ignore
orreplace
). - newline – Newline character to use (either
\\n
,\\r
,\\r\\n
,''
orNone
). - line_buffering – Unsupported. Anything by False will raise and error.
Return type: A file-like object.
Raises: - fs.errors.ResourceInvalidError – If an intermediate directory is an file.
- fs.errors.ResourceNotFoundError – If the path is not found.
-
remove
(path)[source]¶ Remove a file from the filesystem.
Parameters: path (string) – Path of the resource to remove.
Raises: - fs.errors.ResourceInvalidError – If the path is a directory.
- fs.errors.DirectoryNotEmptyError – If the directory is not empty.
-
removedir
(path, recursive=False, force=False)[source]¶ Remove a directory from the filesystem.
Parameters: - path (string) – Path of the directory to remove.
- recursive (bool) – Unsupported by XRootDPyFS implementation.
- force (bool) – If True, any directory contents will be removed (recursively). Note that this can be very expensive as the xrootd protocol does not support recursive deletes - i.e. the library will do a full recursive listing of the directory and send a network request per file/directory.
Raises: - fs.errors.DirectoryNotEmptyError – If the directory is not empty and force is False.
- fs.errors.ResourceInvalidError – If the path is not a directory.
- fs.errors.ResourceNotFoundError – If the path does not exist.
-
rename
(src, dst)[source]¶ Rename a file or directory.
Parameters: - src (string) – path to rename.
- dst (string) – new name.
Raises: - DestinationExistsError – if destination already exists.
- ResourceNotFoundError – if source does not exists.
-
xrd_checksum
(path, _statobj=None)[source]¶ Get checksum of file from server.
Specific to
XRootDPyFS
. Note not all XRootD servers support the checksum operation (in particular the default local xrootd server).Parameters: src (string) – File to calculate checksum for.
Raises: - fs.errors.UnsupportedError – If server does not support checksum calculation.
- fs.errors.FSError – If you try to get the checksum of e.g. a directory.
-
xrd_client
¶ Pyxrootd filesystem client.
Specific to
XRootDPyFS
.
File interface¶
File-like interface for interacting with files over the XRootD protocol.
-
class
xrootdpyfs.xrdfile.
XRootDPyFile
(path, mode='r', buffering=-1, encoding=None, errors=None, newline=None, line_buffering=False, buffer_size=None, **kwargs)[source]¶ File-like interface for working with files over XRootD protocol.
This class understands and will accept the following mode strings, with any additional characters being ignored:
r
- Open the file for reading only.r+
- Open the file for reading and writing.r-
- Open the file for streamed reading; do not allow seek/tell.w
- Open the file for writing only; create the file if it doesn’t exist; truncate it to zero length.w+
- Open the file for reading and writing; create the file if it doesn’t exist; truncate it to zero length.w-
- Open the file for streamed writing; do not allow seek/tell.a
- Open the file for writing only; create the file if it doesn’t exist; place pointer at end of file.a+
- Open the file for reading and writing; create the file if it doesn’t exist; place pointer at end of file.
Note
Streamed reading/writing modes has no performance advantages over non-streamed reading/writing for XRootD.
Parameters: - path (string) – Path to file that should be opened.
- mode (string) – Mode of file to open, identical to the mode string used in ‘file’ and ‘open’ builtins.
- buffering – An optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer.
- encoding – Determines encoding used when writing unicode data.
- errors – An optional string that specifies how encoding and
decoding errors are to be handled (e.g.
strict
,ignore
orreplace
). - newline – Newline character to use (either
\\n
,\\r
,\\r\\n
,''
orNone
). - line_buffering – Unsupported. Anything by False will raise and error.
- buffer_size – Buffer size used when reading files (defaults to 64K). This can likely be optimized to chunks up to 2MB depending on your desired memory usage.
-
close
()[source]¶ Close the file, including flushing the write buffers.
The file may not be accessed further once it is closed.
-
closed
¶ Check if file is closed.
-
fileno
()[source]¶ Get the underlying file descriptor.
Unsupported by XRootDPyFS (added for
io
module compatibility).
-
name
¶ Get filename.
-
read
(sizehint=-1)[source]¶ Read
sizehint
bytes from the file object.If no
sizehint
is provided the entire file is read! Multiple calls to this method after EOF as been reached, will return an empty string.Oaram sizehint: Number of bytes to read from file object.
-
readline
()[source]¶ Read one entire line from the file.
A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line).
-
readlines
()[source]¶ Read until EOF using readline().
Warning
This methods reads the entire file into memory! You are probably better off using either
xreadlines
or just normal iteration over the file object.
-
seek
(offset, whence=0)[source]¶ Set the file’s internal position pointer, approximately.
The possible values of whence and their meaning are defined in the Linux man pages for lseek(): http://man7.org/linux/man-pages/man2/lseek.2.html
SEEK_SET
- The internal position pointer is set to offset bytes.
SEEK_CUR
- The ipp is set to its current position plus offset bytes.
SEEK_END
- The ipp is set to the size of the file plus offset bytes.
-
size
¶ Get file size.
-
truncate
(size=None)[source]¶ Truncate the file’s size to
size
.Note that
size
will never be None; if it was not specified by the user the current file position is used.
Opener¶
PyFilesystem opener for XRootD.
-
class
xrootdpyfs.opener.
XRootDPyOpener
[source]¶ XRootD PyFilesystem Opener.
-
desc
= 'Opens a filesystem via the XRootD protocol.'¶
-
classmethod
get_fs
(registry, fs_name, fs_name_params, fs_path, writeable, create_dir)[source]¶ Get a
XRootDPyFS
object.Parameters: - fs_name – The name of the opener, as extracted from the protocol part of the url.
- fs_name_params – Reserved for future use.
- fs_path – Path part of the url.
- writeable – If True, then
get_fs
must return an FS that can be written to. - create_dir – If True then
get_fs
should attempt to silently create the directory referenced in the path.
-
names
= ['root', 'roots']¶
-
Environment¶
Set global timeout behavior in environment.
Note
XRootD timeout behavior depends on a number of different parameters:
- Timeout resolution: The time interval between timeout detection.
- Timeout: The time to wait for a response to a request (should be larger than timeout resolution).
- Connection window: The time interval during which a single new connection will be attempted. Subsequent attempts will not append until the next window.
- Connection retry: Number of connection windows to try before declaring permanent failure.
-
xrootdpyfs.env.
set_connectionretry
(value)[source]¶ Number of connection attempts that should be made.
I.e number of available connection windows before declaring a permanent failure.
Sets the environment variable
XRD_CONNECTIONRETRY
.
-
xrootdpyfs.env.
set_connectionwindow
(value)[source]¶ Set time window for the connection establishment.
A connection failure is declared if the connection is not established within the time window. If a connection failure happened earlier then another connection attempt will only be made at the beginning of the next window.
Sets the environment variable
`XRD_CONNECTIONWINDOW
.
-
xrootdpyfs.env.
set_timeout
(value)[source]¶ Default value for the time after which an error is declared.
This value can be overwritten on case-by-case in
xrootdpyfs.fs.XRootDPyFS
.Sets the environment variable
XRD_REQUESTTIMEOUT
.
Changes¶
Version 0.1.2 (released 2016-08-19)
- Fixes issue with generated root url when query string was present.
Version 0.1.1 (released 2015-10-09)
- Changed package name from xrootdfs to xrootdpyfs due to naming conflict with XRootD FUSE.
Version 0.1.0 (released 2015-09-29)
- Initial public release.
Contributing¶
Bug reports, feature requests, and other contributions are welcome. If you find a demonstrable problem that is caused by the code of this library, please:
- Search for already reported problems.
- Check if the issue has been fixed or is still reproducible on the latest master branch.
- Create an issue with a test case.
If you create a feature branch, you can run the tests to ensure everything is operating correctly. The easiest is to run the tests using Docker:
$ docker build -t xrootd .
$ docker run -h xrootdpyfs -it xrootd
You can also run the tests locally:
$ ./run-tests.sh
You will however need to start a local XRootD server, e.g.:
$ xrootd -b -l /dev/null /tmp <tmpfolder>
where, <tmpfolder>
is dependent on your system (e.g. on OS X it is
/var/folders
, while on Linux it can be left empty).
Note
XRootD have issues with Docker’s default hostname, thus it is important to
supply a host name to docker run
via the -h
option.
License¶
xrootdpyfs is free software; you can redistribute it and/or modify it under the terms of the Revised BSD License quoted below.
Copyright (C) 2015 CERN.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.
Authors¶
Contact us at info@inveniosoftware.org
- Odd Magnus Trondrud <odd.magnus.trondrud@cern.ch>
- Lars Holm Nielsen <lars.holm.nielsen@cern.ch>