mdb-add: Adding objects

Sequence and collections can be created in the database using the mdb-add command-line tool, together with a file in CSV or JSON format that describes these objects.

The syntax is the following:

Usage: mdb-add [options]

Part of the MetagenomeDB toolkit. Import objects (Sequence and/or Collection)
into the database. Those objects are provided as JSON- or CSV-formatted
descriptions.

Options:
  -h, --help            show this help message and exit
  -i FILENAME, --input=FILENAME
                        Name of the file containing a description of the
                        objects to import, or '-' to read from the standard
                        input (mandatory).
  -f STRING, --format=STRING
                        Format of the input file, either 'json' or 'csv'
                        (optional). Default: csv
  --ignore-duplicates   If set, ignore duplicate objects errors.
  --ignore-missing      If set, ignore relationships that refer to missing
                        objects.
  -v, --verbose
  --dry-run

  Connection:
    --host=HOSTNAME     Host name or IP address of the MongoDB server
                        (optional). Default: localhost
    --port=INTEGER      Port of the MongoDB server (optional). Default: 27017
    --db=STRING         Name of the database in the MongoDB server (optional).
                        Default: 'MetagenomeDB'
    --user=STRING       User for the MongoDB server connection (optional).
                        Default: ''
    --password=STRING   Password for the MongoDB server connection (optional).
                        Default: ''

The description file

mdb-add relies on an external file you must provide, which contains a description of the sequences and/or collections to create. This file can be in either JSON or CSV format.

CSV format

A CSV file can be generated by any spreadsheet software that supports this format; examples are Microsoft Excel, OpenOffice or Numbers.

Each line of this file will describe one object, or describe the relationship between two objects (see Relationships). Each cell of a given line will describe one property of this object. For example,

A B C D
_type=collection name=Sample #1 year=2011^integer month=1^integer

will create a Collection with name “Sample #1”, year 2011 and month 1. Let’s describe this format:

  • each cell must contain a property name followed by an equal sign (‘=’) then the value for this property. Any space before or after the equal sign, or after the value will be ignored; e.g., property=value is similar to property = value.
  • the value is, by default, interpreted as a string. If you want to provide another type of value (such as an integer for the year or month in our example), you must use a modifier after a caret sign (‘^’). Accepted modifiers are integer, float, and boolean. E.g., a=true^boolean will store the boolean true into a property ‘a’.
  • a property can receive a list of values, by using a comma to separate them and by adding a square bracket (‘[‘) after the caret sign; e.g., a=1,2^[integer will store the list of integers 1, 2 into the property ‘a’. If some of the values are strings that contain commas, you must quote them. E.g., a="foo,bar","hello" will store the two strings “foo,bar” and “hello” into a property ‘a’. Note that a list of strings doesn’t require a ^[string modifier.
  • more than one value type can be provided after the caret sign. This is useful when you generate your CSV file programmatically and do not know in advance what a value will be. E.g., a=x^integer,string will store the value x as an integer if possible, and as a string if not.
  • empty lines or lines starting with a ‘#’ sign will be ignored.

Here are some examples:

Definition Result
5^integer integer 3
3,4^[integer list of integers 3, 4
3.4^float float 3.4
1^integer,string integer 1
a^integer,string string “a”
true^boolean boolean true

There are two additional notes:

  • a special property _type must be provided for each object (sequence, collection or relationship between them), with value ‘sequence’, ‘collection’ or ‘relationship’ (case insensitive). This is needed for mdb-add to know the type of object you want to create
  • the property name is mandatory when creating sequences and collections. When creating a sequence, a sequence property is also requested. See Sequence and Collection documentation.

Creating a relationship between two objects require some other properties, namely _source and _target, to describe the source and target of the relationship, respectively (see Relationships). Hence, let us imagine the three following objects (two collections and one sequence):

_type=collection name=Collection A _type=collection name=Collection B _type=sequence

JSON format

To do.

Using the description file

Once your description file created, adding those objects to the database require the following command:

$ mdb-add -i [description filename]

By default mdb-add will expect a CSV-formatted file. If you provide a JSON you must tell it so by using the -f|--format option:

$ mdb-add -i [JSON file] -f son

By default mdb-add will display an error and stop the processing of your description file in any of the two following cases:

  • an object that is defined in the description file already exists in the database
  • a relationship that is defined in the description file refers to a source or a target that doesn’t exist in the database

You can override this behavior by using the --ignore-duplicates and --ignore-missing options, respectively.

Table Of Contents

Previous topic

Tools Documentation

Next topic

Annotating objects