Sequence and collections can be created in the database using the mdb-add command-line tool, together with a file in CSV or JSON format that describes these objects.
The syntax is the following:
Usage: mdb-add [options]
Part of the MetagenomeDB toolkit. Import objects (Sequence and/or Collection)
into the database. Those objects are provided as JSON- or CSV-formatted
descriptions.
Options:
-h, --help show this help message and exit
-i FILENAME, --input=FILENAME
Name of the file containing a description of the
objects to import, or '-' to read from the standard
input (mandatory).
-f STRING, --format=STRING
Format of the input file, either 'json' or 'csv'
(optional). Default: csv
--ignore-duplicates If set, ignore duplicate objects errors.
--ignore-missing If set, ignore relationships that refer to missing
objects.
-v, --verbose
--dry-run
Connection:
--host=HOSTNAME Host name or IP address of the MongoDB server
(optional). Default: localhost
--port=INTEGER Port of the MongoDB server (optional). Default: 27017
--db=STRING Name of the database in the MongoDB server (optional).
Default: 'MetagenomeDB'
--user=STRING User for the MongoDB server connection (optional).
Default: ''
--password=STRING Password for the MongoDB server connection (optional).
Default: ''
mdb-add relies on an external file you must provide, which contains a description of the sequences and/or collections to create. This file can be in either JSON or CSV format.
A CSV file can be generated by any spreadsheet software that supports this format; examples are Microsoft Excel, OpenOffice or Numbers.
Each line of this file will describe one object, or describe the relationship between two objects (see Relationships). Each cell of a given line will describe one property of this object. For example,
| A | B | C | D |
|---|---|---|---|
| _type=collection | name=Sample #1 | year=2011^integer | month=1^integer |
will create a Collection with name “Sample #1”, year 2011 and month 1. Let’s describe this format:
Here are some examples:
| Definition | Result |
|---|---|
| 5^integer | integer 3 |
| 3,4^[integer | list of integers 3, 4 |
| 3.4^float | float 3.4 |
| 1^integer,string | integer 1 |
| a^integer,string | string “a” |
| true^boolean | boolean true |
There are two additional notes:
Creating a relationship between two objects require some other properties, namely _source and _target, to describe the source and target of the relationship, respectively (see Relationships). Hence, let us imagine the three following objects (two collections and one sequence):
_type=collection name=Collection A _type=collection name=Collection B _type=sequence
To do.
Once your description file created, adding those objects to the database require the following command:
$ mdb-add -i [description filename]
By default mdb-add will expect a CSV-formatted file. If you provide a JSON you must tell it so by using the -f|--format option:
$ mdb-add -i [JSON file] -f son
By default mdb-add will display an error and stop the processing of your description file in any of the two following cases:
You can override this behavior by using the --ignore-duplicates and --ignore-missing options, respectively.