Finding Archives¶
In this section we’ll take a look at finding archives via the command line.
You can find archives from the command line interface or from python. This documentation mirrors the python documentation.
Using listdir¶
In our database we have many archives. We know that impactlab is a top-level directory-like namespace in our database. Let’s have a look.
$ datafs listdir impactlab
labor
climate
conflict
mortality
Ok. We see that labor, climate, mortality and conflict are all directory-like namespaces groupings below impactlab. Lets have a look at conflict.
$ datafs listdir impactlab/conflict
global
Let’s see what is in impactlab/conflict/global.
$ datafs listdir impactlab/conflict/global
conflict_global_daily.csv
$ datafs listdir impactlab/conflict/global/conflict_global_daily.csv
0.0.1
We can see that there is currently only version 0.0.1 of conflict_global_daily.csv
Using filter¶
DataFS lets you filter so you can limit the search space on archive names. At the command line, you can use the prefix, path, str, and regex pattern options to filter archives.
Let’s look at using the prefix project1_variable1_ which corresponds to the prefix option, the beginning string of a set of archive names.
$ datafs filter --prefix project1_variable1_ # doctest: +SKIP
project1_variable1_scenario5.nc
project1_variable1_scenario1.nc
project1_variable1_scenario4.nc
project1_variable1_scenario2.nc
project1_variable1_scenario3.nc
We can also filter on path. In this case we want to filter all NetCDF files that match a specific pattern. We need to set our engine value to path and put in our search pattern.
$ datafs filter --pattern *_variable4_scenario4.nc --engine path \
# doctest: +SKIP
project1_variable4_scenario4.nc
project2_variable4_scenario4.nc
project3_variable4_scenario4.nc
project5_variable4_scenario4.nc
project4_variable4_scenario4.nc
We can also filter archives with archive names containing a specific string by setting engine to str. In this example we want all archives with the string variable2.
$ datafs filter --pattern variable2 --engine str # doctest: +ELLIPSIS +SKIP
project1_variable2_scenario1.nc
project1_variable2_scenario2.nc
project1_variable2_scenario3.nc
...
project5_variable2_scenario3.nc
project5_variable2_scenario4.nc
project5_variable2_scenario5.nc
Using search¶
DataFS search capabilites are enabled via tagging of archives. The arguments of the search command are tags associated with a given archive. If archives are not tagged, they cannot be searched. Please see this for a reference on how to tag archives.
Our archives have been tagged with team1, team2, or team3 Let’s search for some archives with tag team3.
$ datafs search team3 # doctest: +ELLIPSIS +SKIP
project2_variable2_scenario2.nc
project5_variable4_scenario1.nc
project1_variable5_scenario4.nc
project3_variable2_scenario1.nc
project2_variable1_scenario1.nc
...
project5_variable1_scenario2.nc
project2_variable5_scenario5.nc
project5_variable2_scenario5.nc
project3_variable2_scenario5.nc
Let’s use get_tags to have a look at one of our archives’ tags.
$ datafs get_tags project2_variable2_scenario2.nc
team3
We can see that indeed it has been tagged with team3.
For completeness, let’s have a look at archives with tag of team1.
$ datafs search team1 # doctest: +ELLIPSIS +SKIP
project1_variable1_scenario4.nc
project1_variable2_scenario2.nc
project1_variable2_scenario5.nc
project1_variable3_scenario3.nc
project1_variable4_scenario1.nc
project1_variable4_scenario4.nc
...
project5_variable3_scenario2.nc
project5_variable3_scenario5.nc
project5_variable4_scenario3.nc
project5_variable5_scenario1.nc
project5_variable5_scenario4.nc
And now let’s have a look at one of them to see what tags are associated with it.
$ datafs get_tags project2_variable5_scenario1.nc
team1
We can see clearly that our archive has been tagged with team1.
We want your feedback. If you find bugs or have suggestions to improve this documentation, please consider contributing.