Data module¶
This is the data class in some detail, this class is useful to read the data in a specified format and to read the inputs parameters as specified in the Input file template.
Module author: Matias Carrasco Kind
- data.bootstrap_index(N, SS)¶
Returns bootstrapping indexes of sample N from array of indices
Parameters: - N (int) – size of boostrap sample
- SS (int) – extract indexes from 0 to SS
Returns: array of bootstrap indices
Return type: int array
- class data.catalog(Pars, cat_type='train', L1=0, L2=-1, rank=0)¶
Creates a catalog instance for training or testing
Parameters: - Pars (class) – Class of parameters read from inputs files
- cat_type (str) – ‘train’ or ‘test’ file (names are taken from Pars class)
- L1 (int) – keep only entries between L1 and L2
- L2 (int) – keep only entries between L1 and L2
- get_XY(curr_at='all', bootstrap='no')¶
Creates X and Y methods based on catalog, using random realization or bootstrapping, after this both X and Y are loaded and ready to be used
Parameters: - curr_at (dict) – dictionary of attributes to be used (like a subsample of them), ‘all’ by default
- bootstrap (str) – Bootstrapping sample? (‘yes’/’no’)
Returns: Saves X, Y oob (and no-oob) data if required and original catalog
- has_X()¶
Is X already loaded in memory?
Returns: Boolean
- has_Y()¶
Is Y already loaded in memory?
Returns: Boolean
- load_random()¶
Loads the random catalog with the realizations
- make_random(outfileran='', ntimes=-1)¶
Actually makes the random realizations :param str outfileran: output file (not needed) :param int ntimes: taken from class Pars unless otherwise indicated
- oob_data(frac=0.0)¶
Creates oob data and separates it from the no-oob data for further tests :param float frac: Fraction of the data to be separated, taken from class Pars (default is 1/3)
- sample_dim(nsample)¶
Samples from the list of attributes
Parameters: nsample (int) – size of subsample Returns: dictionary with subsample attributes and their locations
- data.create_random_realizations(AT, F, N, keyatt)¶
Create random realizations using error in magnitudes, saves a temporarily file on train data directory. Uses normal distribution
Parameters: - AT (dict) – dictionary with columns names and colum index
- F (float) – Training data
- N (int) – Number of realizations
- keyatt (str) – Attribute name to be predicted or classifed
Returns: Returns an array with random realizations
- data.make_AT(cols, attributes, keyatt)¶
Creates dictionary used on all routines
Note
Make sure all columns have different names, and error columns are the same as attribute columns with a ‘e’ in front of it, ex. ‘mag_u’ and ‘emag_u’
Parameters: - cols (str) – str array with column names from file
- attributes (str) – attributes to be used from those columns
- keyatt (str) – Attribute to be predicted or classified
Returns: dictionary, each key correspond to an attribute and itself a dictionary where ‘ind’ is the column index and ‘eind’ is the error column for the same attribute, ex., A={u:{‘ind’=1, ‘eind’=6}}
Return type: dict
- data.read_catalog(filename, myrank=0, check='no', get_ng='no', L_1=0, L_2=-1, A_T='')¶
Read the catalog, either for training or testing currently accepting ascii tables, numpy tables
Parameters: - filename (str) – Filename of the catalod
- myrank (int) – current processor id, for parallel reading (not implemented)
- check (str) – To check the code, only uses 200 lines of catalog
- get_ng (str) – Just get the total number og galaxies in the catalog
- L_1 (int) – if passed get catalog between L_1 and L_2
- L_2 (int) – if passed get catalog between L_1 and L_2
Returns: The whole catalog
Return type: float array