Data module¶

This is the data class in some detail, this class is useful to read the data in a specified format and to read the inputs parameters as specified in the Input file template.

Module author: Matias Carrasco Kind

data.bootstrap_index(N, SS)¶

Returns bootstrapping indexes of sample N from array of indices

Parameters:	N (int) – size of boostrap sample SS (int) – extract indexes from 0 to SS
Returns:	array of bootstrap indices
Return type:	int array

class data.catalog(Pars, cat_type='train', L1=0, L2=-1, rank=0)¶

Creates a catalog instance for training or testing

Parameters:	Pars (class) – Class of parameters read from inputs files cat_type (str) – ‘train’ or ‘test’ file (names are taken from Pars class) L1 (int) – keep only entries between L1 and L2 L2 (int) – keep only entries between L1 and L2

get_XY(curr_at='all', bootstrap='no')¶

Creates X and Y methods based on catalog, using random realization or bootstrapping, after this both X and Y are loaded and ready to be used

Parameters:	curr_at (dict) – dictionary of attributes to be used (like a subsample of them), ‘all’ by default bootstrap (str) – Bootstrapping sample? (‘yes’/’no’)
Returns:	Saves X, Y oob (and no-oob) data if required and original catalog

has_X()¶

Is X already loaded in memory?

Returns:	Boolean

has_Y()¶

Is Y already loaded in memory?

Returns:	Boolean

load_random()¶: Loads the random catalog with the realizations

make_random(outfileran='', ntimes=-1)¶: Actually makes the random realizations :param str outfileran: output file (not needed) :param int ntimes: taken from class Pars unless otherwise indicated

oob_data(frac=0.0)¶: Creates oob data and separates it from the no-oob data for further tests :param float frac: Fraction of the data to be separated, taken from class Pars (default is 1/3)

sample_dim(nsample)¶

Samples from the list of attributes

Parameters:	nsample (int) – size of subsample
Returns:	dictionary with subsample attributes and their locations

data.create_random_realizations(AT, F, N, keyatt)¶

Create random realizations using error in magnitudes, saves a temporarily file on train data directory. Uses normal distribution

Parameters:	AT (dict) – dictionary with columns names and colum index F (float) – Training data N (int) – Number of realizations keyatt (str) – Attribute name to be predicted or classifed
Returns:	Returns an array with random realizations

data.make_AT(cols, attributes, keyatt)¶

Creates dictionary used on all routines

Note

Make sure all columns have different names, and error columns are the same as attribute columns with a ‘e’ in front of it, ex. ‘mag_u’ and ‘emag_u’

Parameters:	cols (str) – str array with column names from file attributes (str) – attributes to be used from those columns keyatt (str) – Attribute to be predicted or classified
Returns:	dictionary, each key correspond to an attribute and itself a dictionary where ‘ind’ is the column index and ‘eind’ is the error column for the same attribute, ex., A={u:{‘ind’=1, ‘eind’=6}}
Return type:	dict

data.read_catalog(filename, myrank=0, check='no', get_ng='no', L_1=0, L_2=-1, A_T='')¶

Read the catalog, either for training or testing currently accepting ascii tables, numpy tables

Parameters:	filename (str) – Filename of the catalod myrank (int) – current processor id, for parallel reading (not implemented) check (str) – To check the code, only uses 200 lines of catalog get_ng (str) – Just get the total number og galaxies in the catalog L_1 (int) – if passed get catalog between L_1 and L_2 L_2 (int) – if passed get catalog between L_1 and L_2
Returns:	The whole catalog
Return type:	float array