Bases: object
Wrapper around a pandas.DataFrame that adds additional functionality.
The underlying pandas.DataFrame is always available with the data attribute.
Any attributes not explicitly in this class will be looked for in the underlying pandas.DataFrame.
Parameters: | data : string or pandas.DataFrame
db : string or gffutils.FeatureDB
import_kwargs : dict
|
---|
Methods
TSS([upstream, downstream]) | Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. |
TTS([upstream, downstream]) | Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. |
align_with(other) | Align the dataframe’s index with another. |
attach_db(db) | Attach a gffutils.FeatureDB for access to features. |
copy() | |
features([ignore_unknown]) | Generator of features. |
five_prime([upstream, downstream]) | Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. |
genes_in_common(other) | Convenience method for getting the genes found in both dataframes. |
genes_with_peak(peaks[, transform_func, ...]) | Returns a boolean index of genes that have a peak nearby. |
radviz(column_names[, transforms]) | Radviz plot. |
reindex_to(x[, attribute]) | Returns a copy that only has rows corresponding to feature names in x. |
scatter(x, y[, xfunc, yfunc, xscale, ...]) | Do-it-all method for making annotated scatterplots. |
strip_unknown_features() | Remove features not found in the gffutils.FeatureDB. |
three_prime([upstream, downstream]) | Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. |
update(dataframe) | Updates the current data with a new dataframe. |
Methods
TSS([upstream, downstream]) | Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. |
TTS([upstream, downstream]) | Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. |
__init__(data[, db, import_kwargs]) | |
align_with(other) | Align the dataframe’s index with another. |
attach_db(db) | Attach a gffutils.FeatureDB for access to features. |
copy() | |
features([ignore_unknown]) | Generator of features. |
five_prime([upstream, downstream]) | Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. |
genes_in_common(other) | Convenience method for getting the genes found in both dataframes. |
genes_with_peak(peaks[, transform_func, ...]) | Returns a boolean index of genes that have a peak nearby. |
radviz(column_names[, transforms]) | Radviz plot. |
reindex_to(x[, attribute]) | Returns a copy that only has rows corresponding to feature names in x. |
scatter(x, y[, xfunc, yfunc, xscale, ...]) | Do-it-all method for making annotated scatterplots. |
strip_unknown_features() | Remove features not found in the gffutils.FeatureDB. |
three_prime([upstream, downstream]) | Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. |
update(dataframe) | Updates the current data with a new dataframe. |
Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. Needs an attached database.
Parameters: | upstream, downstream : int
|
---|
Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. Needs an attached database.
Parameters: | upstream, downstream : int
|
---|
Attach a gffutils.FeatureDB for access to features.
Useful if you want to attach a db after this instance has already been created.
Parameters: | db : gffutils.FeatureDB |
---|
Generator of features.
If a gffutils.FeatureDB is attached, returns a pybedtools.Interval for every feature in the dataframe’s index.
Parameters: | ignore_unknown : bool
|
---|
Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. Needs an attached database.
Parameters: | upstream, downstream : int
|
---|
Returns a boolean index of genes that have a peak nearby.
Parameters: | peaks : string or pybedtools.BedTool
transform_func : callable
intersect_kwargs : dict
id_attribute : str
|
---|
Radviz plot.
Useful for exploratory visualization, a radviz plot can show multivariate data in 2D. Conceptually, the variables (here, specified in column_names) are distributed evenly around the unit circle. Then each point (here, each row in the dataframe) is attached to each variable by a spring, where the stiffness of the spring is proportional to the value of corresponding variable. The final position of a point represents the equilibrium position with all springs pulling on it.
In practice, each variable is normalized to 0-1 (by subtracting the mean and dividing by the range).
This is a very exploratory plot. The order of column_names will affect the results, so it’s best to try a couple different orderings. For other caveats, see [1].
Additional kwargs are passed to self.scatter, so subsetting, callbacks, and other configuration can be performed using options for that method (e.g., genes_to_highlight is particularly useful).
Parameters: | column_names : list
transforms : dict
ax : matplotlib.Axes
kwargs : dict
|
---|
Notes
This method adds two new variables to self.data: “radviz_x” and “radviz_y”. It then calls the self.scatter method, using these new variables.
The data transformation was adapted from the pandas.tools.plotting.radviz function.
References
[2] http://www.agocg.ac.uk/reports/visual/casestud/brunsdon/radviz.htm [3] http://pandas.pydata.org/pandas-docs/stable/visualization.html #radviz
Returns a copy that only has rows corresponding to feature names in x.
Parameters: | x : str or pybedtools.BedTool
attribute : str
|
---|
Do-it-all method for making annotated scatterplots.
Parameters: | x, y : array-like
xfunc, yfunc : callable
xlab, ylab : string
ax : None or Axes object
general_kwargs : dict
genes_to_highlight : list of (index, dict) tuples
callback : callable
one_to_one : None or dict
label_kwargs : dict
offset_kwargs : dict
xlab_prefix, ylab_prefix : str
hist_size : float
hist_pad : float
nan_offset, pos_offset, neg_offset : float
linelength : float
|
---|
Remove features not found in the gffutils.FeatureDB. This will typically include ‘ambiguous’, ‘no_feature’, etc, but can also be useful if the database was created from a different one than was used to create the table.