yellowbrick.features package¶
Submodules¶
yellowbrick.features.base module¶
Base classes for feature visualizers and feature selection tools.
-
class
yellowbrick.features.base.
DataVisualizer
(ax=None, features=None, classes=None, color=None, colormap=None, **kwargs)[source]¶ Bases:
yellowbrick.features.base.FeatureVisualizer
Data Visualizers are a subclass of Feature Visualiers which plot the instances in feature space (also called data space, hence the name of the visualizer). Feature space is a multi-dimensional space defined by the columns of the instance dependent vector input, X which is passed to
fit()
andtransform()
. Instances can also be labeled by the target independent vector input, y which is only passed tofit()
. For that reason most Data Visualizers perform their drawing infit()
.This class provides helper functionality related to target identification: whether or not the target is sequential or categorical, and mapping a color sequence or color set to the targets as appropriate. It also uses the fit method to call the drawing utilities.
-
fit
(X, y=None, **kwargs)[source]¶ The fit method is the primary drawing input for the parallel coords visualization since it has both the X and y data required for the viz and the transform method does not.
Parameters: X : ndarray or DataFrame of shape n x m
A matrix of n instances with m features
y : ndarray or Series of length n
An array or series of target or class values
kwargs : dict
Pass generic arguments to the drawing method
Returns: self : instance
Returns the instance of the transformer/visualizer
-
-
class
yellowbrick.features.base.
FeatureVisualizer
(ax=None, **kwargs)[source]¶ Bases:
yellowbrick.base.Visualizer
,sklearn.base.TransformerMixin
Base class for feature visualization to investigate features individually or together.
FeatureVisualizer is itself a transformer so that it can be used in a Scikit-Learn Pipeline to perform automatic visual analysis during build.
Accepts as input a DataFrame or Numpy array.
-
fit
(X, y=None, **fit_params)[source]¶ This method performs preliminary computations in order to set up the figure or perform other analyses. It can also call drawing methods in order to set up various non-instance related figure elements.
This method must return self.
-
yellowbrick.features.pcoords module¶
Implementations of parallel coordinates for multi-dimensional feature analysis. There are a variety of parallel coordinates from Andrews Curves to coordinates that optimize column order.
-
class
yellowbrick.features.pcoords.
ParallelCoordinates
(ax=None, features=None, classes=None, color=None, colormap=None, vlines=True, vlines_kwds=None, **kwargs)[source]¶ Bases:
yellowbrick.features.base.DataVisualizer
Parallel coordinates displays each feature as a vertical axis spaced evenly along the horizontal, and each instance as a line drawn between each individual axis.
-
yellowbrick.features.pcoords.
parallel_coordinates
(X, y=None, ax=None, features=None, classes=None, color=None, colormap=None, vlines=True, vlines_kwds=None, **kwargs)[source]¶ Displays each feature as a vertical axis and each instance as a line.
This helper function is a quick wrapper to utilize the ParallelCoordinates Visualizer (Transformer) for one-off analysis.
Parameters: X : ndarray or DataFrame of shape n x m
A matrix of n instances with m features
y : ndarray or Series of length n
An array or series of target or class values
ax : matplotlib axes
The axes to plot the figure on.
features : list of strings
The names of the features or columns
classes : list of strings
The names of the classes in the target
color : list or tuple of colors
Specify the colors for each individual class
colormap : string or matplotlib cmap
Sequential colormap for continuous target
vlines : bool
Display the vertical azis lines
vlines_kwds : dict
Keyword arguments to draw the vlines
Returns: ax : matplotlib axes
Returns the axes that the parallel coordinates were drawn on.
yellowbrick.features.radviz module¶
Implements radviz for feature analysis.
-
yellowbrick.features.radviz.
RadViz
¶ alias of
RadialVisualizer
-
class
yellowbrick.features.radviz.
RadialVisualizer
(ax=None, features=None, classes=None, color=None, colormap=None, **kwargs)[source]¶ Bases:
yellowbrick.features.base.DataVisualizer
RadViz is a multivariate data visualization algorithm that plots each axis uniformely around the circumference of a circle then plots points on the interior of the circle such that the point normalizes its values on the axes from the center to each arc.
-
draw
(X, y, **kwargs)[source]¶ Called from the fit method, this method creates the radviz canvas and draws each instance as a class or target colored point, whose location is determined by the feature data set.
-
-
yellowbrick.features.radviz.
radviz
(X, y=None, ax=None, features=None, classes=None, color=None, colormap=None, **kwargs)[source]¶ Displays each feature as an axis around a circle surrounding a scatter plot whose points are each individual instance.
This helper function is a quick wrapper to utilize the RadialVisualizer (Transformer) for one-off analysis.
Parameters: X : ndarray or DataFrame of shape n x m
A matrix of n instances with m features
y : ndarray or Series of length n
An array or series of target or class values
ax : matplotlib axes
The axes to plot the figure on.
features : list of strings
The names of the features or columns
classes : list of strings
The names of the classes in the target
color : list or tuple of colors
Specify the colors for each individual class
colormap : string or matplotlib cmap
Sequential colormap for continuous target
Returns: ax : matplotlib axes
Returns the axes that the parallel coordinates were drawn on.
yellowbrick.features.rankd module¶
Implements 1D (histograms) and 2D (joint plot) feature rankings.
-
class
yellowbrick.features.rankd.
Rank2D
(ax=None, algorithm='pearson', features=None, colormap='RdBu_r', **kwargs)[source]¶ Bases:
yellowbrick.features.base.FeatureVisualizer
Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm (e.g. Pearson correlation) then returns them ranked as a lower left triangle diagram.
-
finalize
(**kwargs)[source]¶ Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.
Parameters: kwargs: dict
generic keyword arguments
-
fit
(X, y=None, **kwargs)[source]¶ The fit method gathers information about the state of the visualizer.
Parameters: X : ndarray or DataFrame of shape n x m
A matrix of n instances with m features
y : ndarray or Series of length n
An array or series of target or class values
kwargs : dict
Pass generic arguments to the drawing method
Returns: self : instance
Returns the instance of the transformer/visualizer
-
rank
(X, algorithm=None)[source]¶ Returns the ranking of each pair of columns as an m by m matrix.
Parameters: X : ndarray or DataFrame of shape n x m
A matrix of n instances with m features
algorithm : str or None
The ranking mechanism to use, or None for the default
Returns: R : ndarray
The mxm ranking matrix of the variables
-
ranking_methods
= {'pearson': <function Rank2D.<lambda>>, 'covariance': <function Rank2D.<lambda>>}¶
-
-
yellowbrick.features.rankd.
rank2d
(X, y=None, ax=None, algorithm='pearson', features=None, colormap='RdBu_r', **kwargs)[source]¶ Displays pairwise comparisons of features with the algorithm and ranks them in a lower-left triangle heatmap plot.
This helper function is a quick wrapper to utilize the Rank2D Visualizer (Transformer) for one-off analysis.
Parameters: X : ndarray or DataFrame of shape n x m
A matrix of n instances with m features
y : ndarray or Series of length n
An array or series of target or class values
ax : matplotlib axes
the axis to plot the figure on.
algorithm : one of {pearson, covariance}
the ranking algorithm to use, default is Pearson correlation.
features : list
a list of feature names to use If a DataFrame is passed to fit and features is None, feature names are selected as the columns of the DataFrame.
colormap : string or cmap
optional string or matplotlib cmap to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.
Returns: ax : matplotlib axes
Returns the axes that the parallel coordinates were drawn on.