yellowbrick.features package

Submodules

yellowbrick.features.base module

Base classes for feature visualizers and feature selection tools.

class yellowbrick.features.base.DataVisualizer(ax=None, features=None, classes=None, color=None, colormap=None, **kwargs)[source]

Bases: yellowbrick.features.base.FeatureVisualizer

Data Visualizers are a subclass of Feature Visualiers which plot the instances in feature space (also called data space, hence the name of the visualizer). Feature space is a multi-dimensional space defined by the columns of the instance dependent vector input, X which is passed to fit() and transform(). Instances can also be labeled by the target independent vector input, y which is only passed to fit(). For that reason most Data Visualizers perform their drawing in fit().

This class provides helper functionality related to target identification: whether or not the target is sequential or categorical, and mapping a color sequence or color set to the targets as appropriate. It also uses the fit method to call the drawing utilities.

fit(X, y=None, **kwargs)[source]

The fit method is the primary drawing input for the parallel coords visualization since it has both the X and y data required for the viz and the transform method does not.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

kwargs : dict

Pass generic arguments to the drawing method

Returns:

self : instance

Returns the instance of the transformer/visualizer

class yellowbrick.features.base.FeatureVisualizer(ax=None, **kwargs)[source]

Bases: yellowbrick.base.Visualizer, sklearn.base.TransformerMixin

Base class for feature visualization to investigate features individually or together.

FeatureVisualizer is itself a transformer so that it can be used in a Scikit-Learn Pipeline to perform automatic visual analysis during build.

Accepts as input a DataFrame or Numpy array.

fit(X, y=None, **fit_params)[source]

This method performs preliminary computations in order to set up the figure or perform other analyses. It can also call drawing methods in order to set up various non-instance related figure elements.

This method must return self.

fit_transform_poof(X, y=None, **kwargs)[source]

Fit to data, transform it, then visualize it.

Fits the visualizer to X and y with opetional parameters by passing in all of kwargs, then calls poof with the same kwargs. This method must return the result of the transform method.

transform(X)[source]

Primarily a pass-through to ensure that the feature visualizer will work in a pipeline setting. This method can also call drawing methods in order to ensure that the visualization is constructed.

This method must return a numpy array with the same shape as X.

yellowbrick.features.pcoords module

Implementations of parallel coordinates for multi-dimensional feature analysis. There are a variety of parallel coordinates from Andrews Curves to coordinates that optimize column order.

class yellowbrick.features.pcoords.ParallelCoordinates(ax=None, features=None, classes=None, color=None, colormap=None, vlines=True, vlines_kwds=None, **kwargs)[source]

Bases: yellowbrick.features.base.DataVisualizer

Parallel coordinates displays each feature as a vertical axis spaced evenly along the horizontal, and each instance as a line drawn between each individual axis.

draw(X, y, **kwargs)[source]

Called from the fit method, this method creates the parallel coordinates canvas and draws each instance and vertical lines on it.

finalize(**kwargs)[source]

Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

Parameters:kwargs: generic keyword arguments.
yellowbrick.features.pcoords.parallel_coordinates(X, y=None, ax=None, features=None, classes=None, color=None, colormap=None, vlines=True, vlines_kwds=None, **kwargs)[source]

Displays each feature as a vertical axis and each instance as a line.

This helper function is a quick wrapper to utilize the ParallelCoordinates Visualizer (Transformer) for one-off analysis.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

ax : matplotlib axes

The axes to plot the figure on.

features : list of strings

The names of the features or columns

classes : list of strings

The names of the classes in the target

color : list or tuple of colors

Specify the colors for each individual class

colormap : string or matplotlib cmap

Sequential colormap for continuous target

vlines : bool

Display the vertical azis lines

vlines_kwds : dict

Keyword arguments to draw the vlines

Returns:

ax : matplotlib axes

Returns the axes that the parallel coordinates were drawn on.

yellowbrick.features.radviz module

Implements radviz for feature analysis.

yellowbrick.features.radviz.RadViz

alias of RadialVisualizer

class yellowbrick.features.radviz.RadialVisualizer(ax=None, features=None, classes=None, color=None, colormap=None, **kwargs)[source]

Bases: yellowbrick.features.base.DataVisualizer

RadViz is a multivariate data visualization algorithm that plots each axis uniformely around the circumference of a circle then plots points on the interior of the circle such that the point normalizes its values on the axes from the center to each arc.

draw(X, y, **kwargs)[source]

Called from the fit method, this method creates the radviz canvas and draws each instance as a class or target colored point, whose location is determined by the feature data set.

finalize(**kwargs)[source]

Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

Parameters:kwargs: generic keyword arguments.
static normalize(X)[source]

MinMax normalization to fit a matrix in the space [0,1] by column.

yellowbrick.features.radviz.radviz(X, y=None, ax=None, features=None, classes=None, color=None, colormap=None, **kwargs)[source]

Displays each feature as an axis around a circle surrounding a scatter plot whose points are each individual instance.

This helper function is a quick wrapper to utilize the RadialVisualizer (Transformer) for one-off analysis.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

ax : matplotlib axes

The axes to plot the figure on.

features : list of strings

The names of the features or columns

classes : list of strings

The names of the classes in the target

color : list or tuple of colors

Specify the colors for each individual class

colormap : string or matplotlib cmap

Sequential colormap for continuous target

Returns:

ax : matplotlib axes

Returns the axes that the parallel coordinates were drawn on.

yellowbrick.features.rankd module

Implements 1D (histograms) and 2D (joint plot) feature rankings.

class yellowbrick.features.rankd.Rank2D(ax=None, algorithm='pearson', features=None, colormap='RdBu_r', **kwargs)[source]

Bases: yellowbrick.features.base.FeatureVisualizer

Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm (e.g. Pearson correlation) then returns them ranked as a lower left triangle diagram.

draw(X, **kwargs)[source]

Draws the heatmap of the ranking matrix of variables.

finalize(**kwargs)[source]

Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

Parameters:

kwargs: dict

generic keyword arguments

fit(X, y=None, **kwargs)[source]

The fit method gathers information about the state of the visualizer.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

kwargs : dict

Pass generic arguments to the drawing method

Returns:

self : instance

Returns the instance of the transformer/visualizer

rank(X, algorithm=None)[source]

Returns the ranking of each pair of columns as an m by m matrix.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

algorithm : str or None

The ranking mechanism to use, or None for the default

Returns:

R : ndarray

The mxm ranking matrix of the variables

ranking_methods = {'pearson': <function Rank2D.<lambda>>, 'covariance': <function Rank2D.<lambda>>}
transform(X, **kwargs)[source]

The transform method is the primary drawing hook for ranking classes.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

kwargs : dict

Pass generic arguments to the drawing method

Returns:

Xp : ndarray

The transformed matrix, X’

yellowbrick.features.rankd.rank2d(X, y=None, ax=None, algorithm='pearson', features=None, colormap='RdBu_r', **kwargs)[source]

Displays pairwise comparisons of features with the algorithm and ranks them in a lower-left triangle heatmap plot.

This helper function is a quick wrapper to utilize the Rank2D Visualizer (Transformer) for one-off analysis.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

ax : matplotlib axes

the axis to plot the figure on.

algorithm : one of {pearson, covariance}

the ranking algorithm to use, default is Pearson correlation.

features : list

a list of feature names to use If a DataFrame is passed to fit and features is None, feature names are selected as the columns of the DataFrame.

colormap : string or cmap

optional string or matplotlib cmap to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

Returns:

ax : matplotlib axes

Returns the axes that the parallel coordinates were drawn on.