speedml package¶
Submodules¶
speedml.base module¶
speedml.feature module¶
Speedml Feature component with methods that work on dataset features or the feature engineering workflow. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.feature.
Feature
[source]¶ Bases:
speedml.base.Base
-
concat
(new, a, sep, b)[source]¶ Create
new
text feature by concatenatinga
andb
text feature values, usingsep
separator.
-
density
(a)[source]¶ Create new feature named
a
feature name + suffix ‘_density’, based on density or value_counts for each unique value ina
feature specified as a string or multiple features as a list of strings.
-
divide
(new, a, b)[source]¶ Create
new
numeric feature by dividinga
/b
feature values. Replace division-by-zero with zero values.
-
drop
(features)[source]¶ Drop one or more list of strings naming
features
from train and test datasets.
-
extract
(a, regex, new=None)[source]¶ Match
regex
regular expression witha
text feature values to updatea
feature with matching text ifnew
= None. Otherwise createnew
feature based on matching text.
-
impute
()[source]¶ Replace empty values in the entire dataframe with median value for numerical features and most common values for text features.
-
labels
(features)[source]¶ Generate numerical labels replacing text values from list of categorical
features
.
-
list_len
(new, a)[source]¶ Create
new
numeric feature based on length or item count froma
feature containing list object as values.
-
mapping
(a, data)[source]¶ Convert values for categorical feature
a
usingdata
dictionary. Use when number of categories are limited otherwise use labels.
-
outliers
(a, lower=None, upper=None)[source]¶ Fix outliers for
lower
orupper
or both percentile of values withina
feature.
-
replace
(a, match, new)[source]¶ In feature
a
valuesmatch
string or list of strings and replace with anew
string.
-
speedml.model module¶
Speedml Model component with methods that work on sklearn models workflow. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.model.
Model
[source]¶ Bases:
speedml.base.Base
speedml.plot module¶
Speedml Plot component with methods that work on plots or the Exploratory Data Analysis (EDA) workflow. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.plot.
Plot
[source]¶ Bases:
speedml.base.Base
-
continuous
(y)[source]¶ Plot continuous features (numeric) using scatter plot. Use this to determine outliers within continuous features.
-
correlate
()[source]¶ Plot correlation matrix heatmap for numerical features of the training dataset. Use this plot to understand if certain features are duplicate, are of low importance, or possibly high importance for our model.
-
distribute
()[source]¶ Plot multiple feature distribution histogram plots for all numeric features. This helps understand skew of distribution from normal to quickly and relatively identify outliers in the dataset.
-
speedml.util module¶
Speedml utility methods. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
speedml.xgb module¶
Speedml Xgb component with methods that work on XGBoost model workflow. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.xgb.
Xgb
[source]¶ Bases:
speedml.base.Base
-
classifier
()[source]¶ Creates the XGBoost Classifier with Base.xgb_params dictionary of model hyper-parameters.
-
cv
(grid_params)[source]¶ Calculate the Cross-Validation (CV) score for XGBoost model based on
grid_params
parameters. Sets xgb.cv_results variable to the resulting dataframe.
-
Module contents¶
Speedml is a Python package to speed start machine learning projects. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.
Speedml
(train, test, target, uid=None)[source]¶ Bases:
speedml.base.Base
-
configure
(option=None, value=None)[source]¶ Configure Speedml defaults with
option
configuration parameter,value
setting. When method is called without parameters it simply returns the current config dictionary, otherwise returns the updated configuration.
-
eda
()[source]¶ Performs speed exploratory data analysis (EDA) on the current state of datasets. Returns metrics and recommendations as a dataframe. Progressively hides metrics as they achieve workflow completion goals or meet the configured defaults and thresholds.
-
save_results
(columns, file_path)[source]¶ Saves the
columns
dictionary input to a DataFrame asfile_path
CSV file.
-