speedml package¶
Submodules¶
speedml.base module¶
speedml.feature module¶
Speedml Feature component with methods that work on dataset features or the feature engineering workflow. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.feature.Feature[source]¶ Bases:
speedml.base.Base-
concat(new, a, sep, b)[source]¶ Create
newtext feature by concatenatingaandbtext feature values, usingsepseparator.
-
density(a)[source]¶ Create new feature named
afeature name + suffix ‘_density’, based on density or value_counts for each unique value inafeature specified as a string or multiple features as a list of strings.
-
divide(new, a, b)[source]¶ Create
newnumeric feature by dividinga/bfeature values. Replace division-by-zero with zero values.
-
drop(features)[source]¶ Drop one or more list of strings naming
featuresfrom train and test datasets.
-
extract(a, regex, new=None)[source]¶ Match
regexregular expression withatext feature values to updateafeature with matching text ifnew= None. Otherwise createnewfeature based on matching text.
-
impute()[source]¶ Replace empty values in the entire dataframe with median value for numerical features and most common values for text features.
-
labels(features)[source]¶ Generate numerical labels replacing text values from list of categorical
features.
-
list_len(new, a)[source]¶ Create
newnumeric feature based on length or item count fromafeature containing list object as values.
-
mapping(a, data)[source]¶ Convert values for categorical feature
ausingdatadictionary. Use when number of categories are limited otherwise use labels.
-
outliers(a, lower=None, upper=None)[source]¶ Fix outliers for
lowerorupperor both percentile of values withinafeature.
-
replace(a, match, new)[source]¶ In feature
avaluesmatchstring or list of strings and replace with anewstring.
-
speedml.model module¶
Speedml Model component with methods that work on sklearn models workflow. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.model.Model[source]¶ Bases:
speedml.base.Base
speedml.plot module¶
Speedml Plot component with methods that work on plots or the Exploratory Data Analysis (EDA) workflow. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.plot.Plot[source]¶ Bases:
speedml.base.Base-
continuous(y)[source]¶ Plot continuous features (numeric) using scatter plot. Use this to determine outliers within continuous features.
-
correlate()[source]¶ Plot correlation matrix heatmap for numerical features of the training dataset. Use this plot to understand if certain features are duplicate, are of low importance, or possibly high importance for our model.
-
distribute()[source]¶ Plot multiple feature distribution histogram plots for all numeric features. This helps understand skew of distribution from normal to quickly and relatively identify outliers in the dataset.
-
speedml.util module¶
Speedml utility methods. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
speedml.xgb module¶
Speedml Xgb component with methods that work on XGBoost model workflow. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.xgb.Xgb[source]¶ Bases:
speedml.base.Base-
classifier()[source]¶ Creates the XGBoost Classifier with Base.xgb_params dictionary of model hyper-parameters.
-
cv(grid_params)[source]¶ Calculate the Cross-Validation (CV) score for XGBoost model based on
grid_paramsparameters. Sets xgb.cv_results variable to the resulting dataframe.
-
Module contents¶
Speedml is a Python package to speed start machine learning projects. Contact author https://twitter.com/manavsehgal. Code, docs and demos https://speedml.com.
-
class
speedml.Speedml(train, test, target, uid=None)[source]¶ Bases:
speedml.base.Base-
configure(option=None, value=None)[source]¶ Configure Speedml defaults with
optionconfiguration parameter,valuesetting. When method is called without parameters it simply returns the current config dictionary, otherwise returns the updated configuration.
-
eda()[source]¶ Performs speed exploratory data analysis (EDA) on the current state of datasets. Returns metrics and recommendations as a dataframe. Progressively hides metrics as they achieve workflow completion goals or meet the configured defaults and thresholds.
-
save_results(columns, file_path)[source]¶ Saves the
columnsdictionary input to a DataFrame asfile_pathCSV file.
-