Material Analytics

Contains all functionality needed to automatically determine the yield stress of a material (see yield_stress()), even with noisy data, given a stress-strain curve in the form [Strain|Stress] in each row. Also automatically creates a phenomenological model for the material’s behavior under stress and can return Young’s Modulus for the elastic deformation of the material.

This works for alloys that exhibit the yield point phenomenon as well.


Written for convenience because it makes all values positive in order to be able to take logarithms.

material_analytics.combine_data(data1, data2)

Given two arrays, returns a combined list where each element is \(x_i,y_i\).

material_analytics.delete_noise(model, cutoff=0.025)

Takes an array (assuming it’s roughly sorted) and returns it after a certain value (cutoff), useful for removing early values which may contain high levels of noise.

material_analytics.expToTrain(exp, start=None)

Converts a bunch of individual domain values to lists, because each domain value must be iterable for training data.

material_analytics.format_data(data, start=None)

This method will put data in the appropriate format for regression (Scikit-Learn).


Takes the approximate derivative of a two-column dataset by taking slopes between all of the points.

The data should be formatted [x,y] for each row.

material_analytics.kcluster(data, numclusters, start=None)

Clusters the data using regular kmeans clustering.

material_analytics.kmeanssplit(data, numclusters=2)

Clusters the data into groups (k-means) and returns the split data.

material_analytics.kminicluster(data, numclusters, start=None)

Clusters the data using mini batch kmeans.

material_analytics.linfit(data, start=None)

Fits a linear regression to the data and returns it.


Given a dataset with two columns, this function returns the logarithmic function that best fits that data.

material_analytics.log_prep(model, cutoff=0.025)

Makes data ready for logarithmic approximation. Deletes data components that we know are inaccurate and sets all to be positive because logs can only be taken of positive values.


Returns the value that is halfway through the list (index-wise), left midpoint if there are an odd number of points.

material_analytics.predictlinear(data, step=0.5)

Creates a linear model based on data and predicts its values over the domain, returning the predictions.


Converts every non-numerical list value to zero which is useful for analysis later.

material_analytics.samplepoints(function, interval, numpoints)

Given a function and an interval (two-element list) and a number of points, applies it to the function and gets sample points at even intervals.

material_analytics.splitdata(data, predictions)

Takes predictions from kmeans clustering and split the table into two groups.

material_analytics.stress_model(data, yielding=None, strain=None)

Returns a two-element array with the strain value as the first item, and the expected stress as the second if a strain value is provided. Otherwise returns a function that will predict stress given a strain value.

Given a dataset and a strain value, predicts what the stress will be at that point. As the first parameter, data should be an array with a bunch of entries [strain, stress]. The second parameter, strain should be the value for which you wish to estimate stress.

This effectively constructs a physical model for the stress-strain behavior of any material on-the-fly. If no expected strain value is provided, this function will simply return the physical model function that automatically computes expected stress. This is the preferred use-case for large datasets where the stress-strain curve will need to be predicted repeatedly, because otherwise the entire model is recalculated each time, which is hugely inefficient.

material_analytics.yield_stress(model, numpoints=1000, cutoff=0.0, startx=None, endx=None, decreasingend=False)

Finds the yield stress of a dataset automatically using kmeans clustering and covariance analysis.

In order to use this function, you just need to provide it with a stress/strain curve as a numpy array where the data is formatted as a two-column array. The first column is all of the stress values, and the second column is all of the strain values.

This works by fitting a logarithmic model as closely as possible to the experimental data (to reduce noise) and then to analyze where the slope begins to be decrease relative to the average slope. In other words, where \(\partial \sigma/ \partial \epsilon < (f(b)-f(a))/(b-a)\) where a and b are the beginning and end of the interval, respectively. For the purposes of this method, it is important that we have data up until the point of failure of a given material.

material_analytics.yield_stress_classic_fitted(data_original, cutoff=0.0, offset=0.002)

Fit a log curve

material_analytics.yield_stress_classic_unfitted(data, cutoff=0.0, offset=0.002)

Determine average slope


Given a stress-strain dataset, returns Young’s Modulus.