datascience Tables¶
-
class
prob140.Table(labels=None, _deprecated=None, *, formatter=<datascience.formats.Formatter object>)[source] A sequence of string-labeled columns.
-
class
Rows(table)[source] An iterable view over the rows in a table.
-
Table.append(row_or_table)[source] Append a row or all rows of a table. An appended table must have all columns of self.
-
Table.append_column(label, values)[source] Appends a column to the table or replaces a column.
__setitem__is aliased to this method:table.append_column('new_col', make_array(1, 2, 3))is equivalent totable['new_col'] = make_array(1, 2, 3).- Args:
label(str): The label of the new column.values(single value or list/array): If a single value, everyvalue in the new column is
values.If a list or array, the new column contains the values in
values, which must be the same length as the table.
- Returns:
- Original table with new or replaced column
- Raises:
ValueError: Iflabelis not a string.valuesis a list/array and does not have the same length as the number of rows in the table.
>>> table = Table().with_columns( ... 'letter', make_array('a', 'b', 'c', 'z'), ... 'count', make_array(9, 3, 3, 1), ... 'points', make_array(1, 2, 2, 10)) >>> table letter | count | points a | 9 | 1 b | 3 | 2 c | 3 | 2 z | 1 | 10 >>> table.append_column('new_col1', make_array(10, 20, 30, 40)) >>> table letter | count | points | new_col1 a | 9 | 1 | 10 b | 3 | 2 | 20 c | 3 | 2 | 30 z | 1 | 10 | 40 >>> table.append_column('new_col2', 'hello') >>> table letter | count | points | new_col1 | new_col2 a | 9 | 1 | 10 | hello b | 3 | 2 | 20 | hello c | 3 | 2 | 30 | hello z | 1 | 10 | 40 | hello >>> table.append_column(123, make_array(1, 2, 3, 4)) Traceback (most recent call last): ... ValueError: The column label must be a string, but a int was given >>> table.append_column('bad_col', [1, 2]) Traceback (most recent call last): ... ValueError: Column length mismatch. New column does not have the same number of rows as table.
-
Table.apply(fn, *column_or_columns)[source] Apply
fnto each element or elements ofcolumn_or_columns. If nocolumn_or_columnsprovided, fn` is applied to each row.- Args:
fn(function) – The function to apply.column_or_columns: Columns containing the arguments tofnas either column labels (str) or column indices (int). The number of columns must match the number of arguments thatfnexpects.- Raises:
ValueError– ifcolumn_labelis not an existing- column in the table.
TypeError– if insufficent number ofcolumn_labelpassed- to
fn.
- Returns:
- An array consisting of results of applying
fnto elements specified bycolumn_labelin each row.
>>> t = Table().with_columns( ... 'letter', make_array('a', 'b', 'c', 'z'), ... 'count', make_array(9, 3, 3, 1), ... 'points', make_array(1, 2, 2, 10)) >>> t letter | count | points a | 9 | 1 b | 3 | 2 c | 3 | 2 z | 1 | 10 >>> t.apply(lambda x: x - 1, 'points') array([0, 1, 1, 9]) >>> t.apply(lambda x, y: x * y, 'count', 'points') array([ 9, 6, 6, 10]) >>> t.apply(lambda x: x - 1, 'count', 'points') Traceback (most recent call last): ... TypeError: <lambda>() takes 1 positional argument but 2 were given >>> t.apply(lambda x: x - 1, 'counts') Traceback (most recent call last): ... ValueError: The column "counts" is not in the table. The table contains these columns: letter, count, points
Whole rows are passed to the function if no columns are specified.
>>> t.apply(lambda row: row[1] * 2) array([18, 6, 6, 2])
-
Table.as_html(max_rows=0)[source] Format table as HTML.
-
Table.as_text(max_rows=0, sep=' | ')[source] Format table as text.
-
Table.bar(column_for_categories=None, select=None, overlay=True, width=6, height=4, **vargs)[source] Plot bar charts for the table.
Each plot is labeled using the values in column_for_categories and one plot is produced for every other column (or for the columns designated by select).
Every selected except column for column_for_categories must be numerical.
- Args:
- column_for_categories (str): A column containing x-axis categories
- Kwargs:
- overlay (bool): create a chart with one color per data column;
- if False, each will be displayed separately.
- vargs: Additional arguments that get passed into plt.bar.
- See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar for additional arguments that can be passed into vargs.
-
Table.barh(column_for_categories=None, select=None, overlay=True, width=6, **vargs)[source] Plot horizontal bar charts for the table.
- Args:
column_for_categories(str): A column containing y-axis categories- used to create buckets for bar chart.
- Kwargs:
- overlay (bool): create a chart with one color per data column;
- if False, each will be displayed separately.
- vargs: Additional arguments that get passed into plt.barh.
- See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.barh for additional arguments that can be passed into vargs.
- Raises:
- ValueError – Every selected except column for
column_for_categories - must be numerical.
- ValueError – Every selected except column for
- Returns:
- Horizontal bar graph with buckets specified by
column_for_categories. Each plot is labeled using the values incolumn_for_categoriesand one plot is produced for every other column (or for the columns designated byselect).
>>> t = Table().with_columns( ... 'Furniture', make_array('chairs', 'tables', 'desks'), ... 'Count', make_array(6, 1, 2), ... 'Price', make_array(10, 20, 30) ... ) >>> t Furniture | Count | Price chairs | 6 | 10 tables | 1 | 20 desks | 2 | 30 >>> furniture_table.barh('Furniture') <bar graph with furniture as categories and bars for count and price> >>> furniture_table.barh('Furniture', 'Price') <bar graph with furniture as categories and bars for price> >>> furniture_table.barh('Furniture', make_array(1, 2)) <bar graph with furniture as categories and bars for count and price>
-
Table.bin(*columns, **vargs)[source] Group values by bin and compute counts per bin by column.
By default, bins are chosen to contain all values in all columns. The following named arguments from numpy.histogram can be applied to specialize bin widths:
If the original table has n columns, the resulting binned table has n+1 columns, where column 0 contains the lower bound of each bin.
- Args:
columns(str or int): Labels or indices of columns to be- binned. If empty, all columns are binned.
bins(int or sequence of scalars): If bins is an int,- it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.
range((float, float)): The lower and upper range of- the bins. If not provided, range contains all values in the table. Values outside the range are ignored.
density(bool): If False, the result will contain the number of- samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.
-
Table.boxplot(**vargs)[source] Plots a boxplot for the table.
Every column must be numerical.
- Kwargs:
- vargs: Additional arguments that get passed into plt.boxplot.
- See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot for additional arguments that can be passed into vargs. These include vert and showmeans.
- Returns:
- None
- Raises:
- ValueError: The Table contains columns with non-numerical values.
>>> table = Table().with_columns( ... 'test1', make_array(92.5, 88, 72, 71, 99, 100, 95, 83, 94, 93), ... 'test2', make_array(89, 84, 74, 66, 92, 99, 88, 81, 95, 94)) >>> table test1 | test2 92.5 | 89 88 | 84 72 | 74 71 | 66 99 | 92 100 | 99 95 | 88 83 | 81 94 | 95 93 | 94 >>> table.boxplot() <boxplot of test1 and boxplot of test2 side-by-side on the same figure>
-
Table.cdf(x) Finds the cdf of the distribution
Parameters: x : float
Value in distribution
Returns: float
Finds P(X<=x)
Examples
>>> dist = Table().with_columns('Value',make_array(2, 3, 4),'Probability',make_array(0.25, 0.5, 0.25)) >>> dist.cdf(0) 0 >>> dist.cdf(2) 0.25 >>> dist.cdf(3.5) 0.75 >>> dist.cdf(1000) 1
-
Table.column(index_or_label)[source] Return the values of a column as an array.
table.column(label) is equivalent to table[label].
>>> tiles = Table().with_columns( ... 'letter', make_array('c', 'd'), ... 'count', make_array(2, 4), ... )
>>> tiles.column('letter') array(['c', 'd'], dtype='<U1') >>> tiles.column(1) array([2, 4])
- Args:
- label (int or str): The index or label of a column
- Returns:
- An instance of
numpy.array. - Raises:
ValueError: When theindex_or_labelis not in the table.
-
Table.column_index(label)[source] Return the index of a column by looking up its label.
-
Table.column_labels Return a tuple of column labels. [Deprecated]
-
Table.copy(*, shallow=False)[source] Return a copy of a table.
-
Table.drop(*column_or_columns)[source] Return a Table with only columns other than selected label or labels.
- Args:
column_or_columns(string or list of strings): The header names or indices of the columns to be dropped.column_or_columnsmust be an existing header name, or a valid column index.- Returns:
- An instance of
Tablewith given columns removed.
>>> t = Table().with_columns( ... 'burgers', make_array('cheeseburger', 'hamburger', 'veggie burger'), ... 'prices', make_array(6, 5, 5), ... 'calories', make_array(743, 651, 582)) >>> t burgers | prices | calories cheeseburger | 6 | 743 hamburger | 5 | 651 veggie burger | 5 | 582 >>> t.drop('prices') burgers | calories cheeseburger | 743 hamburger | 651 veggie burger | 582 >>> t.drop(['burgers', 'calories']) prices 6 5 5 >>> t.drop('burgers', 'calories') prices 6 5 5 >>> t.drop([0, 2]) prices 6 5 5 >>> t.drop(0, 2) prices 6 5 5 >>> t.drop(1) burgers | calories cheeseburger | 743 hamburger | 651 veggie burger | 582
-
classmethod
Table.empty(labels=None)[source] Creates an empty table. Column labels are optional. [Deprecated]
- Args:
labels(None or list): IfNone, a table with 0- columns is created. If a list, each element is a column label in a table with 0 rows.
- Returns:
- A new instance of
Table.
-
Table.event(x) Shows the probability that distribution takes on value x or list of values x.
Parameters: x : float or Iterable
An event represented either as a specific value in the domain or a subset of the domain
Returns: Table
Shows the probabilities of each value in the event
Examples
>>> dist = Table().values([1,2,3,4]).probability([1/4,1/4,1/4,1/4]) >>> dist.event(2) Domain | Probability 2 | 0.25
>>> dist.event([2,3]) Domain | Probability 2 | 0.25 3 | 0.25
-
Table.exclude()[source] Return a new Table without a sequence of rows excluded by number.
- Args:
row_indices_or_slice(integer or list of integers or slice):- The row index, list of row indices or a slice of row indices to be excluded.
- Returns:
- A new instance of
Table.
>>> t = Table().with_columns( ... 'letter grade', make_array('A+', 'A', 'A-', 'B+', 'B', 'B-'), ... 'gpa', make_array(4, 4, 3.7, 3.3, 3, 2.7)) >>> t letter grade | gpa A+ | 4 A | 4 A- | 3.7 B+ | 3.3 B | 3 B- | 2.7 >>> t.exclude(4) letter grade | gpa A+ | 4 A | 4 A- | 3.7 B+ | 3.3 B- | 2.7 >>> t.exclude(-1) letter grade | gpa A+ | 4 A | 4 A- | 3.7 B+ | 3.3 B | 3 >>> t.exclude(make_array(1, 3, 4)) letter grade | gpa A+ | 4 A- | 3.7 B- | 2.7 >>> t.exclude(range(3)) letter grade | gpa B+ | 3.3 B | 3 B- | 2.7
Note that
excludealso supports NumPy-like indexing and slicing:>>> t.exclude[:3] letter grade | gpa B+ | 3.3 B | 3 B- | 2.7
>>> t.exclude[1, 3, 4] letter grade | gpa A+ | 4 A- | 3.7 B- | 2.7
-
Table.expected_value() Finds expected value of distribution
Returns: float
Expected value
-
classmethod
Table.from_array(arr)[source] Convert a structured NumPy array into a Table.
-
classmethod
Table.from_columns_dict(columns)[source] Create a table from a mapping of column labels to column values. [Deprecated]
-
classmethod
Table.from_df(df)[source] Convert a Pandas DataFrame into a Table.
-
classmethod
Table.from_records(records)[source] Create a table from a sequence of records (dicts with fixed keys).
-
classmethod
Table.from_rows(rows, labels)[source] Create a table from a sequence of rows (fixed-length sequences). [Deprecated]
-
Table.group(column_or_label, collect=None)[source] Group rows by unique values in a column; count or aggregate others.
- Args:
column_or_label: values to group (column label or index, or array)collect: a function applied to values in other columns for each group- Returns:
- A Table with each row corresponding to a unique value in
column_or_label, where the first column contains the unique values fromcolumn_or_label, and the second contains counts for each of the unique values. Ifcollectis provided, a Table is returned with all original columns, each containing values calculated by first grouping rows according tocolumn_or_label, then applyingcollectto each set of grouped values in the other columns. - Note:
- The grouped column will appear first in the result table. If
collectdoes not accept arguments with one of the column types, that column will be empty in the resulting table.
>>> marbles = Table().with_columns( ... "Color", make_array("Red", "Green", "Blue", "Red", "Green", "Green"), ... "Shape", make_array("Round", "Rectangular", "Rectangular", "Round", "Rectangular", "Round"), ... "Amount", make_array(4, 6, 12, 7, 9, 2), ... "Price", make_array(1.30, 1.30, 2.00, 1.75, 1.40, 1.00)) >>> marbles Color | Shape | Amount | Price Red | Round | 4 | 1.3 Green | Rectangular | 6 | 1.3 Blue | Rectangular | 12 | 2 Red | Round | 7 | 1.75 Green | Rectangular | 9 | 1.4 Green | Round | 2 | 1 >>> marbles.group("Color") # just gives counts Color | count Blue | 1 Green | 3 Red | 2 >>> marbles.group("Color", max) # takes the max of each grouping, in each column Color | Shape max | Amount max | Price max Blue | Rectangular | 12 | 2 Green | Round | 9 | 1.4 Red | Round | 7 | 1.75 >>> marbles.group("Shape", sum) # sum doesn't make sense for strings Shape | Color sum | Amount sum | Price sum Rectangular | | 27 | 4.7 Round | | 13 | 4.05
-
Table.groups(labels, collect=None)[source] Group rows by multiple columns, count or aggregate others.
- Args:
labels: list of column names (or indices) to group oncollect: a function applied to values in other columns for each group- Returns: A Table with each row corresponding to a unique combination of values in
- the columns specified in
labels, where the first columns are those specified inlabels, followed by a column of counts for each of the unique values. Ifcollectis provided, a Table is returned with all original columns, each containing values calculated by first grouping rows according to to values in thelabelscolumn, then applyingcollectto each set of grouped values in the other columns. - Note:
- The grouped columns will appear first in the result table. If
collectdoes not accept arguments with one of the column types, that column will be empty in the resulting table.
>>> marbles = Table().with_columns( ... "Color", make_array("Red", "Green", "Blue", "Red", "Green", "Green"), ... "Shape", make_array("Round", "Rectangular", "Rectangular", "Round", "Rectangular", "Round"), ... "Amount", make_array(4, 6, 12, 7, 9, 2), ... "Price", make_array(1.30, 1.30, 2.00, 1.75, 1.40, 1.00)) >>> marbles Color | Shape | Amount | Price Red | Round | 4 | 1.3 Green | Rectangular | 6 | 1.3 Blue | Rectangular | 12 | 2 Red | Round | 7 | 1.75 Green | Rectangular | 9 | 1.4 Green | Round | 2 | 1 >>> marbles.groups(["Color", "Shape"]) Color | Shape | count Blue | Rectangular | 1 Green | Rectangular | 2 Green | Round | 1 Red | Round | 2 >>> marbles.groups(["Color", "Shape"], sum) Color | Shape | Amount sum | Price sum Blue | Rectangular | 12 | 2 Green | Rectangular | 15 | 2.7 Green | Round | 2 | 1 Red | Round | 11 | 3.05
-
Table.hist(*columns, overlay=True, bins=None, bin_column=None, unit=None, counts=None, width=6, height=4, **vargs)[source] Plots one histogram for each column in columns. If no column is specificed, plot all columns.
- Kwargs:
- overlay (bool): If True, plots 1 chart with all the histograms
- overlaid on top of each other (instead of the default behavior of one histogram for each column in the table). Also adds a legend that matches each bar color to its column.
- bins (list or int): Lower bound for each bin in the
- histogram or number of bins. If None, bins will be chosen automatically.
- bin_column (column name or index): A column of bin lower bounds.
- All other columns are treated as counts of these bins. If None, each value in each row is assigned a count of 1.
counts (column name or index): Deprecated name for bin_column.
- vargs: Additional arguments that get passed into :func:plt.hist.
- See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist for additional arguments that can be passed into vargs. These include: range, normed, cumulative, and orientation, to name a few.
>>> t = Table().with_columns( ... 'count', make_array(9, 3, 3, 1), ... 'points', make_array(1, 2, 2, 10)) >>> t count | points 9 | 1 3 | 2 3 | 2 1 | 10 >>> t.hist() <histogram of values in count> <histogram of values in points>
>>> t = Table().with_columns( ... 'value', make_array(101, 102, 103), ... 'proportion', make_array(0.25, 0.5, 0.25)) >>> t.hist(bin_column='value') <histogram of values weighted by corresponding proportions>
-
Table.index_by(column_or_label)[source] Return a dict keyed by values in a column that contains lists of rows corresponding to each value.
-
Table.join(column_label, other, other_label=None)[source] Creates a new table with the columns of self and other, containing rows for all values of a column that appear in both tables.
- Args:
column_label(str): label of column in self that is used to- join rows of
other. other: Table object to join with self on matching values ofcolumn_label.
- Kwargs:
other_label(str): default None, assumescolumn_label.- Otherwise in
otherused to join rows.
- Returns:
- New table self joined with
otherby matching values incolumn_labelandother_label. If the resulting join is empty, returns None. If a join value appears more than once inself, each row with that value will appear in resulting join, but inother, only the first row with that value will be used.
>>> table = Table().with_columns('a', make_array(9, 3, 3, 1), ... 'b', make_array(1, 2, 2, 10), ... 'c', make_array(3, 4, 5, 6)) >>> table a | b | c 9 | 1 | 3 3 | 2 | 4 3 | 2 | 5 1 | 10 | 6 >>> table2 = Table().with_columns( 'a', make_array(9, 1, 1, 1), ... 'd', make_array(1, 2, 2, 10), ... 'e', make_array(3, 4, 5, 6)) >>> table2 a | d | e 9 | 1 | 3 1 | 2 | 4 1 | 2 | 5 1 | 10 | 6 >>> table.join('a', table2) a | b | c | d | e 1 | 10 | 6 | 2 | 4 9 | 1 | 3 | 1 | 3 >>> table.join('a', table2, 'a') # Equivalent to previous join a | b | c | d | e 1 | 10 | 6 | 2 | 4 9 | 1 | 3 | 1 | 3 >>> table.join('a', table2, 'd') # Repeat column labels relabeled a | b | c | a_2 | e 1 | 10 | 6 | 9 | 3 >>> table2 #table2 has three rows with a = 1 a | d | e 9 | 1 | 3 1 | 2 | 4 1 | 2 | 5 1 | 10 | 6 >>> table #table has only one row with a = 1 a | b | c 9 | 1 | 3 3 | 2 | 4 3 | 2 | 5 1 | 10 | 6 >>> table2.join('a', table) # When we join, we get all three rows in table2 where a = 1 a | d | e | b | c 1 | 2 | 4 | 10 | 6 1 | 2 | 5 | 10 | 6 1 | 10 | 6 | 10 | 6 9 | 1 | 3 | 1 | 3 >>> table.join('a', table2) # Opposite join only keeps first row in table2 with a = 1 a | b | c | d | e 1 | 10 | 6 | 2 | 4 9 | 1 | 3 | 1 | 3
-
Table.labels Return a tuple of column labels.
-
Table.move_to_end(column_label)[source] Move a column to the last in order.
-
Table.move_to_start(column_label)[source] Move a column to the first in order.
-
Table.normalized() Returns the distribution by making the proabilities sum to 1
Returns: Table
A distribution with the probabilities normalized
Examples
>>> Table().values([1,2,3]).probability([1,1,1]) Value | Probability 1 | 1 2 | 1 3 | 1 >>> Table().values([1,2,3]).probability([1,1,1]).normalized() Value | Probability 1 | 0.333333 2 | 0.333333 3 | 0.333333
-
Table.num_columns Number of columns.
-
Table.num_rows Number of rows.
-
Table.percentile(p)[source] Return a new table with one row containing the pth percentile for each column.
Assumes that each column only contains one type of value.
Returns a new table with one row and the same column labels. The row contains the pth percentile of the original column, where the pth percentile of a column is the smallest value that at at least as large as the p% of numbers in the column.
>>> table = Table().with_columns( ... 'count', make_array(9, 3, 3, 1), ... 'points', make_array(1, 2, 2, 10)) >>> table count | points 9 | 1 3 | 2 3 | 2 1 | 10 >>> table.percentile(80) count | points 9 | 10
-
Table.pivot(columns, rows, values=None, collect=None, zero=None)[source] Generate a table with a column for each unique value in
columns, with rows for each unique value inrows. Each row counts/aggregates the values that match both row and column based oncollect.- Args:
columns– a single column label or index, (strorint),- used to create new columns, based on its unique values.
rows– row labels or indices, (strorintor list),- used to create new rows based on it’s unique values.
values– column label in table for use in aggregation.- Default None.
collect– aggregation function, used to groupvalues- over row-column combinations. Default None.
zero– zero value for non-existent row-column combinations.- Raises:
- TypeError – if
collectis passed in andvaluesis not, - vice versa.
- TypeError – if
- Returns:
- New pivot table, with row-column combinations, as specified, with
aggregated
valuesbycollectacross the intersection ofcolumnsandrows. Simple counts provided if values and collect are None, as default.
>>> titanic = Table().with_columns('age', make_array(21, 44, 56, 89, 95 ... , 40, 80, 45), 'survival', make_array(0,0,0,1, 1, 1, 0, 1), ... 'gender', make_array('M', 'M', 'M', 'M', 'F', 'F', 'F', 'F'), ... 'prediction', make_array(0, 0, 1, 1, 0, 1, 0, 1)) >>> titanic age | survival | gender | prediction 21 | 0 | M | 0 44 | 0 | M | 0 56 | 0 | M | 1 89 | 1 | M | 1 95 | 1 | F | 0 40 | 1 | F | 1 80 | 0 | F | 0 45 | 1 | F | 1 >>> titanic.pivot('survival', 'gender') gender | 0 | 1 F | 1 | 3 M | 3 | 1 >>> titanic.pivot('prediction', 'gender') gender | 0 | 1 F | 2 | 2 M | 2 | 2 >>> titanic.pivot('survival', 'gender', values='age', collect = np.mean) gender | 0 | 1 F | 80 | 60 M | 40.3333 | 89 >>> titanic.pivot('survival', make_array('prediction', 'gender')) prediction | gender | 0 | 1 0 | F | 1 | 1 0 | M | 2 | 0 1 | F | 0 | 2 1 | M | 1 | 1 >>> titanic.pivot('survival', 'gender', values = 'age') Traceback (most recent call last): ... TypeError: values requires collect to be specified >>> titanic.pivot('survival', 'gender', collect = np.mean) Traceback (most recent call last): ... TypeError: collect requires values to be specified
-
Table.pivot_bin(pivot_columns, value_column, bins=None, **vargs)[source] Form a table with columns formed by the unique tuples in pivot_columns containing counts per bin of the values associated with each tuple in the value_column.
By default, bins are chosen to contain all values in the value_column. The following named arguments from numpy.histogram can be applied to specialize bin widths:
- Args:
bins(int or sequence of scalars): If bins is an int,- it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.
range((float, float)): The lower and upper range of- the bins. If not provided, range contains all values in the table. Values outside the range are ignored.
normed(bool): If False, the result will contain the number of- samples in each bin. If True, the result is normalized such that the integral over the range is 1.
-
Table.pivot_hist(pivot_column_label, value_column_label, overlay=True, width=6, height=4, **vargs)[source] Draw histograms of each category in a column.
-
Table.plot(column_for_xticks=None, select=None, overlay=True, width=6, height=4, **vargs)[source] Plot line charts for the table.
- Args:
- column_for_xticks (
str/array): A column containing x-axis labels - Kwargs:
- overlay (bool): create a chart with one color per data column;
- if False, each plot will be displayed separately.
- vargs: Additional arguments that get passed into plt.plot.
- See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot for additional arguments that can be passed into vargs.
- Raises:
- ValueError – Every selected column must be numerical.
- Returns:
- Returns a line plot (connected scatter). Each plot is labeled using the values in column_for_xticks and one plot is produced for all other columns in self (or for the columns designated by select).
>>> table = Table().with_columns( ... 'days', make_array(0, 1, 2, 3, 4, 5), ... 'price', make_array(90.5, 90.00, 83.00, 95.50, 82.00, 82.00), ... 'projection', make_array(90.75, 82.00, 82.50, 82.50, 83.00, 82.50)) >>> table days | price | projection 0 | 90.5 | 90.75 1 | 90 | 82 2 | 83 | 82.5 3 | 95.5 | 82.5 4 | 82 | 83 5 | 82 | 82.5 >>> table.plot('days') <line graph with days as x-axis and lines for price and projection> >>> table.plot('days', overlay=False) <line graph with days as x-axis and line for price> <line graph with days as x-axis and line for projection> >>> table.plot('days', 'price') <line graph with days as x-axis and line for price>
-
Table.prob_event(x) Finds the probability of an event x
Parameters: x : float or Iterable
An event represented either as a specific value in the domain or a subset of the domain
Returns: float
Probability of the event
Examples
>>> dist = Table().values([1,2,3,4]).probability([1/4,1/4,1/4,1/4]) >>> dist.prob_event(2) 0.25
>>> dist.prob_event([2,3]) 0.5
>>> dist.prob_event(np.arange(1,5)) 1.0
-
Table.probability(values) Assigns probabilities to domain values.
Parameters: values : List or Array
Values that must correspond to the domain in the same order
Returns: Table
A proability distribution with those probabilities
-
Table.probability_function(pfunc) Assigns probabilities to a Distribution via a probability function. The probability function is applied to each value of the domain. Must have domain values in the first column first.
Parameters: pfunc : univariate function
Probability function of the distribution
Returns: Table
Table with those probabilities in its second column
-
classmethod
Table.read_table(filepath_or_buffer, *args, **vargs)[source] Read a table from a file or web address.
- filepath_or_buffer – string or file handle / StringIO; The string
- could be a URL. Valid URL schemes include http, ftp, s3, and file.
-
Table.relabel(column_label, new_label)[source] Changes the label(s) of column(s) specified by
column_labelto labels innew_label.- Args:
column_label– (single str or array of str) The label(s) of- columns to be changed to
new_label. new_label– (single str or array of str): The label name(s)- of columns to replace
column_label.
- Raises:
ValueError– ifcolumn_labelis not in table, or ifcolumn_labelandnew_labelare not of equal length.TypeError– ifcolumn_labeland/ornew_labelis notstr.
- Returns:
- Original table with
new_labelin place ofcolumn_label.
>>> table = Table().with_columns( ... 'points', make_array(1, 2, 3), ... 'id', make_array(12345, 123, 5123)) >>> table.relabel('id', 'yolo') points | yolo 1 | 12345 2 | 123 3 | 5123 >>> table.relabel(make_array('points', 'yolo'), ... make_array('red', 'blue')) red | blue 1 | 12345 2 | 123 3 | 5123 >>> table.relabel(make_array('red', 'green', 'blue'), ... make_array('cyan', 'magenta', 'yellow', 'key')) Traceback (most recent call last): ... ValueError: Invalid arguments. column_label and new_label must be of equal length.
-
Table.relabeled(label, new_label)[source] Return a new table with
labelspecifying column label(s) replaced by correspondingnew_label.- Args:
label– (str or array of str) The label(s) of- columns to be changed.
new_label– (str or array of str): The new label(s) of- columns to be changed. Same number of elements as label.
- Raises:
ValueError– iflabeldoes not exist in- table, or if the
labelandnew_labelare not not of equal length. Also, raised iflabeland/ornew_labelare notstr.
- Returns:
- New table with
new_labelin place oflabel.
>>> tiles = Table().with_columns('letter', make_array('c', 'd'), ... 'count', make_array(2, 4)) >>> tiles letter | count c | 2 d | 4 >>> tiles.relabeled('count', 'number') letter | number c | 2 d | 4 >>> tiles # original table unmodified letter | count c | 2 d | 4 >>> tiles.relabeled(make_array('letter', 'count'), ... make_array('column1', 'column2')) column1 | column2 c | 2 d | 4 >>> tiles.relabeled(make_array('letter', 'number'), ... make_array('column1', 'column2')) Traceback (most recent call last): ... ValueError: Invalid labels. Column labels must already exist in table in order to be replaced.
-
Table.remove(row_or_row_indices)[source] Removes a row or multiple rows of a table in place.
-
Table.row(index)[source] Return a row.
-
Table.rows Return a view of all rows.
-
Table.sample(n=1) Randomly samples from the distribution
Parameters: n : int
Number of times to sample from the distribution (default: 1)
Returns: float or array
Samples from the distribution
>>> dist = Table().with_columns('Value',make_array(2, 3, 4),'Probability',make_array(0.25, 0.5, 0.25))
>>> dist.sample()
3
>>> dist.sample()
2
>>> dist.sample(10)
array([3, 2, 2, 4, 3, 4, 3, 4, 3, 3])
-
Table.sample_from_distribution(distribution, k, proportions=False)[source] Return a new table with the same number of rows and a new column. The values in the distribution column are define a multinomial. They are replaced by sample counts/proportions in the output.
>>> sizes = Table(['size', 'count']).with_rows([ ... ['small', 50], ... ['medium', 100], ... ['big', 50], ... ]) >>> sizes.sample_from_distribution('count', 1000) size | count | count sample small | 50 | 239 medium | 100 | 496 big | 50 | 265 >>> sizes.sample_from_distribution('count', 1000, True) size | count | count sample small | 50 | 0.24 medium | 100 | 0.51 big | 50 | 0.25
-
Table.scatter(column_for_x, select=None, overlay=True, fit_line=False, colors=None, labels=None, sizes=None, width=5, height=5, s=20, **vargs)[source] Creates scatterplots, optionally adding a line of best fit.
- Args:
column_for_x(str): The column to use for the x-axis values- and label of the scatter plots.
- Kwargs:
overlay(bool): If true, creates a chart with one color- per data column; if False, each plot will be displayed separately.
fit_line(bool): draw a line of best fit for each set of points.vargs: Additional arguments that get passed into plt.scatter.- See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter for additional arguments that can be passed into vargs. These include: marker and norm, to name a couple.
colors: A column of categories to be used for coloring dots.labels: A column of text labels to annotate dots.sizes: A column of values to set the relative areas of dots.s: Size of dots. If sizes is also provided, then dots will be- in the range 0 to 2 * s.
- Raises:
- ValueError – Every column,
column_for_xorselect, must be numerical - Returns:
- Scatter plot of values of
column_for_xplotted against values for all other columns in self. Each plot uses the values in column_for_x for horizontal positions. One plot is produced for all other columns in self as y (or for the columns designated by select).
>>> table = Table().with_columns( ... 'x', make_array(9, 3, 3, 1), ... 'y', make_array(1, 2, 2, 10), ... 'z', make_array(3, 4, 5, 6)) >>> table x | y | z 9 | 1 | 3 3 | 2 | 4 3 | 2 | 5 1 | 10 | 6 >>> table.scatter('x') <scatterplot of values in y and z on x>
>>> table.scatter('x', overlay=False) <scatterplot of values in y on x> <scatterplot of values in z on x>
>>> table.scatter('x', fit_line=True) <scatterplot of values in y and z on x with lines of best fit>
-
Table.sd() Finds standard deviation of Distribution
Returns: float
Standard Deviation
-
Table.select(*column_or_columns)[source] Return a table with only the columns in
column_or_columns.- Args:
column_or_columns: Columns to select from theTableas either column labels (str) or column indices (int).- Returns:
- A new instance of
Tablecontaining only selected columns. The columns of the newTableare in the order given incolumn_or_columns. - Raises:
KeyErrorif any ofcolumn_or_columnsare not in the table.
>>> flowers = Table().with_columns( ... 'Number of petals', make_array(8, 34, 5), ... 'Name', make_array('lotus', 'sunflower', 'rose'), ... 'Weight', make_array(10, 5, 6) ... )
>>> flowers Number of petals | Name | Weight 8 | lotus | 10 34 | sunflower | 5 5 | rose | 6
>>> flowers.select('Number of petals', 'Weight') Number of petals | Weight 8 | 10 34 | 5 5 | 6
>>> flowers # original table unchanged Number of petals | Name | Weight 8 | lotus | 10 34 | sunflower | 5 5 | rose | 6
>>> flowers.select(0, 2) Number of petals | Weight 8 | 10 34 | 5 5 | 6
-
Table.set_format(column_or_columns, formatter)[source] Set the format of a column.
-
Table.show(max_rows=0)[source] Display the table.
-
Table.sort(column_or_label, descending=False, distinct=False)[source] Return a Table of rows sorted according to the values in a column.
- Args:
column_or_label: the column whose values are used for sorting.descending: if True, sorting will be in descending, rather than- ascending order.
distinct: if True, repeated values incolumn_or_labelwill- be omitted.
- Returns:
- An instance of
Tablecontaining rows sorted based on the values incolumn_or_label.
>>> marbles = Table().with_columns( ... "Color", make_array("Red", "Green", "Blue", "Red", "Green", "Green"), ... "Shape", make_array("Round", "Rectangular", "Rectangular", "Round", "Rectangular", "Round"), ... "Amount", make_array(4, 6, 12, 7, 9, 2), ... "Price", make_array(1.30, 1.30, 2.00, 1.75, 1.40, 1.00)) >>> marbles Color | Shape | Amount | Price Red | Round | 4 | 1.3 Green | Rectangular | 6 | 1.3 Blue | Rectangular | 12 | 2 Red | Round | 7 | 1.75 Green | Rectangular | 9 | 1.4 Green | Round | 2 | 1 >>> marbles.sort("Amount") Color | Shape | Amount | Price Green | Round | 2 | 1 Red | Round | 4 | 1.3 Green | Rectangular | 6 | 1.3 Red | Round | 7 | 1.75 Green | Rectangular | 9 | 1.4 Blue | Rectangular | 12 | 2 >>> marbles.sort("Amount", descending = True) Color | Shape | Amount | Price Blue | Rectangular | 12 | 2 Green | Rectangular | 9 | 1.4 Red | Round | 7 | 1.75 Green | Rectangular | 6 | 1.3 Red | Round | 4 | 1.3 Green | Round | 2 | 1 >>> marbles.sort(3) # the Price column Color | Shape | Amount | Price Green | Round | 2 | 1 Red | Round | 4 | 1.3 Green | Rectangular | 6 | 1.3 Green | Rectangular | 9 | 1.4 Red | Round | 7 | 1.75 Blue | Rectangular | 12 | 2 >>> marbles.sort(3, distinct = True) Color | Shape | Amount | Price Green | Round | 2 | 1 Red | Round | 4 | 1.3 Green | Rectangular | 9 | 1.4 Red | Round | 7 | 1.75 Blue | Rectangular | 12 | 2
-
Table.split(k)[source] Return a tuple of two tables where the first table contains
krows randomly sampled and the second contains the remaining rows.- Args:
k(int): The number of rows randomly sampled into the first- table.
kmust be between 1 andnum_rows - 1.
- Raises:
ValueError:kis not between 1 andnum_rows - 1.- Returns:
- A tuple containing two instances of
Table.
>>> jobs = Table().with_columns( ... 'job', make_array('a', 'b', 'c', 'd'), ... 'wage', make_array(10, 20, 15, 8)) >>> jobs job | wage a | 10 b | 20 c | 15 d | 8 >>> sample, rest = jobs.split(3) >>> sample job | wage c | 15 a | 10 b | 20 >>> rest job | wage d | 8
-
Table.stack(key, labels=None)[source] Takes k original columns and returns two columns, with col. 1 of all column names and col. 2 of all associated data.
-
Table.stats(ops=(<built-in function min>, <built-in function max>, <function median>, <built-in function sum>))[source] Compute statistics for each column and place them in a table.
-
Table.take()[source] Return a new Table with selected rows taken by index.
- Args:
row_indices_or_slice(integer or array of integers): The row index, list of row indices or a slice of row indices to be selected.- Returns:
- A new instance of
Tablewith selected rows in order corresponding torow_indices_or_slice. - Raises:
IndexError, if anyrow_indices_or_sliceis out of bounds with respect to column length.
>>> grades = Table().with_columns('letter grade', ... make_array('A+', 'A', 'A-', 'B+', 'B', 'B-'), ... 'gpa', make_array(4, 4, 3.7, 3.3, 3, 2.7)) >>> grades letter grade | gpa A+ | 4 A | 4 A- | 3.7 B+ | 3.3 B | 3 B- | 2.7 >>> grades.take(0) letter grade | gpa A+ | 4 >>> grades.take(-1) letter grade | gpa B- | 2.7 >>> grades.take(make_array(2, 1, 0)) letter grade | gpa A- | 3.7 A | 4 A+ | 4 >>> grades.take[:3] letter grade | gpa A+ | 4 A | 4 A- | 3.7 >>> grades.take(np.arange(0,3)) letter grade | gpa A+ | 4 A | 4 A- | 3.7 >>> grades.take(10) Traceback (most recent call last): ... IndexError: index 10 is out of bounds for axis 0 with size 6
-
Table.toJoint(table, X_column_label=None, Y_column_label=None, probability_column_label=None, reverse=True) Converts a table of probabilities associated with two variables into a JointDistribution object
Parameters: table : Table
You can either call pass in a Table directly or call the toJoint() method of that Table. See examples
X_column_label (optional) : String
Label for the first variable. Defaults to the same label as that of first variable of Table
Y_column_label (optional) : String
Label for the second variable. Defaults to the same label as that of second variable of Table
probability_column_label (optional) : String
Label for probabilities reverse (optional) : Boolean If True, the vertical values will be reversed
Returns: JointDistribution
A JointDistribution object
Examples
>>> dist1 = Table().values([0,1],[2,3]) >>> dist1['Probability'] = make_array(0.1, 0.2, 0.3, 0.4) >>> dist1.toJoint() X=0 X=1 Y=3 0.2 0.4 Y=2 0.1 0.3 >>> dist2 = Table().values("Coin1",['H','T'], "Coin2", ['H','T']) >>> dist2['Probability'] = np.array([0.4*0.6, 0.6*0.6, 0.4*0.4, 0.6*0.4]) >>> dist2.toJoint() Coin1=H Coin1=T Coin2=T 0.36 0.24 Coin2=H 0.24 0.16
-
Table.to_array()[source] Convert the table to a structured NumPy array.
-
Table.to_csv(filename)[source] Creates a CSV file with the provided filename.
The CSV is created in such a way that if we run
table.to_csv('my_table.csv')we can recreate the same table withTable.read_table('my_table.csv').- Args:
filename(str): The filename of the output CSV file.- Returns:
- None, outputs a file with name
filename.
>>> jobs = Table().with_columns( ... 'job', make_array('a', 'b', 'c', 'd'), ... 'wage', make_array(10, 20, 15, 8)) >>> jobs job | wage a | 10 b | 20 c | 15 d | 8 >>> jobs.to_csv('my_table.csv') <outputs a file called my_table.csv in the current directory>
-
Table.to_df()[source] Convert the table to a Pandas DataFrame.
-
Table.transition_function(pfunc) Assigns transition probabilities to a Distribution via a probability function. The probability function is applied to each value of the domain. Must have domain values in the first column first.
Parameters: pfunc : variate function
Conditional probability function of the distribution ( P(Y | X))
Returns: Table
Table with those probabilities in its final column
-
Table.transition_probability(values) For a multivariate probability distribution, assigns transition probabilities: ie P(Y | X).
Parameters: values : List or Array
Values that must correspond to the domain in the same order
Returns: Table
A proability distribution with those probabilities
-
Table.variance() Finds variance of distribution
Returns: float
Variance
-
Table.where(column_or_label, value_or_predicate=None, other=None)[source] Return a new
Tablecontaining rows wherevalue_or_predicatereturns True for values incolumn_or_label.- Args:
column_or_label: A column of theTableeither as a label (str) or an index (int). Can also be an array of booleans; only the rows where the array value isTrueare kept.value_or_predicate: If a function, it is applied to every value incolumn_or_label. Only the rows wherevalue_or_predicatereturns True are kept. If a single value, only the rows where the values incolumn_or_labelare equal tovalue_or_predicateare kept.other: Optional additional column label forvalue_or_predicateto make pairwise comparisons. See the examples below for usage. Whenotheris supplied,value_or_predicatemust be a callable function.- Returns:
If
value_or_predicateis a function, returns a newTablecontaining only the rows wherevalue_or_predicate(val)is True for theval``s in ``column_or_label.If
value_or_predicateis a value, returns a newTablecontaining only the rows where the values incolumn_or_labelare equal tovalue_or_predicate.If
column_or_labelis an array of booleans, returns a newTablecontaining only the rows wherecolumn_or_labelisTrue.
>>> marbles = Table().with_columns( ... "Color", make_array("Red", "Green", "Blue", ... "Red", "Green", "Green"), ... "Shape", make_array("Round", "Rectangular", "Rectangular", ... "Round", "Rectangular", "Round"), ... "Amount", make_array(4, 6, 12, 7, 9, 2), ... "Price", make_array(1.30, 1.20, 2.00, 1.75, 0, 3.00))
>>> marbles Color | Shape | Amount | Price Red | Round | 4 | 1.3 Green | Rectangular | 6 | 1.2 Blue | Rectangular | 12 | 2 Red | Round | 7 | 1.75 Green | Rectangular | 9 | 0 Green | Round | 2 | 3
Use a value to select matching rows
>>> marbles.where("Price", 1.3) Color | Shape | Amount | Price Red | Round | 4 | 1.3
In general, a higher order predicate function such as the functions in
datascience.predicates.arecan be used.>>> from datascience.predicates import are >>> # equivalent to previous example >>> marbles.where("Price", are.equal_to(1.3)) Color | Shape | Amount | Price Red | Round | 4 | 1.3
>>> marbles.where("Price", are.above(1.5)) Color | Shape | Amount | Price Blue | Rectangular | 12 | 2 Red | Round | 7 | 1.75 Green | Round | 2 | 3
Use the optional argument
otherto apply predicates to compare columns.>>> marbles.where("Price", are.above, "Amount") Color | Shape | Amount | Price Green | Round | 2 | 3
>>> marbles.where("Price", are.equal_to, "Amount") # empty table Color | Shape | Amount | Price
-
Table.with_column(label, values, *rest)[source] Return a new table with an additional or replaced column.
- Args:
label(str): The column label. If an existing label is used,- the existing column will be replaced in the new table.
values(single value or sequence): If a single value, every- value in the new column is
values. If sequence of values, new column takes on values invalues. rest: An alternating list of labels and values describing- additional columns. See with_columns for a full description.
- Raises:
ValueError: Iflabelis not a valid column name- if
labelis not of type (str) valuesis a list/array that does not have the same- length as the number of rows in the table.
- Returns:
- copy of original table with new or replaced column
>>> alphabet = Table().with_column('letter', make_array('c','d')) >>> alphabet = alphabet.with_column('count', make_array(2, 4)) >>> alphabet letter | count c | 2 d | 4 >>> alphabet.with_column('permutes', make_array('a', 'g')) letter | count | permutes c | 2 | a d | 4 | g >>> alphabet letter | count c | 2 d | 4 >>> alphabet.with_column('count', 1) letter | count c | 1 d | 1 >>> alphabet.with_column(1, make_array(1, 2)) Traceback (most recent call last): ... ValueError: The column label must be a string, but a int was given >>> alphabet.with_column('bad_col', make_array(1)) Traceback (most recent call last): ... ValueError: Column length mismatch. New column does not have the same number of rows as table.
-
Table.with_columns(*labels_and_values)[source] Return a table with additional or replaced columns.
- Args:
labels_and_values: An alternating list of labels and values or- a list of label-value pairs. If one of the labels is in
existing table, then every value in the corresponding column is
set to that value. If label has only a single value (
int), every row of corresponding column takes on that value.
- Raises:
ValueError: If- any label in
labels_and_valuesis not a valid column - name, i.e if label is not of type (str).
- any label in
- if any value in
labels_and_valuesis a list/array and - does not have the same length as the number of rows in the table.
- if any value in
AssertionError:- ‘incorrect columns format’, if passed more than one sequence
- (iterables) for
labels_and_values.
- ‘even length sequence required’ if missing a pair in
- label-value pairs.
- Returns:
- Copy of original table with new or replaced columns. Columns added
in order of labels. Equivalent to
with_column(label, value)when passed only one label-value pair.
>>> players = Table().with_columns('player_id', ... make_array(110234, 110235), 'wOBA', make_array(.354, .236)) >>> players player_id | wOBA 110234 | 0.354 110235 | 0.236 >>> players = players.with_columns('salaries', 'N/A', 'season', 2016) >>> players player_id | wOBA | salaries | season 110234 | 0.354 | N/A | 2016 110235 | 0.236 | N/A | 2016 >>> salaries = Table().with_column('salary', ... make_array('$500,000', '$15,500,000')) >>> players.with_columns('salaries', salaries.column('salary'), ... 'years', make_array(6, 1)) player_id | wOBA | salaries | season | years 110234 | 0.354 | $500,000 | 2016 | 6 110235 | 0.236 | $15,500,000 | 2016 | 1 >>> players.with_columns(2, make_array('$600,000', '$20,000,000')) Traceback (most recent call last): ... ValueError: The column label must be a string, but a int was given >>> players.with_columns('salaries', make_array('$600,000')) Traceback (most recent call last): ... ValueError: Column length mismatch. New column does not have the same number of rows as table.
-
Table.with_row(row)[source] Return a table with an additional row.
- Args:
row(sequence): A value for each column.- Raises:
ValueError: If the row length differs from the column count.
>>> tiles = Table(make_array('letter', 'count', 'points')) >>> tiles.with_row(['c', 2, 3]).with_row(['d', 4, 2]) letter | count | points c | 2 | 3 d | 4 | 2
-
Table.with_rows(rows)[source] Return a table with additional rows.
- Args:
rows(sequence of sequences): Each row has a value per column.If
rowsis a 2-d array, its shape must be (_, n) for n columns.- Raises:
ValueError: If a row length differs from the column count.
>>> tiles = Table(make_array('letter', 'count', 'points')) >>> tiles.with_rows(make_array(make_array('c', 2, 3), ... make_array('d', 4, 2))) letter | count | points c | 2 | 3 d | 4 | 2
-
class
prob140 JointDistribution¶
-
class
prob140.JointDistribution(data=None, index=None, columns=None, dtype=None, copy=False)[source] -
both_marginals()[source] Finds the marginal distribution of both variables
Returns: JointDistribution Table Examples
>>> dist1 = Table().values([0,1],[2,3]).probability([0.1, 0.2, 0.3, 0.4]).toJoint() >>> dist1.both_marginals() X=0 X=1 Sum: Marginal of Y Y=3 0.2 0.4 0.6 Y=2 0.1 0.3 0.4 Sum: Marginal of X 0.3 0.7 1.0
-
conditional_dist(label, given='', show_ev=False)[source] Given the random variable label, finds the conditional distribution of the other variable
Parameters: label : String
given variable
Returns: JointDistribution Table
Examples
>>> coins = Table().values("Coin1",['H','T'],"Coin2", ['H','T']).probability(np.array([0.24, 0.36, 0.16,0.24])).toJoint() >>> coins.conditional_dist("Coin1","Coin2") Coin1=H Coin1=T Sum Dist. of Coin1 | Coin2=H 0.6 0.4 1.0 Dist. of Coin1 | Coin2=T 0.6 0.4 1.0 Marginal of Coin1 0.6 0.4 1.0 >>> coins.conditional_dist("Coin2","Coin1") Dist. of Coin2 | Coin1=H Dist. of Coin2 | Coin1=T Marginal of Coin2 Coin2=H 0.4 0.4 0.4 Coin2=T 0.6 0.6 0.6 Sum 1.0 1.0 1.0
-
marginal(label)[source] Returns the marginal distribution of label
Parameters: label : String
The label of the variable of which we want to find the marginal distribution
Returns: JointDistribution Table
Examples
>>> dist2 = Table().values("Coin1",['H','T'],"Coin2", ['H','T']).probability(np.array([0.24, 0.36, 0.16, 0.24])).toJoint() >>> dist2.marginal("Coin1") Coin1=H Coin1=T Coin2=T 0.36 0.24 Coin2=H 0.24 0.16 Sum: Marginal of Coin1 0.60 0.40 >>> dist2.marginal("Coin2") Coin1=H Coin1=T Sum: Marginal of Coin2 Coin2=T 0.36 0.24 0.6 Coin2=H 0.24 0.16 0.4
-
marginal_dist(label)[source] Finds the marginal marginal distribution of label, returns as a single variable distribution
Parameters: label
The label of the variable of which we want to find the marginal distribution
Returns: Table
Single variable distribution of label
-
