Welcome to bops’s documentation!¶

bops stands for boolean array operations. bops uses numpy to do boolean operations on numpy.ndarrays. This module is meant to simplify boolean operations on lists and arrays. This module also allows for combining boolean arrays to get the logical AND, as well as the OR, for multiple boolean arrays. This functionality allows for faster data filtering on multiple aspects of the data.

bops also has map reduce functionality for data grouping and aggregation. This adds the capability to slice and dice data however you see fit.

Examples / Support¶

Full-length examples can be found here.

Mailing List

A mailing list has been created to support the use of this module. You can join and follow the discussion on Google groups.

Any errors, issues and enhancements can be discussed here.

Bops Methodology¶

Bops’ goal is to simplify the selection (via numpy boolean arrays - ie. filtering) and grouping data by using numpy arrays to aggregate the unique values being grouped on. Bops’ also attempts to simplify map reduce operations by using traditional python functions without the added complexity of a network protocol.

Bops focuses on four principle methods in data analysis:

Selection / Slicing

Data selection and slicing is a big part of analyzing only the data you want and removing data you don’t. This is really easy in bops by using the select function.

Grouping

Grouping data on similar attributes is crucial sometimes, otherwise making complex analysis difficult. Luckily, bops has this in mind.

Ordering

Ordering data can be a deal breaker in lots of algorithms. Sometimes sorting data can be a headache. Bops makes it easy.

MapReduce

Bops was written to modularize the code that makes traditional, non-distributed MapReduce possible, without the headache of network protocols. There are some other python modules which shoot for a distributed MapReduce capability, but bops does not. MapReduce can simplify analysis code while producing greatly varying results simply by changing either the mapper or reducer functions.

Bops’ tries to make MapReduce as simple as possible

Four Data Analysis Principles¶

Before any data manipulation can take place, the data needs to be initialized in a way that bops can understand. This is via the bop class. The bop class wraps the data in a numpy records array. This adds the easy to use dot-syntax to the data for easy manipulation. Bops has functions to select, group, order, map and reduce data giving you the flexibility to do what you wish.

Note

A bop takes a list of iterables as the first argument.

A list of tuples, lists, dicts, sets, numpy arrays and even strings. This list could go on, but those objects are nearly bound to work.

For these examples / explanations a fake sample data file will be used which contains census data. Every data point has a country, a state or province, first name, last name, age and gender.

A bop is initialized like this:

from bops import bop
import numpy as np

# open some data file
fh = open('some_data.csv', 'r')

# read lines and split on commas
# this produces a list of lists, which is the data format that bops expects
lines = [line.split(',') for line in fh.readlines()]

# close data file
fh.close()

# This initializes the data with 6 columns (assuming the CSV had 6 columns).
# Column names are turned into variables so they MUST be compatible with
# Python syntax (ie. no special characters, other than '_'; no internal spaces, ect.).
data = bop(lines, 'Country, Province_State, First_Name, Last_Name, Age, Gender')

# this returns all the data in column 1 and so forth
col1 = data.country

# you'll notice that the variable name is lowercase. However, bops
# auto-lowers all column names. But this is done at runtime,
# therefore the programmer can type in camel case and it won't matter.
col1 = data.Country

Warning

Column names are denoted by a comma-delimited string. They MUST meet Python’s variable name requirements. Otherwise, you may not be able to get to the data you need. Column names cannot have internal spaces: ‘First Name’ will NOT work. However, ” First_Name ” will work. Bops trims column names before using them. Column names cannot contains any special characters other than an underscore, ‘_’. Again, they MUST meet Python’s variable name specification.

Selection¶

Data selection and slicing is a big part of analyzing only the data you want. This is really easy in bops by using the select function.

Here’s how you would do it with bops:

# Assuming your data is initialized as above...
# The command below returns ONLY the rows where 'country' is = 'US'.
united_states = data.select(data.country == 'US')

# 'Filters' can be persisted like this
male_filter = data.gender == 'M'
female_filter = data.gender == 'F'

# The above filters will select only the rows where the gender column matches either 'M' or 'F'
US_males = united_states.select(male_filter)
US_females = united_states.select(female_filter)

# Filters can also be combined simply my multiplying them.
# Selecting Females younger than 25 years old can be done like so.
US_females_young = united_states.select(female_filter * (data.age < 25))

# Notice the parentheses around the age filter...

Grouping¶

Grouping data on similar attributes is crucial sometimes, otherwise making complex analysis difficult. Luckily, bops has this in mind.

Grouping can be done like this:

# Grouping by country
# By default, groupby returns a dictionary
countries = data.groupby('country')

# Looping through the data can be done like so.
# This will print the country name and the number of people listed for the country.
for country, country_data in countries:
        print country, len(country_data)

        # The variable 'country_data' is another 'bop' instance.
        males = country_data.select(country_data.gender == 'M')
        females = country_data.select(country_data.gender == 'F')

        # This prints the average age of males and females for the country by calling numpy's mean function on the age data for each gender.
        print np.mean(males.age), np.mean(females.age)

Note

By default, groupby returns a list of tuples. The expand option has very little difference when only grouping by one column, it simply removes the need for the items() call on the dict. However, with multiple columns grouped, it can be advantageous.

When expand is False, a dict is returned instead. The dict key is a tuple of the grouped columns. Therefore, grouping on multiple columns is allowed. The values of the dictionary is a bop instance for all the rows that match the group.

For instance, if you group on ‘country’ and ‘state’ (data.groupby(‘country’,’state’)), then the key of the dictionary will be something like (‘US’,’CA’,). Where the values would be a bop instance for all rows found with country=’US’ and state=’CA’.

Grouping by One column:

# Grouping by country
# By default, groupby returns a list of tuples
countries = data.groupby('country')

# Looping through the data can be done like so.
# This will print the country name and the number of people listed for the country.
for country, country_data in countries:
        print country, len(country_data)

        # The variable 'country_data' is another 'bop' instance.
        males = country_data.select(country_data.gender == 'M')
        females = country_data.select(country_data.gender == 'F')

        # This prints the average age of males and females for the country by calling numpy's mean function on the age data for each gender.
        print np.mean(males.age), np.mean(females.age)

Grouping by Multiple columns:

# Grouping by country
# By default, groupby returns a dictionary
countries_states = data.groupby('country', 'Province_State')

# Looping through the data can be done like so.
# This will print the country name and the number of people listed for the country.
for country, state, country_state_data in countries_states:
        print country, state, len(country_state_data)

        # The variable 'country_state_data' is another 'bop' instance.
        males = country_state_data.select(country_state_data.gender == 'M')
        females = country_state_data.select(country_state_data.gender == 'F')

        # This prints the average age of males and females for the country by calling numpy's mean function on the age data for each gender.
        print np.mean(males.age), np.mean(females.age)

Note

The ‘groupby’ function returns a list of tuples of the grouped columns and the data associated as the last item in the tuple. When the expand option is False, a dictionary is returned.

Ordering¶

Ordering data can be a deal breaker in lots of algorithms. Sometimes sorting data can be a headache. Bops makes it easy. Ordering is done in place by numpy like so.

# Ordering made simple.
data.orderby(data.age)

youngest = data.age[0]
oldest = data.age[-1]

MapReduce¶

Map Reduce is an ancient data analysis paradigm. However, Google wrote a paper on how they were using the method to reduce petabytes of data over several thousand commodity servers and thus a trend was started. Although Google uses this method in a distributed fashion across many servers, most data sets do not require the complexity of such a solution. This includes engineering and scientific data in the hundreds of MB range. This is why bops was written. To provide a MapReduce tool to assist in the reduction of data that did not require a distributed solution.

Bops was written to modularize the code that makes traditional, non-distributed MapReduce possible, without the headache of network protocols.

There are some other python modules which shoot for a distributed MapReduce capability, but bops does not. Bops attempts to fill a gap where distributed computing is not required. With that disclaimer, here’s how I decided to do it.

Bops uses functions called ‘mappers’ and ‘reducers’. These are ordinary Python functions which follow a certain convention.

Mappers¶

Mapper functions are simply functions which are called for every row in a dataset. Mappers get ONE argument, which is the row. Mappers return TWO arguments, a key and a value. All rows for which a mapper function returns the same key, the values will then be shoved together.

This is how you would ‘map’ by gender and age.

# The simple mapper function...
def gender_age_mapper(row):
        return (row.gender, row.age), row

# 'Map' the data by gender and age
gender_age_groups = data.map(gender_age_mapper)

# This will print the gender, age and number of people in the group
for (gender, age), rows in gender_age_groups.items():
        print gender, age, len(rows)

This can be simplified even further by creating a reduce function.

Reducers¶

Like mappers, reducers are simple Python functions. However, unlike mapper functions reducers are given the entire dataset for every map result.

For example, you can simplify the above mapreduce code like this:

# The simple mapper function...
def gender_age_mapper(row):
        return (row.gender, row.age), row

# 'Map' the data by gender and age, and reduce it with the built-in 'len' function.
gender_age_groups = data.mapreduce(gender_age_mapper, len)

# This will print the gender, age and number of people in the group
for gender, age, count in gender_age_groups:
        print gender, age, count

Notice that the mapreduce function does NOT return a dict like map. Also, inside the for-loop ‘len(rows)’ is replaced with ‘count’. This is because the reducer, ‘len’ was already called for the mapped data set.

In order for this to happen, the entire dataset has to be passed to the reducer.

Here’s another example which finds the total years in college for a gender age group.

# The simple mapper function...
def gender_age_mapper(row):
        return (row.gender, row.age), row

# Sums the years of higher education for each gender, age group
def     total_college_years(rows):
        return sum([row.college for row in rows])

# 'Map' the data by gender and age, and reduce it with the 'total_college_years' function.
gender_age_groups = data.mapreduce(gender_age_mapper, total_college_years)

# This will print the gender, age and number of people in the group
for gender, age, college_years in gender_age_groups:
        print gender, age, college_years

I have used the built-in ‘len’ and ‘sum’ functions several times to create histograms and statistics based on how the mapper function maps the data.

Here’s a simpler and faster way to do the same as the code above.

# The simple mapper function...
def gender_age_mapper(row):
        return (row.gender, row.age), row.college

# 'Map' the data by gender and age, and reduce it with the 'total_college_years' function.
gender_age_groups = data.mapreduce(gender_age_mapper, sum)

# This will print the gender, age and number of people in the group
for gender, age, college_years in gender_age_groups:
        print gender, age, college_years

This is simpler because there’s no need to write an additional reducer function. I simply changed the mapper function to return the row.college instead of the entire row. Then by calling Python’s ‘sum’ function as the reducer, it sums the college years.

This solution is faster as well because the array doesn’t have to be iterated over inside the reducer.

Numpy with Sugar¶

Bops also has some ‘syntactical-sugar’ as well. After the data has been initialized you can forgo calling numpy functions by changing the variable name. For instance:

>>> from bops import bop
>>> import numpy as np
>>>
>>> # Read file, query database, ect.
>>> # ...
>>>
>>> data = bop(results, 'name,gender,age,years_in_college')
>>>
>>> # Normal function calls
>>> oldest = np.max(data.age)
>>>
>>> # with sugar
>>> sugar = data.age_max
>>>
>>> oldest == sugar
True
>>>

Bops intelligently figures out that ‘age_max’ is not a column, then tries to call np.max on data.age. This will work for all numpy functions that don’t have underscores in the name which accept numpy arrays. The function call MUST be on the end and separated with an ‘ _ ‘.

For example, this will still work:

>>> max_education = data.years_in_college_max

Note

This only works if no extra **kwargs are needed..

You can find a list of numpy functions by category here and alphabetically here. Some of the perhaps most useful are: min, max, std, mean, size and histogram.

Aliasing¶

Addition to the Numpy sugar, you can also do the same with the alias function. There are some numpy functions that contain underscores in the name (ie. float_, int_, ect.). This is how bops mitigates this issue.

The alias function can be used to rename Numpy functions, as well as traditional Python functions, and then inform bops of the aliased name. For example:

# Simply takes a numpy array of 'F'\'s and 'M'\'s and turns it into a list of 'Female'/'Male' strings
def full_gender(array):
        gender = []
        for g in array:
                if g in 'F':
                        gender.append('Female')
                else:
                        gender.append('Male')
        return np.asarray(gender)

# This aliases the function name so it can be used with the underscore shortcut functionality
# NOTE: The keyword CANNOT have underscores in the name, however, the function name can.
data.alias(fullgender=full_gender)

# An example of this functionality
full_gender = data.gender_fullgender

Bops does have some built-in aliases:

avg: np.mean

len: np.size

float: np.float_

int: np.int_

bool: np.bool_

str: np.str_

unicode: np.unicode_

complex: np.complex_

bop: With great power comes ... faster data analysis¶

This class provides tremendous power for data analysis. With a back end of Numpy, this class allows for lightening fast data filtering and grouping. This class also has built in MapReduce functionality!

class bops.bop(data=[], names='')¶

This class is meant for very quick data filtering and analysis.

Parameters:	data (list of lists) – The data is a 2d list. Meaning either a list of lists or a list of tuples. names (comma delimited string) – This can either be a comma delimited list of strings or a list of strings.
Raises :	`TypeError` - if names is not a string or list of strings

Usage:

from bops import bop

# Name the columns
cols = 'radar,range,az,el'

# Perform data grouping on database results
data = bop(results, cols)

select(index)¶

New in version 0.1.

This method allow you select data slices from the original data. This returns a new bop instance with only the new data selected

Parameters:	filters (`numpy.ndarray of booleans`) – The filter is a simple numpy array of booleans. The boolean array is then applied to the entire dataset, returning only the `True` indexes.
Returns:	A new `bop` instance of only the filtered data.

Usage:

from bops import bop

# Name the columns
cols = 'radar,range,az,el'

# Perform data grouping on database results
data = bop(results, cols)

# Select data where range > 1500
# The 'filter', data.range > 1500, 
# returns a boolean array which is 
# then applied to the entire dataset, 
# returning indices where the filter is True.
far = data.select(data.range > 1500)

# Filters can be multiplied to have the logical AND of both filters.
far_high = data.select((data.range > 1500) * (data.el > 60))

# Or you can save the filters as variables:
range_filter = data.range > 1500
el_filter = data.el > 60
far_high = data.select(range_filter * el_filter)

groupby(*args, **kwargs)¶

New in version 0.1.

Changed in version 0.2,: 0.5

This method groups data together based on the string attributes provided. Unlike SQL, this method returns the data behind the grouping. Returns all the grouping attributes and the data behind it in a list of tuples

Parameters:	attrs (list) – The columns to group by (a list of column names) expand (bool) – This flag expands the output, instead of returning a dictionary.
Returns:	Either a list (if expanded) which is the default, or a dictionary.

Usage:

from bops import bop

# Name the columns
cols = 'state,town,zip,population'

# Perform data grouping on database results
data = bop(results, cols)

# Group by state
states = data.groupby('state')

# Loop through states
# Using the default ``expand`` option, ``groupby`` returns a list of tuples, with the last index as the data in the group.
for state, state_data in states:
        print state, len(state_data)

# If we grouped by multiple columns, it would be used like so:
state_zip = data.groupby('state', 'zip')

# Iterating through the results like so:
for state, zip, state_zip_data in state_zip:
        print state, zip, len(state_zip_data)

# Now if the 'expand' option is set to False, this is how it would be used.
state_zip_dict = data.groupby('state', 'zip', expand=False)

for key, value in state_zip_dict.items():
        # The key contains the grouped column values
        state, zip = key

        # The value contains the data found for that group.
        state_zip_data = value

        print state, zip, len(state_zip_data)

# Using the expand=False simply allows dictionary access instead of a list.

orderby(*args)¶

New in version 0.1.

Changed in version 0.2.

This method orders the data in place on the columns given. Multiple column ordering is possible.

Parameters:	attrs (list) – The columns to order by (a list of column names)

Usage:

from bops import bop

# Name the columns
cols = 'range,az,el'

# Perform data grouping on database results
data = bop(results, cols)

# Order on range
data.orderby('range')

map(mapper)¶

New in version 0.3.

This method maps all the appropriate groups in accordance with the mapper function passed in. The mapper function is called on every element of the data. The mapper function should return a key, value pair.

Parameters:	mapper (callable) – A callable object (normally a function). Will be called for each data point.
Returns:	A dictionary of mapped keys and grouped lists.
Raises :	MapperException

Usage:

from bops import bop

# Name the columns
cols = 'name,gender,age'

# Perform data grouping on database results
data = bop(results, cols)

# This mapper function classifies the row as it's gender and age group (decade).
# All mappers MUST return a 2-element tuple. This represents a key/value pair for a dictionary.
def gender_age_group_mapper(row):
        return (row.gender, row.age // 10 * 10), row

# The mapper is the argument to the bop.map function.
gender_ages = data.map(gender_age_group_mapper)

# The key returned is the tuple of gender and age_group
# The value contains all rows that share the same gender and age group.
for key, value in gender_ages.items():
        gender, age_group = key
        similar_people = value

Warning

When using the row object passed to the mapper function, ALL attributes are lowercase.

reduce(mapper_results, reducer)¶

New in version 0.3.

This function ‘reduces’ the data returned from each map group. Reducers are meant to return a single value per group. However, due to python’s typing you can return a list, dictionary or tuple because they are objects themselves.

Parameters:	mapper_results (dict) – A dictionary object. This is the results returned from a bop.map call. reducer (callable) – A callable object (normally a function). This function is meant to act on all the results for a mapped group.
Returns:	A dictionary of mapped keys and grouped lists.
Raises :	TypeError - if reducer is None

Usage:

from bops import bop

# Name the columns
cols = 'name,gender,age'

# Perform data grouping on database results
data = bop(results, cols)

# This mapper function classifies the row as it's gender and age group (decade).
# All mappers MUST return a 2-element tuple. This represents a key / value pair for a dictionary.
def gender_age_group_mapper(row):
        return (row.gender, row.age // 10 * 10), row

# The mapper is the argument to the bop.map function.
gender_ages = data.map(gender_age_group_mapper)

# The key returned is the tuple of gender and age_group
# The value contains all rows that share the same gender and age group.
for key, value in gender_ages.items():
        gender, age_group = key
        similar_people = value

# This reduce function returns a dictionary. The keys are the same as the map results. 
# However, the values are returned from the reducer function.
# This simply uses the built-in 'len' function to count the people in each group.
counts = data.reduce(gender_ages, len)

# This is used like so:
for key, value in counts.items():
        gender, age_group = key
        similar_people_count = value

complexreduce(mapper_results, reducer)¶

New in version 0.3.

This function ‘reduces’ the data returned from each map group. Reducers are meant to return a single value per group. However, due to python’s typing you can return a list, dictionary or tuple because they are objects themselves.

This function is very similar to reduce, the only difference is that both the key and value returned from the mapper are passed to the reducer, instead of just the value.

The only reason for using this instead of the regular reduce function, is if there is data in the mapped key that a normal reducer would not have access to.

Parameters:	mapper_results (dict) – A dictionary object. This is the results returned from a bop.map call. reducer (callable) – A callable object (normally a function). This function is meant to act on all the results for a mapped group.
Returns:	A dictionary of mapped keys and grouped lists.
Raises :	TypeError - if reducer is None

mapreduce(mapper, reducer, expand=True, sort=False, complex=False)¶

New in version 0.3.

Changed in version 0.5.

This function calls the map and reduce functions and the results will be expanded into a list. This means that each row in the results will be a tuple, with the last value being the mapped data (returned as the value from the mapper function).

However, if the expand option is False, the results as a dictionary. The key will be a tuple, and the value will be a list of all rows matching the mapper output.

The key(s) returned from the mapper function will be indexes [:-1] for each row of results.

The sort flag can also be used to sort results. The results will be sorted on keys returned from the mapper function.

Parameters:	mapper (callable) – A callable object (normally a function). Will be called for each data point. reducer (callable) – A callable object (normally a function). This function is meant to act on all the results for a mapped group. expand (bool) – If False, The results are a dictionary, otherwise the results are expanded into a list of tuples. sort (bool) – Sorts the results based on the key returned from the mapper function. complex (bool) – Uses a complexreduce function instead of normal reducers. This means that both the key and value form the mapper function is passed to the reducer function.
Returns:	A list of tuples. If expand is False, a dictionary of mapped keys and grouped lists.
Raises :	TypeError - if reducer is None

Usage:

from bops import bop

results = sqlalchemy magic ....

# Name the columns
cols = 'name,gender,age,college'

# Perform data grouping on database results
data = bop(results, cols)

# This mapper function classifies the row as it's gender and age group (decade).
# All mappers MUST return a 2-element tuple. This represents a key / value pair for a dictionary.
# However, the key can also be a tuple
def gender_age_group_mapper(row):
        return (row.gender, row.age // 10 * 10), row

# The mapper, gender_age_group_mapper, is the argument to the bop.map function.
# The reducer, len, is the same as if you would pass it to a bop.reduce function.
# This mapreduce function returns a dictionary when expand=False. The keys are the first argument of the map results.
# However, the values are returned from the reducer function.
# This simply uses the built-in 'len' function to count the people in each group.
# Remember, expand=False means a dictionary is returned.
counts = data.mapreduce(gender_age_group_mapper, len, expand=False)

# This is used like so:
for key, value in counts.items():
        gender, age_group = key
        similar_people_count = value

# If True, the 'expand' option can make results easier to use:
# expand=True is the new default in v0.5+
counts = data.mapreduce(gender_age_group_mapper, len, expand=True)

for gender, age_group, similar_people_count in counts:
        print gender, age_group, similar_people_count
        
        # 'gender' and 'age_group' are the 'mapper' key tuple, where 'similar_people_count' is the result from the reduce function, 'len'.

# This counts the number of people in the gender/age group that have more than 4 yrs in college.
def grads(group):
        return sum([1 for p in group if p.college > 4])

# This mapreduce function returns a list of tuples instead of a dictionary, simply because the expand option is True (which is the new default in v0.5+).
gender_age_grad = data.mapreduce(gender_age_group_mapper, reducer=grads, expand=True)

# With the expand option as True, the mapper key and reducer value is readily available for iteration.
for gender, age_group, grads in gender_age_grad:
        print gender, age_group, grads

mapreducebatch(maps=[], reducer=None, names='', expand=True)¶

New in version 0.3.

Changed in version 0.5.

This function performs the same work as the ‘mapreduce’ function but uses numpy for a speed boost.

Here are the differences:

The ‘maps’ argument should be a list of numpy arrays or lists. This list corresponds to the key of a normal mapper function. In other words, this list represents the columns you are grouping by.
This function uses numpy array and numpy.core.records module to speed up execution.
The names argument can be used to specify the column names for the results returned.

Parameters:	maps (list) – A list of numpy arrays. reducer (callable) – A callable object (normally a function). This function is meant to act on all the results for a mapped group. names (str) – A comma delimited string of column names. expand (bool) – If False, The results are a dictionary, otherwise the results are expanded into a list of tuples.
Returns:	A list of tuples. If expand is False, a dictionary of mapped keys and grouped lists. However, if the names arg is given and not an empty string, then the results will be another bop instance with the columns set to the names argument.

Usage:

from bops import bop

# sqlalchemy magic...
results = session.execute('SELECT name,gender,age,college FROM students;').fetchall()

# Name the columns
cols = 'name,gender,age,college'

# Initialize bop instance
data = bop(results, cols)

# Define graduated
# This is a reducer do be used on the data for each group after it has passed 
#   through a map operation.
# NOTE: All reducers are given the entire mapped data group.
# This reducer returns the number of people who have more than 4 years in college.
def graduated(group):
        return len(np.nonzero(data.college > 4)[0])

# This is one attribute of the data describing the age groups that ppl belong to.
# This basically determines the decade of your age. 
# Simply put, if you are 35, it returns 30, for 58 it returns 50
# This allows the data to be aggregated by ppl of similar age
agegroup = data.age // 10 * 10

# This is the map reduce operation call
# It finds all the unique combinations of gender and age group and passes 
#   each unique group to the reducer separately.
# If a reducer is left out, then the data returned is the raw data that belong to that group.
# The 'expand' option makes the output easier to deal with, however, if you 
#   only want a key / value pair to be returned, leave out this option.
#The 'names' option are the column names to be returned
gender_age_grad = data.mapreducebatch([data.gender, agegroup], reducer=graduated, expand=True, names='gender,agegroup,graduates')

# This orders the data by gender and age group for ordered output.  
gender_age_grad.orderby('gender','agegroup')

# Output the results in a pretty fashion
print
print repr("Gender").rjust(7),repr("Age Group").rjust(4),repr(">4yrs in college").rjust(17)
for gender, age, grad in gender_age_grad:
        print repr(gender).ljust(9),repr(age).ljust(11),repr(grad).ljust(17)
print

alias(*args, **kwargs)¶

New in version 0.3.1.

This function allows you to alias functions and callables for use with the method missing functionality. For more information look in the section.

Parameters:	args (dict) – Dictionaries are the only object used, everything else is skipped. Keys are the aliases and the values are the callables. Key word arguments, kwargs, are also permitted.
Returns:	bop.aliases - all existing aliases
Raises :

Usage:

from bops import bop

# Name the columns
cols = 'name,gender,age'

# Perform data grouping on database results
data = bop(results, cols)

# Simply takes a numpy array of 'F''s and 'M''s and turns it into a list of 'Female'/'Male' strings
def full_gender(array):
        gender = []
        for g in array:
                if g in 'F':
                        gender.append('Female')
                else:
                        gender.append('Male')
        return np.asarray(gender)

# This aliases the function name so it can be used with the underscore shortcut functionality
# NOTE: The keyword CANNOT have underscores in the name, however, the function name can.
data.alias(fullgender=full_gender)

# An example of this functionality
full_gender = data.gender_fullgender

# NOTE: data.gender_fullgender is the same as full_gender(data.gender)

Warning

Aliased function names cannot have underscores in the name. ie. data.alias(full_gender=full_gender) does NOT work. Aliased names must be single words, like fullgender.

Exceptions¶

Note

These exceptions need more documentation!!

exception bops.exceptions.MapperException(mapper)¶: docstring for MapperException

exception bops.exceptions.TypeException(_placement='', _type='', text='')¶: docstring for TypeException

Legacy methods¶

These methods were written mainly because the author wasn’t very familiar with numpy and didn’t have any idea what he was doing. He also was tired of wrapping list after list after list in a numpy.array, so bops was started. He soon discovered the errors (Yes, that is plural) of his ways and has now seen the light. These legacy methods accept lists or numpy arrays, but always return a numpy array.

The main usefulness of these methods are the fact that they accept pure python lists.

bops.eq(a, eq)¶

Equals (i == eq)

This function returns a boolean array, with indexes representing matches and non-matches (True or False). True values represent indexes that are equal to the number given.

Usage:

>>> import bops
>>> r = range(10)
>>> b = bops.eq(r, 2)
array([False, False,  True, False, False, False, False, False, False, False], dtype=bool)

bops.false(a)¶

False (i == 0 or not i)

Returns a list of indexes that are False for the boolean array argument.

Usage:

>>> import bops
>>> r = range(10)
>>> b = bops.btw(r, 2, 8)
>>> b
array([False, False, False,  True,  True,  True,  True,  True, False, False], dtype=bool)
>>>
>>> bops.false(b)
array([0, 1, 2, 8, 9], dtype=int64)

bops.gt(a, gt)¶

Greater-than (>)

This function returns a boolean array, with indexes representing matches and non-matches (True or False). True values represent indexes that are greater than the number given.

Usage:

>>> import bops
>>> r = range(10)
>>> bops.gt(r, 5)
array([False, False, False, False, False, False,  True,  True,  True,  True], dtype=bool)

bops.gtoe(a, gt)¶

Greater-than OR Equal to (>=)

This function returns a boolean array, with indexes representing matches and non-matches (True or False). True values represent indexes that are greater than or equal to the number given.

Usage:

>>> import bops
>>> r = range(10)
>>> bops.gtoe(r, 5)
array([False, False, False, False, False,  True,  True,  True,  True,  True], dtype=bool)

bops.l2a(_list)¶: This function converts a python list to a numpy array. This is an internal function used in the boolean functions that allows for passing them regular lists.

bops.logand(*args)¶

This function provides the logical AND for all boolean arrays passed in.

Example Script:

#import numpy for random number generation
import numpy

#import bops for boolean operation
import bops

#use numpy to generate an array of random numbers with 10 values
rand1 = numpy.random.rand(10)

#'gt' returns a boolean array with values for each index, where false means the value is NOT > 0.3 and a true value where it is > 0.3
gt1 = bops.gt(rand1, 0.3)

#'lt' returns a boolean array with values for each index, where false means the value is NOT < 0.6 and a true value where it is < 0.6
lt2 = bops.lt(rand1, 0.6)

#'logand' also returns a boolean array with the logical AND of both gt1 and lt2 arrays
#This is used to find values that are greater than 0.3 AND less than 0.6
logand(gt1, lt2)

bops.logor(*args)¶

This function provides the logical OR for all boolean arrays passed in.

Example Script:

#import numpy for random number generation
import numpy

#import bops for boolean operation
import bops

#use numpy to generate an array of random numbers with 10 values
rand1 = numpy.random.rand(10)

#'gt' returns a boolean array with values for each index, where false means the value is NOT > 0.8 and a true value where it is > 0.8
gt1 = bops.gt(rand1, 0.8)

#'lt' returns a boolean array with values for each index, where false means the value is NOT < 0.2 and a true value where it is < 0.2
lt2 = bops.lt(rand1, 0.2)

#'logand' also returns a boolean array with the logical OR of both gt1 and lt2 arrays
#This is used to find values that are greater than 0.8 OR less than 0.2
logor(gt1, lt2)

bops.lt(a, lt)¶

Less-than (<)

This function returns a boolean array, with indexes representing matches and non-matches (True or False). True values represent indexes that are less than the number given.

Usage:

>>> import bops
>>> r = range(10)
>>> bops.lt(r, 5)
array([ True,  True,  True,  True,  True,  False, False, False, False, False], dtype=bool)

bops.ltoe(a, lt)¶

Less-than OR Equal to (<=)

This function returns a boolean array, with indexes representing matches and non-matches (True or False). True values represent indexes that are less than or equal to the number given.

Usage:

>>> import bops
>>> r = range(10)
>>> bops.ltoe(r, 5)
array([ True,  True,  True,  True,  True,  True, False, False, False, False], dtype=bool)

bops.oband(a, **kwargs)¶

This function uses the kwargs hash to find the AND of multiple attributes

Example Usage:

import numpy

#test class
class test(object):
  def __init__(self, x):
    self.x = x
    self.y = x*2
    self.r = numpy.random.rand(1)[0]
  def __str__(self):
    return "<Test: X: %(x)i Y: %(y)i >" % self.__dict__

#generate 25 test objects
tests = [test(i) for i in range(25)]

#TRUE indexes with object AND
indexes = true(oband(tests, x=3, y=6))

In the above example, ‘tests’ is a list of objects, that have attributes of x and y. This function finds all the objects in the list that have an x value of 3 AND a y value of 6

bops.obor(a, **kwargs)¶

This function uses the kwargs hash to find the OR of multiple object attributes.

Example Usage:

import numpy

#test class
class test(object):
  def __init__(self, x):
    self.x = x
    self.y = x*2
    self.r = numpy.random.rand(1)[0]
  def __str__(self):
    return "<Test: X: %(x)i Y: %(y)i >" % self.__dict__

#generate 25 test objects
tests = [test(i) for i in range(25)]

#TRUE indexes with object OR
indexes = true(obor(tests, x=3, y=8))

In the above example, ‘tests’ is a list of objects that have attributes of x and y. This function finds all the objects in the list that have an x value of 3 OR a y value of 8

bops.true(a)¶

True (i == 1 or i)

Returns a list of indexes that are True for the boolean array argument.

Usage:

>>> import bops
>>> r = range(10)
>>> b = bops.btw(r, 2, 8)
>>> b
array([False, False, False,  True,  True,  True,  True,  True, False, False], dtype=bool)
>>>
>>> bops.true(b)
array([3, 4, 5, 6, 7], dtype=int64)

Welcome to bops’s documentation!¶

Examples / Support¶

Bops Methodology¶

Four Data Analysis Principles¶

Selection¶

Grouping¶

Ordering¶

MapReduce¶

Mappers¶

Reducers¶

Numpy with Sugar¶

Aliasing¶

bop: With great power comes ... faster data analysis¶

Exceptions¶

Legacy methods¶

Indexes and tables¶

Table Of Contents

This Page

Navigation

Welcome to bops’s documentation!¶

Examples / Support¶

Bops Methodology¶

Four Data Analysis Principles¶

Selection¶

Grouping¶

Ordering¶

MapReduce¶

Mappers¶

Reducers¶

Numpy with Sugar¶

Aliasing¶

bop: With great power comes ... faster data analysis¶

Exceptions¶

Legacy methods¶

Indexes and tables¶

Table Of Contents

This Page

Quick search

Navigation