nhlscrapi: NHL Scraper API¶

Purpose¶

Provide a Python API for accessing NHL game data including play by play, game summaries, player stats et c. The library hides the guts of the NHL website scraping process and encapsulates not only the data gathering, but data output. This project is inspired by the R package nhlscrapr, an all around must for NHL analytics geeks and R power users.

nhlscrapi is in the early/initial stages, but will be updated regularly. Currently, the package support most of the game summary reports, but all of the important and essential ones.

Related projects:

Installation¶

Getting started is as easy as:

pip install nhlscrapi

For more information on the setup, see the PyPi: nhlscrapi. The documentation for the package can be found at nhlscrapi: NHL Scraper API.

Usage Example¶

Scrape data for game 1226 of 2014, Ottawa vs Pittsburgh.

from nhlscrapi.games.game import Game, GameKey, GameType
from nhlscrapi.games.cumstats import Score, ShotCt, Corsi, Fenwick

season = 2014                                    # 2013-2014 season
game_num = 1226                                  #
game_type = GameType.Regular                     # regular season game
game_key = GameKey(season, game_type, game_num)

# define stat types that will be counted as the plays are parsed
cum_stats = {
  'Score': Score(),
  'Shots': ShotCt(),
  'Corsi': Corsi(),
  'Fenwick': Fenwick()
}
game = Game(game_key, cum_stats=cum_stats)

# also http requests and processing are lazy
# accumulators require play by play info so they parse the RTSS PBP
print('Final         : {}'.format(game.cum_stats['Score'].total))
print('Shootout      : {}'.format(game.cum_stats['Score'].shootout.total))
print('Shots         : {}'.format(game.cum_stats['Shots'].total))
print('EV Shot Atts  : {}'.format(game.cum_stats['Corsi'].total))
print('Corsi         : {}'.format(game.cum_stats['Corsi'].share()))
print('FW Shot Atts  : {}'.format(game.cum_stats['Fenwick'].total))
print('Fenwick       : {}'.format(game.cum_stats['Fenwick'].share()))

# http req for roster report
# only parses the sections related to officials and coaches
print('\nRefs          : {}'.format(game.refs))
print('Linesman      : {}'.format(game.linesman))
print('Coaches')
print('  Home        : {}'.format(game.home_coach))
print('  Away        : {}'.format(game.away_coach))

# scrape all remaining reports
game.load_all()

Current Release: v0.4.0¶

This is a pre-release and is not stable and fully fit for production. The first full stable release (v1.0.0) will be made available once the framework for all NHL game reports are completed. Currently, Play-by-Play, Home/Away TOI, Roster, Face-off Comparison and Event Summary reports are functional.

License¶

The NHL Scraper API is a free Python library provided under Apache License version 2.0.

Free software: Apache License, v2.0

Documentation: nhlscrapi: NHL Scraper API

Contents¶

nhlscrapi package

Change log¶

v0.4.0¶

added support and associated unit test for event summary report

scraper in scrapr.eventsummary.EventSummRep

report wrapper and primary access object in games.eventsummary.EventSummary

the event summary report has abiltiy to filter and sort by player data

updated docs

updated REAMDME to reflect change

v0.3.7¶

messed up the prior upload. embarrassing. fixed remaining 3.x print issue.

v0.3.6¶

fixed a lot of python3.x compatibility issues

_tools.build_enum switch to items() from iteritems()

print vs to print() in scrapr.descparser

take out maketrans in scrapr.descparser and put in replace()

fully qualify the scrapr.eventparser import in scrapr.rtss

Game.plays property returns self.play_by_play.plays() but plays isn’t callable

v0.3.5¶

dropped urllib2 dependency because it’s 2015 and I’m tired of being a dinosaur

added requests to setup dependencies

fully qualified the scrapr.NHLCn import in scrapr.reportloader

consolidated cli_opts.py into gamedata.py ... that whole thing needs a rewrite anyway (TODO)

v0.3.4¶

setup script reference bug.

v0.3.3¶

true bug fix. messed up the pypi upload setup

forgot cfg et c.

v0.3.2¶

refactored Plays/Strength construct

moved Plays and Strength from games.plays to games.playbyplay

moved scrapr.rtss.playparser.PlayParser to scrapr.rtss

deleted games/plays.py and scrapr/playparser.py

reworked data structure of PlayParser to be purely a dict

parsed play data isn’t converted into the proper Play object until games.playbyplay.PlayByPlay gets it

refactored TOI/ShiftSummary construct

moved ShiftSummary from scrapr.toirep to games.toi

scrapr.toirep.TOIRepBase now stores by player shift info as dict

parsed shift summary isn’t made into a ShiftSummary object until in TOI

Goal of both big refactors was to keep scraping/raw web data as dicts and have object wrappers only exist in the games package

added a unittest for the time on ice and shift summary info

added docstrings to major report and scraper interfaces

built docs using Sphinx

v0.3.1¶

fixed play-by-play bug created when no cum_stats provided

deprecated extractors

refactored GameKey and GameType into nhlscrapi.games.game

updated unittests and README to reflect the refactoring

v0.3.0¶

added face off comparison report, associated report loaded (scraper) and unittest

gave Game object basic access/loading to face off comp

reworked testing framework

can now run tests w the standard python -m unittest discover

made versioning counter sane. structure is v(realease).(feature).(bug)

added lxml to the install requirements in setup

added this change log