For this exercice I do have a preference for bpython, since it has the ctrl+S shortcut. Thus, you can save any «experiments» in a file.
It is pretty much a querying language in disguise.
Initially I did not planned to use it in a console or as a standalone module so the API is not satisfying.
>>> context=notch(
'yahi/test/biggersample.log' ,'another_log',
include="yahi/test/include.json",
exclude='{ "ip" : "^(192\.168|10\.)"}',
output_format="csv"
)
# include.json contains : { "_country" : "GB","user" : "-" }
Here you parse two files, you want:
(Since no output file is set, output is redirected to stdout (errors are directed on stderr)).
Shoot has 2 inputs:
An extractor is a function extracting and transforming datas, and since I love short circuits, that may contain some on the fly filtering :)
>>> from archery import Hankyu as _dict
>>> shoot(
... context,
... lambda data: _dict({ 'total_lines' : 1 })
... )
Business hour being each weekday from monday to friday, between 8 am and 5 pm.
>>> from archery import Hankyu as _dict
>>> shoot(
... context,
... lambda data: _dict({ (
... 8 >= data["_datetime"].hour >= 17 and
... data["_datetime"].weekday() < 5
... ) and "business_hour" or "other_hour" : 1 })
... )
Hankyu is a dict supporting addition.
>>> from archery import Hankyu as _dict
>>> shoot(
... context,
... lambda data: _dict({ data["_country"]: 1 })
... )
ToxicSet is a set that maps add to union.
>>> from archery import Hankyu as _dict
>>> from yahi import ToxicSet
>>> shoot(
... context,
... lambda data: _dict(distinct_ip = ToxicSet({ data["ip"]}))
... )
ToxicSet is a set that maps add to union.
>>> date_formater= lambda dt :"%s-%s-%s" % ( dt.year, dt.month, dt.day)
>>> from archery import Hankyu as _dict
>>> shoot(
... context,
... lambda data: _dict({
... date_formater(data["_datetime"]) : 1
... }))
You can now parallize all your requests by adding one key in the aggregator dict.
Just beware of the memory consumption.
Sometimes regexp are not enough, imagine you have a function for checking if a user belongs to the employees, and you want to check all the workhaolic in your company reaching an authentified realm out of the working hours:
>>> context.data_filter= lambda data: (
... is_employee(data["user"]) and not working_hours(data["_datetime"])
... )
>>> shoot( context, _dict(workaholicness = _dict({data["user"] : 1})))
Warning
data_filter will override any include/exclude rules given in notch