server¶

usage: lupyne.server [-h] [-r] [-c CONFIG] [-p FILE] [-d]
                     [--autoreload SECONDS] [--autoupdate SECONDS]
                     [--autosync HOST{:PORT}{/PATH},...] [--real-time]
                     [directory [directory ...]]

Restful json cherrypy server.

positional arguments:
  directory             index directories

optional arguments:
  -h, --help            show this help message and exit
  -r, --read-only       expose only read methods; no write lock
  -c CONFIG, --config CONFIG
                        optional configuration file or json object of global
                        params
  -p FILE, --pidfile FILE
                        store the process id in the given file
  -d, --daemonize       run the server as a daemon
  --autoreload SECONDS  automatically reload modules; replacement for
                        engine.autoreload
  --autoupdate SECONDS  automatically update index version and commit any
                        changes
  --autosync HOST{:PORT}{/PATH},...
                        automatically synchronize searcher with remote hosts
                        and update
  --real-time           search in real-time without committing

Restful json CherryPy server.

The server script mounts a WebSearcher (read_only) or WebIndexer root. Standard CherryPy configuration applies, and the provided custom tools are also configurable. All request and response bodies are application/json values.

WebSearcher exposes resources for an IndexSearcher. In addition to search requests, it provides access to term and document information in the index.

/

/search

/docs

/terms

/update

/queries

WebIndexer extends WebSearcher, exposing additional resources and methods for an Indexer. Single documents may be added, deleted, or replaced by a unique indexed field. Multiples documents may also be added or deleted by query at once. By default changes are not visible until the update resource is called to commit a new index version. If a near real-time Indexer is used, then changes are instantly searchable. In such cases a commit still hasn’t occurred, and the index based last-modified header shouldn’t be used for caching.

/

/search

/docs

/fields

/update

Custom servers should create and mount WebSearchers and WebIndexers as needed. Caches and field settings can then be applied directly before starting the server. WebSearchers and WebIndexers can of course also be subclassed for custom interfaces.

CherryPy and Lucene VM integration issues:

Monitors (such as autoreload) are not compatible with the VM unless threads are attached.
WorkerThreads must be also attached to the VM.
VM initialization must occur after daemonizing.
Recommended that the VM ignores keyboard interrupts (-Xrs) for clean server shutdown.

Note

Lucene doc ids are ephemeral; only use doc ids across requests for the same index version.

tools¶

CherryPy tools enabled by default: tools.{json_in,json_out,allow,timer,validate}.on

lupyne.server.json_in(process_body=None, **kwargs)[source]¶

Handle request bodies in json format.

Parameters:	content_type – request media type process_body – optional function to process body into request.params

lupyne.server.json_out(content_type='application/json', indent=None, **kwargs)[source]¶

Handle responses in json format.

Parameters:	content_type – response content-type header indent – indentation level for pretty printing

lupyne.server.allow(methods=None, paths=(), **kwargs)[source]¶: Only allow specified methods.

lupyne.server.timer()[source]¶: Return response time in headers.

lupyne.server.validate(etag=True, last_modified=False, max_age=None, expires=None)[source]¶

Return and validate caching headers.

Parameters:	etag – return weak entity tag header based on index version and validate if-match headers last_modified – return last-modified header based on index timestamp and validate if-modified headers max_age – return cache-control max-age and age headers based on last update timestamp expires – return expires header offset from last update timestamp

WebSearcher¶

class lupyne.server.WebSearcher(*directories, **kwargs)[source]¶

Dispatch root with a delegated Searcher.

Parameters:	hosts – ordered hosts to synchronize with

Changed in version 1.2: automatic synchronization and promotion

fields¶: optional field settings will trigger indexer promotion when synchronized hosts are exhausted

autoupdate¶: optional autoupdate timer for use upon indexer promotion

docs(name=None, value='', **options)[source]¶

Return ids or documents.

GET /docs

Return array of doc ids.

return:	[int,... ]

GET /docs/[int|chars/chars]?

Return document mapping from id or unique name and value.

&fields=chars,... &fields.multi=chars,... &fields.indexed=chars[:chars],...: optionally select stored, multi-valued, and cached indexed fields
&fields.vector=chars,... &fields.vector.counts=chars,...: optionally select term vectors with term counts

return:	{string: null\|string\|number\|array\|object,... }

index(host='', path='')[source]¶

Return index information and synchronize with remote index.

GET, POST /[index]

Return a mapping of the directory to the document count. Add new segments from remote host.

{“host”: string[, “path”: string]}

return:	{string: int,... }

queries(name='', value='')[source]¶

Match a document against registered queries. Queries are cached by a unique name and value, suitable for document indexing.

New in version 1.4.

GET /queries

Return query set names.

return:	[string,... ]

GET, POST /queries/chars

Return query values and scores which match given document.

{string: string,... }

return:	{string: number,... }

GET, PUT, DELETE /queries/chars/chars

Return, create, or delete a registered query.

string

return:	string

search(q=None, count=None, start=0, fields=None, sort=None, facets='', group='', hl='', mlt=None, spellcheck=0, timeout=None, **options)[source]¶

Run query and return documents.

GET /search?

Return array of document objects and total doc count.

&q=chars&q.type=[term|prefix|wildcard]&q.chars=...,: query, optional type to skip parsing, and optional parser settings: q.field, q.op,...
&filter=chars: cached filter applied to the query

if a previously cached filter is not found, the value will be parsed as a query
&count=int&start=0: maximum number of docs to return and offset to start at
&fields=chars,... &fields.multi=chars,... &fields.indexed=chars[:chars],...: only include selected stored fields; multi-valued fields returned in an array; indexed fields with optional type are cached
&sort=[-]chars[:chars],... &sort.scores[=max]: field name, optional type, minus sign indicates descending

optionally score docs, additionally compute maximum score
&facets=chars,... &facets.count=int&facets.min=0: include facet counts for given field names; facets filters are cached

optional maximum number of most populated facet values per field, and minimum count to return
&group=chars[:chars]&group.count=1: group documents by field value with optional type, up to given maximum count

Changed in version 1.6: grouping searches use count and start options

&hl=chars,... &hl.count=1&hl.tag=strong&hl.enable=[fields|terms]: stored fields to return highlighted

optional maximum fragment count and html tag name

optionally enable matching any field or any term
&mlt=int&mlt.fields=chars,... &mlt.chars=...,: doc index (or id without a query) to find MoreLikeThis

optional document fields to match

optional MoreLikeThis settings: mlt.minTermFreq, mlt.minDocFreq,...
&spellcheck=int: maximum number of spelling corrections to return for each query term, grouped by field

original query is still run; use q.spellcheck=true to affect query parsing
&timeout=number: timeout search after elapsed number of seconds

return:

{
“query”: string|null,
“count”: int|null,
“maxscore”: number|null,
“docs”: [{“__id__”: int, “__score__”: number, “__keys__”: array,
“__highlights__”: {string: array,... }, string: value,... },... ],
“facets”: {string: {string: int,... },... },
“groups”: [{“count”: int, “value”: value, “docs”: [object,... ]},... ]
“spellcheck”: {string: {string: [string,... ],... },... },
}

terms(name='', value='*', *path, **options)[source]¶

Return data about indexed terms.

GET /terms?

Return field names, with optional selection.

&indexed=true|false

return:	[string,... ]

GET /terms/chars[:int|float]?step=0

Return term values for given field name, with optional type and step for numeric encoded values.

return:	[string,... ]

GET /terms/chars/chars[*|:chars|~[int]]

Return term values (prefix, slices, or fuzzy terms) for given field name.

return:	[string,... ]

GET /terms/chars/chars[*|~[int]]?count=int

Return spellchecked term values ordered by decreasing document frequency. Prefixes (*) are optimized to be suitable for real-time query suggestions; all terms are cached.

return:	[string,... ]

GET /terms/chars/chars

Return document count for given term.

return:	int

GET /terms/chars/chars/docs

Return document ids for given term.

return:	[int,... ]

GET /terms/chars/chars/docs/counts

Return document ids and frequency counts for given term.

return:	[[int, int],... ]

GET /terms/chars/chars/docs/positions

Return document ids and positions for given term.

return:	[[int, [int,... ]],... ]

update(**caches)[source]¶

Refresh index version.

POST /update

Reopen searcher, optionally reloading caches, and return document count.

{“filters”|”sorters”|”spellcheckers”: true,... }

Changed in version 1.2: request body is an object instead of an array

return:	int

WebIndexer¶

class lupyne.server.WebIndexer(*args, **kwargs)[source]¶

Bases: lupyne.server.WebSearcher

Dispatch root with a delegated Indexer, exposing write methods.

docs(name=None, value='', **options)[source]¶

Add or return documents. See WebSearcher.docs() for GET method.

POST /docs

Add documents to index.

[{string: string|number|array,... },... ]

PUT, DELETE /docs/chars/chars

Set or delete document. Unique term should be indexed and is added to the new document.

{string: string|number|array,... }

fields(name='', **settings)[source]¶

Return or store a field’s settings.

GET /fields

Return known field names.

return:	[string,... ]

GET, PUT /fields/chars

Set and return settings for given field name.

{“stored”|”indexed”|...: string|true|false,... }

Changed in version 1.6: lucene FieldType attributes used as settings

return:	{“stored”\|”indexed”\|...: string\|true\|false,... }

index()[source]¶

Add indexes. See WebSearcher.index() for GET method.

POST /[index]

Add indexes without optimization.

[string,... ]

search(q=None, **options)[source]¶

Run or delete a query. See WebSearcher.search() for GET method.

DELETE /search?q=chars: Delete documents which match query.

update(id='', name='', **options)[source]¶

Commit index changes and refresh index version.

POST /update

Commit write operations and return document count. See WebSearcher.update() for caching options.

{“merge”: true|int,... }

Changed in version 1.2: request body is an object instead of an array

return:	int

GET, PUT, DELETE /update/[snapshot|int]

Verify, create, or release unique snapshot of current index commit and return array of referenced filenames.

Changed in version 1.4: lucene identifies snapshots by commit generation; use location header

return:	[string,... ]

GET /update/int/chars

Download index file corresponding to snapshot id and filename.

start¶

lupyne.server.mount(root, path='', config=None, autoupdate=0, app=None)[source]¶

Attach root and subscribe to plugins.

Parameters:	root,path,config – see cherrypy.tree.mount autoupdate – see command-line options app – optionally replace root on existing app

lupyne.server.start(root=None, path='', config=None, pidfile='', daemonize=False, autoreload=0, autoupdate=0, callback=None)[source]¶

Attach root, subscribe to plugins, and start server.

Parameters:	root,path,config – see cherrypy.quickstart pidfile,daemonize,autoreload,autoupdate – see command-line options callback – optional callback function scheduled after daemonizing