server

usage: lupyne.server [-h] [-r] [-c CONFIG] [-p FILE] [-d]
                     [--autoreload SECONDS] [--autoupdate SECONDS]
                     [--autosync HOST{:PORT}{/PATH},...] [--real-time]
                     [directory [directory ...]]

Restful json cherrypy server.

positional arguments:
  directory             index directories

optional arguments:
  -h, --help            show this help message and exit
  -r, --read-only       expose only read methods; no write lock
  -c CONFIG, --config CONFIG
                        optional configuration file or json object of global
                        params
  -p FILE, --pidfile FILE
                        store the process id in the given file
  -d, --daemonize       run the server as a daemon
  --autoreload SECONDS  automatically reload modules; replacement for
                        engine.autoreload
  --autoupdate SECONDS  automatically update index version and commit any
                        changes
  --autosync HOST{:PORT}{/PATH},...
                        automatically synchronize searcher with remote hosts
                        and update
  --real-time           search in real-time without committing

Restful json CherryPy server.

The server script mounts a WebSearcher (read_only) or WebIndexer root. Standard CherryPy configuration applies, and the provided custom tools are also configurable. All request and response bodies are application/json values.

WebSearcher exposes resources for an IndexSearcher. In addition to search requests, it provides access to term and document information in the index.

WebIndexer extends WebSearcher, exposing additional resources and methods for an Indexer. Single documents may be added, deleted, or replaced by a unique indexed field. Multiples documents may also be added or deleted by query at once. By default changes are not visible until the update resource is called to commit a new index version. If a near real-time Indexer is used, then changes are instantly searchable. In such cases a commit still hasn’t occurred, and the index based last-modified header shouldn’t be used for caching.

Custom servers should create and mount WebSearchers and WebIndexers as needed. Caches and field settings can then be applied directly before starting the server. WebSearchers and WebIndexers can of course also be subclassed for custom interfaces.

CherryPy and Lucene VM integration issues:
  • Monitors (such as autoreload) are not compatible with the VM unless threads are attached.
  • WorkerThreads must be also attached to the VM.
  • VM initialization must occur after daemonizing.
  • Recommended that the VM ignores keyboard interrupts (-Xrs) for clean server shutdown.

Note

Lucene doc ids are ephemeral; only use doc ids across requests for the same index version.

tools

CherryPy tools enabled by default: tools.{json_in,json_out,allow,timer,validate}.on

lupyne.server.json_in(process_body=None, **kwargs)[source]

Handle request bodies in json format.

Parameters:
  • content_type – request media type
  • process_body – optional function to process body into request.params
lupyne.server.json_out(content_type='application/json', indent=None, **kwargs)[source]

Handle responses in json format.

Parameters:
  • content_type – response content-type header
  • indent – indentation level for pretty printing
lupyne.server.allow(methods=None, paths=(), **kwargs)[source]

Only allow specified methods.

lupyne.server.timer()[source]

Return response time in headers.

lupyne.server.validate(etag=True, last_modified=False, max_age=None, expires=None)[source]

Return and validate caching headers.

Parameters:
  • etag – return weak entity tag header based on index version and validate if-match headers
  • last_modified – return last-modified header based on index timestamp and validate if-modified headers
  • max_age – return cache-control max-age and age headers based on last update timestamp
  • expires – return expires header offset from last update timestamp

WebSearcher

class lupyne.server.WebSearcher(*directories, **kwargs)[source]

Dispatch root with a delegated Searcher.

Parameters:hosts – ordered hosts to synchronize with

Changed in version 1.2: automatic synchronization and promotion

fields

optional field settings will trigger indexer promotion when synchronized hosts are exhausted

autoupdate

optional autoupdate timer for use upon indexer promotion

docs(name=None, value='', **options)[source]

Return ids or documents.

GET /docs

Return array of doc ids.

return:[int,... ]
GET /docs/[int|chars/chars]?

Return document mapping from id or unique name and value.

&fields=chars,... &fields.multi=chars,... &fields.indexed=chars[:chars],...
optionally select stored, multi-valued, and cached indexed fields
&fields.vector=chars,... &fields.vector.counts=chars,...
optionally select term vectors with term counts
return:{string: null|string|number|array|object,... }
index(host='', path='')[source]

Return index information and synchronize with remote index.

GET, POST /[index]

Return a mapping of the directory to the document count. Add new segments from remote host.

{“host”: string[, “path”: string]}

return:{string: int,... }
queries(name='', value='')[source]

Match a document against registered queries. Queries are cached by a unique name and value, suitable for document indexing.

New in version 1.4.

GET /queries

Return query set names.

return:[string,... ]
GET, POST /queries/chars

Return query values and scores which match given document.

{string: string,... }

return:{string: number,... }
GET, PUT, DELETE /queries/chars/chars

Return, create, or delete a registered query.

string

return:string
search(q=None, count=None, start=0, fields=None, sort=None, facets='', group='', hl='', mlt=None, spellcheck=0, timeout=None, **options)[source]

Run query and return documents.

GET /search?

Return array of document objects and total doc count.

&q=chars&q.type=[term|prefix|wildcard]&q.chars=...,
query, optional type to skip parsing, and optional parser settings: q.field, q.op,...
&filter=chars
cached filter applied to the query
if a previously cached filter is not found, the value will be parsed as a query
&count=int&start=0
maximum number of docs to return and offset to start at
&fields=chars,... &fields.multi=chars,... &fields.indexed=chars[:chars],...
only include selected stored fields; multi-valued fields returned in an array; indexed fields with optional type are cached
&sort=[-]chars[:chars],... &sort.scores[=max]
field name, optional type, minus sign indicates descending
optionally score docs, additionally compute maximum score
&facets=chars,... &facets.count=int&facets.min=0
include facet counts for given field names; facets filters are cached
optional maximum number of most populated facet values per field, and minimum count to return
&group=chars[:chars]&group.count=1
group documents by field value with optional type, up to given maximum count

Changed in version 1.6: grouping searches use count and start options

&hl=chars,... &hl.count=1&hl.tag=strong&hl.enable=[fields|terms]
stored fields to return highlighted
optional maximum fragment count and html tag name
optionally enable matching any field or any term
&mlt=int&mlt.fields=chars,... &mlt.chars=...,
doc index (or id without a query) to find MoreLikeThis
optional document fields to match
optional MoreLikeThis settings: mlt.minTermFreq, mlt.minDocFreq,...
&spellcheck=int
maximum number of spelling corrections to return for each query term, grouped by field
original query is still run; use q.spellcheck=true to affect query parsing
&timeout=number
timeout search after elapsed number of seconds
return:
{
“query”: string|null,
“count”: int|null,
“maxscore”: number|null,
“docs”: [{“__id__”: int, “__score__”: number, “__keys__”: array, “__highlights__”: {string: array,... }, string: value,... },... ],
“facets”: {string: {string: int,... },... },
“groups”: [{“count”: int, “value”: value, “docs”: [object,... ]},... ]
“spellcheck”: {string: {string: [string,... ],... },... },
}
terms(name='', value='*', *path, **options)[source]

Return data about indexed terms.

GET /terms?

Return field names, with optional selection.

&indexed=true|false

return:[string,... ]
GET /terms/chars[:int|float]?step=0

Return term values for given field name, with optional type and step for numeric encoded values.

return:[string,... ]
GET /terms/chars/chars[*|:chars|~[int]]

Return term values (prefix, slices, or fuzzy terms) for given field name.

return:[string,... ]
GET /terms/chars/chars[*|~[int]]?count=int

Return spellchecked term values ordered by decreasing document frequency. Prefixes (*) are optimized to be suitable for real-time query suggestions; all terms are cached.

return:[string,... ]
GET /terms/chars/chars

Return document count for given term.

return:int
GET /terms/chars/chars/docs

Return document ids for given term.

return:[int,... ]
GET /terms/chars/chars/docs/counts

Return document ids and frequency counts for given term.

return:[[int, int],... ]
GET /terms/chars/chars/docs/positions

Return document ids and positions for given term.

return:[[int, [int,... ]],... ]
update(**caches)[source]

Refresh index version.

POST /update

Reopen searcher, optionally reloading caches, and return document count.

{“filters”|”sorters”|”spellcheckers”: true,... }

Changed in version 1.2: request body is an object instead of an array

return:int

WebIndexer

class lupyne.server.WebIndexer(*args, **kwargs)[source]

Bases: lupyne.server.WebSearcher

Dispatch root with a delegated Indexer, exposing write methods.

docs(name=None, value='', **options)[source]

Add or return documents. See WebSearcher.docs() for GET method.

POST /docs

Add documents to index.

[{string: string|number|array,... },... ]

PUT, DELETE /docs/chars/chars

Set or delete document. Unique term should be indexed and is added to the new document.

{string: string|number|array,... }

fields(name='', **settings)[source]

Return or store a field’s settings.

GET /fields

Return known field names.

return:[string,... ]
GET, PUT /fields/chars

Set and return settings for given field name.

{“stored”|”indexed”|...: string|true|false,... }

Changed in version 1.6: lucene FieldType attributes used as settings

return:{“stored”|”indexed”|...: string|true|false,... }
index()[source]

Add indexes. See WebSearcher.index() for GET method.

POST /[index]

Add indexes without optimization.

[string,... ]

search(q=None, **options)[source]

Run or delete a query. See WebSearcher.search() for GET method.

DELETE /search?q=chars
Delete documents which match query.
update(id='', name='', **options)[source]

Commit index changes and refresh index version.

POST /update

Commit write operations and return document count. See WebSearcher.update() for caching options.

{“merge”: true|int,... }

Changed in version 1.2: request body is an object instead of an array

return:int
GET, PUT, DELETE /update/[snapshot|int]

Verify, create, or release unique snapshot of current index commit and return array of referenced filenames.

Changed in version 1.4: lucene identifies snapshots by commit generation; use location header

return:[string,... ]
GET /update/int/chars
Download index file corresponding to snapshot id and filename.

start

lupyne.server.mount(root, path='', config=None, autoupdate=0, app=None)[source]

Attach root and subscribe to plugins.

Parameters:
  • root,path,config – see cherrypy.tree.mount
  • autoupdate – see command-line options
  • app – optionally replace root on existing app
lupyne.server.start(root=None, path='', config=None, pidfile='', daemonize=False, autoreload=0, autoupdate=0, callback=None)[source]

Attach root, subscribe to plugins, and start server.

Parameters:
  • root,path,config – see cherrypy.quickstart
  • pidfile,daemonize,autoreload,autoupdate – see command-line options
  • callback – optional callback function scheduled after daemonizing