server¶
usage: lupyne.server [-h] [-r] [-c CONFIG] [-p FILE] [-d]
[--autoreload SECONDS] [--autoupdate SECONDS]
[--autosync HOST{:PORT}{/PATH},...] [--real-time]
[directory [directory ...]]
Restful json cherrypy server.
positional arguments:
directory index directories
optional arguments:
-h, --help show this help message and exit
-r, --read-only expose only read methods; no write lock
-c CONFIG, --config CONFIG
optional configuration file or json object of global
params
-p FILE, --pidfile FILE
store the process id in the given file
-d, --daemonize run the server as a daemon
--autoreload SECONDS automatically reload modules; replacement for
engine.autoreload
--autoupdate SECONDS automatically update index version and commit any
changes
--autosync HOST{:PORT}{/PATH},...
automatically synchronize searcher with remote hosts
and update
--real-time search in real-time without committing
Restful json CherryPy server.
The server script mounts a WebSearcher (read_only) or WebIndexer root. Standard CherryPy configuration applies, and the provided custom tools are also configurable. All request and response bodies are application/json values.
WebSearcher exposes resources for an IndexSearcher. In addition to search requests, it provides access to term and document information in the index.
WebIndexer extends WebSearcher, exposing additional resources and methods for an Indexer.
Single documents may be added, deleted, or replaced by a unique indexed field.
Multiples documents may also be added or deleted by query at once.
By default changes are not visible until the update resource is called to commit a new index version.
If a near real-time Indexer is used, then changes are instantly searchable.
In such cases a commit still hasn’t occurred, and the index based last-modified header
shouldn’t be used for caching.
Custom servers should create and mount WebSearchers and WebIndexers as needed.
Caches
and field settings
can then be applied directly before starting the server.
WebSearchers and WebIndexers can of course also be subclassed for custom interfaces.
- CherryPy and Lucene VM integration issues:
- Monitors (such as autoreload) are not compatible with the VM unless threads are attached.
- WorkerThreads must be also attached to the VM.
- VM initialization must occur after daemonizing.
- Recommended that the VM ignores keyboard interrupts (-Xrs) for clean server shutdown.
Note
Lucene doc ids are ephemeral; only use doc ids across requests for the same index version.
tools¶
CherryPy tools enabled by default: tools.{json_in,json_out,allow,timer,validate}.on
-
lupyne.server.
json_in
(process_body=None, **kwargs)[source]¶ Handle request bodies in json format.
Parameters: - content_type – request media type
- process_body – optional function to process body into request.params
-
lupyne.server.
json_out
(content_type='application/json', indent=None, **kwargs)[source]¶ Handle responses in json format.
Parameters: - content_type – response content-type header
- indent – indentation level for pretty printing
-
lupyne.server.
validate
(etag=True, last_modified=False, max_age=None, expires=None)[source]¶ Return and validate caching headers.
Parameters: - etag – return weak entity tag header based on index version and validate if-match headers
- last_modified – return last-modified header based on index timestamp and validate if-modified headers
- max_age – return cache-control max-age and age headers based on last update timestamp
- expires – return expires header offset from last update timestamp
WebSearcher¶
-
class
lupyne.server.
WebSearcher
(*directories, **kwargs)[source]¶ Dispatch root with a delegated Searcher.
Parameters: hosts – ordered hosts to synchronize with Changed in version 1.2: automatic synchronization and promotion
-
fields
¶ optional field settings will trigger indexer promotion when synchronized hosts are exhausted
-
autoupdate
¶ optional autoupdate timer for use upon indexer promotion
-
docs
(name=None, value='', **options)[source]¶ Return ids or documents.
- GET /docs
Return array of doc ids.
return: [int,... ] - GET /docs/[int|chars/chars]?
Return document mapping from id or unique name and value.
- &fields=chars,... &fields.multi=chars,... &fields.indexed=chars[:chars],...
- optionally select stored, multi-valued, and cached indexed fields
- &fields.vector=chars,... &fields.vector.counts=chars,...
- optionally select term vectors with term counts
return: {string: null|string|number|array|object,... }
-
index
(host='', path='')[source]¶ Return index information and synchronize with remote index.
- GET, POST /[index]
Return a mapping of the directory to the document count. Add new segments from remote host.
{“host”: string[, “path”: string]}
return: {string: int,... }
-
queries
(name='', value='')[source]¶ Match a document against registered queries. Queries are cached by a unique name and value, suitable for document indexing.
New in version 1.4.
- GET /queries
Return query set names.
return: [string,... ] - GET, POST /queries/chars
Return query values and scores which match given document.
{string: string,... }
return: {string: number,... } - GET, PUT, DELETE /queries/chars/chars
Return, create, or delete a registered query.
string
return: string
-
search
(q=None, count=None, start=0, fields=None, sort=None, facets='', group='', hl='', mlt=None, spellcheck=0, timeout=None, **options)[source]¶ Run query and return documents.
- GET /search?
Return array of document objects and total doc count.
- &q=chars&q.type=[term|prefix|wildcard]&q.chars=...,
- query, optional type to skip parsing, and optional parser settings: q.field, q.op,...
- &filter=chars
- cached filter applied to the queryif a previously cached filter is not found, the value will be parsed as a query
- &count=int&start=0
- maximum number of docs to return and offset to start at
- &fields=chars,... &fields.multi=chars,... &fields.indexed=chars[:chars],...
- only include selected stored fields; multi-valued fields returned in an array; indexed fields with optional type are cached
- &sort=[-]chars[:chars],... &sort.scores[=max]
- field name, optional type, minus sign indicates descendingoptionally score docs, additionally compute maximum score
- &facets=chars,... &facets.count=int&facets.min=0
- include facet counts for given field names; facets filters are cachedoptional maximum number of most populated facet values per field, and minimum count to return
- &group=chars[:chars]&group.count=1
- group documents by field value with optional type, up to given maximum count
Changed in version 1.6: grouping searches use count and start options
- &hl=chars,... &hl.count=1&hl.tag=strong&hl.enable=[fields|terms]
- stored fields to return highlightedoptional maximum fragment count and html tag nameoptionally enable matching any field or any term
- &mlt=int&mlt.fields=chars,... &mlt.chars=...,
- doc index (or id without a query) to find MoreLikeThisoptional document fields to matchoptional MoreLikeThis settings: mlt.minTermFreq, mlt.minDocFreq,...
- &spellcheck=int
- maximum number of spelling corrections to return for each query term, grouped by fieldoriginal query is still run; use q.spellcheck=true to affect query parsing
- &timeout=number
- timeout search after elapsed number of seconds
return: {“query”: string|null,“count”: int|null,“maxscore”: number|null,“docs”: [{“__id__”: int, “__score__”: number, “__keys__”: array, “__highlights__”: {string: array,... }, string: value,... },... ],“facets”: {string: {string: int,... },... },“groups”: [{“count”: int, “value”: value, “docs”: [object,... ]},... ]“spellcheck”: {string: {string: [string,... ],... },... },}
-
terms
(name='', value='*', *path, **options)[source]¶ Return data about indexed terms.
- GET /terms?
Return field names, with optional selection.
&indexed=true|false
return: [string,... ] - GET /terms/chars[:int|float]?step=0
Return term values for given field name, with optional type and step for numeric encoded values.
return: [string,... ] - GET /terms/chars/chars[*|:chars|~[int]]
Return term values (prefix, slices, or fuzzy terms) for given field name.
return: [string,... ] - GET /terms/chars/chars[*|~[int]]?count=int
Return spellchecked term values ordered by decreasing document frequency. Prefixes (*) are optimized to be suitable for real-time query suggestions; all terms are cached.
return: [string,... ] - GET /terms/chars/chars
Return document count for given term.
return: int - GET /terms/chars/chars/docs
Return document ids for given term.
return: [int,... ] - GET /terms/chars/chars/docs/counts
Return document ids and frequency counts for given term.
return: [[int, int],... ] - GET /terms/chars/chars/docs/positions
Return document ids and positions for given term.
return: [[int, [int,... ]],... ]
-
WebIndexer¶
-
class
lupyne.server.
WebIndexer
(*args, **kwargs)[source]¶ Bases:
lupyne.server.WebSearcher
Dispatch root with a delegated Indexer, exposing write methods.
-
docs
(name=None, value='', **options)[source]¶ Add or return documents. See
WebSearcher.docs()
for GET method.- POST /docs
Add documents to index.
[{string: string|number|array,... },... ]
- PUT, DELETE /docs/chars/chars
Set or delete document. Unique term should be indexed and is added to the new document.
{string: string|number|array,... }
-
fields
(name='', **settings)[source]¶ Return or store a field’s settings.
- GET /fields
Return known field names.
return: [string,... ] - GET, PUT /fields/chars
Set and return settings for given field name.
{“stored”|”indexed”|...: string|true|false,... }
Changed in version 1.6: lucene FieldType attributes used as settings
return: {“stored”|”indexed”|...: string|true|false,... }
-
index
()[source]¶ Add indexes. See
WebSearcher.index()
for GET method.- POST /[index]
Add indexes without optimization.
[string,... ]
-
search
(q=None, **options)[source]¶ Run or delete a query. See
WebSearcher.search()
for GET method.- DELETE /search?q=chars
- Delete documents which match query.
-
update
(id='', name='', **options)[source]¶ Commit index changes and refresh index version.
- POST /update
Commit write operations and return document count. See
WebSearcher.update()
for caching options.{“merge”: true|int,... }
Changed in version 1.2: request body is an object instead of an array
return: int - GET, PUT, DELETE /update/[snapshot|int]
Verify, create, or release unique snapshot of current index commit and return array of referenced filenames.
Changed in version 1.4: lucene identifies snapshots by commit generation; use location header
return: [string,... ] - GET /update/int/chars
- Download index file corresponding to snapshot id and filename.
-
start¶
-
lupyne.server.
mount
(root, path='', config=None, autoupdate=0, app=None)[source]¶ Attach root and subscribe to plugins.
Parameters: - root,path,config – see cherrypy.tree.mount
- autoupdate – see command-line options
- app – optionally replace root on existing app
-
lupyne.server.
start
(root=None, path='', config=None, pidfile='', daemonize=False, autoreload=0, autoupdate=0, callback=None)[source]¶ Attach root, subscribe to plugins, and start server.
Parameters: - root,path,config – see cherrypy.quickstart
- pidfile,daemonize,autoreload,autoupdate – see command-line options
- callback – optional callback function scheduled after daemonizing