More Like This (MLT) is a feature of Solr which provides for comparisons of documents; you can ask Solr to tell you about any More documents it has that are Like This one.
An MLT query can be part of a standard query (see More Like This.), in which case you’re asking Solr to tell you not only about immediate query results, but also about any other results which are similar to the results you’ve got.
Alternatively, you can feed Solr an entire document that is not already in its index, and ask to do an MLT query on that document.
The first case is covered above in More Like This; the second case we’ll show here.
Instead of calling the query method on the interface, we call the mlt_query method.
si.mlt_query(content=open("localfile").read())
We give the MLT handler some content (sourced in this case from a local file); the MLT query will take this text, analyze it, and retrieve documents that are similar according to the results of its analysis.
Because we haven’t specified which fields we care about, the similarity is calculated on the default search field, whatever that is.
The results are returned in the same format as illustrated in the mlt() method.
If we wanted similarity to be calculated with respect to a different field or fields, that can be specified too:
si.mlt_query(content=open("localfile").read(), fields="name")
si.mlt_query(content=open("localfile").read(), fields=["name", "author_t"])
We can understand a little more about why we get the results we do by asking for the result of the MLT document analysis.
si.mlt_query(content=open("localfile").read(), interestingTerms="list")
si.mlt_query(content=open("localfile").read(), interestingTerms="details")
“list” will return a list of the interesting terms extracted; “details” will also provide details of the boost used for each term.
If the document you’re supplying is not encoded in UTF-8 (or equivalently ASCII) format, then you need to specify the charset in use (using the list available at http://docs.python.org/library/codecs.html#standard-encodings:
si.mlt_query(content=open("localfile").read(), content_charset="iso-8859-1")
You can also choose to tell Solr to source the document from the web, by giving the URL for the content rather than supplying it yourself:
si.mlt_query(url="http://example.com/document")
All the other options above still apply to URL-sourced content, except for “content_charset”; that’s up to the webserver where the content is stored.
In all the cases above, you can also specify any of the other options shown in mlt(), apart from “count”.
You can perform an MLT query on indexed content in the following way:
si.mlt_query().query(...)
ie - initialize an otherwise empty mlt_query object, and then run queries on it as you would run normal queries. The full range of query operations is supported when composing the query for indexed content:
si.mlt_query().query(title='Whale').exclude(author='Melville').query(si.Q('Moby')|si.Q('Dick'))
The mlt_query() method is chainable in the same way as the query method. There are a fre differences to note.
The mlt_query() method takes all of the mlt() options except “count”.