🔍 mSearch finds documents with similar meaning 🔍
Create and configure collections, load documents and query them interactively. mSearch retrieves documents relevant to textual queries or finds documents similar to another document.
mSearch combines the following search methods:
An mSearch collection is a set of documents. Collections are configured at creation time but most of their properties can be changed later. When a collection is created, it is assigned a fixed unique identifier (UUID), and documents can be uploaded to it. Collection configuration consists of search method parameters and of a schema that defines the names and types of document fields that are relevant for searching.
mSearch collections can be queried by:
Input document format include JSON, JSON lines, Excel spreadsheets, PDFs, docx, HTML and others. For a full listing of formats and examples, see Ingesting documents. Documents can be POSTed as raw request bodies, or as multipart/form-data uploaded files, accompanied by the right content-type. Examples of uploaded documents include flat JSON documents like
{"id": "d1", "title": "Rapunzel", "text": "Once upon a time..."}
or lists of JSON documents like
{
"documents": [
{"doc_id": "d1", "title": "Rapunzel", "text": "Once..."},
{"doc_id": "d2", "title": "The Riddle", "text": "A prince..."}
]
}
or JSON lines like
{"doc_id": "d1", "title": "Rapunzel", "text": "Once..."},
{"doc_id": "d2", "title": "The Riddle", "text": "A prince..."}
or Excel spreadsheets (xlsx) with one document per row. The JSON keys or spreadsheet column names of the source documents are mapped to field names of the collection; these are matched either by their names or by mappings explained under Ingesting documents.
mSearch outputs retrieved document records with the following fixed properties:
document_id
- document identifier, the value of the document's field typed id
in collection schema score
- relevancy score in the range 0..1 for most search methodssource
- the search method(s) that retrieved this documentdebug
- explanation why this document was considered relevant and how its score
was computed Apart from the properties above, the actual content of each retrieved document can be returned in two formats:
sections
each having its type
that corresponds to the name of each
document field and its content
.