mSearch API

🔍 mSearch finds documents with similar meaning 🔍

Create and configure collections, load documents and query them interactively. mSearch retrieves documents relevant to textual queries or finds documents similar to another document.

Search methods

mSearch combines the following search methods:

Semantic search - uses the information that e.g. "teacher" and "instructor" are similar
Full-text search - leverages classic TF/IDF search to boost relevancy of exact matches
Knowledge-based search - allows rules such that 2 ticket documents having the same "assignee" are similar

Collections

An mSearch collection is a set of documents. Collections are configured at creation time but most of their properties can be changed later. When a collection is created, it is assigned a fixed unique identifier (UUID), and documents can be uploaded to it. Collection configuration consists of search method parameters and of a schema that defines the names and types of document fields that are relevant for searching.

Queries

mSearch collections can be queried by:

Text - single or multiple words or sentences
Document that already exists in the searched collection - used to find other similar documents
Provided document - finds documents similar to a posted document

Input document formats

Input document format include JSON, JSON lines, Excel spreadsheets, PDFs, docx, HTML and others. For a full listing of formats and examples, see Ingesting documents. Documents can be POSTed as raw request bodies, or as multipart/form-data uploaded files, accompanied by the right content-type. Examples of uploaded documents include flat JSON documents like

  {"id": "d1", "title": "Rapunzel", "text": "Once upon a time..."}

or lists of JSON documents like

  {
    "documents": [
      {"doc_id": "d1", "title": "Rapunzel", "text": "Once..."}, 
      {"doc_id": "d2", "title": "The Riddle", "text": "A prince..."}
    ]
  }

or JSON lines like

{"doc_id": "d1", "title": "Rapunzel", "text": "Once..."}, 
{"doc_id": "d2", "title": "The Riddle", "text": "A prince..."}

or Excel spreadsheets (xlsx) with one document per row. The JSON keys or spreadsheet column names of the source documents are mapped to field names of the collection; these are matched either by their names or by mappings explained under Ingesting documents.

Output document formats

mSearch outputs retrieved document records with the following fixed properties:

document_id - document identifier, the value of the document's field typed id in collection schema
score - relevancy score in the range 0..1 for most search methods
source - the search method(s) that retrieved this document
debug - explanation why this document was considered relevant and how its score was computed

Apart from the properties above, the actual content of each retrieved document can be returned in two formats:

Sections format: An ordered list of sections each having its type that corresponds to the name of each document field and its content.
Flat format: The same key-value format as described for input formats.

API reference