1
0
Fork 0
arangodb/Documentation/Books/HTTP/Analyzers
Vasiliy 1a22d1360c issue 526.9.1: implement swagger interface, add documentation (#8730)
* issue 526.9.1: implement swagger interface, add documentation

* address review comments

* add ngram

* Formatting

* Move REST description to new Analyzers top chapter in HTTP book

* Missed a DocuBlock

* Add Analyzers chapter to Manual SUMMARY.md

* Move REST API description back to Manual

Headlines were broken

* Add n-gram example
2019-04-16 18:54:30 +03:00
..
README.md

README.md

HTTP Interface for Analyzers

The REST API is accessible via the /_api/analyzer endpoint URL callable via HTTP requests.

Analyzer Operations

@startDocuBlock post_api_analyzer

@startDocuBlock get_api_analyzer

@startDocuBlock get_api_analyzers

@startDocuBlock delete_api_analyzer

Analyzer Types

The currently implemented Analyzer types are:

  • identity
  • delimited
  • ngram
  • text

Identity

An analyzer applying the identity transformation, i.e. returning the input unmodified.

The value of the properties attribute is ignored.

Delimited

An analyzer capable of breaking up delimited text into tokens as per RFC4180 (without starting new records on newlines).

The properties allowed for this analyzer are either:

  • a string encoded delimiter to use
  • an object with the attribute delimiter containing the string encoded delimiter to use

N-gram

An analyzer capable of producing n-grams from a specified input in a range of [min;max] (inclusive). Can optionally preserve the original input.

The properties allowed for this analyzer are an object with the following attributes:

  • max: unsigned integer (required) maximum n-gram length
  • min: unsigned integer (required) minimum n-gram length
  • preserveOriginal: boolean (required) output the original value as well

Example

With min = 4 and max = 5, the analyzer will produce the following n-grams for the input foobar:

  • foob
  • ooba
  • obar
  • fooba
  • oobar

With preserveOriginal enabled, it will additionally include foobar itself.

Text

An analyzer capable of breaking up strings into individual words while also optionally filtering out stop-words, applying case conversion and extracting word stems.

The properties allowed for this analyzer are an object with the following attributes:

  • locale: string (required) format: (language[_COUNTRY][.encoding][@variant])
  • case_convert: string enum (optional) one of: lower, none, upper, default: lower
  • ignored_words: array of strings (optional) words to omit from result, default: load words from ignored_words_path
  • ignored_words_path: string(optional) path with the language sub-directory containing files with words to omit, default: if no ignored_words provided then the value from the environment variable IRESEARCH_TEXT_STOPWORD_PATH or if undefined then the current working directory
  • no_accent: boolean (optional) apply accent removal, default: true
  • no_stem: boolean (optional) do not apply stemming on returned words, default: false