mirror of https://gitee.com/bigwinds/arangodb
69 lines
3.2 KiB
Plaintext
69 lines
3.2 KiB
Plaintext
!CHAPTER Fulltext queries
|
|
|
|
ArangoDB allows to run queries on text contained in document attributes. To use
|
|
this, a [fulltext index](../Glossary/README.html#fulltext_index) must be defined for the attribute of the collection that
|
|
contains the text. Creating the index will parse the text in the specified
|
|
attribute for all documents of the collection. Only documents will be indexed
|
|
that contain a textual value in the indexed attribute. For such documents, the
|
|
text value will be parsed, and the individual words will be inserted into the
|
|
fulltext index.
|
|
|
|
When a fulltext index exists, it can be queried using a fulltext query.
|
|
|
|
!SUBSECTION Fulltext
|
|
<!-- js/common/modules/org/arangodb/arango-collection-common.js-->
|
|
@startDocuBlock collectionFulltext
|
|
|
|
!SUBSECTION Fulltext Syntax:
|
|
|
|
In the simplest form, a fulltext query contains just the sought word. If
|
|
multiple search words are given in a query, they should be separated by commas.
|
|
All search words will be combined with a logical AND by default, and only such
|
|
documents will be returned that contain all search words. This default behavior
|
|
can be changed by providing the extra control characters in the fulltext query,
|
|
which are:
|
|
|
|
* *+*: logical AND (intersection)
|
|
* *|*: logical OR (union)
|
|
* *-*: negation (exclusion)
|
|
|
|
*Examples:*
|
|
|
|
* *"banana"*: searches for documents containing "banana"
|
|
* *"banana,apple"*: searches for documents containing both "banana" *AND* "apple"
|
|
* *"banana,|orange"*: searches for documents containing either "banana" *OR* "orange" *OR* both
|
|
* *"banana,-apple"*: searches for documents that contains "banana" but *NOT* "apple".
|
|
|
|
Logical operators are evaluated from left to right.
|
|
|
|
Each search word can optionally be prefixed with *complete*: or *prefix*:, with
|
|
*complete*: being the default. This allows searching for complete words or for
|
|
word prefixes. Suffix searches or any other forms are partial-word matching are
|
|
currently not supported.
|
|
|
|
Examples:
|
|
|
|
* *"complete:banana"*: searches for documents containing the exact word "banana"
|
|
* *"prefix:head"*: searches for documents with words that start with prefix "head"
|
|
* *"prefix:head,banana"*: searches for documents contain words starting with prefix
|
|
"head" and that also contain the exact word "banana".
|
|
|
|
Complete match and prefix search options can be combined with the logical
|
|
operators.
|
|
|
|
Please note that only words with a minimum length will get indexed. This minimum
|
|
length can be defined when creating the fulltext index. For words tokenisation,
|
|
the libicu text boundary analysis is used, which takes into account the default
|
|
as defined at server startup (*--server.default-language* startup
|
|
option). Generally, the word boundary analysis will filter out punctuation but
|
|
will not do much more.
|
|
|
|
Especially no Word normalization, stemming, or similarity analysis will be
|
|
performed when indexing or searching. If any of these features is required, it
|
|
is suggested that the user does the text normalization on the client side, and
|
|
provides for each document an extra attribute containing just a comma-separated
|
|
list of normalized words. This attribute can then be indexed with a fulltext
|
|
index, and the user can send fulltext queries for this index, with the fulltext
|
|
queries also containing the stemmed or normalized versions of words as required
|
|
by the user.
|