arangodb

History

CoDEmanX a39b712efe Documentation: corrected typos and case, prefer American over British English		2015-09-01 17:19:13 +02:00
..
README.mdpp	Documentation: corrected typos and case, prefer American over British English	2015-09-01 17:19:13 +02:00

README.mdpp

!CHAPTER Glossary

!SUBSECTION Collection

A collection consists of documents. It is uniquely identified by its collection identifier.
It also has a unique name that clients should use to identify and access it.
Collections can be renamed. It will change the collection name, but not the collection identifier.
Collections contain documents of a specific type. There are currently two types: document (default) and edge. The type is specified by the user when the collection is created, and cannot be changed later.

!SUBSECTION Collection Identifier

A collection identifier identifies a collection in a database. It is a string value and is unique within the database. Up to including ArangoDB 1.1, the collection identifier has been a client's primary means to access collections. Starting with ArangoDB 1.2, clients should instead use a collection's unique name to access a collection instead of its identifier.

ArangoDB currently uses 64bit unsigned integer values to maintain collection ids internally. When returning collection ids to clients, ArangoDB will put them into a string to ensure the collection id is not clipped by clients that do not support big integers. Clients should treat the collection ids returned by ArangoDB as
opaque strings when they store or use it locally.

!SUBSECTION Collection Name

A collection name identifies a collection in a database. It is a string and is unique within the database. Unlike the collection identifier it is supplied by the creator of the collection. The collection name must consist of letters, digits, and the _ (underscore) and - (dash) characters only. Please refer to [NamingConventions](../NamingConventions/CollectionNames.html) for more information on valid collection names.

!SUBSECTION Database

ArangoDB can handle multiple databases in the same server instance. Databases can be used to logically group and separate data. An ArangoDB database consists of collections and dedicated database-specific worker processes.

A database contains its own collections (which cannot be accessed from other databases), Foxx applications, and replication loggers and appliers. Each ArangoDB database contains its own system collections (e.g. _users, _replication, ...).

There will always be at least one database in ArangoDB. This is the default database, named _system. This database cannot be dropped, and provides special operations for creating, dropping, and enumerating databases. Users can create additional databases and give them unique names to access them later. Database management operations cannot be initiated from out of user-defined databases.

When ArangoDB is accessed via its HTTP REST API, the database name is read from the first part of the request URI path (e.g. /_db/_system/...). If the request URI does not contain a database name, the database name is automatically derived from the endpoint. Please refer to [DatabaseEndpoint](../HttpDatabase/DatabaseEndpoint.html) for more information.

!SUBSECTION Database Name

A single ArangoDB instance can handle multiple databases in parallel. When multiple databases are used, each database must be given a unique name. This name is used to uniquely identify a database. The default database in ArangoDB is named _system.

The database name is a string consisting of only letters, digits and the _ (underscore) and - (dash) characters. User-defined database names must always start with a letter. Database names is case-sensitive.

!SUBSECTION Database Organization

A single ArangoDB instance can handle multiple databases in parallel. By default, there will be at least one database, which is named _system.

Databases are physically stored in separate sub-directories underneath the database directory, which itself resides in the instance's data directory.

Each database has its own sub-directory, named database-<database id>. The database directory contains sub-directories for the collections of the database, and a file named parameter.json. This file contains the database id and name.

In an example ArangoDB instance which has two databases, the filesystem layout could look like this:

```
data/ # the instance's data directory
databases/ # sub-directory containing all databases' data
database-<id>/ # sub-directory for a single database
parameter.json # file containing database id and name
collection-<id>/ # directory containing data about a collection
database-<id>/ # sub-directory for another database
parameter.json # file containing database id and name
collection-<id>/ # directory containing data about a collection
collection-<id>/ # directory containing data about a collection
```

Foxx applications are also organized in database-specific directories inside the application path. The filesystem layout could look like this:

```
apps/ # the instance's application directory
system/ # system applications (can be ignored)
_db/ # sub-directory containing database-specific applications
<database-name>/ # sub-directory for a single database
<mountpoint>/APP # sub-directory for a single application
<mountpoint>/APP # sub-directory for a single application
<database-name>/ # sub-directory for another database
<mountpoint>/APP # sub-directory for a single application
```

!SUBSECTION Document

Documents in ArangoDB are JSON objects. These objects can be nested (to any depth) and may contains lists. Each document is uniquely identified by its document handle.

!SUBSECTION Document ETag

The document revision enclosed in double quotes. The revision is returned by several HTTP API methods in the Etag HTTP header.

!SUBSECTION Document Handle

A document handle uniquely identifies a document in the database. It is a string and consists of the collection's name and the document key (_key attribute) separated by /.

!SUBSECTION Document Key

A document key uniquely identifies a document in a given collection. It can and should be used by clients when specific documents are searched. Document keys are stored in the _key attribute of documents. The key values are automatically indexed by ArangoDB in a collection's primary index. Thus looking up a document by its key is regularly a fast operation. The _key value of a document is immutable once the document has been created.

By default, ArangoDB will auto-generate a document key if no _key attribute is specified, and use the user-specified _key otherwise.

This behavior can be changed on a per-collection level by creating collections with the keyOptions attribute.

Using keyOptions it is possible to disallow user-specified keys completely, or to force a specific regime for auto-generating the _key values.

!SUBSECTION Document Revision

As ArangoDB supports MVCC, documents can exist in more than one revision. The document revision is the MVCC token used to identify a particular revision of a document. It is a string value currently containing an integer number and is unique within the list of document revisions for a single document. Document revisions can be used to conditionally update, replace or delete documents in the database. In order to find a particular revision of a document, you need the document handle and the document revision.

ArangoDB currently uses 64bit unsigned integer values to maintain document revisions internally. When returning document revisions to clients, ArangoDB will put them into a string to ensure the revision id is not clipped by clients that do not support big integers. Clients should treat the revision id returned by ArangoDB as an opaque string when they store or use it locally. This will allow ArangoDB to change the format of revision ids later if this should be required. Clients can use revisions ids to perform simple equality/non-equality comparisons (e.g. to check whether a document has changed or not), but they should not use revision ids to perform greater/less than comparisons with them to check if a document revision is older than one another, even if this might work for some cases.

!SUBSECTION Edge

Edges are special documents used for connecting other documents into a graph. An edge describe the connection between two documents using the internal attributes: _from and _to. These contain document handles, namely the start-point and the end-point of the edge.

The values of _from and _to are immutable once saved.

!SUBSECTION Edge Collection

Edge collections are collections that store edges.

!SUBSECTION Index

Indexes are used to allow fast access to documents in a collection. All collections have a primary index, which is the document's _key attribute. This index cannot be dropped or changed.

Edge collections will also have an automatically created edges index, which cannot be modified. This index provides quick access to documents via the _from and _to attributes.

Most user-land indexes can be created by defining the names of the attributes which should be indexed. Some index types allow indexing just one attribute (e.g. fulltext index) whereas other index types allow indexing multiple attributes.

Indexing system attributes such as _id, _key, _from, and _to in user-defined indexes is not supported by any index type. Manually creating an index that relies on any of these attributes is unsupported.

!SUBSECTION Edges Index

An edges index is automatically created for edge collections. It contains connections between vertex documents and is invoked when the connecting edges of a vertex are queried. There is no way to explicitly create or delete edge indexes.

!SUBSECTION Fulltext Index

A fulltext index can be used to find words, or prefixes of words inside documents. A fulltext index can be defined on one attribute only, and will include all words contained in documents that have a textual value in the index attribute. Since ArangoDB 2.6 the index will also include words from the index attribute if the index attribute is an array of strings, or an object with string value members.

For example, given a fulltext index on the `translations` attribute and the following documents, then searching for `лиса` using the fulltext index would return only the first document. Searching for the index for the exact string `Fox` would return the first two documents, and searching for `prefix:Fox` would return all three documents:

{ translations: { en: "fox", de: "Fuchs", fr: "renard", ru: "лиса" } }
{ translations: "Fox is the English translation of the German word Fuchs" }
{ translations: [ "ArangoDB", "document", "database", "Foxx" ] }

If the index attribute is neither a string, an object or an array, its contents will not be indexed. When indexing the contents of an array attribute, an array member will only be included in the index if it is a string. When indexing the contents of an object attribute, an object member value will only be included in the index if it is a string. Other data types are ignored and not indexed.

Only words with a (specifiable) minimum length are indexed. Word tokenization is done using the word boundary analysis provided by libicu, which is taking into account the selected language provided at server start. Words are indexed in their lower-cased form. The index supports complete match queries (full words) and prefix queries.

!SUBSECTION Geo Index

A geo index is used to find places on the surface of the earth fast.

!SUBSECTION Index Handle

An index handle uniquely identifies an index in the database. It is a string and consists of a collection name and an index identifier separated by /.

!SUBSECTION Hash Index

A hash index is used to find documents based on examples. A hash index can be created for one or multiple document attributes.

A hash index will only be used by queries if all indexed attributes are present in the example or search query, and if all attributes are compared using the equality (== operator). That means the hash index does not support range queries.

If the index is declared unique, then access to the indexed attributes should be fast. The performance degrades if the indexed attribute(s) contain(s) only very few distinct values.

!SUBSECTION Skiplist Index

A skiplist is used to find ranges of documents.