mirror of https://gitee.com/bigwinds/arangodb
209 lines
9.3 KiB
Markdown
209 lines
9.3 KiB
Markdown
Manually Upgrading an _Active Failover_ Deployment
|
|
=========================================
|
|
|
|
This page will guide you through the process of a manual upgrade of an [_Active Failover_](../../Architecture/DeploymentModes/ActiveFailover/README.md)
|
|
setup. The different nodes can be upgraded one at a time without
|
|
incurring a _prolonged_ downtime of the entire system. The downtimes of the individual nodes
|
|
should also stay fairly low.
|
|
|
|
The manual upgrade procedure described in this section can be used to upgrade
|
|
to a new hotfix version, or to perform an upgrade to a new minor version of ArangoDB.
|
|
Please refer to the [Upgrade Paths](../GeneralInfo/README.md#upgrade-paths) section
|
|
for detailed information.
|
|
|
|
Preparations
|
|
------------
|
|
|
|
The ArangoDB installation packages (e.g. for Debian or Ubuntu) set up a
|
|
convenient standalone instance of `arangod`. During installation, this instance's
|
|
database will be upgraded (see [`--database.auto-upgrade`](../../Programs/Arangod/Database.md#auto-upgrade))
|
|
and the service will be (re)started.
|
|
|
|
You have to make sure that your _Active Failover_ deployment is independent of this
|
|
standalone instance. Specifically, make sure that the database directory as
|
|
well as the socket used by the standalone instance provided by the package are
|
|
separate from the ones in your _Active Failover_ configuration. Also, that you haven't
|
|
modified the init script or systemd unit file for the standalone instance in a way
|
|
that it would start or stop your _Active Failover_ instance instead.
|
|
|
|
### Install the new ArangoDB version binary
|
|
|
|
The first step is to install the new ArangoDB package.
|
|
|
|
**Note:** you do not have to stop the _Active Failover_ (_arangod_) processes before upgrading it.
|
|
|
|
For example, if you want to upgrade to `3.3.16` on Debian or Ubuntu, either call
|
|
|
|
```
|
|
$ apt install arangodb=3.3.16-1
|
|
```
|
|
|
|
(`apt-get` on older versions) if you have added the ArangoDB repository. Or
|
|
install a specific package using
|
|
|
|
```
|
|
$ dpkg -i arangodb3-3.3.16-1_amd64.deb
|
|
```
|
|
|
|
after you have downloaded the corresponding file from https://download.arangodb.com/.
|
|
|
|
|
|
#### Stop the Standalone Instance
|
|
|
|
As the package will automatically start the standalone instance, you might want to
|
|
stop that instance now, as otherwise it can create some confusion later. As you are
|
|
starting the _Active Failover_ processes manually
|
|
you will not need the automatically installed and started standalone instance,
|
|
and you should hence stop it via:
|
|
|
|
```
|
|
$ service arangodb3 stop
|
|
```
|
|
|
|
Also, you might want to remove the standalone instance from the default
|
|
_runlevels_ to prevent it from starting on the next reboots of your machine. How this
|
|
is done depends on your distribution and _init_ system. For example, on older Debian
|
|
and Ubuntu systems using a SystemV-compatible _init_, you can use:
|
|
|
|
```
|
|
$ update-rc.d -f arangodb3 remove
|
|
```
|
|
|
|
Set supervision into maintenance mode
|
|
-------------------------------------
|
|
|
|
**Important**: Supervision maintenance mode is supported from ArangoDB versions
|
|
3.3.8/3.2.14 or higher.
|
|
|
|
You have two main choices when performing an upgrade of the _Active Failover_ setup:
|
|
|
|
- Upgrade while incurring a leader-to-follower switch (with reduced downtime)
|
|
- An upgrade with no leader-to-follower switch.
|
|
|
|
Turning the maintenance mode _on_ will enable the latter case. You might have a short
|
|
downtime during the _leader_ upgrade, but there will be no potential loss of _acknowledged_ operations.
|
|
|
|
To enable the maintenance mode means to essentially disable the Agency supervision for a limited amount
|
|
of time during the upgrade procedure. The following API calls will
|
|
activate and deactivate the maintenance mode of the supervision job. You might use _curl_ to send the API calls.
|
|
The following examples assume there is an _Active Failover_ node running on `localhost` on port 7002.
|
|
|
|
### Activate Maintenance mode
|
|
|
|
`curl -u username:password <single-server>/_admin/cluster/maintenance -XPUT -d'"on"'`
|
|
|
|
For example:
|
|
```
|
|
curl -u "root:" http://localhost:7002/_admin/cluster/maintenance -XPUT -d'"on"'
|
|
|
|
{"error":false,"warning":"Cluster supervision deactivated.
|
|
It will be reactivated automatically in 60 minutes unless this call is repeated until then."}
|
|
```
|
|
**Note:** In case the manual upgrade takes longer than 60 minutes, the API call has to be resent.
|
|
|
|
|
|
### Deactivate Maintenance mode
|
|
|
|
The _cluster_ supervision resumes automatically 60 minutes after disabling it.
|
|
It can be manually reactivated earlier at any point using the following API call:
|
|
|
|
`curl -u username:password <single-server>/_admin/cluster/maintenance -XPUT -d'"off"'`
|
|
|
|
For example:
|
|
```
|
|
curl -u "root:" http://localhost:7002/_admin/cluster/maintenance -XPUT -d'"off"'
|
|
|
|
{"error":false,"warning":"Cluster supervision reactivated."}
|
|
```
|
|
|
|
Upgrade the _Active Failover_ processes
|
|
---------------------------------------
|
|
|
|
Now all the _Active Failover_ (_Agents_, _Single-Server_) processes (_arangod_) have to be
|
|
upgraded on each node.
|
|
|
|
**Note:** Please read the section regarding the maintenance mode above.
|
|
|
|
In order to stop an _arangod_ process we will need to use a command like `kill -15`:
|
|
|
|
```
|
|
kill -15 <pid-of-arangod-process>
|
|
```
|
|
|
|
The _pid_ associated to your _Active Failover setup_ can be checked using a command like _ps_:
|
|
|
|
|
|
```
|
|
ps -C arangod -fww
|
|
```
|
|
|
|
The output of the command above does not only show the process ids of all _arangod_
|
|
processes but also the used commands, which is useful for the following
|
|
restarts of all _arangod_ processes.
|
|
|
|
The output below is from a test machine where three _Agents_ and two _Single-Servers_
|
|
were running locally. In a more production-like scenario, you will find only one instance of each
|
|
type running per machine:
|
|
|
|
```
|
|
ps -C arangod -fww
|
|
UID PID PPID C STIME TTY TIME CMD
|
|
max 29075 8072 0 13:50 pts/2 00:00:42 arangod --server.endpoint tcp://0.0.0.0:5001 --agency.my-address=tcp://127.0.0.1:5001 --server.authentication false --agency.activate true --agency.size 3 --agency.endpoint tcp://127.0.0.1:5001 --agency.supervision true --log.file a1 --javascript.app-path /tmp --database.directory agent1
|
|
max 29208 8072 2 13:51 pts/2 00:02:08 arangod --server.endpoint tcp://0.0.0.0:5002 --agency.my-address=tcp://127.0.0.1:5002 --server.authentication false --agency.activate true --agency.size 3 --agency.endpoint tcp://127.0.0.1:5001 --agency.supervision true --log.file a2 --javascript.app-path /tmp --database.directory agent2
|
|
max 29329 16224 0 13:51 pts/3 00:00:42 arangod --server.endpoint tcp://0.0.0.0:5003 --agency.my-address=tcp://127.0.0.1:5003 --server.authentication false --agency.activate true --agency.size 3 --agency.endpoint tcp://127.0.0.1:5001 --agency.supervision true --log.file a3 --javascript.app-path /tmp --database.directory agent3
|
|
max 29824 16224 1 13:55 pts/3 00:01:53 arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:7001 --cluster.my-address tcp://127.0.0.1:7001 --cluster.my-role SINGLE --cluster.agency-endpoint tcp://127.0.0.1:5001 --cluster.agency-endpoint tcp://127.0.0.1:5002 --cluster.agency-endpoint tcp://127.0.0.1:5003 --log.file c1 --javascript.app-path /tmp --database.directory single1
|
|
max 29938 16224 2 13:56 pts/3 00:02:13 arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:7002 --cluster.my-address tcp://127.0.0.1:7002 --cluster.my-role SINGLE --cluster.agency-endpoint tcp://127.0.0.1:5001 --cluster.agency-endpoint tcp://127.0.0.1:5002 --cluster.agency-endpoint tcp://127.0.0.1:5003 --log.file c2 --javascript.app-path /tmp --database.directory single2
|
|
```
|
|
|
|
**Note:** The start commands of _Agent_ and _Single Server_ are required for restarting the processes later.
|
|
|
|
The recommended procedure for upgrading an _Active Failover_ setup is to stop, upgrade
|
|
and restart the _arangod_ instances one by one on all participating servers,
|
|
starting first with all _Agent_ instances, and then following with the _Active Failover_
|
|
instances themselves. When upgrading the _Active Failover_ instances, the followers should
|
|
be upgraded first.
|
|
|
|
To figure out the node containing the followers you can consult the cluster endpoints API:
|
|
```
|
|
curl http://<single-server>:7002/_api/cluster/endpoints
|
|
```
|
|
This will yield a list of endpoints, the _first_ of which is always the leader node.
|
|
|
|
|
|
### Stopping, upgrading and restarting an instance
|
|
|
|
To stop an instance, the currently running process has to be identified using the `ps`
|
|
command above.
|
|
|
|
Let's assume we are about to upgrade an _Agent_ instance, so we have to look in the `ps`
|
|
output for an agent instance first, and note its process id (pid) and start command.
|
|
|
|
The process can then be stopped using the following command:
|
|
|
|
```
|
|
kill -15 <pid-of-agent>
|
|
```
|
|
|
|
The instance then has to be upgraded using the same command that was used before (in the `ps` output),
|
|
but with the additional option:
|
|
|
|
```
|
|
--database.auto-upgrade=true
|
|
```
|
|
|
|
After the upgrade procecure has finishing successfully, the instance will remain stopped.
|
|
So it has to be restarted using the command from the `ps` output before
|
|
(this time without the `--database.auto-upgrade` option).
|
|
|
|
|
|
Once an _Agent_ was upgraded and restarted successfully, repeat the procedure for the
|
|
other _Agent_ instances in the setup and then repeat the procedure for the _Active Failover_
|
|
instances, there starting with the followers.
|
|
|
|
### Final words
|
|
|
|
The _Agency_ supervision then needs to be reactivated by issuing the following API call
|
|
to the leader:
|
|
|
|
`curl -u username:password <single-server>/_admin/cluster/maintenance -XPUT -d'"off"'`
|