We strongly recommend that you do periodic backups of your DSS data. We also recommend that you backup before upgrading DSS.
Your DSS Data Directory (or DATA_DIR) contains your configuration, your projects (graphs, recipes, notebooks, etc.), your connections to databases, the filesystem_managed files, etc.
Note that this directory obviously does not contains datasets stored outside of the server: SQL servers, cloud storage, hadoop, etc.
The simplest way to backup your data dir is to do a FULL backup of the whole data directory folder:
tar -zcvf your_backup.tar.gz /path/to/DATA_DIR/
The above mentioned method using
tar is very simple but always performs full backups, which might not be practical with large data dirs.
There are many other backup methods, and listing them all is outside of the scope of this document, but we can mention:
Important note: at the moment, the full consistency of the backup is only guaranteed if the backup was executed while DSS was not running. Note that all critical files of DSS are text files, which are written atomically, so partially-consistent backups (run while DSS was running) will always be mostly recoverable.
To restore a backup, you need to restore the files that you backed up to their original location.
A pristine restore means a restoration of the backed up DSS data:
For this kind of restoration, you simply need to replace the content of DATA_DIR with the content of the archive:
If applicable, stop the currently running DSS, and move away the current content of the DATA_DIR
Restore the backup
cd DATA_DIR tar -zxvf your_backup.tar.gz
If restoring on another machine, download and uncompress the DSS software on the new machine
Restore the backup files
mkdir new_datadir_location tar -zxvf your_backup.tar.gz
Replay the installer in “upgrade” mode: this will “reattach” the restored datadir to the installation directory. It will also, if needed, migrate to the newer DSS version:
INSTALL_DIR/installer.sh -d DATA_DIR -u
If you installed the data dir on a different machine or in a different location, you need to rebuild the Python environment. See the “Migrating the data directory section” of our documentation on migrations
Replay the various “integration” setup scripts:
Here is an example shell script that you can run periodically within a cron task.
Save this script in a file “backupscript.sh” and set a cron task like the following one (running from Monday to Friday at 6:15am):
15 6 * * 1-5 /path/to/backupscript.sh
The data directory contains some folders which can safely be excluded from the backup because they only contain temporary data which can be rebuilt:
In addition, the following folders only contain log data, which you might want to exclude from backup:
Datasets stored outside of DATA_DIR aren’t affected by a DSS upgrade: they will still be available after the upgrade.
The following folders contain data which you might consider excluding:
The following folders contain data built by DSS. This data can generally be rebuilt, but caution should be exercised when choosing whether to backup these folders: