Gravwell supports an ageout mechanism called Cloud Archive. When Cloud Archive is enabled, indexers will upload shards to the Cloud Archive server before deleting them from storage. Gravwell Cloud Archive is an excellent method for long term archival storage for data that does not need to be actively searchable but must be retained. The Cloud Archive service can be hosted on a variety of storage platforms and is designed to provide a remote, low-cost storage platform. Cloud Archive configuration can be enabled on a per-well basis, which means you can decide which data sets warrant long term archival.
Indexers will not delete data until they have successfully uploaded it to the archive server. If the indexer cannot upload due to connectivity issues, misconfigurations, or poor network throughput they will not delete data. The inability to delete data may cause indexers to run out of storage and cease ingesting new data. If a Cloud Archive upload fails to complete, the user interface will display a notification with the failure.
The Cloud Archive system compresses data while in transit, which requires some CPU resources when uploading. Pushing data to a remote system also takes time, depending on available bandwidth and CPU. Be sure to leave yourself a little headroom when defining ageout parameters to account for additional time consumed by Cloud Archive – if you are ingesting at 1Gbps but only have a 500Mbps uplink, you may not be able to archive shards as fast as new data comes in!
Every indexer can define a single Cloud Archive configuration block which specifies the remote archive server and authentication token. The configuration block is specified using the
[Cloud-Archive] section header. To enable Cloud Archive on a well, add the “Archive-Deleted-Shards=true” directive within that well.
Here is an example configuration with three wells:
The above example has 3 configured wells (default, netflow, and raw). The default well uses both a hot and cold storage tier which means that data is archived when it would normally roll out of the cold storage tier. The netflow well contains only a hot storage tier, its data will be uploaded when it would normally be deleted after 7 days. The raw well does not have Cloud Archive enabled (the default is Archive-Deleted-Shards=false), so its data will not be uploaded.
Hosting Cloud Archive#
The Cloud Archive service is designed to be self-hosted and potentially integrated into other larger infrastructures. The code is open-sourced and available at github.com/gravwell/cloudarchive. It is also packaged for Debian, Redhat, and as a shell installer.
Installing the Server#
To install on Debian:
apt install gravwell-cloudarchive-server
yum install gravwell-cloudarchive-server
As a standalone shell installer (downloaded from the downloads page):
These commands will install the server, but not configure it. Read on for instructions on configuration.
gravwell_cloudarchive_usertool command to set up the password database with an entry for your customer number:
sudo su gravwell -c "/opt/gravwell/bin/gravwell_cloudarchive_usertool -action useradd -id 11111 -passfile /opt/gravwell/etc/cloud.passwd"
The tool will prompt for the passphrase to use for the specified customer number. You can find your customer number on the License page of the Gravwell UI.
Indexers will authenticate to the cloud archive service using the customer license number on the indexer. In an overwatch configuration, this number may be different from the license number deployed on the webservers.
A default configuration will be installed at
/opt/gravwell/etc/cloudarchive_server.conf. This configuration stores archived shards on the local disk, in
/opt/gravwell/cloudarchive/. It listens for clients on port 8886, using the specified TLS cert/key pair for encryption. The
Password-File parameter points at the password database set up earlier.
The following config archives incoming data shards to an FTP server instead of the local disk. Note the specification of the
FTP-Password fields should be for a valid account on that FTP server. The
Storage-Directory parameter is still required; this directory will be used as temporary storage for archive operations.
If you don’t want to set up certificates for TLS, you can put the server into plaintext mode by setting
Disable-TLS=true. Be aware that this is horribly insecure and a terrible idea unless your Cloud Archive server and your indexers are on the same trusted network!
Configure your Gravwell indexers as above, setting the
Cloud-Archive stanza to point at your server. The
Archive-Shared-Secret value should match the password you entered when running gravwell_cloudarchive_usertool.
If you disabled TLS on the server, set
Insecure-Disable-TLS=true in the
Cloud-Archive stanza. If you are using self-signed certs, set