Backblaze B2 Rsync



Rsync

  • To Keep in Mind
  • The sync command
  • The check command
  • Appendix

Recap

  • A fork of attic, which was known as the holy grail of backups. It supports compression, block-based incremental and is open-source. However, the only remote backup it supports is SSH.
  • Backblaze B2 is part of the Cantemo Achive Framework. This means the Cantemo Portal can enable sophisticated rules to automatically archive or restore an asset or collection to/from Backblaze B2 Cloud Storage.

'rsync for cloud storage' - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files - Hsury/rclone.

The previous post detailed how Rclone can reliably upload large files with their checksums to Backblaze unlike other programs. This post will outline the workflow and some gotchas to keep in mind when doing massive data loads over the internet.

With trial and error, I was able to archive 8 TB of footage from my Synology NAS to Backblaze B2 in about a month.

To Keep in Mind

First, the overall workflow.

Remote to Remote is Possible

Keep in mind Rsync supports copying between two remotes directly. The computer running Rclone will stream data in RAM as it shuttles data between the two.

In fact that’s what I mainly did: transferred assets from a personal B2 bucket to the organization’s new B2 bucket. Pretty neat!

List Folders Syntax: lsd

After setting up your remote with rclone config, use the list directory command lsd to double check your source/target folders.

For example, if the B2 remote name is called b2-remote1 then the command to list the root is:

Note the : at the end.

Musical fidelity sound cards & media devices driver download. If a folder contains spaces, you use double quotes like this rather than backticks .

Also use trailing forward slashes / instead of asterisks * to indicate the files inside.

Consider copy instead of sync

From the docs1:

  • rclone copy - Copy files from source to dest, skipping already copied.
  • rclone sync - Make source and dest identical, modifying destination only.

Depending on your intention, copy may be better.

Expect Errors and Verify

Although Rclone automatically retries upload errors (by default up to 10 times) there are few reasons why files never get uploaded. See the appendix for various scenarios.

Therfore, in a nutshell, always verify your transfer after (see below).

Beware Quota Restrictions

Unexpected EOF (end of file) errors can occur when streaming from a remote because of Backblaze quota restrictions.

Double Check the Source Supports (and has) Checksums

Backblaze B2 Rsync

Since Backblaze only supports SHA-1 checksums, the Rclone docs indicate the source must also support SHA-1 checksums.2

For a large file to be uploaded with an SHA1 checksum, the source needs to support SHA1 checksums. The local disk supports SHA1 checksums so large file transfers from local disk will have an SHA1. See the overview for exactly which remotes support SHA1.

So B2 to B2 syncs should always populate checksums, right? Wrong. It will only if the source B2 bucket had checksums.

As detailed in the previous post, that means if the large files were copied with Rclone would they have checksums.

Rclone Browser is Great (but Deprecated) for Local <-> Remote

Rclone Browser is a wrapper that the same config as the CLI. Rclone Browser does not support direct remote to remote syncs, but it is good for normal use. Unfortunately the program deprecated in favor of the WebGUI, but the latter doesn’t let you yet upload things. 🤷🏾‍♂️

On Mac, Rclone Browser can be installed with Homebrew via brew cask install rclone-browser

⬆︎ Reliability by ⬆︎ Chunk Size (using ⬆︎ RAM)

The default settings seem to be optimized for small files, like webpages.

  • Single part upload cutoff of 200 MB
  • Chunk size of 96 MB
  • Four concurrent transfers

For whatever reason, the error rate with these defaults was higher than I expected (see below).

Instead, I found better stability for large video files with:

  • Cutoff of 1G
  • 1G <= chunk size <=4G
  • Two concurrent transfers

Note that all concurrent chunks are buffered into memory, so there is significantly more RAM usage with larger chunk sizes. Hence the downgrade to two transfers.

More specifics in the sync section below.

Measure Twice, Cut Once: dryrun

Before discussing the sync command, it’s imperative mention the --dryrun flag for the following reasons.

  • Backblaze bills by usage/throughput
  • B2 doesn’t support renaming files after they are uploaded

Therefore, when running rclone sync always use the --dryrun option first.

The sync command

My goto sync (orcopy) command is:

rclone sync <source> <dest> --exclude .DS_Store -vv --b2-upload-cutoff 1G --b2-chunk-size 1G --transfers 2

Explanation of Flags

  • --exclude .DS_Store to excluding Mac specific files
  • -vv to enable DEBUG logging for visibility into chunk retries, etc.
  • --b2-upload-cutoff files above this size will switch to a multipart chunked transfer
  • --b2-chunk-size the size of the chunks, buffered in memory
  • --transfers number of simulatenous transfers. b2-chunk-size x transfers must fit in RAM

Phased Approach with --max-size

Sometimes I found it helpful to transfer all files under a certain size limit first, say 1 GB, and then re-run the command for larger files.

To do so, add --max-size 1G to the rclone sync command.

The check command

Always verify after a sync. Even if you think you don’t need to. The command is straightforward:

rclone check <source> <dest> --exclude .DS_Store

If there are discrepancies the output will look like:

Use error output to create diff file

By massaging the rclone check standard output into a new file with just the file names, it is possible to re-sync just these files. This saves us Backblaze read transactions on the files already copied.

Assuming a file mydiff.txt:

the sync command is:

Then, run rclone check again on all the files.

The cleanup command

If your buckets are created with default settings, the file lifecyle is set to Keep all versions.

To purge deleted files, use a similar syntax to the lsd command.

Also note that3:

Note that cleanup will remove partially uploaded files from the bucket if they are more than a day old.

Backblaze b2 vs rsync.net

Appendix

Backblaze B2 Vs Rsync.net

Performance Logs

The exact command I used at first was

and it completed, roughly 3 days later with a 5% error rate.

Instead, by using a chunk size 1G and two max transfers (total 2G in RAM at a time) transfers were noticeably more stable.

Upload cutoffs of “5G”

During my experiments, I once tried a 5G single-part cutoff: --b2-chunk-size 2G --b2-upload-cutoff 5G --max-size 5G. The docs state This value should be set no larger than 4.657GiB ( 5GB) however it threw this error.

Backblaze B2 Rsync

So apparently 5G is too high. 4G worked fine though.

500 Internal Server Error

Something is wrong with Backblaze, usually a transient problem. Rclone will retry, by default up to 10 times with built-in rate limiting (pacer) as shown with the incident a7691a3d7f71-e47fc872d7ba below.

References