git Large File Storage (LFS)

Git LFS

git itself is suited for text files (code, for instance .py files) but is not well suited to store large files, such as large csv, database, images, videos or binary files.

To fix git’s shortcomings in that respect, git lfs was created so that large files (database, images or binary files) can be tracked and versioned in a git repository by their hashes (a unique small identifier of the large file), with the content of the file itself hosted outside of the git repository.

Github provides a git lfs service (with a 1GB free quota) so that the integration is seamless. This can be used for data backup.

See the installation instructions and getting started guide at https://git-lfs.com/

After tracking files with git lfs, the files tracked can be seen in a .gitattributes file inside the reposiutory. The .gitattributes file must be added to the git repository.

Example from lecture

git lfs track rates.sqlite
$ git commit -m "Backup one large file using LFS"
[main 887623e] Backup one large file using LFS
 1 file changed, 3 insertions(+)
 create mode 100644 rates.sqlite

When performing git push, notice the first line indicating that LFS objects are being uploaded:

$ git push

Uploading LFS objects: 100% (1/1), 8.2 KB | 0 B/s, done.                                                            
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 414 bytes | 414.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To github.com:stats-at-Rutgers/march-25-lecture-APIs.git

After this, we have

.gitattributes
rates.sqlite filter=lfs diff=lfs merge=lfs -text
rates_backup.xlsx filter=lfs diff=lfs merge=lfs -text

where we can see the files tracked by git lfs to use git LFS.