git Large File Storage (LFS)
Git LFS
git itself is suited for text files (code, for instance .py
files) but is not well suited to store large files, such as large csv
, database, images, videos or binary files.
To fix git’s shortcomings in that respect, git lfs
was created so that large files (database, images or binary files) can be tracked and versioned in a git repository by their hashes (a unique small identifier of the large file), with the content of the file itself hosted outside of the git repository.
Github provides a git lfs
service (with a 1GB free quota) so that the integration is seamless. This can be used for data backup.
See the installation instructions and getting started guide at https://git-lfs.com/
After tracking files with git lfs
, the files tracked can be seen in a .gitattributes
file inside the reposiutory. The .gitattributes
file must be added to the git repository.
Example from lecture
git lfs track rates.sqlite
$ git commit -m "Backup one large file using LFS"
[main 887623e] Backup one large file using LFS
1 file changed, 3 insertions(+) create mode 100644 rates.sqlite
When performing git push
, notice the first line indicating that LFS objects are being uploaded:
$ git push
Uploading LFS objects: 100% (1/1), 8.2 KB | 0 B/s, done.
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 414 bytes | 414.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object. To github.com:stats-at-Rutgers/march-25-lecture-APIs.git
After this, we have
.gitattributes
rates.sqlite filter=lfs diff=lfs merge=lfs -text rates_backup.xlsx filter=lfs diff=lfs merge=lfs -text
where we can see the files tracked by git lfs
to use git LFS.