Table of Contents
Git prune, a child of git gc, maintains a repository by clearing unreachable refs. The refs to delete are also referred to as orphaned objects.
Failure to understand the ins and outs of the command could prevent you from using it appropriately. This guide explains the internals of git prune. You will find it easy to identify when to use the prune command or ignore it by the end of the tutorial.
Here is a quick overview of how to apply the command.
git prune command cheat sheet
You can use git prune with several options. For instance, use the --dry-run
and --verbose
options
git prune --dry-run --verbose
to can check for objects to remove.
Use the --expire
and --verbose
options
git prune --expire=<time> --verbose
to prune objects expiring from the reflog at the specified time.
View the extent of optimization with the --progress
option.
git prune --progress
Additionally, you can remove unused objects or those borrowing objects from your repository with the git prune and the rev-parse commands.
git prune $(cd .. /<another folder>, && git rev-parse --all)
Apart from removing unreachable objects, you can apply git prune as an option in fetch and remote commands to discard outdated branches and refs to the remotes.
git fetch --prune git remote prune <remote name> git fetch --prune --prune-tags
It would be best to deeply understand the concept of orphaned objects before applying git prune.
What are unreachable objects in git?
Unreachable objects lack parents. Understand git internals to picture how an object can be unreachable.
The arrangements in the .git subdirectory
Git tracks changes using three object types: blob, tree and commit.A blob contains file contents. The tree object references blobs and other trees. Lastly, a commit object references a tree object.
In the .git
subdirectory, the object's directory stores three objects' SHA1s and other metadata. Refs is a convention for referring to branches and tags. Inside the refs
folder are the heads and tags. heads
references branches. A branch is a named reference to a commit.
How git staging and committing affect objects
On staging a file, two things happen. First, git creates a blob to store the new file's contents. Secondly, git makes a tree object to reference the blob object. A similar scenario occurs on modifying and staging a file.
On committing a file, git creates a commit object to reference the tree object. The commit object has comprehensive information about the changes. For instance, commit SHA1, the author, commit message, and timestamp.
Git attaches the HEAD file with information about the latest commit location at the commit level. The HEAD then tracks the tip of the latest commit per branch.
The key takeaway, here, is that two objects (blob and tree) build a commit object, which branches or tags reference. Git knows the active branch by checking the HEAD file in the .git
subdirectory.
How to orphan a git object
Since committing changes creates a tree with nodes referencing parents and children, a new commit gets attached to a former one. Simply put, the former commit becomes the parent to the current one.
The tree continues growing until you reset commits, leaving a commit without a reference to another. Such a commit is detached and inaccessible through git checkout, git reflog, or git cherry-pick.
Why you should git prune orphaned objects
The unreachable objects interfere with dynamic memory allocation because they occupy the disk. So, we introduce housekeeping tools such as git gc and its children like git prune, git pack, and git repack to free up the unneeded objects' space.
What is git gc?
Garbage collection is a trait git derived from dynamic programming languages that optimize performance by removing unused objects or compressing massive files.
Git does garbage collection automatically when you commit, merge, or pull changes. Since resets do not have arrangements for automatic garbage collection, you can use the git gc command to maintain your repository after git reset.
Git gc removes any mess left in the current directory after orphaning commits. It compresses the objects into a pack file. The command comes with many options, letting you control the target extent of optimization. The most typical options are:
git gc --auto
which checks if there is any optimization needed before acting.
git gc --aggressive
takes a long time clearing orphaned objects, ensuring more disk space gets saved.
Before applying the gc command, can check the total number of objects using the disk
git count-objects -v
and unreachable ones.
git fsck --unreachable
You may notice two new files (.pack
and .idx
) introduced in the .git/objects/pack
directory after running the gc command. The .pack
file is where the objects get compressed, whereas the .idx
file holds information about the compressed objects.
Why git prune may not work for you and the solution
Git prune may not work as expected. The reason being git reflog could be holding data about the detached objects before their expiration dates. Tip: The reflog command automatically dumps the discarded commits after 90 days.
So, what is the solution?
First, you can force git reflog to discard the unused objects as soon as now
git reflog expire --expire=now --expire-unreachable=now --all
before running the garbage collection command on the unreachable objects. Secondly, you should then run the prune command with the sped-up expiration date and verbose as the second option.
git prune --expire=now --verbose
But there is a catch.
Forcefully clearing the reflog is a dangerous operation on a shared repository. Additionally, the unpredictable process of applying the command calls for prioritizing git gc in housekeeping situations.
Nevertheless, you can still use git prune effectively, as you are about to see in the practice section.
Lab setup to practice git prune
I am creating a repo on GitHub repo called git_prune
.
I copy its URL, clone it on my terminal, and then navigate it.
Create more three commits as follows.
Commit one
echo "File 1" > file1.txt git stage file1.txt git commit -m "Add file 1"
Commit two
echo "File 2" > file2.txt git stage file2.txt git commit -m "Add file 2"
Commit three
echo "File 3" > file3.txt git stage file3.txt git commit -m "Add file 3"
We have four commits.
git log --pretty=oneline
Now we are ready to see a standard way to apply git prune independently.
A typical scenario to apply the git prune command
Let's inspect the available objects using the git log command,
git log --pretty=oneline
the find method,
find .git/objects -type f
and the cat-file command.
git cat-file --batch-check --batch-all-objects
We have four commits, blobs and trees, respectively: 12 objects in total.
Let's detach the three commits we added in the setup section and recheck history.
git reset --hard HEAD~3 git log --pretty=oneline
We may think the hard reset discarded the objects from the repository. However, git still references objects in the reflog. So, let's clean the reflog.
git reflog expire --expire=now --expire-unreachable=now --all
We still have 12 objects in the .git/objects
folder
git cat-file --batch-check --batch-all-objects
Check the objects to git prune.
git prune --dry-run --verbose git fsck --unreachable
Prune the 9 out 12 objects.
git prune --expire=now --verbose
Then recheck the objects, confirming if there is anything to prune.
git cat-file --batch-check --batch-all-objects git prune --dry-run --verbose git fsck --unreachable
After git pruning 9 out 12 objects, we have 3 objects in the .git/objects
folder with no unreachable objects.
We can then wrap this tutorial by pushing the changes.
git push
Conclusion
You have just learned the roots of the git prune command and how to use it independently. Go ahead and safely apply it as recommended in this tutorial.