The ins and outs of the git prune command explained

Git prune, a child of git gc, maintains a repository by clearing unreachable refs. The refs to delete are also referred to as orphaned objects.

Failure to understand the ins and outs of the command could prevent you from using it appropriately. This guide explains the internals of git prune. You will find it easy to identify when to use the prune command or ignore it by the end of the tutorial.

Advertisement

Here is a quick overview of how to apply the command.

 

git prune command cheat sheet

You can use git prune with several options. For instance, use the --dry-run and --verbose options

git prune --dry-run --verbose

to can check for objects to remove.

Use the --expire and --verbose options

git prune --expire=<time> --verbose

to prune objects expiring from the reflog at the specified time.

View the extent of optimization with the --progress option.

Advertisement
git prune --progress

Additionally, you can remove unused objects or those borrowing objects from your repository with the git prune and the rev-parse commands.

git prune $(cd .. /<another folder>, && git rev-parse --all)

Apart from removing unreachable objects, you can apply git prune as an option in fetch and remote commands to discard outdated branches and refs to the remotes.

git fetch --prune
git remote prune <remote name>
git fetch --prune --prune-tags​

It would be best to deeply understand the concept of orphaned objects before applying git prune.

 

What are unreachable objects in git?

Unreachable objects lack parents. Understand git internals to picture how an object can be unreachable.

 

The arrangements in the .git subdirectory

Git tracks changes using three object types: blob, tree and commit.A blob contains file contents. The tree object references blobs and other trees. Lastly, a commit object references a tree object.

In the .git subdirectory, the object's directory stores three objects' SHA1s and other metadata. Refs is a convention for referring to branches and tags. Inside the refs folder are the heads and tags. heads references branches. A branch is a named reference to a commit.

Advertisement

 

How git staging and committing affect objects

On staging a file, two things happen. First, git creates a blob to store the new file's contents. Secondly, git makes a tree object to reference the blob object. A similar scenario occurs on modifying and staging a file.

On committing a file, git creates a commit object to reference the tree object. The commit object has comprehensive information about the changes. For instance, commit SHA1, the author, commit message, and timestamp.

Git attaches the HEAD file with information about the latest commit location at the commit level. The HEAD then tracks the tip of the latest commit per branch.

The key takeaway, here, is that two objects (blob and tree) build a commit object, which branches or tags reference. Git knows the active branch by checking the HEAD file in the .git subdirectory.

 

How to orphan a git object

Since committing changes creates a tree with nodes referencing parents and children, a new commit gets attached to a former one. Simply put, the former commit becomes the parent to the current one.

The tree continues growing until you reset commits, leaving a commit without a reference to another. Such a commit is detached and inaccessible through git checkout, git reflog, or git cherry-pick.

Advertisement

 

Why you should git prune orphaned objects

The unreachable objects interfere with dynamic memory allocation because they occupy the disk. So, we introduce housekeeping tools such as git gc and its children like git prune, git pack, and git repack to free up the unneeded objects' space.

 

What is git gc?

Garbage collection is a trait git derived from dynamic programming languages that optimize performance by removing unused objects or compressing massive files.

Git does garbage collection automatically when you commit, merge, or pull changes. Since resets do not have arrangements for automatic garbage collection, you can use the git gc command to maintain your repository after git reset.

Git gc removes any mess left in the current directory after orphaning commits. It compresses the objects into a pack file. The command comes with many options, letting you control the target extent of optimization. The most typical options are:

git gc --auto

which checks if there is any optimization needed before acting.

git gc --aggressive

takes a long time clearing orphaned objects, ensuring more disk space gets saved.

Advertisement

Before applying the gc command, can check the total number of objects using the disk

git count-objects -v

and unreachable ones.

git fsck --unreachable

You may notice two new files (.pack and .idx) introduced in the .git/objects/pack directory after running the gc command. The .pack file is where the objects get compressed, whereas the .idx file holds information about the compressed objects.

 

Why git prune may not work for you and the solution

Git prune may not work as expected. The reason being git reflog could be holding data about the detached objects before their expiration dates. Tip: The reflog command automatically dumps the discarded commits after 90 days.

So, what is the solution?

First, you can force git reflog to discard the unused objects as soon as now

Advertisement
git reflog expire --expire=now --expire-unreachable=now --all

before running the garbage collection command on the unreachable objects. Secondly, you should then run the prune command with the sped-up expiration date and verbose as the second option.

git prune --expire=now --verbose

But there is a catch.

Forcefully clearing the reflog is a dangerous operation on a shared repository. Additionally, the unpredictable process of applying the command calls for prioritizing git gc in housekeeping situations.

Nevertheless, you can still use git prune effectively, as you are about to see in the practice section.

 

Lab setup to practice git prune

I am creating a repo on GitHub repo called git_prune.

The ins and outs of the git prune command explained

I copy its URL, clone it on my terminal, and then navigate it.
The ins and outs of the git prune command explained

Advertisement

Create more three commits as follows.

Commit one

echo "File 1" > file1.txt
git stage file1.txt
git commit -m "Add file 1"

Commit two

echo "File 2" > file2.txt
git stage file2.txt
git commit -m "Add file 2"

Commit three

echo "File 3" > file3.txt
git stage file3.txt
git commit -m "Add file 3"

We have four commits.

git log --pretty=oneline

The ins and outs of the git prune command explained

Now we are ready to see a standard way to apply git prune independently.

 

A typical scenario to apply the git prune command

Let's inspect the available objects using the git log command,

git log --pretty=oneline

the find method,

find .git/objects -type f

and the cat-file command.

git cat-file --batch-check --batch-all-objects

We have four commits, blobs and trees, respectively: 12 objects in total.

The ins and outs of the git prune command explained

Let's detach the three commits we added in the setup section and recheck history.

git reset --hard HEAD~3
git log --pretty=oneline

We may think the hard reset discarded the objects from the repository. However, git still references objects in the reflog. So, let's clean the reflog.

git reflog expire --expire=now --expire-unreachable=now --all

We still have 12 objects in the .git/objects folder

git cat-file --batch-check --batch-all-objects

Check the objects to git prune.

git prune --dry-run --verbose
git fsck --unreachable

The ins and outs of the git prune command explained

Prune the 9 out 12 objects.

git prune --expire=now --verbose

Then recheck the objects, confirming if there is anything to prune.

git cat-file --batch-check --batch-all-objects​
git prune --dry-run --verbose
git fsck --unreachable

After git pruning 9 out 12 objects, we have 3 objects in the .git/objects folder with no unreachable objects.

after git prune

We can then wrap this tutorial by pushing the changes.

git push

 

Conclusion

You have just learned the roots of the git prune command and how to use it independently. Go ahead and safely apply it as recommended in this tutorial.

 

Didn't find what you were looking for? Perform a quick search across GoLinuxCloud

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can either use the comments section or contact me form.

Thank You for your support!!

Leave a Comment

X