The ins and outs of the git prune command explained


GIT

Author: Steve Alila
Reviewer: Deepak Prasad

Git prune, a child of git gc, maintains a repository by clearing unreachable refs. The refs to delete are also referred to as orphaned objects such as remnants of previous commits, files, or branches that are no longer referenced by any part of the repository. Removing these objects helps to clean up the repository and reduce its size by getting rid of unused data.

Failure to understand the ins and outs of the command could prevent you from using it appropriately. This guide explains the internals of git prune. You will find it easy to identify when to use the prune command or ignore it by the end of the tutorial.

Here is a quick overview of how to apply the command.

 

git prune command cheat sheet

Here is the basic syntax of how you would use git prune command with different arguments:

git prune [options]
  • --expire <time>: Only prune objects older than the specified time (e.g., --expire=2.weeks.ago).
  • --dry-run: Show what objects would be pruned without actually removing them.
  • --verbose: Show additional information about the pruning process.

Check which objects would be pruned without deleting them.

git prune --dry-run

You can combine --dry-run and --verbose options to have more information:

git prune --dry-run --verbose

Use the --expire option to prune objects that have been unreachable for a specified time period.

git prune --expire=<time> --verbose

For Example:

git prune --expire=2.weeks.ago

View the extent of optimization with the --progress option.

git prune --progress

Additionally, you can remove unused objects or those borrowing objects from your repository with the git prune and the rev-parse commands.

git prune $(cd .. /<another folder>, && git rev-parse --all)

Before pruning, you might want to identify which objects are unreachable:

git fsck --unreachable

Example Workflow 1

# Identify Unreachable Objects
git fsck --unreachable

# Dry Run to See What Would Be Prune
git prune --dry-run

# Prune Unreachable Objects
git prune

Example Workflow 2

# Dry Run to Check
git prune --dry-run --expire=2.weeks.ago

# Prune Objects Older Than 2 Weeks
git prune --expire=2.weeks.ago

Apart from removing unreachable objects, you can apply git prune as an option in fetch and remote commands to discard outdated branches and refs to the remotes.

git fetch --prune
git remote prune <remote name>
git fetch --prune --prune-tags​

It would be best to deeply understand the concept of orphaned objects before applying git prune.

 

What are unreachable objects in git?

Unreachable objects in Git are objects that are no longer referenced by any commit, branch, or tag in the repository. These objects can include commits, blobs (file contents), and trees (directory listings). They become "orphaned" when operations such as rebasing, resetting, or deleting branches leave them without any references in the repository. Unreachable objects take up space and can slow down repository operations if not managed.

Understand git internals to picture how an object can be unreachable.

 

The arrangements in the .git subdirectory

Git tracks changes using three object types: blob, tree and commit.A blob contains file contents. The tree object references blobs and other trees. Lastly, a commit object references a tree object.

In the .git subdirectory, the object's directory stores three objects' SHA1s and other metadata. Refs is a convention for referring to branches and tags. Inside the refs folder are the heads and tags. heads references branches. A branch is a named reference to a commit.

 

How git staging and committing affect objects

On staging a file, two things happen. First, git creates a blob to store the new file's contents. Secondly, git makes a tree object to reference the blob object. A similar scenario occurs on modifying and staging a file.

On committing a file, git creates a commit object to reference the tree object. The commit object has comprehensive information about the changes. For instance, commit SHA1, the author, commit message, and timestamp.

Git attaches the HEAD file with information about the latest commit location at the commit level. The HEAD then tracks the tip of the latest commit per branch.

The key takeaway, here, is that two objects (blob and tree) build a commit object, which branches or tags reference. Git knows the active branch by checking the HEAD file in the .git subdirectory.

 

How to orphan a git object

Since committing changes creates a tree with nodes referencing parents and children, a new commit gets attached to a former one. Simply put, the former commit becomes the parent to the current one.

The tree continues growing until you reset commits, leaving a commit without a reference to another. Such a commit is detached and inaccessible through git checkout, git reflog, or git cherry-pick.

 

Why you should git prune orphaned objects

The unreachable objects interfere with dynamic memory allocation because they occupy the disk. So, we introduce housekeeping tools such as git gc and its children like git prune, git pack, and git repack to free up the unneeded objects' space.

 

What is git gc?

Garbage collection is a trait git derived from dynamic programming languages that optimize performance by removing unused objects or compressing massive files.

Git does garbage collection automatically when you commit, merge, or pull changes. Since resets do not have arrangements for automatic garbage collection, you can use the git gc command to maintain your repository after git reset.

Git gc removes any mess left in the current directory after orphaning commits. It compresses the objects into a pack file. The command comes with many options, letting you control the target extent of optimization. The most typical options are:

git gc --auto

which checks if there is any optimization needed before acting.

git gc --aggressive

takes a long time clearing orphaned objects, ensuring more disk space gets saved.

Before applying the gc command, can check the total number of objects using the disk

git count-objects -v

and unreachable ones.

git fsck --unreachable

You may notice two new files (.pack and .idx) introduced in the .git/objects/pack directory after running the gc command. The .pack file is where the objects get compressed, whereas the .idx file holds information about the compressed objects.

 

Why git prune may not work for you and the solution

Git prune may not work as expected. The reason being git reflog could be holding data about the detached objects before their expiration dates. Tip: The reflog command automatically dumps the discarded commits after 90 days.

So, what is the solution?

First, you can force git reflog to discard the unused objects as soon as now

git reflog expire --expire=now --expire-unreachable=now --all

before running the garbage collection command on the unreachable objects. Secondly, you should then run the prune command with the sped-up expiration date and verbose as the second option.

git prune --expire=now --verbose

But there is a catch.

Forcefully clearing the reflog is a dangerous operation on a shared repository. Additionally, the unpredictable process of applying the command calls for prioritizing git gc in housekeeping situations.

Nevertheless, you can still use git prune effectively, as you are about to see in the practice section.

 

A typical scenario to apply the git prune command

Let's inspect the available objects using the git log command,

git log --pretty=oneline

the find method,

find .git/objects -type f

and the cat-file command.

git cat-file --batch-check --batch-all-objects

We have four commits, blobs and trees, respectively: 12 objects in total.

The ins and outs of the git prune command explained

Let's detach the three commits we added in the setup section and recheck history.

git reset --hard HEAD~3
git log --pretty=oneline

We may think the hard reset discarded the objects from the repository. However, git still references objects in the reflog. So, let's clean the reflog.

git reflog expire --expire=now --expire-unreachable=now --all

We still have 12 objects in the .git/objects folder

git cat-file --batch-check --batch-all-objects

Check the objects to git prune.

git prune --dry-run --verbose
git fsck --unreachable
The ins and outs of the git prune command explained

Prune the 9 out 12 objects.

git prune --expire=now --verbose

Then recheck the objects, confirming if there is anything to prune.

git cat-file --batch-check --batch-all-objects​
git prune --dry-run --verbose
git fsck --unreachable

After git pruning 9 out 12 objects, we have 3 objects in the .git/objects folder with no unreachable objects.

after git prune

We can then wrap this tutorial by pushing the changes.

git push

 

Conclusion

You have just learned the roots of the git prune command and how to use it independently. Go ahead and safely apply it as recommended in this tutorial.

 

Steve Alila

Steve Alila

He specializes in web design, WordPress development, and data analysis, with proficiency in Python, JavaScript, and data extraction tools. Additionally, he excels in web API development, AI integration, and data presentation using Matplotlib and Plotly. You can connect with him on his LinkedIn profile.

Can't find what you're searching for? Let us assist you.

Enter your query below, and we'll provide instant results tailored to your needs.

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can send mail to admin@golinuxcloud.com

Thank You for your support!!

Leave a Comment