Software Development

Git Internals: How Understanding the Object Model Changed the Way I Use Git

Early in my career I accidentally deleted what I thought was a week of work. Understanding how Git stores data is what let me recover it — and it completely changed how I think about every rebase, merge, and reset since.

Norehan Norrizan
··12 min read

About three months into my first engineering job, I ran git reset --hard HEAD~3 on the wrong branch. I had intended to rewrite my own feature branch. I had accidentally rewritten a shared development branch that three people were actively using. My stomach dropped. I genuinely thought I had destroyed work.

A senior engineer on the team walked over, ran two commands, and recovered everything in about ninety seconds. Later that day, she explained why it worked — and that explanation, about how Git actually stores data, is the most useful thing anyone has ever taught me about version control.

Git Stores Content, Not Diffs

The most important thing to understand about Git is that it does not store file changes. It stores complete snapshots. Every commit contains a full copy of every file in your repository — or rather, a pointer to every file. This is counterintuitive if you think of version control as "tracking changes", but it is the reason Git is so reliable.

Git is, at its foundation, a content-addressable filesystem. Every object in Git is stored as a binary blob and named by the SHA-1 hash of its contents. The same content always produces the same hash. Different content always produces a different hash. This gives Git a natural integrity guarantee: if the content of any object changes, its name (hash) changes, and all the objects that reference it become invalid. You cannot silently corrupt a Git repository.

The Four Object Types

Blob

A blob stores the raw bytes of a single file's contents — no filename, no permissions, no metadata. Just the bytes. The same file content across two different commits is stored as a single blob. This is why Git repositories are often smaller than people expect: identical files are deduplicated automatically.

Tree

A tree represents a directory. It is a list of entries: each entry has a mode (permissions), a type (blob or tree), a hash, and a name. Trees reference blobs (files) and other trees (subdirectories). The root tree of a commit is a complete, point-in-time snapshot of the entire repository.

$ git cat-file -p HEAD^{tree}
100644 blob 8ab686...  README.md
100644 blob f3c214...  package.json
040000 tree a1b2c3...  src
040000 tree d4e5f6...  tests

Commit

A commit is surprisingly small. It contains: a pointer to the root tree (the snapshot), a pointer to its parent commit (or parents if it is a merge), author/committer metadata, and the message. That is it. The full content of the repository is reachable by traversing the tree from the root tree pointer.

$ git cat-file -p HEAD
tree a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
parent 1234567890abcdef1234567890abcdef12345678
author Norehan Norrizan <hello@norehan.dev> 1735000000 +0800
committer Norehan Norrizan <hello@norehan.dev> 1735000000 +0800

Fix authentication bug in session handler

Tag

An annotated tag is an object with a pointer to a commit, tagger metadata, and a message. Lightweight tags (created with git tag v1.0) are just references — plain files containing a commit hash. Annotated tags are immutable objects and can be signed. For release management, always use annotated tags.

Branches Are Just Pointers — This Changes Everything

A branch in Git is a file in .git/refs/heads/ containing exactly one thing: a 40-character commit hash. When you commit, Git creates a new commit object and updates the branch file to point at it. That is the entire operation.

$ cat .git/refs/heads/main
bf5b9421c37d60d4e83ab12f456def8901234abc

This means branching in Git is O(1) — it creates a file. Merging in Git is fundamentally about creating a new commit with two parent pointers. Rebasing is about creating new commit objects (with updated parent pointers) that replay the same changes. None of these operations "move files around". They just create new objects and update pointers.

Understanding this removes the fear from the operations that scare most Git users. A rebase does not destroy your work — it creates new commit objects. The old ones still exist until garbage collection removes them (typically after 90 days). A reset does not delete commits — it moves the branch pointer. The commits are still there, accessible through the reflog.

The Reflog: Why I Could Recover That Mistake

The reflog is Git's safety net. Every time any reference (branch, HEAD) is updated, Git records the old value in the reflog. After my git reset --hard disaster, the recovery was:

# Find the commit that HEAD pointed to before the reset
$ git reflog
abc1234 HEAD@{0}: reset: moving to HEAD~3
def5678 HEAD@{1}: commit: Add user authentication
ghi9012 HEAD@{2}: commit: Fix session handling

# Recover by creating a new branch at the old position
$ git checkout -b recovery-branch def5678

The work was never gone. The reset had moved the branch pointer back by three commits. The commits themselves were still in the object store, and the reflog told me exactly where they were. This is why the "I deleted my work" catastrophe is almost always recoverable if you catch it within 90 days.

Merging vs Rebasing: What Actually Happens

A merge creates a new commit with two parents — the tip of each branch being merged. The commit graph gains a diamond shape. History is preserved exactly as it happened.

A rebase takes the commits from one branch and replays them on top of another. Each replayed commit gets a new SHA-1 hash (because its parent is different, even if the content diff is identical). The resulting history is linear — as if you had branched from the tip of the target branch all along.

Both are valid tools. I use merge for integrating long-lived branches (feature branches into main) where I want to preserve exactly when the integration happened. I use rebase for cleaning up local commits before sharing them — squashing "fix typo" commits, reordering logical units. The rule I never violate: never rebase commits that have been pushed to a remote branch others are using. Rebase rewrites hashes. Anyone who based work on the old hashes will have a painful conflict.

Further Reading

  • Pro Git: Git Internals — the authoritative free book, chapters 10 onward cover the object model in depth
  • GitHub Blog: Commits are snapshots, not diffs — a clear visual explanation of the snapshot model
  • git cat-file -p HEAD — run this in any repository and explore what you find. Fifteen minutes of hands-on exploration teaches more than an hour of reading.

Filed under

Gitversion controlinternalstooling