Skip to content

Git Internals — Objects, Refs & .git

DodaTech Updated 2026-06-24 8 min read

In this tutorial, you'll learn about Git Internals. We cover key concepts, practical examples, and best practices.

Git internals are the four object types — blob, tree, commit, and tag — stored in the .git/objects directory, referenced by SHA-1 hashes, and tracked through branches and tags.

In this tutorial, you'll explore Git internals — how Git stores data as content-addressable objects, how references work, what the .git directory contains, and how basic Git operations work at the plumbing level. Understanding internals lets you recover from disasters, build Git tools, write custom hooks, and truly understand what Git commands do. By the end, you'll navigate the .git directory manually.

Real-world use: Doda Tech's custom CI tools use Git plumbing commands directly for efficient repository analysis. Understanding internals helps DodaZIP's team recover from corrupted repositories and build custom automation scripts.

flowchart TD
  A[.git/ directory] --> B[objects/]
  A --> C[refs/]
  A --> D[HEAD]
  A --> E[index]
  B --> F[Blob: file content]
  B --> G[Tree: directory listing]
  B --> H[Commit: snapshot metadata]
  B --> I[Tag: annotated reference]
  C --> J[heads/ - branches]
  C --> K[tags/ - tags]
  C --> L[remotes/ - remote refs]
  D --> M[Current branch pointer]
  D --> N[Detached HEAD commit]

The .git Directory Structure

Every Git repository has a .git directory at its root.

$ ls -la .git/
drwxr-xr-x    HEAD              # Current branch reference
drwxr-xr-x    config            # Repository configuration
drwxr-xr-x    description       # Repository description
drwxr-xr-x    hooks/            # Client/server hooks
-rw-r--r--    index             # Staging area (binary)
drwxr-xr-x    info/             # Additional metadata
drwxr-xr-x    logs/             # Reflog history
drwxr-xr-x    objects/          # All data (blobs, trees, commits)
drwxr-xr-x    refs/             # Pointers (branches, tags, remotes)

Git Object Model

Git stores everything as objects identified by a SHA-1 hash.

# Initialize a repo and create a file
mkdir git-internals-demo && cd git-internals-demo
git init
echo "Hello, Git Internals" > hello.txt
git add hello.txt

# Find the object hash
$ git hash-object hello.txt
3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0

# Where the object is stored
$ ls .git/objects/3b/
18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0

The hash is the SHA-1 of the content. The first two characters become the directory name.

Blob Objects

Blobs store file content. A blob has no name or metadata — just content.

# Inspect a blob object
$ git cat-file -p 3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0
Hello, Git Internals

# Check the object type
$ git cat-file -t 3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0
blob

# Check the object size
$ git cat-file -s 3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0
21

Tree Objects

Trees represent directories. They map filenames to blob or subtree hashes.

# Commit the file to create a tree
git commit -m "Add hello.txt"

# Find the tree hash (part of the commit)
$ git cat-file -p HEAD
tree 4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3
author User <user@email.com> 1719240000 +0000
committer User <user@email.com> 1719240000 +0000

Add hello.txt

# Inspect the tree
$ git cat-file -p 4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3
100644 blob 3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0    hello.txt

# Tree format: [mode] [type] [hash]    [filename]

Expected output: The tree shows one entry — hello.txt with blob hash 3b18e5..., mode 100644 (regular file), proving Git stores directory structure separately from file content.

Commit Objects

Commits are snapshots. They point to a tree and one or more parents.

# Inspect the commit object
$ git cat-file -p HEAD
tree 4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3
author Jane Doe <jane@example.com> 1719240000 +0000
committer Jane Doe <jane@example.com> 1719240000 +0000

Add hello.txt

# Create a second commit
echo "Line 2" >> hello.txt
git add hello.txt
git commit -m "Add second line"

# Commit has parent pointer
$ git cat-file -p HEAD
tree 5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3a4
parent a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b
author Jane Doe <jane@example.com> 1719240100 +0000
committer Jane Doe <jane@example.com> 1719240100 +0000

Add second line

A commit contains: a tree hash (snapshot), parent commit(s) (history), author/committer metadata, and the commit message.

Tag Objects

Annotated tags point to a specific commit with metadata.

# Create an annotated tag
git tag -a v1.0 -m "Release version 1.0"

# Inspect the tag object
$ git cat-file -p refs/tags/v1.0
object a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b
type commit
tag v1.0
tagger Jane Doe <jane@example.com> 1719240200 +0000

Release version 1.0

References

References (refs) are pointers stored as files in .git/refs/.

# Branch refs point to commits
$ cat .git/refs/heads/main
a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b

# HEAD tells us which branch we're on
$ cat .git/HEAD
ref: refs/heads/main

# Tag refs point to tag objects (or commits for lightweight tags)
$ cat .git/refs/tags/v1.0
b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c

# Remote refs track what was last fetched
$ cat .git/refs/remotes/origin/main
c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d

The Index (Staging Area)

The index (.git/index) is a binary file that tracks what will go into the next commit.

# Inspect the index with plumbing commands
$ git ls-files --stage
100644 3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0 0       hello.txt

# The index stores: mode, hash, stage number, and path
# Stage 0 means unconflicted. Stages 1-3 are during merge conflicts.

Object Storage and Compression

Git compresses objects using zlib and stores them individually or in packfiles.

# Check object count and size
$ git count-objects -v
count: 12
size: 4
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0

# After more commits, Git packs objects for efficiency
$ git gc
$ git count-objects -v
count: 0
size: 0
in-pack: 12
packs: 1
size-pack: 3  # Smaller total size after packing!

Plumbing Commands

Git's plumbing commands give you direct access to the object database.

# Hash a file without adding it
echo "test content" | git hash-object --stdin
# 3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0

# Write a blob directly
$ git hash-object -w hello.txt
3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0

# Read a tree from a commit
$ git rev-parse HEAD^{tree}
4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3

# List all reachable objects
$ git rev-list --objects --all
3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0 hello.txt
4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3
a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b

# Show the diff of staged changes
$ git diff --cached

# Find the commit that introduced a blob
$ git log --all --find-object=3b18e5a5e6a4e3f2b1c0a9b8c7d6e5f4a3b2c1d0
commit a1b2c3d... (HEAD -> main)
Author: Jane Doe <jane@example.com>
Date:   Mon Jun 24 10:00:00 2026 +0000
    Add hello.txt

Object Graph Traversal

Understand how Git traverses the object graph for operations like log and diff.

// Pseudocode: how git log works internally
function gitLog(commitHash) {
  const commit = readObject(commitHash, 'commit');
  printCommit(commit);

  if (commit.parent) {
    gitLog(commit.parent); // Follow parent chain
  }
}

// How git diff works
function gitDiff(commitHash) {
  const commit = readObject(commitHash, 'commit');
  const parentCommit = commit.parent
    ? readObject(commit.parent, 'commit')
    : null;
  const parentTree = parentCommit
    ? readObject(parentCommit.tree, 'tree')
    : emptyTree();
  const currentTree = readObject(commit.tree, 'tree');

  return diffTrees(parentTree, currentTree);
}

Common Errors

  1. fatal: Not a valid object name — The hash or reference doesn't exist. Check your typing. Git hashes are 40-character hex strings. Use git rev-parse to verify references.
  2. object file is empty — A .git/objects file is corrupted. Try git fsck to find corrupted objects and restore from a remote or backup.
  3. dangling blob or dangling commit — These are objects not referenced by any branch or tag. They're normal garbage waiting for git gc. Not an error.
  4. fatal: bad object HEAD — The HEAD file is corrupted or points to a non-existent reference. Check .git/HEAD content and compare with .git/refs/heads/.
  5. refs/heads/main: file exists when switching branches — The ref file exists but has incorrect content. Use git symbolic-ref to fix HEAD.

Practice Questions

What are the four types of Git objects?

Blob (file content), Tree (directory listing), Commit (snapshot with metadata), and Tag (annotated reference). All are identified by SHA-1 hashes and stored in .git/objects/.

How does Git store file content vs file names?

File content is stored as a blob object (content only, no name). File names are stored in tree objects, which map names to blob hashes. This means renaming a file creates a new tree but reuses the same blob — no duplicate storage.

What is the difference between a branch and a tag at the storage level?

Both are files in .git/refs/. A branch (refs/heads/main) is a pointer that moves with new commits. A lightweight tag (refs/tags/v1.0) is a pointer that never changes. An annotated tag is a Git object that points to a commit with additional metadata.

What happens during git gc?

git gc (garbage collection) compresses loose objects into packfiles, removes dangling objects older than 2 weeks, and optimizes the object store. Packfiles store multiple objects efficiently using delta compression (storing only the differences between file versions).

How does Git compute a commit hash?

The commit hash is SHA-1 of: the tree hash, parent commit hash(es), author/committer name, email, timestamp, timezone offset, and the commit message. The same content always produces the same hash — this is how Git detects tampering

Challenge

Create a repository entirely using plumbing commands (no git add or git commit). Create a blob from a string, create a tree pointing to that blob, create a commit pointing to that tree, and update HEAD to point to the commit. Verify with git log and git show. Then create a second commit that modifies the file and inspect the packfile.

Real-World Task

The DodaZIP team suspects a corrupted object in their repository. Use git fsck to verify object integrity. Locate any dangling or corrupted objects. If corrupted, find the correct object from a team member's clone or from a backup. Use git cat-file to inspect objects and git replace to fix corrupted commits. Document the recovery process for the team's disaster recovery playbook.


Previous: Git Basics | Related: Git Reflog | Related: Git Clean & GC

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro