.4 Git Tools - Rewriting History
Many times, when working with Git, you may want to revise your commit history for some reason. One of the great things about Git is that it allows you to make decisions at the last possible moment. You can decide what files go into which commits right before you commit with the staging area, you can decide that you didn’t mean to be working on something yet with the stash command, and you can rewrite commits that already happened so they look like they happened in a different way. This can involve changing the order of the commits, changing messages or modifying files in a commit, squashing together or splitting apart commits, or removing commits entirely — all before you share your work with others.
In this section, you’ll cover how to accomplish these very useful tasks so that you can make your commit history look the way you want before you share it with others.
Changing your last commit is probably the most common rewriting of history that you’ll do. You’ll often want to do two basic things to your last commit: change the commit message, or change the snapshot you just recorded by adding, changing and removing files.
If you only want to modify your last commit message, it’s very simple:
$ git commit --amend
That drops you into your text editor, which has your last commit message in it, ready for you to modify the message. When you save and close the editor, the editor writes a new commit containing that message and makes it your new last commit.
If you’ve committed and then you want to change the snapshot you committed by adding or changing files, possibly because you forgot to add a newly created file when you originally committed, the process works basically the same way. You stage the changes you want by editing a file and running
git add on it or
git rm to a tracked file, and the subsequent
git commit --amend takes your current staging area and makes it the snapshot for the new commit.
You need to be careful with this technique because amending changes the SHA-1 of the commit. It’s like a very small rebase — don’t amend your last commit if you’ve already pushed it.
To modify a commit that is farther back in your history, you must move to more complex tools. Git doesn’t have a modify-history tool, but you can use the rebase tool to rebase a series of commits onto the HEAD they were originally based on instead of moving them to another one. With the interactive rebase tool, you can then stop after each commit you want to modify and change the message, add files, or do whatever you wish. You can run rebase interactively by adding the
-i option to
git rebase. You must indicate how far back you want to rewrite commits by telling the command which commit to rebase onto.
For example, if you want to change the last three commit messages, or any of the commit messages in that group, you supply as an argument to
git rebase -i the parent of the last commit you want to edit, which is
HEAD~3. It may be easier to remember the
~3 because you’re trying to edit the last three commits; but keep in mind that you’re actually designating four commits ago, the parent of the last commit you want to edit:
$ git rebase -i HEAD~3
Remember again that this is a rebasing command — every commit included in the range
HEAD~3..HEAD will be rewritten, whether you change the message or not. Don’t include any commit you’ve already pushed to a central server — doing so will confuse other developers by providing an alternate version of the same change.
Running this command gives you a list of commits in your text editor that looks something like this:
pick f7f3f6d changed my name a bit pick 310154e updated README formatting and added blame pick a5f4a0d added cat-file # Rebase 710f0f8..a5f4a0d onto 710f0f8 # # Commands: # p, pick = use commit # r, reword = use commit, but edit the commit message # e, edit = use commit, but stop for amending # s, squash = use commit, but meld into previous commit # f, fixup = like "squash", but discard this commit's log message # x, exec = run command (the rest of the line) using shell # # These lines can be re-ordered; they are executed from top to bottom. # # If you remove a line here THAT COMMIT WILL BE LOST. # # However, if you remove everything, the rebase will be aborted. # # Note that empty commits are commented out
It’s important to note that these commits are listed in the opposite order than you normally see them using the
log command. If you run a
log, you see something like this:
$ git log --pretty=format:"%h %s" HEAD~3..HEAD a5f4a0d added cat-file 310154e updated README formatting and added blame f7f3f6d changed my name a bit
Notice the reverse order. The interactive rebase gives you a script that it’s going to run. It will start at the commit you specify on the command line (
HEAD~3) and replay the changes introduced in each of these commits from top to bottom. It lists the oldest at the top, rather than the newest, because that’s the first one it will replay.
You need to edit the script so that it stops at the commit you want to edit. To do so, change the word pick to the word edit for each of the commits you want the script to stop after. For example, to modify only the third commit message, you change the file to look like this:
edit f7f3f6d changed my name a bit pick 310154e updated README formatting and added blame pick a5f4a0d added cat-file
When you save and exit the editor, Git rewinds you back to the last commit in that list and drops you on the command line with the following message:
$ git rebase -i HEAD~3 Stopped at 7482e0d... updated the gemspec to hopefully work better You can amend the commit now, with git commit --amend Once you’re satisfied with your changes, run git rebase --continue
These instructions tell you exactly what to do. Type
$ git commit --amend
Change the commit message, and exit the editor. Then, run
$ git rebase --continue
This command will apply the other two commits automatically, and then you’re done. If you change pick to edit on more lines, you can repeat these steps for each commit you change to edit. Each time, Git will stop, let you amend the commit, and continue when you’re finished.
You can also use interactive rebases to reorder or remove commits entirely. If you want to remove the "added cat-file" commit and change the order in which the other two commits are introduced, you can change the rebase script from this
pick f7f3f6d changed my name a bit pick 310154e updated README formatting and added blame pick a5f4a0d added cat-file
pick 310154e updated README formatting and added blame pick f7f3f6d changed my name a bit
When you save and exit the editor, Git rewinds your branch to the parent of these commits, applies
310154e and then
f7f3f6d, and then stops. You effectively change the order of those commits and remove the "added cat-file" commit completely.
It’s also possible to take a series of commits and squash them down into a single commit with the interactive rebasing tool. The script puts helpful instructions in the rebase message:
# # Commands: # p, pick = use commit # r, reword = use commit, but edit the commit message # e, edit = use commit, but stop for amending # s, squash = use commit, but meld into previous commit # f, fixup = like "squash", but discard this commit's log message # x, exec = run command (the rest of the line) using shell # # These lines can be re-ordered; they are executed from top to bottom. # # If you remove a line here THAT COMMIT WILL BE LOST. # # However, if you remove everything, the rebase will be aborted. # # Note that empty commits are commented out
If, instead of "pick" or "edit", you specify "squash", Git applies both that change and the change directly before it and makes you merge the commit messages together. So, if you want to make a single commit from these three commits, you make the script look like this:
pick f7f3f6d changed my name a bit squash 310154e updated README formatting and added blame squash a5f4a0d added cat-file
When you save and exit the editor, Git applies all three changes and then puts you back into the editor to merge the three commit messages:
# This is a combination of 3 commits. # The first commit's message is: changed my name a bit # This is the 2nd commit message: updated README formatting and added blame # This is the 3rd commit message: added cat-file
When you save that, you have a single commit that introduces the changes of all three previous commits.
Splitting a commit undoes a commit and then partially stages and commits as many times as commits you want to end up with. For example, suppose you want to split the middle commit of your three commits. Instead of "updated README formatting and added blame", you want to split it into two commits: "updated README formatting" for the first, and "added blame" for the second. You can do that in the
rebase -i script by changing the instruction on the commit you want to split to "edit":
pick f7f3f6d changed my name a bit edit 310154e updated README formatting and added blame pick a5f4a0d added cat-file
When you save and exit the editor, Git rewinds to the parent of the first commit in your list, applies the first commit (
f7f3f6d), applies the second (
310154e), and drops you to the console. There, you can do a mixed reset of that commit with
git reset HEAD^, which effectively undoes that commit and leaves the modified files unstaged. Now you can take the changes that have been reset, and create multiple commits out of them. Simply stage and commit files until you have several commits, and run
git rebase --continue when you’re done:
$ git reset HEAD^ $ git add README $ git commit -m 'updated README formatting' $ git add lib/simplegit.rb $ git commit -m 'added blame' $ git rebase --continue
Git applies the last commit (
a5f4a0d) in the script, and your history looks like this:
$ git log -4 --pretty=format:"%h %s" 1c002dd added cat-file 9b29157 added blame 35cfb2b updated README formatting f3cc40e changed my name a bit
Once again, this changes the SHAs of all the commits in your list, so make sure no commit shows up in that list that you’ve already pushed to a shared repository.
There is another history-rewriting option that you can use if you need to rewrite a larger number of commits in some scriptable way — for instance, changing your e-mail address globally or removing a file from every commit. The command is
filter-branch, and it can rewrite huge swaths of your history, so you probably shouldn’t use it unless your project isn’t yet public and other people haven’t based work off the commits you’re about to rewrite. However, it can be very useful. You’ll learn a few of the common uses so you can get an idea of some of the things it’s capable of.
Removing a File from Every Commit
This occurs fairly commonly. Someone accidentally commits a huge binary file with a thoughtless
git add ., and you want to remove it everywhere. Perhaps you accidentally committed a file that contained a password, and you want to make your project open source.
filter-branch is the tool you probably want to use to scrub your entire history. To remove a file named passwords.txt from your entire history, you can use the
--tree-filter option to
$ git filter-branch --tree-filter 'rm -f passwords.txt' HEAD Rewrite 6b9b3cf04e7c5686a9cb838c3f36a8cb6a0fc2bd (21/21) Ref 'refs/heads/master' was rewritten
--tree-filter option runs the specified command after each checkout of the project and then recommits the results. In this case, you remove a file called passwords.txt from every snapshot, whether it exists or not. If you want to remove all accidentally committed editor backup files, you can run something like
git filter-branch --tree-filter "rm -f *~" HEAD.
You’ll be able to watch Git rewriting trees and commits and then move the branch pointer at the end. It’s generally a good idea to do this in a testing branch and then hard-reset your master branch after you’ve determined the outcome is what you really want. To run
filter-branch on all your branches, you can pass
--all to the command.
Making a Subdirectory the New Root
Suppose you’ve done an import from another source control system and have subdirectories that make no sense (trunk, tags, and so on). If you want to make the
trunk subdirectory be the new project root for every commit,
filter-branch can help you do that, too:
$ git filter-branch --subdirectory-filter trunk HEAD Rewrite 856f0bf61e41a27326cdae8f09fe708d679f596f (12/12) Ref 'refs/heads/master' was rewritten
Now your new project root is what was in the
trunk subdirectory each time. Git will also automatically remove commits that did not affect the subdirectory.
Changing E-Mail Addresses Globally
Another common case is that you forgot to run
git config to set your name and e-mail address before you started working, or perhaps you want to open-source a project at work and change all your work e-mail addresses to your personal address. In any case, you can change e-mail addresses in multiple commits in a batch with
filter-branch as well. You need to be careful to change only the e-mail addresses that are yours, so you use
$ git filter-branch --commit-filter ' if [ "$GIT_AUTHOR_EMAIL" = "schacon@localhost" ]; then GIT_AUTHOR_NAME="Scott Chacon"; GIT_AUTHOR_EMAIL="email@example.com"; git commit-tree "$@"; else git commit-tree "$@"; fi' HEAD
This goes through and rewrites every commit to have your new address. Because commits contain the SHA-1 values of their parents, this command changes every commit SHA in your history, not just those that have the matching e-mail address.
The Very Fast Nuclear Option: Big Friendly Giant Repo Cleaner (BFG)
Roberto Tyley has written a similar tool to
filter-branch called the BFG. BFG cannot do as much as
filter-branch, but it is very fast and on a large repository this can make a big difference. If the change you want to make is in the scope of BFG capability, and you have performance issues, then you should consider using it.
See the BFG website for details.