9/26/2013 - 9:59 PM

tabs to spaces

tabs to spaces

HEY: I've turned this into a blog post, which is a little more in depth.

Death to tabs, long live spaces

Do this

  1. Fix any inconsistent indentation in your existing files, or Python code will break, since it considers a tab to be 8 and we're about to make it 4.

  2. Populate .gitattributes in your repository, as below.

     *.py filter=spabs

    You may want more filetypes; just add more lines with different extensions.

    Optionally, commit it. DO NOT PUSH YET.

  3. Run expand manually on your entire repository. (TODO how to do this, and/or how to make git do it.) Commit. DO NOT PUSH YET.

  4. By hand of God, big scary emails, or perhaps by editing /etc/gitconfig on all your developers' machines, give the chunk of .gitconfig below to all of your contributors.

     [filter "spabs"]
         clean = expand --initial -t 4
         smudge = expand --initial -t 4
         renormalize = true
  5. Now you push.

Here's what will happen

Note that this will not keep tabs in the repository and spaces in a checkout or whatever other nonsense. This will convert tabs to spaces, permanently, period, everywhere.

  • Anyone checking the repository out will just get spaces, because that's what git's storing now. The filter will run all the time and replace any new tabs before they can be committed.

  • Anyone with an inflight branch will see tabs on that branch, because the .gitattributes file won't exist yet.

  • Anyone who merges an inflight branch with master will have their branch transparently renormalized before git tries to merge, thanks to merge.renormalize. After the merge, the branch will have spaces. Most likely the developer will never notice anything changed at all. (This also applies in the other direction: if other work happens on master while you're detabbing in a branch, you can merge master in seamlessly. Either way, .gitattributes ends up in the merged result, and that's what Git uses.)

  • Anyone who rebases an inflight branch is totally fucked, because merge.renormalize doesn't apply to rebasing. So you must send out another BIG SCARY EMAIL informing all your rebasing jerks that they must pass -Xrenormalize anytime they rebase a tabbed branch. This will more explicitly do the same thing that happens for merging. (It works for merging, too, but since there's a config flag there's not much reason to use it there. Also the same applies to cherry-pick and other ways of rearranging commits.)

  • New files on inflight branches WILL NOT be de-tabbed during the merge—they were only changed on one side, so git sees no reason to merge them! But git will still consider their "canon" representations to be spaces, so git diff will claim that every single indented line has "changed" from tabs to spaces, even if the file on disk still contains tabs. git checkout or git reset --hard will not make the "changed" files go away.

    It's possible to fix this with a clever git hook that applies the filter to new files during a merge, but it's not that huge a problem in practice: git status will report the files as modified immediately following the merge and they can be committed then. If more work is done before someone notices, git diff -w will still confirm the "useful" part of the change.

  • Stashes will not apply cleanly, and git stash apply seems to ignore -X. There are two workarounds:

    • Convert the stash to a branch with git stash branch, then merge or rebase it in.

    • Apply the stash manually with e.g. git cherry-pick 'stash@{0}' -n -m 1 -Xrenormalize. You need the -m 1 because a stash is actually a merge of several distinct commits that hold different parts of the stash, and cherry-pick wants to know which parent to diff against. -n just prevents committing, so you don't end up with "WIP: ..." as a commit message.

Of course, anyone without the filter definition somewhere in git's configuration will be utterly confused. So this probably only works for fairly centralized development or very small teams.

Other considerations

  • .gitattributes is cool but is not a magic bullet. Whenever you ask git to look at a file, it will always report seeing spaces—but if you put tabs in a file on disk, they'll stay there until you ask git to update the file (via merge, etc.). Confusion will abound, especially in Python files. You can force a checkout with git checkout-index --force [files...].

  • Eventually you should let your developers know that they can drop whatever .vimrc et al. hacks they've been using to force tabs within your codebase.

  • This may balloon your reflog, but git stores binary patches, it's all the same character so it's very amenable to gzip anyway, and git gc will eventually take care of it.

  • If you feel particularly destructive, you can also put the attribute stuff in /etc/gitattributes and have it apply to files in all git repositories on the entire machine.

  • Blame is not, in fact, totally wrecked. Use git blame -w to ignore whitespace-only changes.