Tigraine

Daniel Hoelbling-Inzko talks about programming

Git isn't magic

One thing I am seeing over the years is that Git has become too commonplace - too ubiquituous so that people don't care how it actually works or why it works that way. It's just this magical thing that keeps your sourcecode safe - and as long as push and pull work they just don't care why or how it works. Until git pull messes up your history, or you accidentally merged into the wrong branch or just a case of something went wrong™ and then they hope their Git client can bail them out.

So here is a very short guide on why Git is the way it is and why you should not be afraid of anything when it comes to working with git.

First thing to understand is that Git stores all of your committed files in a blob storage living inside your .git folder. Each and every file you ever committed got slashed into tiny files and placed in the .git folder. This is important because Git does not think in deltas or changes like other SCM systems but it thinks in snapshots. Each commit you do will commit the whole file to the blob storage. To save space git will internally map your files to multiple smaller files and have a blob object reference these smaller chunks (everything is addressed as a hash in git). So whenever you change something in that file git will not save the whole file a second time, but rather create a new blob object that references all the old chunks, alongside the new one you just changed.

Now that we know that Git has a storage that can return you every file ever committed in your repository we can talk about the commit and the tree. Files in itself are not very useful as we usually have a lot of these in our repository, so there is a object above the blobs that describes which versions of each blob belong together into a directory tree. Aptly named tree this is just a file that is more or less a mapping from the filename to the underlying blob hash. This is also important because if we change a file in a huge directory tree, a new version of that tree will just change one of these references to a new blob object (leaving all the other ones pointing to the old objects) - and that blob object will possible reference older chunks alongside the few changes we did in that file.

Why is this important? This also explains why git is so lightning fast. For every checkout operation git has to just look at the tree object, go into it's blob storage and store each file at the name it was referenced in the tree. Unlike other delta based SCMs where you had to get the last snapshot and then replay all changes since that snapshot to get to the current version. For git this is a constant operation that always takes the same amount of time - no matter which commit you check out.

The last concept to understand with git is the commit. A commit is basically a very small file that contains a few important pieces of information. It contains the SHA1 hash of the tree it is representing, the name of the author, the committer and the commit message. The git commit hash is exactly what the name implies: A shas1sum over this commit file. Thus a commit hash uniquely represents a commit, which unambiguously and cryptographically securely references a tree which represents (using the same sha1 concept) all files. If I tell you to check out version **d0ce065** of the project there is no doubt about what you are getting. Either your commit sha1 matches it's commit files contents or not. There is no way to corrupt that unit of your source code. Everything attached to a commit is securely identified by that one commit (since blobs, trees and chunks are all also in turn identified by their SHA1 sums).

So we end with a diagram like this (graphic by Scott Shacon):

https://git-scm.com/book

What is missing from this picture is the history. Where are the other commits? This is no SCM yet. Well I left out one very important detail in my previous description. The commit file also contains the sha1 of it's parent commit. So if you remember that everything about a git commit is a sha1 hash, one commit also cryptographically secures the whole git history before it and makes it impossible to miss a commit or add ones without all hashes upstream to change. Again this is best described with a diagram:

alt

So this now also explains why in every git presentation ever there where arrows between the commits that always went backwards. Git is a acyclic directed graph. This also means that git commits only know about their parents - never about their children. So if you did something you are not proud of, just check out the parent commit and continue working there - your child commit will linger on in the git object storage for some time, but eventually git will forget about it and it's gone.

No branches?

If you think about all of the above for a second you will realize branches are there just for human comprehension. They are not needed in git. Everything git needs to know or reason about (and you) is stored in single commit sha1 hashes. The git datastructure has no concept what master or develop means, it can only reason about 9b09404973ca743d0f5e11367034250dff219637.

Branches are just cosmetic to make us stupid humans not have to remember sha1sums when we work with git. A branch is nothing more than a file that lives inside .git/refs/heads that contains exactly the shasum of the commit your branch is currently pointing to. Branches are pointers - they serve no (real) purpose and can be deleted at no cost without anything happening to the commit tree. Try this:

$ git clone https://github.com/dotless/dotless.git
$ cd dotless/.git/refs/heads/
$ cat master
=> 9b09404973ca743d0f5e11367034250dff219637 <= This is the commit sha1.

Go ahead and do a git checkout 9b09404973ca743d0f5e11367034250dff219637 and you will get exactly the same working directory as if you had done git checkout master. This is so important because most of the time when stuff goes wrong with git people don't realize that having a commit on a branch where it does not belong is no big deal - just delete the branch and re-create it at the point where it actually should be.

Coincidentally remote branches are also just the same thing: pointers. Locally you can merge with your origin/master exactly because when you do a git pull or git fetch git updates the files inside .git/remotes/origin to the sha1sums that are on the server. These things are just local operations since all changes of the server are already living inside your .git directory.

So if you ever accidentally deleted a branch and forgot the commit sha1 it was at - that branch is not lost. It's still right there in your .git folder - you just have no clue how to ask git for it. That's the point where you can just run git reflog and you will see all the recent pointer/branch changes you did recently along with their sha1 sum for you to check out and recover.

$ git reflog
858b6db HEAD@{0}: merge laedit/Adding-support-for-'unit'-function: Merge made by recursive.
715db85 HEAD@{1}: checkout: moving from laedit-Adding-support-for-unit-function to master
715db85 HEAD@{2}: checkout: moving from master to laedit-Adding-support-for-unit-function
715db85 HEAD@{3}: merge nevett: Fast-forward
2a933ed HEAD@{4}: checkout: moving from nevett to master
715db85 HEAD@{5}: pull https://github.com/Nevett/dotless.git fix-mixin-important-recursive: Fast-forward
2a933ed HEAD@{6}: checkout: moving from master to nevett
2a933ed HEAD@{7}: checkout: moving from d1d2a822561fcd2c52a54b6bb8799000e4efecf9 to master
d1d2a82 HEAD@{8}: checkout: moving from MarkOSIndustries-master to d1d2a822561fcd2c52a54b6bb8799000e4efecf9

As you can see, each operation I did (even when I did not commit anything) is noted here and I can find the missing sha1sum to recover.

Filed under git

Don&apos;t commit when you rebase

One common mistake and my team seem to be doing at times with Git is that we routinely commit after resolving a merge conflict during a rebase.

The idea is simple: Instead of doing thousands of merges (thus cluttering the timeline of the project) whenever someone wants to check something into the central repository he fetches the remote changes and applies his local changes onto the remote ones.

Thanks to Visual Studio and it's way of having every source file referenced from the .csproj files (I hate that btw), we routinely get merge conflicts on the project files that need to be manually resolved. During a rebase this means the rebase will stop on the commit that caused the conflict and you have to fix it.

After you are done with fixing the conflict (usually it's just removing the conflict markers Git inserted) you then have to add the now merged file to the index (aka staging area) and hit git rebase --continue. Problem is: Since it's just a special state of a repository you can just as easily do the following:

git add -A git commit

Good job you just put your repository into limbo mode and your merge is gone .. You recorded a commit onto your rebase-HEAD and you can't continue the rebase because the rebase head is no longer where it belongs to.. So once you try to do git rebase --continue git will tell you: "No changes - did you forget to use 'git add'?" And when you run git status you'll see: "Not currently on any branch. nothing to commit"

Your only avenue here is:

git rebase --abort git checkout master git rebase origin/master ... redo the merge git rebase --continue

Filed under programmierung, git

Why I moved to Git from Mercurial

Believe it or not, my first encounter with distributed version control was through Mercurial. At the time I was used to SVN and Mercurial (hg) opened my eyes to the painless ways of the DVCS. And while I was very content with the way Mercurial worked, I still had to make sacrifices to the way the tool worked.

After seeing some .NET OSS projects on GitHub, and following GitHub and BitBucket for some time it became clear that GitHub is much more active than BitBucket. So I decided to try out Git and see for myself why everyone is making a fuss about GitHub, and behold: I liked it!

While hg pretty much feels like Subversion but more awesome, Git manages to loose that feeling. Especially the flexibility that git allows through it’s index made it immediately feel “right”. Here is why:

Forgettable me

Sadly, I forget things all the time. And this is especially true while working. Sometimes I commit something and forget to save my Visual Studio project, so my commit only contains loose files. In hg, like in SVN, you are screwed and have to commit a second time to make things right. It’s embarrassing when others see your history littered with “Doh, forget to save XY” commits.

Git solved this with it’s git commit --amend that allows me to add stuff to the previous commit with no hassle. Hg treats all commits as immutable, and that’s just not how I tend to work. I’m stupid, and my tools have to reflect that!

Caveat: You should NEVER mess around with published history. If you amend a commit that is already in a public repository, everyone that pulled that commit will have a very hard time merging with any upstream changes you start to introduce. Just don’t do it, once you push you are committed!

Staging

Same story, different setting. Besides forgetting to commit something, I sometimes even forget to commit at all. You get into the flow of things, and once you look up you finished 3 things that better go into individual commits. With git you can simply add individiual files (or even individual lines inside the files) to the index, commit, repeat until no uncommitted files are left.

This feature is awesome, and it allows a great deal of flexibility without having to think about “damn am I doing too much for one commit?”. It’s like above: I don’t have to think how the tool wants me to behave, I can make my tool behave like I need it do easily.

Mercurial can also do this, but you end up with hg commit --include fileA fileB ... and that’s just not as simple as the Git index.

And that’s just all about it. I went with hg first because people told me that it’s better on Windows, but when I moved to Git afterwards I didn’t notice any problems. Speed is pretty much the same, so you can just pick and choose what you like regardless of platform.

I simply chose Git because it does not interfere with the way I work, and because it makes it easy to fix my screwups afterwards.

Filed under programmierung, git

Git: Unstaging all changes and reset the working tree

When you commit something to git you first have to stage (add to the index) your changes. This means you have to git add all the files you want to have included in this commit before git considers them part of the commit. To some this added complexity feels like a burden, but to me it’s a blessing.

git working dir / index / repository diagram

Ever forgot that you just did 2 distinct features that really should be in 2 different commits? Well, just add the changes of the first feature, commit, repeat for the second commit and you have everything organized the way you want it to show up in your history.

But: You usually don’t do individual git add commands anyway, if you are anything like me you just hit git add –A (adding all changes in the working directory) and while writing the commit message you realize that there are two files that should not go into this commit. But how do you remove something from the index?

Since you used git add to put them in the index it would be logical to use git rm? Wrong! Git rm will simply delete the file and add the deletion to the index. Meet:

git reset

As with all things in git, git reset can be used in a number of ways. So here is the most common usage in day to day usage:

git reset : Clears your index, leaves your working directory untouched. (simply unstaging everything)

git reset --hard : Clears your index, reverts all changes in your working directory to the last commit. (the same thing as svn revert would do)

Filed under programmierung, git

Using git from Powershell just got easier: Posh-git

Whenever someone asks me at the end of my git presentation about what tools to use with git my answer was always be the same:

Learn to use the commandline. It’s by far the most convenient way to get stuff done. That’s the way git was intended to be used, and with msysgit and Powershell you get a pretty powerful shell to do your stuff. I work exclusively from the commandline, rarely using gitk to take a look at the history.
Unfortunately the basic Windows commandline (cmd) is just awful and outdated. And while some people swear by the MinGW stuff, I loathe the Unix commandline. So Powershell was the only good way for me on Windows to use git.

Now, thanks to some great work from Keith Dahlby, Jeremy Skinner and Mark Embling using git from Powershell just got a whole lot more comfortable with Posh-Git.

Documentation is still a bit sparse, but Posh-Git at it’s core gives you two things: It modifies your Powershell prompt to display relevant git information (branch name and staged/unstaged changes) and adds tab-completion to all git commands. Tab completion also works on branches so you can even hit tab on a git checkout and it will auto-complete to a branch. Really really nice I have to say.

Here is how my git shell looks like now with Posh-Git:

image

Once you have changes it will display them also on the prompt:

image

(Meaning: 1 new file, 1 deleted and 0 old ones changed)

Installing

Unfortunately not everyone is a Powershell buff like Keith and friends, so I had a bit of trouble finding out how to set up Posh-Git. Actually, it’s dead simple, but nobody told me: Just put all files from the Posh-Git repository into the folder:

C:\Users\<username>\Documents\WindowsPowerShell\

And rename the file profile.example.ps1 to Microsoft.PowerShell_profile.ps1. If you already had a PowerShell_profile.ps1 file set up with some custom settings, you can just add the code from profile.example.ps1 to your existing profile (assuming you didn’t change your prompt before).

After that, restart your Powershell prompt and enjoy!

Important:

The latest Posh-Git is only working with git 1.7.1 and higher, so if you are still running the 1.7.0.2 release you have to use the v0.2 Posh-Git release.

Filed under git, tools

Barcamp Vienna 2010

I just booted my PC after 4 hours of drive through heavy rain and thunderstorms back from Vienna where I attended Barcamp. I have to say it was just fantastic! All Barcamps I attended before had a very diverse crowd, but usually lacking developers thus the social media enthusiasts usually dominated the attendees.

Barcamp Vienna was different, maybe it was the awesome location at Microsoft Austria headquarters or just the fact that it was in Vienna.. But I met more coders there in 2 days than in the last 2 yeas in Klagenfurt.

Coolest thing, I even met a Subsonic developer: Saintedlama! That was really awesome and funny when we met during breakfast randomly chatting about our stuff and I noted that I’ll be presenting dotless when he said: “Wow that’s you? I wanted to contact you for some time now about dotless. I’m working on Subsonic btw.. " (Imagine my jaw dropping right there.. ). He showed me some really cool demos of the simple repository they introduced in SubSonic 3 and it’s uses with MVC.. and I have to tell you: wow.. Using a ORM was really never so easy..

Anyway, I really had a great time either chatting up really interesting people or doing my two presentations.
On Saturday I talked about dotless while on Sunday I talked about Git. Both talks went great in my opinion, but if anyone was there and has additional feedback on my presentations I’d be glad to hear them. I uploaded both slide decks to http://www.docs.com and you can find them here:

image

dotless – CSS done right

image

Git

At any rate: Thanks to Max and Rolf for organizing this awesome event and to Microsoft for so generously hosting it!

Filed under barcamp, dotless, git, personal

Git Source Control Provider for Visual Studio 2010

During my presentation at Barcamp Vienna today I got asked what GUI to use with Git. My plain and simple answer was: “Use the command line, its just better that way.. “. Well, after the talk Andreas approached me and raised a very good point: Mainstream adoption of tools (like Git) often depends on GUIs to help people during the transition phase.

Besides the obvious Guis like TortoiseGit or GitGui there is now one more tool aimed at .NET developers to make the transition easier: Git Source Control Provider for Visual Studio 2010. It looks like Microsoft got VS2010 extensibility right this time, and someone managed to implement Git integration right into the VS2010 solution explorer:

Git Source Control Provider

The Plugin is available through Visual Studio Gallery and seems to be Ms-Pl licensed and free, although I couldn’t find the code anywhere, it’s still worth checking out: Git Source Control Provider.

Filed under net, programmierung, git

Displaying git branch in your powershell prompt

When I started out with git I didn’t start using the commandline right away. I fiddled around with TortoiseGit and GitGui quite a bit before I found out that it’s just so much faster to do all those things from the commandline. As we all know, the windows commandline is not the most powerful thing on the planet, but I also loathe the unix commandline that comes installed with git (gitbash). Obviously, the only alternative is Powershell so I went with that and am very happy with it.

One thing though: I envied Linux users who could extend their bash prompt to display git specific information directly on the shell. Well, after a bit of digging and through some Stackoverflow articles, I managed to find this:

image

Powershell is quite extensible, and by placing a file called Microsoft.PowerSehll_profile.ps1 in your Documents\WindowsPowerShell folder you can define a function called prompt that allows you to modify your prompt text.

I’ve been using this now for quite some time, so I can’t recall where I found the script before I modified it, but here it is anyway (credit goes to whoever created it in the first place): Microsoft.PowerShell_profile.ps1.

Filed under programmierung, git

Disable AutCrlf in MsysGit!

Ok, today I got this really cool pull request from Jon Galloway for .less. He did some improvements to the T4 Template for .less so that it will run the template on every build if you want that, or that Visual Studio opens up .less.css files. Cool stuff, and thanks a lot Jon! Of course I quickly looked through the changesets and then went ahead and committed it, but I noticed one novice mistake many people run into with MsysGit on Windows.

image

I am pretty sure Jon only changed a few lines of code, yet the changeset logs every file in the document as changed, thus making it pretty unreadable. This happened to me a lot when I started out with GIT, and finding the solution to this wasn’t easy either. So here is how to fix your MsysGit.

This is caused by a little setting called core.AutoCrlf that is set in MsysGit by default to true. MsysGit will then go ahead and localize the line-ending depending on what machine you are running on. This happens on checkout, so even with abolutely zero changes to the repository you’ll see a lot of local changes where there haven’t been any.

Needless to say how stupid this setting is, it only generates a ton of useless insertions and deletions while sticking to the pipe dream that everyone on the team will use the same tools to generate consistent line feeds (I’ve even had issues with only Windows machines and this setting).

Now the solution to this is rather simple:

image

By running “git config –global core.autocrlf false” you disable this and the text files will not get changed upon checkout. This now means that Visual Studio will maybe ask you to fix inconsistent line endings when opening a file, but it is still better than having your source control screw you over upon checkout.

Filed under programmierung, git

My Photography business

Projects

dynamic css for .NET

Archives

more