Git tracks its history using a tree structure behind the scenes. Each entry in the tree is uniquely identified with a SHA hash. When playing with Git you may have seen something like:
A list of commits, each with one parent
What you are looking at is a stack of commits where the SHA hash uniquely identifies each commit. It can be thought of as a unique identifier like a primary key in a database.
Try the following to see some commits in your repository.
git log --oneline
The oldest is at the bottom and the newest at the top. What is not obvious is that the child item has a pointer to its parent(s). In the above scenario each child has one parent and it looks just like a list. Note: In the diagram below we are using alphabet letters instead of SHA hashes.
B has a pointer to its parent A
A commit can have one or more parents. When the commit has two or more parents it is essentially a merge where we have taken content from A and B and made C.
C has a pointer to its parents, A and B
When we download a Git repository we can think of it being almost two separate items. First there are all the files in your project as then there is as an index that tracks the Sha references and which files they each relate to. This index is called the reflog. Not only does it contain these Sha references, it also can contain Tags (which are human-readable labels applied to a single Sha reference) and pointers that identify the Sha reference of each branch in the repository.
At a high level we will be discussing two separate instances of Git. Firstly there is the instance on your own machine which we can refer to as the local instance. Then there is the instance on some remote machine which we can refer to as the remote instance. There aren’t really any differences between the local and remote except for being on different machines. The remote instance for example could be the place where the team pushes their code to. Ultimately there usually are at least 2 Git copies, your local and somewhere that you push your copy to. Think of it like Subversion and pushing your changes to a remote repository.
Side note: quite often in documentation you will see the remote referenced as “origin” (a default name).
Creating a Git Repository
Let’s start off by creating a brand new Git Repository.
This creates our Demo folder and then creates a new Git repository. Simple!
Logical sections within Git
Within an instance, Git is best visualized as 3 separate buckets.
The working directory is what you see in your folders and file system. You make code changes here.
In the Working Directory, you might create a new file called A.txt, add some content and save it.
If we run the command
This will ask Git to see what the current status is. Has anything changed? Are there any files waiting to be added to Git?
You can see in this example, Git isn’t tracking A.txt. It also tells us how to add the file.
Next you would tell Git to track this new file.
git add A.txt
Alternatively you could add all files within the directory by specifying the dot syntax.
git add .
Running Git status again we see:
The whole point of the Staging Area is to create a candidate for committing to the database. Take for example fixing a bug. It would be far cleaner and easier to read 1 commit to the database e.g. “#424 Bug fixed” rather than 10 piecemeal commits all contributing to the commit. In Git we can stage and manipulate our commits to make history more logical and easier to read.
Once you are happy with your your Staged work you would commit it to the actual repository.
git commit -m "#424 Bug fixed"
This would move our staged changes into the Repository. Essentially freezing those changes in place.
That’s it! That is a very simple example that touches upon all 3 buckets within Git.
To help visualize the bigger picture, the typical day-to-day workflow that you would use would be something like this:
Fetch from the database (this updates your repository and brings down all information from the centralized repository)
Pull to update your local copy with what is in the remote repository
Make some local changes
Add them to the Staging Area
Commit your Staging Area to the repository
Finally when you are happy with your commits, Push them to the remote instance
And go to step 1 again…
Don’t worry about the other verbs that we haven’t discussed yet, we’ll touch upon them in later topics.
GIT is a Distributed Version Control System (DVCS), which can be seen as a step up from a regular Version Control Systems (VCS). The “Distributed” in DVCS means that multiple copies of a repository are held rather than just the single repository.
This is good because:
Enhanced failover capabilities: If you loose your main repository through a catastrophic incident, you still have copies of the repository held on other machines
It opens the door for collaboration: In a centralized version control system, everyone has to commit and merge on the single source of truth. With a DVCS, we can have sub teams of people working on a long running sub-project. Reading and writing to each others repositories on different machines from the main repository. They can integrate their work over and over again without disrupting people working on the main repository.
Offline work: Means you can take your work home for the night and rather than having to wait till the next morning to commit all of your work, you can commit there and then and then just re-sync your local repository and your company’s main repository the next day.
Encourages Forking: GitHub has exploded with popularity and levelled up Open Source software. The reason being is that you can take someone’s work, create a copy (a Fork in GitHub land) and use that as a starting place to improve the original software (or build your own software). A DVCS allows you to download the repository of work and create your own branch and re-submit it back to the original author who can include your additions back into their own work.
An example of a distributed system. One central remote supplying 3 repositories. And a side project going on.