Git guide

This guide is a basic tutorial to understand the initial concepts about Git. For more advanced knowledge about Git you can review the official guide: https://git-scm.com/book/en/v2/

Git is a Version Control System (VCS) that records changes to a file or set of files over time so that you can recall specific versions later. It allows you to revert selected files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Using a VCS also generally means that if you screw things up or lose files, you can easily recover. In addition, you get all this for very little overhead. But, what type of files is possible to track ?. In reality you can do this with nearly any type of file on a computer but VCS systems were intended to version source code.

Warning

Garnatxa provides a remote repository to push Git files. It should only used to store source code. Then avoid to upload not ascii files like binaries, databases, compressed files, etc. Any type of these files may be rejected when trying to store them in the repository.

Git is a Distributed Version Control Systems (DVCSs). In a DVCS (such as Git, Mercurial, Bazaar or Darcs), clients don’t just check out the latest snapshot of the files; rather, they fully mirror the repository, including its full history. Thus, if any server dies, and these systems were collaborating via that server, any of the client repositories can be copied back up to the server to restore it. Every clone is really a full backup of all the data.

Main features

  • Backup & Restore: Save and edit files.

  • Synchronization: Allows sharing source code and update your local repository to the last version.

  • Undo changes: Any change can be undone to go back to any old version.

  • Track changes: Is possible to track the evolution of changes in files and their differences. The changes in source code can be explained to better understanding.

  • Track ownership: Git is a multiuser platform and this means that everybody can see who and when a file was modified.

  • Sandboxing: Allows to develop testing code without interfering with stable code.

  • Branching & Merging: The testing code (or any type of source code) can be merged to the main line of development and vice versa.

  • Distributed and Offline availability : Every user keeps a copy of all changes in files (local repository). After to commit a change they can push changes to a remote repository but if this remote location fails the data continues available in local repositories.

../_images/gitlab8.png

Figure 1. Example of workflow in Git.

During the life cycle of a project a set of files can be modified many times. Also is possible to work with multiple developments of the same files in parallel (branches) merging then the resulted sources into a stable release. Git allows you to track changes in files and recover modifications at every instant of time. In this way is possible to return to a specific version of a project. In addition git allows multi-user collaborative development. Several users can be modifying the same source code without interfering. The changes produced in the files are reviewed, agreed and merged before obtaining a stable version in Git.

../_images/gitlab9.png

Figure 2. In the figure the files A,B and C are modified several times along the time. Git allows commit snapshots of the project and tagging them in order to preserving changes. You can return to each version of the project and compare differences.

It is very important to understand the stages and workspaces that a file goes through in Git. In Git we will always work with files stored on our personal computer. Then we can identify the next working spaces:

  • Working directory: Associated to the entire project workspace. We can modify files as many times as we want. Changes to those files are not committed.

  • Staging Area: Set of files that were modified from the last committed. We can put in this area all the files to be committed as modified.

  • Git Repository: Committed files. This stage include the files that were tracked as a new snapshot in the project. You can return to old commits.

  • Remote server: It’s used to push committed files to external servers. This server is used to backup all our project. The git project can be downloaded from this server to other locations.

Note

Garnatxa provides a remote Git repository to upload your code project. Check the :ref:` Garnatxa’s Gitlab service <gitlab_service>` section in order to add a new Git repository in the remote server.

../_images/gitlab10.png

Figure 3. Stages in git.

Using Git in Garnatxa

The general procedure to develop in Garnatxa is:

1- Develop your software in your personal computer or workstation. Test your code and produce a stable version in Git. Upload this release to the remote Git server (provided by Garnatxa).

2- Once you have uploaded a stable release in Garnatxa repository (Gitlab) connect to Garnatxa and clone or download the changes in your code from the remote Git repository. Test the code in an interactive session or submit the jobs in the queue system.

3- Finally Submit jobs to Garnatxa.

The following sections provide a summary of the basic actions to handle git in you local computer and upload changes to a remote server. You only need a personal computer with git installed.

Preliminary actions

To develop the course we are going to employ a directory with some already existing source files. In this case we are considering that our project already exists but it is possible to create an initial project and start to develop in git from zero.

USERNAME@localhost:~$ scp -r USERNAME@garnatxa.uv.es:/doc/test .
USERNAME@localhost:~$ cd test
USERNAME@localhost:~$ ls test

Create an empty project in Gitlab:

We need to create a new project in Gitlab (the Git remote repository provided by Garnatxa). You can obtain more information about how to initialize a Gitlab project here: Garnatxa’s Gitlab service . Follow the steps in the guide to create a new project with the name: test (avoid to create a README.md file).

Initialize the local project

Only the first time you are configuring a new project in git you have to do:

1. Configure some global variables in git and initialize the local project. The next commands create a .git directory and some internal configuration settings inside it.

localhost$~/test git config --global init.defaultBranch main
localhost$~/test git config --global user.name "USERNAME"
localhost$~/test git config --global user.email user.surname@example.com
localhost$~/test git init
Initialized empty Git repository in /home/USERNAME/test/.git/

2. Add the remote address of your project test in Gitlab to the git local configuration. To get the address enter in Gitlab and Projects->test->Clone->Clone with SSH and copy (ctrl+C) the text: git@garnatxagitlab.uv.es:USER_FIRSTNAME.USER:LASTNAME/test.git

localhost@$~/test$ git remote add origin git@garnatxagitlab.uv.es:user.surname/test.git

3. A branch in git is a development line and can coexist with others versions of the same code in parallel. For example, we can have a main branch where we only develop stable content and release stable versions (it is the branch from which our clients should download the stable versions) and at the same time another branch can coexist in which we work to solve a problem in the code. This way we can work with the bug without interfering with the stable version. When we fix the bug it will be possible to join the two branches into one, usually the main one. Gitlab and GitHub use the name: main as the default branch. We make sure that our branch is called main with: git branch -m main

localhost@$~/test$ git branch -m main

Working with Git

git status

The first command that we will use will allow us to know what state our project files are respect to git. Use: git status whenever you want to know which files have been modified or confirmed with respect to the latest version stored in our local directory.

localhost$~/test git status
On branch main

No commits yet

Untracked files:
(use "git add <file>..." to include in what will be committed)
    ArrayJob.sh
    ArrayJob_List.sh
    FileJob.sh
    FileJob_List.sh
    MPIJob.sh
    MultiThreadJob.sh
    OpenMPJob.sh
    SequentialJob.sh
    SingularityJob.sh
    data/
    executables/
    list_of_cmd.txt
    ref/

nothing added to commit but untracked files present (use "git add" to track)

The output of the command tells us that there is no file modified and staged yet with respect to the latest version. The list shows all the files and directories of test that logically they have not been changed. We will modify one of the files to see what happens but first we must understand the following rule in git.

.gitignore

Git can only be used to manage source code developments, so we must avoid uploading files outside of that scope: binaries, databases, data files (fasta, fastq, zip, etc) or directories of input/output files. To avoid upload this type of files and directories you have to create a special file named: .gitignore and put inside the list of files/directories to omit. Review https://git-scm.com/docs/gitignore to learn more about git ignore. In this example of test the directories: data, out, executables should be ignored in git. Use vim or other text editor to create the file .gitignore then add the lines:

localhost$~/test vim .gitignore
data
executables
out

git add

Now the files are local but we want to upload them to the Gitlab server so that they are accessible later in Garnatxa. To upload the files to the server (push in Git terminology) we must first do two previous steps.

  1. We add the files to the stage area. In this way we tell git that these files are ready to be committed as a new version of our software. The git add command allows you to select which directories or files you want to move to the stage area. If we want all the files modified or added since the last version maintained in our working directory to be moved, we must do: git add .

localhost$~/test$ git add .
localhost$~/test$ git status
On branch main

No commits yet

Changes to be committed:
(use "git rm --cached <file>..." to unstage)
    new file:   .gitignore
    new file:   ArrayJob.sh
    new file:   ArrayJob_List.sh
    new file:   FileJob.sh
    new file:   FileJob_List.sh
    new file:   MPIJob.sh
    new file:   MultiThreadJob.sh
    new file:   OpenMPJob.sh
    new file:   README.md
    new file:   SequentialJob.sh
    new file:   SingularityJob.sh
    new file:   list_of_cmd.txt
    new file:   ref/chr8.fa

git commit

  1. If we are sure that all these are the changes that will go into our next version of the software, we must commit the files in the stage area before we can upload them to the Gitlab server. A commit action is something like taking a snapshot of the state of your development files. We can make confirmation when we want to launch a new release of the software or we have corrected a bug in our code and we want to upload it to a remote server so that other users can download the corrected version.

Use git commit -m 'description' and add a description about what include the commit.

localhost$~/test$ git commit -m "My first commit. Original scripts"
[main (root-commit) d0068b2] My first commit
13 files changed, 2927504 insertions(+)
create mode 100644 .gitignore
create mode 100644 ArrayJob.sh
create mode 100644 ArrayJob_List.sh
create mode 100644 FileJob.sh
create mode 100644 FileJob_List.sh
create mode 100644 MPIJob.sh
create mode 100644 MultiThreadJob.sh
create mode 100644 OpenMPJob.sh
create mode 100644 README.md
create mode 100644 SequentialJob.sh
create mode 100644 SingularityJob.sh
create mode 100644 list_of_cmd.txt
create mode 100644 ref/chr8.fa
localhost$~/test$ git status
On branch main
nothing to commit, working tree clean

git push

  1. The last step is to upload the new commit to the Gitlab server in Garnatxa. Use git push -u origin main

localhost$~/test$ git push -u origin main
Enumerating objects: 16, done.
Counting objects: 100% (16/16), done.
Delta compression using up to 16 threads
Compressing objects: 100% (14/14), done.
Writing objects: 100% (16/16), 44.73 MiB | 3.25 MiB/s, done.
Total 16 (delta 3), reused 0 (delta 0), pack-reused 0
To garnatxagitlab.uv.es:user.surname/test.git
* [new branch]      main -> main
Branch 'main' set up to track remote branch 'main' from 'origin'.

git log

Use git log to review the history of commits done. each commit has an associated hash code that can be referred to when we want to go back to a previous version of our code.

localhost$~/test$ git log
commit d0068b214af7c7efe580da766e1001bcdfd6a108 (HEAD -> main, origin/main)
Author: User  <user.surname@example.es>
Date:   Wed Jul 5 14:51:56 2023 +0200

My first commit

The same in brief mode: git log --pretty=oneline

localhost$~/test$  git log --pretty=oneline
98132db3ef221ea9043cc04702f81157e5b85bc2 (HEAD -> main, origin/main) Add a README file
d0068b214af7c7efe580da766e1001bcdfd6a108 My first commit

Now we can create a new file called README.md. Most of the git projects have this file (it contains a brief description of the project) since it allows us to explain the functionality of our software as well as the steps for its installation, use, etc. Use a text editor to create and add these lines:

localhost$~/test vim README.md
This is a testing project. We include some sbatch templates in SLURM.
The sbatch scripts allow to submit jobs to the queue system in Garnatxa.
Multiple types of jobs are implemented: Sequential, multithreads, MPI, arrays, background and singularity.

Git status shows that the new file: README.md is untracked (means out of the stage area). You could continue editing files and do git add at the end.

localhost$~/test git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)

Untracked files:
(use "git add <file>..." to include in what will be committed)
    README.md

nothing added to commit but untracked files present (use "git add" to track)

Execute Git add , git commit and git push to send the new commit to Gitlab.

localhost$~/test git add .
localhost$~/test git commit -m "Add a README file"
[main 98132db] Add a README file
1 file changed, 3 insertions(+)
create mode 100644 README.md
localhost$~/test git push -u origin main
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 16 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 521 bytes | 521.00 KiB/s, done.
Total 5 (delta 3), reused 0 (delta 0), pack-reused 0
To garnatxagitlab.uv.es:user.surname/test.git
d0068b2..98132db  main -> main
Branch 'main' set up to track remote branch 'main' from 'origin'.
localhost$~/test  git log
commit 98132db3ef221ea9043cc04702f81157e5b85bc2 (HEAD -> main, origin/main)
Author: User  <user.surname@example.com>
Date:   Thu Jul 6 10:06:59 2023 +0200

Add a README file

commit d0068b214af7c7efe580da766e1001bcdfd6a108
Author: User  <user.surname@example.com>
Date:   Wed Jul 5 14:51:56 2023 +0200

My first commit

git show

With git show shows the content of the last commit. Also is possible to obtain differences between the last commits and older.

localhost$~/test  git show
commit 98132db3ef221ea9043cc04702f81157e5b85bc2 (HEAD -> main, origin/main)
Author: User  <user.surname@example.com>
Date:   Thu Jul 6 10:06:59 2023 +0200

Add a README file

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..47f41b3
--- /dev/null
+++ b/README.md
@@ -0,0 +1,3 @@
+This is a testing project. We include some sbatch templates in SLURM.
+The sbatch scripts allow to submit jobs to the queue system in Garnatxa.
+Multiple types of jobs are implemented: Sequential, multithreads, MPI, arrays, background and singularity.

git tag

Some commits can be tagged to make it easier to refer to them in subsequent actions. The git tag command allows you to define tags for an already made commit. The tags are often used to reference a milestone achieved in software development, such as the release of a new version, the fixing of a bug, or the creation of a parallel development branch.

localhost$~/test  git tag -a v1.0 -m "First version of test" d0068b
localhost$~/test git log --pretty=oneline
98132db3ef221ea9043cc04702f81157e5b85bc2 (HEAD -> main, origin/main) Add a README file
d0068b214af7c7efe580da766e1001bcdfd6a108 (tag: v1.0) My first commit

When you create tags remember to push them to the remote Gitlab repository: git push origin --tags. Tags remain stored in the local repository.

localhost$~/test git push origin --tags
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 167 bytes | 167.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
To garnatxagitlab.uv.es:user.surname/test.git
* [new tag]         v1.0 -> v1.0

git checkout

At any point in time we can return to a previous version of our code (only is possible with committed snapshots). For example, if we wanted to go back to the initial version before adding the README.md file, we should use the git checkout <commit> command.

Check the content of the test directory, the file README.md is there because we committed it.

localhost$~/test ls
ArrayJob_List.sh  ArrayJob.sh  data  executables  FileJob_List.sh  FileJob.sh  list_of_cmd.txt  MPIJob.sh  MultiThreadJob.sh  OpenMPJob.sh  out  README.md  ref  SequentialJob.sh  SingularityJob.sh

Review the list of commits, remember that we tagged the initial commit with: v1.0

localhost$~/test git log --pretty=oneline
98132db3ef221ea9043cc04702f81157e5b85bc2 (HEAD -> main, origin/main) Add a README file
d0068b214af7c7efe580da766e1001bcdfd6a108 (tag: v1.0) My first commit

We can use a tag or the first 6 characters of the commit identifier to switch the commit.

localhost$~/test git checkout d0068b   (git checkout v1.0 produces the same effect)

Now README.md is gone from the test directory. Notice how the HEAD pointer now points to the initial commit.

localhost$~/test ls
ArrayJob_List.sh  ArrayJob.sh  data  executables  FileJob_List.sh  FileJob.sh  list_of_cmd.txt  MPIJob.sh  MultiThreadJob.sh  OpenMPJob.sh  out  ref  SequentialJob.sh  SingularityJob.sh

localhost$~/test git log --pretty=oneline
d0068b214af7c7efe580da766e1001bcdfd6a108 (HEAD, tag: v1.0) My first commit

The important thing here is that the file README.md has not been removed from git. We have only returned to older snapshot in our project in which the README.md file did not yet exist. This is useful for example if we wanted to return to a point in the code where we detected an error and we wanted to create a parallel branch to solve it.

We can return to the last commit in the main branch:

localhost$~/test git checkout main
Previous HEAD position was d0068b2 My first commit
Switched to branch 'main'
Your branch is up to date with 'origin/main'

localhost$~/test ls
ArrayJob_List.sh  ArrayJob.sh  data  executables  FileJob_List.sh  FileJob.sh  list_of_cmd.txt  MPIJob.sh  MultiThreadJob.sh  OpenMPJob.sh  out  README.md  ref  SequentialJob.sh  SingularityJob.sh

localhost$~/test git log --pretty=oneline
98132db3ef221ea9043cc04702f81157e5b85bc2 (HEAD -> main, origin/main) Add a README file
d0068b214af7c7efe580da766e1001bcdfd6a108 (tag: v1.0) My first commit

git mv and git rm

To move or delete a file or directory from local repositories you have to use git commands. You should not move or delete files directly from the local repository as those actions will not be reflected in subsequent commits.

Create a new directory and move the file SingularityJob.sh there with git mv.

localhost$~/test mkdir others
localhost$~/test git mv SingularityJob.sh others
localhost$~/test git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
(use "git restore --staged <file>..." to unstage)
    renamed:    SingularityJob.sh -> others/SingularityJob.sh

Now try to remove the file with git rm

localhost$~/test git rm others/SingularityJob.sh
error: the following file has changes staged in the index:
others/SingularityJob.sh
(use --cached to keep the file, or -f to force removal)

The commands returns an error because the new directory and the move action was not previously committed. We can force the action but this would delete the others directory as well.

localhost$~/test git rm -f others/SingularityJob.sh

To restore a deleted file in git use git restore

localhost$~/test git restore SingularityJob.sh

Repeat the process but this time we will commit the others directory and his contents.

localhost$~/test mkdir others
localhost$~/test git mv SingularityJob.sh others
localhost$~/test git add .
localhost$~/test git commit -m "Move SingularityJob.sh to new directory others."
[main ef5dacc] Move SingularityJob.sh to new directory others.
1 file changed, 0 insertions(+), 0 deletions(-)
rename SingularityJob.sh => others/SingularityJob.sh (100%

Finally upload all the commits in your local repository to Gitlab in Garnatxa.

localhost$~/test git push -u origin main
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 16 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 344 bytes | 344.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
To garnatxagitlab.uv.es:jose.carrion/test.git
98132db..ef5dacc  main -> main
Branch 'main' set up to track remote branch 'main' from 'origin'.

Summary of commands

Git command

Description

git init

Initialize a new git project. It should only be executed once at the start of the project.

git status

Shows the status and stage of the project files.

git log

Gets a history with the commits made in the project.

git log –pretty=oneline

The same in only a line.

git log –graph

The same showing a temporal line.

git show

Gets differences between commits.

git add .

Add files or directories to the preparation stage. The step before to do a commit.

git commit -m ‘<description>’

Commit a snapshot of the project. The files located in the stage area are the ones that will be included in that commit. Commits are stored on the local machine. Use git push to push the commits to a remote repository.

git push -u origin main

Upload local commits to an external repository. Use this remote repository as a backup and share file area.

git tag -a <tag> -m <description> <commit>

Label a commit to make it easier to refer to in the future. It usually coincides with different milestones in the development of your code. For example a the release of a new version.

git checkout <branch> or <commit> or <tag>

Switch development to other commit or branch.

git rm

Remove a file from the local repository.

git mv

Move a file to a directory.

git restore

Restore a file deleted with git rm