Skip to content

Bio2RDF Git Workflow

Michel Dumontier edited this page Aug 9, 2013 · 5 revisions

#Git and and the Bio2RDF Project

This tutorial explains how to use Git to contribute to the Bio2RDF project. An introduction to Git can be found here. The Bio2RDF project takes advantage of "forks" on GitHub to enable anyone to propose changes to the Open Source scripts and server code that is used to generate data and run our web application. In this model, each contributor maintains their own copy of the Bio2RDF repositories under their user account. Here you can make any changes you want, address issues in the main repository, and work on new rdfizers. Once you have completed and tested your changes, you can open a Pull Request which starts a peer review to enable your changes to be merged into the main repository.

If you have found an error in our data or web application, but do not know how to fix it, open a new Issue in our GitHub Issue Tracker.

A few tips on getting started can be found in our guide to creating RDF for scientific databases. The guide details the structure of the RDF that a script should generate and best practices for leveraging the php-lib RDF creation package.

So you Forked a Project

Congratulations! You forked the Bio2RDF project (if not see GitHub help on doing so). Now you can get to work helping the life sciences Linked Data community!

Once you have Git installed locally, the first thing to do before anything else is to add the main Bio2RDF repo to the list of remotes. This is usually called the 'upstream' repository. If you are using the command line Git, this is done like so:

git remote add upstream git://github.com/bio2rdf/bio2rdf-scripts.git

or if you prefer to have a custom name for the upstream branch, replace "upstream" with something you will remember easily.

I would recommend sticking with 'upstream' as its easier to remember once you start working on other projects to recognize who owns the main project.

##The workflow

Using git can be confusing at first untill you develop a work flow every time you start working. These instructions detail what commands to use with a command line Git. If you are using a graphical client, you will need to refer to the documentation for the client to recognise which commands to use.

###Updating

The first command you should learn is to update your repo when you start working. This is one way to prevent running into conflicts when you go to merge new work or start a new feature.

git fetch upstream
git checkout master
git merge --ff-only upstream/master

This command will checkout the master branch of YOUR repository and merge with any new updates from the upstream repo (Bio2RDF main repo). From here you can start your new feature development by creating a new branch in Git that will contain the changes:

git checkout -B myfeature_local

Now you can start work on fixing the bug or implementing the new feature. You should commit often, as Git does not need to communicate with the server when you perform commits:

    git add newfeature/myrdfiser.php
    git commit -a -m "Implement feature X, fixes issue #1234"

You can push this feature branch, even while it is not completed, to your personal fork on GitHub using:

git push origin myfeature_local

###Done and time to Merge with Bio2RDF Main

So you have addressed an issue on Bio2RDF Main, made the necessary changes, and now want to push back to the main repository from your forked version. The first thing to check is that you have pushed the most up-to-date version of your changes to your forked repository:

git push origin myfeature_local

Now you can initiate a Pull Request for that branch. When you browse to your fork on GitHub, you can click on the Pull Request button at the top. On the left will be the upstream repo and on the right is your repo. Select the feature branch, from right, you wish to merge with upstream repo. Select the "master" branch from the upstream repo. Check that the changes are as you expect, and then enter a description of the changes in the textbox.

You can notify a specific person to do code review while you are describing the changes by typing the "@" symbol in the text box and then entering the persons GitHub username and selecting it in the popup.

At this point you can still make changes to the branch to fix any other bugs you find, but it may confuse the person doing the code review if you do this while they are doing the review! If the reviewer suggests changes then you can make the changes, commit them, and push them to your "origin", and the reviewer will then be able to comment on it again. Once the code is merged you can delete your branch via:

#commit changes first
git checkout master
# Delete the branch locally
git branch -d name_of_branch
# Delete the branch in your GitHub remote 
git push origin :name_of_branch

###Resolve issues

So you set out to solve a specific issue raised on the tracker. Now, having fixed the issue you want to resolve everything with your pull request. When you start a pull request to merge with bio2rdf upstream simply add "fixes #number_of_issue_here" to the description to automatically have the issue closed when the merge is accepted.

Useful Tips

####1 - Seeing the Branch name One tip that makes my life easier is to put a little code in my .bash_profile which tells me which branch I have checked out. This little snippet will only work if you are using Unix or Mac. If someone files a windows solution post below.

function parse_git_branch_and_add_brackets {
	git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/\ \[\1\]/'
}

PS1="\h:\W \u\[\033[0;32m\]\$(parse_git_branch_and_add_brackets) \[\033[0m\]\$ "

####2 - Changing a Remote Name

If for some reason you want to change your remote names:

git remote set-url origin git://new.url.here