A case for a bioinformatics manifesto

Software development can be a complex process and therefore several software development approaches, philosophies and principles have emerged. Many organisations ascribe to one or more of these approaches depending on the size of the development team and the nature of the problem or task. The Agile and the Unix philosophies have come to be widely adopted in most organisations and also by individual developers.

The Unix philosophy advocates for programs that do one thing very well. Programs that work together and provide universal interfaces and in particular to text streams. The idea stems from years of experience developing the Unix tools. The Unix philosophy it is not a written manifesto that you have to ascribe to or sign, it is a respected tradition that organically grew from writing the early Unix operating system and associated tools.

The Agile manifesto aims at delivering working software. It emphasizes on continuous testing and deployment, places special emphasize on working with the client, consulting regularly and lack of formal cast-in-stone methods. It is not surprising that agile initially appeared as anti-establishment, disruptive approach to formal software development methods. Agile process was quick to be adopted by early disruptive technologies like Ruby on Rails and their architects came to be Agile’s staunchest evangelists.

With maturation as a scientific discipline, a pride in Unix and Linux tradition with a rich client base of biologists, you would be mistaken to think that most bioinformatics tools would embrace the Unix philosophy and apply the Agile process. Instead, the bioinformatics software field is often a thorny garden. The field is comparable to a half sinking ship that magically remains afloat and somehow manages to deliver cargo. Often the passengers take the wheel for an adrenaline serge with undesirable consequences.

An attempt to overcome the current challenges has led to establishment of a bioinformatics software manifesto. According to this manifesto, bioinformatics tools are fragmented due to challenges in integration, interaction and visualization of biological data as well as scaling of calculations. It emphasizes on developing what it calls "small tools for bioinformatics".

The manifesto challenges and criticizes the current trends where "institutes and companies create monolithic software solutions for end users". These tools are often exceedingly expensive and not interoperable with other tools that are within the same domain. It calls for the adoption of the unix philosophy and the use of transparent open source licenses in bioinformatics.

Bioinformatics software packaging system is broken and fragmented as well. A web resource is disregarded as soon as an article is accepted in a journal. Where a tool exists, releases can be far between and collaboration is difficult or non-existent. The worst cases is where tools are developed without making use of a revision control mechanism. Small tools for bioinformatics calls for a standard approach to bioinformatics software packaging and distribution.

So will small tools for bioinformatics deliver? That remains to be seen. But first it has to gain a wide adoption and acceptance. As of this writing, 48 bioinformaticians have publicly signed the manifesto. It sounds like a rational approach to start weeding and pruning the thorny garden of bioinformatics software.

Liked this post? Share it with your followers. and Follow me on Twitter!

Publication quality plots with cowplot

I often turn to the ggplot2 R package for creating statistical plots. (Read Hadley's excellent tutorial on how to build a plot with ggplot2). ggplot2 is an implementation of the grammar of graphics, a concise approach for describing the components of a graphic.

It often takes extra effort to arrange ggplot2 objects on a grid. Sometimes the code may look like a hack or unpolished. This is partly due to the fact that R does not offer very clean syntax and therefore code can be difficult to read and understand. Here is a snippet of code that I was using to print four ggplot2 objects on a 2 x 2 grid.

#new grid
  #a 2 rows and 2 columns viewpoint
  pushViewport(viewport(layout = grid.layout(2,2)))

  #a function to render each viewpoint
  vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y)

  #print each plot in its respective area/grid
  print(plot1, vp = vplayout(1, 1))
  print(plot2, vp = vplayout(1, 2))
  print(plot3, vp = vplayout(2, 1))
  print(plot4, vp = vplayout(2, 2))

Using the recent cowplot R package, the seven lines of code are reduced to a single line! This may not seem a lot, but for the sake of brevity, beauty and intent, I think it is a big deal.


The above code is more legible than the previous code. The code also places labels against each plot. This is a winner feature for publication quality graphics. Many journals and reviewers expect well labelled graphics. Cowplot offers several other features.

Liked this post? Share it with your followers. and Follow me on Twitter!

On science and mankind

It would be possible to describe everything scientifically, but it would make no sense; it would be without meaning, as if you described a Beethoven symphony as a variation of wave pressure. -- Albert Einstein

My good friend Siwo, discussed some limitations of science in a blog post. He outlined limitations that emanate from human nature, limits in current knowledge, and the extent of observable and measurable universe. Some authorities claim that the limits of science lie in the inability to:

  • Assess the aesthetic value of an object,
  • Measure morality, and
  • Account for the supernatural

Aesthetics, morality and supernatural concepts emerge to preserve or structure a community. More often they propagate a strongly held opinion that strengthens social-political and geopolitical interests and agenda. The supernatural as an abstract concept explains natural phenomena and is riddled with unchallenged dogma. But dogma is not limited to the supernatural, it forms a central component of the scientific method. The difference is that scientific dogma is open to criticism and is reviewed in the light of new data, evidence or proof.

Humans make observations and measurements that are dependent on their senses and interpretation of the world or universe if you like. What cannot be observed, felt, or measured directly is often interpreted as "abstract", or "non-existent". The limit to understanding our universe is therefore in the extent to which we can "visualize" the abstract. And that is why mathematical tools are indispensable in understanding the laws that govern the universe. The abstract notions and equations yield novel theories, insights and perspectives.

The human mind is a fantastic logical machine but is limited by number of calculations it can process per unit time. But now we have powerful numerical machines that are a cause of excitement and concern and that are going to stir the moral and cultural pot for a while.

Liked this post? Share it with your followers. and Follow me on Twitter!

Beginning afresh

"Sometimes it's important to work for that pot of gold. But other times it's essential to take time off and to make sure that your most important decision in the day simply consists of choosing which color to slide down on the rainbow." ― Douglas Pagels, These Are the Gifts I'd Like to Give to You: A Sourcebook of Joy and Encouragement

Over the last couple of years, I have learnt a lot, I have made mistakes and sometimes missed opportunities. It is refreshing to "burn" the past, to start on a "clean slate". A few old slates are difficult to totally discard.

I took bold steps to enroll on a PhD program, fall in love, start a family and finish the PhD program even when funding was exhausted. Now I am very close to submitting the written thesis.

During this period, I have interacted with some of the smartest and gifted people in the world and made life long friends with very talented persons. I have travelled a little and found amazing people, places and cultures. My greatest journey has been fatherhood. It is far from over and I hope to chronicle relevant bits and pieces here.

Along the way, I have missed a few opportunities; the startup company that I declined to join, the startup ideas that I never followed or completed and the friendships that I let decay.

It is easy to consider life as a blue or a red pill, But I think it is better to prefer the daily excitement, appreciate the uncertainty of life, grapple chaos and usher the illusion of order.

Liked this post? Share it with your followers. and Follow me on Twitter!