Bioruby Resources
Posted: April 17, 2008 Filed under: Uncategorized Leave a comment »As a member of the Bioruby mailing list, my first post was where is the Bioruby documentation?
Why it is that Bioruby is not well documented? This question pops up so often in the mailing list that I am going to list a couple of resources here:
The best place to get started is at the Bioruby new website. For some reason the other website is still active.
An excellent tutorial written by Katayama Toshiaki among others is available here. Sample scripts on common tasks for sequence manipulations can be accessed from here.
Some Bioruby presentations can be downloaded from here as well.
The following blogs are dedicated in a way to Bioruby and bioinformatics
http://www.bioinformaticszen.com/
http://saaientist.blogspot.com/
http://bioinforuby.blogspot.com/
http://rubyonwindows.blogspot.com/
http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/
If you have a specific question post it on the mailing list. And you may have an opportunity to have your question or problem solved by the experts!
Bioruby workshop
Posted: April 17, 2008 Filed under: tutorials Leave a comment »
I would like to announce the first Bioruby workshop here in East Africa.
Please note the following important points:
1. The course will be held on Thursday 22nd May at the International Livestock Research Institute (ILRI) Nairobi-Kenya
2. Send your application form to bioinfoafrica@gmail.com. Applications not sent to this EMAIL ADDRESS WILL NOT be processed.
3. Deadline for application is May 1st 2008, successful applicants will be notified by 8th of May 2008.
4. Currently we are not able to offer travel fellowships for members outside Nairobi or Kenya. However, this is just the beginning of RSG East Africa getting organized to have training for its members and we hope in due course that such initiative will attract funding for travel.
Bioruby mini-series: The Bio::Sequence::Common class
Posted: February 12, 2008 Filed under: bioinformatics, bioruby, tutorials | Tags: bioruby, code, ruby 6 Comments »
Sequence Transformation
Lets have a look at the Bio::Sequence::Common class module which provides us with most of the sequence transformation methods for biological sequences.
Bio::Sequence::Common
Implements methods which are common to both Bio::Sequence::AA and Bio::Sequence::NA, for example
A Bio::Sequence object is easily created like this;
require ‘bio’
my_dna = Bio::Sequence.auto("actagatatttgat") #=> actagatatttgat
my_dna is now a Bio::sequence object and you can use the various methods available for this class, which we are going to explore shortly.
Bio::Sequence::Common Non Modifying methods
-
to_s
This method returns a sequence as a string. It does not modify the original sequence.
puts my_dna.to_s #=> actagatatttgat
puts my_dna.to_s.class #=> String
An alias for this method is the to_str method.
my_dna.to_str
#=> actagatatttgat
-
seq
This method will return a new Bio::Sequence::NA or Bio::Sequence::AA object. The original sequence remains unchanged. For example if you wished to assign a new instance of my_dna object that we created above ,such that you have a my_dna2 object, you would create that as follows,
my_dna2 = my_dna.seq
puts my_dna2 #=> actagatatttgat
puts my_dna2.class #=>
Bio::Sequence::NA
Bio::Sequence::Common modifying methods
-
Normalize!
This method removes all the white space and transforms all positions to uppercase if the sequence is an amino acid (AA) or transforms all positions to lowercase if the sequence is a nucleic acid (NA) sequence, leaving the original sequence modified
For example
test_seq = Bio::Sequence::NA.new(“ACTG”)
puts test_seq.normalize! #=>
actg
-
Concatenating
Many times we want to append a new sequence or a set of bases/residues eg a poly A sequence to the end of a new sequence and modify the original sequence. This is achieved by the concat method.
It is also referred to as << method.test_seq = Bio::Sequence::NA.new(“actg”)
test_seq << “acagat”
test_seq concat “acagat”
puts test_seq #=>
actgacagat
Note that to create a new sequence that adds to an existing sequence without altering the original sequence you would use the + operator. It accepts a variable number of arguments. For example
test_seq = Bio::Sequence::NA.new(“actg”)
test_seq2 = test_seq + (“cttcccttttt” “tatatata”)
puts test_seq2 #=>
actgcttcccttttttatatata
puts test_seq #=>actg
Working with subsequences
Please note that biological sequence numbering convections are one based as opposed to ruby’s zero based. Biological coordinate’s convection for BioSQL and Chado is zero based.
-
Subseq
This method returns a new sequence containing the subsequence identified by the start and end values given as parameters. This method works in a similar way to the slice string method. For example
my_seq = Bio::Sequence::NA.new(“agggatttc”)
puts my_seq.subseq(2,5) #=>
ggga
The first argument denotes the start and the second argument denotes the end of the subsequence. Both arguments must be positive integers
When this method is used without arguments, the start defaults to 1 and the end defaults to the last element of the string. Therefore when subseq is called without any arguments, it returns a new sequence similar to the original sequence.
puts my_seq.subseq #=> agggatttc
-
window_search
This method is typically used with a block. The method is called if you wanted to step through a sequence given a length of a subsequence. Therefore the method accepts two arguments. Step_size which defines the size of your ‘steps’ and the window_size which defines the length of the stepping subsequence. Any remaining sequence at the terminal end will be returned. The default step size is one since its an optional argument.
For example
To print the average GC% on each 100bp you can write,
s.window_search(100) do |subseq|
puts subseq.gc
end
Bioruby mini-series: The Sequence class
Posted: November 23, 2007 Filed under: bioruby, tutorials 3 Comments »Bioruby is a bioinformatics ruby package for analysis of biological sequences. In my quest to become a bioruby guru i have decided to poke the bioruby API and all available tutorials to better understand this fantastic library written by the bioruby team of developers. My journey will be logged here as the bioruby mini series. We start with an introductory overview of the sequence class.
To use the library you need to have a ruby interpreter installed , preferably ruby 1.8.5 and above . To install bioruby as a gem, do:
sudo gem install bio
This will install Bioruby version 1.1.0 and it comes with its own shell as well.
Type bioruby on the command prompt and you will see this:
Loading config (/.bioruby/shell/session/config) … done
Loading object (/.bioruby/shell/session/object) … done
Loading history (/.bioruby/shell/session/history) … done
. . . B i o R u b y i n t h e s h e l l . . .
Version : BioRuby 1.1.0 / Ruby 1.8.6
bioruby>
Now we ready to rock and roll! I dug in to the API and extracted some useful information for us.
The Bio::Sequence class
This is the primary sequence class and deals with sequence translation and transformations. It inherits from ruby’s string class which means that you can use ruby’s string methods with the Bio::Sequence class just like you would with a string.
The Bio::Sequence class object is a wrapper around the actual sequence and it is represented as either a Bio::Sequence::NA or a Bio::Sequence::AA. and responds to all the methods that are defined for both NA and AA classes. This class has the following methods:
- auto – This will guess the type of sequence provided and return the appropriate Bio:Sequence class for the given string, either a Bio::Sequence::AA or a Bio::Sequence::NA
- new – Creates a new Bio::Sequence object. It does not initialize the object in to any of the bioruby objects. It returns a string.
- aa – Will transform your current Bio::Sequence object to a Bio::Sequence::AA object. It will change your current object i.e it will transform a Bio::Sequence::NA to a Bio::Sequence::AA which is undesirable. So it needs to be used only when you are sure of the type of sequence you are working with.
- na – works the same as the aa method above but the returned object is a Bio::Sequence::NA
- output – It returns a string with the current Bio::Sequence object formatted with the given style. The supported styles are fasta, genbank and embl. The style argument is passed as a ruby symbol eg :fasta
- to_s – it returns the sequence as a string leaving the original sequence unaltered. The to_str is an alias for this method
Bio::Sequence::NA class
This class wraps a nucleic acid sequence. It provides a number of methods to work with a DNA sequence as demonstrated in the example below.
Dr Optimist has finally finished his long awaited sequencing project code named Sikwensi. The nucleic acid sequence for a chromosome for which he won’t reveal any further details is shown below.
“gacagatggacatggactagagctgct”
He calls his trusted ruby programmer to help analyze the sequence and tear it base by base. The guy gets to work.
require ‘bio’
bio_seq = Bio::Sequence.auto( ‘gacagatggacatggactagagctgct’) #=> bio_seq is now a Bio::Sequence::NA object
#get the number of codons in the sequence
bio_seq.window_search(3,3) {|codon| puts codon}
# complemental sequence
bio_seq.complement (Bio::Sequence::NA object)
# gets subsequence of positions 4 to 14
bio_seq.subseq(4,14) # he thinks the subsequence is interesting and worth extracting!
bio_seq.gc_percent #what is the gc content?
bio_seq.composition # nucleic acid compositions (returns a Hash)
bio_seq.translate # translation ( returns a Bio::Sequence::AA object)
bio_seq.translate(2) # translation from frame 2 (The default is frame 1)
bio_seq.translate(1,11) # using codon table No.11 (bacteria)
bio_seq.translate.codes # shows three-letter codes ( returns an Array)
bio_seq.translate.names # shows amino acid names (returns an Array)
bio_seq.translate.composition # amino acid compositions (returns a Hash)
bio_seq.translate.molecular_weight # calculating molecular weight (returns Float)
bio_seq.complement.translate # translation of complemental strand
A tutorial written by Katayama Toshiaki can be found here and translated to English by Naohisa Goto. (Thank you guys!)
A Ruby algorithms resource
Posted: November 17, 2007 Filed under: algorithms, ruby | Tags: algorithms, ruby 1 Comment »
An algorithm is a procedure to accomplish a specific task. They solve general well specified problems and are the ideas behind computer programs.
Rubyquiz is an interesting repository for ruby programs that implement queit interesting algorithms. Even though the programs may not have anything to do with biology, some of the algorithms definitely do. It is a good place to a find a start point for your ruby algorithm implementation. Browsing the different sets of problems can give a lot of insight on how to approach some common programming problems eg. writing an inference engine, a hidden markov chain , dictionary matcher etc
The site has about 147 quizzes as of this writing. Take a look!
PlasmoDB 5.4 released
Posted: November 11, 2007 Filed under: databases, malaria | Tags: malaria Leave a comment »
The ApiDB team have announced a new release of the Plasmodb database version 5.4. The database hosts genomic and proteomic data for different species of the parasitic eukaryote Plasmodium, which is the causative agent for malaria. It brings together data provided by numerous laboratories worldwide. From an email sent to registered database users,
New data in the new release include:
- A slightly modified reference genome for P. falciparum
- P. berghei gametocyte proteomics data
- Many additional P. falciparum SNPs
- Additional ESTs
- Expression profiling data for antigenic and adherent variants of P. falciparum 3D7
- User comments submitted prior to June 2007 have now been incorporated into the official annotation.
A brief list of new features include:
- faster loading of Gene and Genome Browser pages.
- Improved synteny views in the Genome Browser.
- Browser views of rodent malaria genomes colored to indicate chromosomes.
- Gene page links to various external data sources (including PlasmoMAP, TDRtargets, UCSC P. falciparum genome browser, Ontology-based Pattern Identification and literature databases).
- More convenient access to help … please click the “Ask us a Question” link on the left of every page, or the “Contact Us” at bottom to report problems or suggest improvements to the database.
Many thanks to the Plasmodb team and the entire ApiDB team for the the recent improvements and for the new datasets.
A ruby micro review
Posted: November 10, 2007 Filed under: ruby, technology Leave a comment »
Ruby is a reflective, dynamic, object-oriented programming language, created by Yukihiro Matsumoto and released to the public in 1995. It is an extremely pragmatic language, less concerned with formalities and more concerned with ease of development and valid results. You will see Agile principles running through ruby and particularly with rails. Most of all TDD and BDD concepts / philosophies have been implemented for ruby developers. Ruby differs from most programming languages by syntax, culture, grammar and customs. It has more in common with LISP and Smalltalk than with most languages such as C++ and PHP.
If you can program in languages such as Perl, PHP, C or Pascal, using and learning ruby is quite easy, but the problem solving pespectives that ruby uses may throw you out at first.
The so popular and hyped ruby on rails DSL (domain specific language) is a framework for developing web applications and currently powers hundreds of large websites around the world.
Bioruby is an excellent bioinformatics library for ruby. Though not highly documented like its sister, bioperl, efforts are been made to improve its level of documentation. The bioruby community is also really nice and friendly. Not a single question that i have posted on the mailing list goes unanswered.
Hundreds of libraries for performing different tasks have been written for ruby , packaged as gems and hosted at rubyforge
So far my favorite ruby editor is the netbeans IDE, whose currently release is in beta 2. The final release is slated for 3rd of Dec 2007. (Am waiting!). It features auto completion, syntax highlighting among other cool things that makes programming a joy. It also comes bundled with the jruby release, a java implementation of ruby that is starting to rock the world, so you can choose to use either native ruby or jruby, the choice is all yours!
Ruby can be downloaded here
Plasmodium falciparum re-annotation workshop opens
Posted: October 22, 2007 Filed under: bioinformatics, databases, malaria | Tags: annotation, bioinformatics, malaria, plasmodium Leave a comment »The Plasmodium genome re-annotation workshop opened on 21st October at the Sanger center. The Workshop runs till the 26th and aims to re-annotate the P. falciparum genome. In a welcome message Prof David Roos pointed out that a major goal of the workshop is to ascribe new or updated functions to gene models, reflecting the current state of knowledge in the wider malaria community.
The Plasmodium falciparum sequencing project was completed in 2002 and since then the Plasmodb database which is currently at version 5 has been the primary source of P. falciparum data and genomic information. With 60% of the P. falciparum genes annotated as hypothetical, it is time to reduce the number of hypothetical genes by providing annotations where known and possible.
Issues that will be addressed and visited include:
- standards for the use of structured gene ontologies in gene/genome annotation
- naming conventions for “hypothetical proteins”, “conserved hypotheticals”, “putative kinases”, etc
- naming conventions for large gene families
- standards for inferring function from orthology, motif/domain conservation, or ‘guilt by association’ based on functional genomics data
- standards for transfering annotations to orthologs in other Plasmodium species
- plans and proposals for further Plasmodium sequencing and other genomics resources
- pipelines for ensuring currency and consistency of data in GenBank/EMBL, GeneDB, PlasmoDB, etc
- future requirements and needs for Plasmodium informatics resources
- annotation projects not completed during the workshop … and strategies for ensuring completion
The workshop is sponsored by Sanger institute and plasmodb
Standalone BLAST with Ruby (windows)
Posted: October 3, 2007 Filed under: bioinformatics, blast, ruby, technology | Tags: bioinformatics, blast, ruby, technology 2 Comments »#create a query sequence
myseq="pcaatcacatyyawwqqffgghhhkllkl"
#create a temporary file
require 'tempfile'
temp=Tempfile.new("seqfile")
#get the name of the temporary file
name=temp.path
#append the contents your sequence to this temporary file
temp.puts "#{myseq}"
temp.close
#since we have a protein query sequence, we will run a blastp. Please note that you will need to have a valid #database to query against. use the formatdb command to create your database before executing the lines #below.
@program = 'blastp'
#path to blast
@database = 'c:/path_to_databasefile'
#name of your query file
@input= name
#your blast output file
@output='c:/path_output_file'
#assume your blast is in a folder called NCBI_Blast, execute
system( "c:/NCBI_Blast/bin/blastall.exe -p #{@program} -d #{@database} -i #{@input} -o #{@output}")
#To capture the output in a variable execute this command instead.
#note that we have omitted the blast -o parameter
result=%x(c:/NCBI_Blast/bin/blastall.exe -p #{@program} -d #{@database} -i #{@input} )
#remember to delete the temporary file!
temp.close(true)