Standalone BLAST with Ruby revisited

Earlier  I showed a very simple way to perform a BLAST  using Ruby. Today I would like to revisit that topic for two reasons.

  1. The “using ruby with blast” search term seems to be very common and actually one of the ways that people reach my blog.
  2. The original post was not very through.

BLAST aka Basic Local Alignment Tool is used to search a sequence (either DNA or protein) against a database of other sequences (either all nucleotide or all protein) in order to identify similar sequences. BLAST has many different flavors and can  search DNA against DNA or protein against protein and also can translate a nucleotide query and search it against a protein database  and vice versa. It can also compute a “profile” for the query sequence and use that for further searches as well as search the query against a database of profiles.

The BLAST tool is fundamental to molecular biologists and bioinformaticians. There are excellent books and tutorials on how to and when to use BLAST, so i will assume all you need is to automated your work and parse the results. The actual algorithm is implemented in C and freely  available from the NCBI website.The first thing  to do is to download the appropriate binaries for your platform. Instructions for setting up and installing BLAST

Once installed on your system  the primary method of interaction is using the command line. Use formatdb to create blast databases and blastall to search for sequence homology for a given sequence against a given blast database.

In Ruby, there are two ways you can call the BLAST program. First using the Bioruby library and second by writing your own ruby wrapper for the BLAST command line parameters and execution. Most often, one executes BLAST from the command line and then process the results file which is in either one of the many BLAST output formats. Bioruby is excellent  at parsing the results file. Using Bioruby with BLAST is  very straightforward:

#blasting the bioruby way
  #query_file: a list of query sequences in fasta format
  #database_path: a path to the actual BLAST formatted database
  #program: The BLAST program to call, either of blastp,blastn,tblastn e.t.c.
    def bio_blast(program, database_path,query_file)
        factory = Bio::Blast.local(program,database_path)

        ff = Bio::FlatFile.open(Bio::FastaFormat, query_file)
        ff.each do |entry|
           report = factory.query(entry) # report will be a Blast::Report object
          # iterate trough the hits
          report.each do |hit|
puts hit.bit_score        # bit score (*)
puts hit.query_seq        # query sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence)
puts hit.midline          # middle line string of alignment of homologous region (*)
puts hit.target_seq       # hit sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence)
puts hit.evalue           # E-value
puts hit.identity         # % identity
puts hit.overlap          # length of overlapping region
puts hit.query_id         # identifier of query sequence
puts hit.query_def        # definition(comment line) of query sequence
puts hit.query_len        # length of query sequence
puts hit.target_id        # identifier of hit sequence
puts hit.target_def       # definition(comment line) of hit sequence
puts hit.target_len       # length of hit sequence
puts hit.query_start      # start position of homologous region in query sequence
puts hit.query_end        # end position of homologous region in query sequence
puts hit.target_start     # start position of homologous region in hit(target) sequence
puts hit.target_end       # end position of homologous region in hit(target) sequence
puts hit.lap_at           # array of above four numbers
hit.each do |hsp| puts hsp.query_from end end end end

The method will execute BLAST and also print the hits and the high scoring potions start coordinates for each hit. How ever you may want to just run BLAST without the bioruby overhead. The line below will work as well:

  input = query_path
    #execute blast and store the results in the blast_results  variable
    #-p blast program to run
    #-d blast database to query against
    #-T gives a html output
    #-i query file path

  #execution
blast_result = %x(blastall -p #{program} -d #{database} -e #{expectation} -M #{matrix}
                 -i #{input} -T  T)
#blast_result will be the output from the system execution of the above command. You can choose to write it 
to a file or process it using the Bio::Blast::Report object.

You can use a similar style command like the one above to create BLAST databases using the formatdb command.

I would recommend the use of the bio-ruby blast report parsing classes to automate the process. Please look at the Bio-ruby API documentation for more details.


2 Responses to “Standalone BLAST with Ruby revisited”

  1. Anonymous Says:

    Thank you for the informative post.
    Could you also give an example of how to.use ruby to make an informative image from a blast result?


Leave a Reply