<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Biorelated &#187; bioinformatics</title>
	<atom:link href="http://biorelated.com/category/bioinformatics/feed/" rel="self" type="application/rss+xml" />
	<link>http://biorelated.com</link>
	<description></description>
	<lastBuildDate>Sat, 28 Apr 2012 00:12:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='biorelated.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/41054b22bbe7debbf1d63972772e21fa?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Biorelated &#187; bioinformatics</title>
		<link>http://biorelated.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://biorelated.com/osd.xml" title="Biorelated" />
	<atom:link rel='hub' href='http://biorelated.com/?pushpress=hub'/>
		<item>
		<title>Announcing scribl-rails</title>
		<link>http://biorelated.com/2012/02/20/announcing-scribl-rails/</link>
		<comments>http://biorelated.com/2012/02/20/announcing-scribl-rails/#comments</comments>
		<pubDate>Mon, 20 Feb 2012 17:12:25 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[biographics]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[ruby on rails]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://biorelated.com/?p=468</guid>
		<description><![CDATA[Sometimes back I had mentioned about scribl javascript framework for drawing bioinformatics glyphs on HTML5 canvas. If you are a Rails developer you will be happy to know that I have written  scribl-rails, an asset helper for including scribl in your application asset pipeline. Usage Add the following to your gemfile: gem 'scribl-rails' ran bundle [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=468&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div id="attachment_470" class="wp-caption aligncenter" style="width: 650px"><a href="http://biorelated.files.wordpress.com/2012/02/screen-shot-2012-02-20-at-20-00-09.png"><img class="size-full wp-image-470 " title="scribl example glyphs" src="http://biorelated.files.wordpress.com/2012/02/screen-shot-2012-02-20-at-20-00-09.png?w=590" alt=""   /></a><p class="wp-caption-text">PfEMP1 domains drawn with scribl</p></div>
<p>Sometimes back I had mentioned about <a title="Scribl" href="http://chmille4.github.com/Scribl/" target="_blank">scribl javascript framework</a> for drawing bioinformatics glyphs on HTML5 canvas. If you are a Rails developer you will be happy to know that I have written <a title="Scribl-rails" href="https://rubygems.org/gems/scribl-rails" target="_blank"> scribl-rails</a>, an asset helper for including scribl in your application asset pipeline.</p>
<h3>Usage</h3>
<p>Add the following to your gemfile:</p>
<pre><code>gem 'scribl-rails' </code></pre>
<p>ran bundle install from the application directory</p>
<p>Add the following directive to your Javascript manifest file (application.js):</p>
<pre><code>//= require scribl </code></pre>
<p>Enjoy using scribl-rails and creating cute bio-graphics! Many thanks to Chase Miller for the awesome library!</p>
<p>For more<a title="Scribl-rails" href="https://github.com/georgeG/scribl-rails" target="_blank"> information and development check out the scribl-rails </a>at github</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/468/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/468/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/468/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/468/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/468/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/468/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/468/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/468/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/468/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/468/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/468/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/468/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/468/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/468/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=468&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2012/02/20/announcing-scribl-rails/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>

		<media:content url="http://biorelated.files.wordpress.com/2012/02/screen-shot-2012-02-20-at-20-00-09.png" medium="image">
			<media:title type="html">scribl example glyphs</media:title>
		</media:content>
	</item>
		<item>
		<title>Use Scribl to draw genomic glyphs on HTML5 canvas</title>
		<link>http://biorelated.com/2011/09/20/use-scribl-to-draw-genomic-glyphs-on-html5-canvas/</link>
		<comments>http://biorelated.com/2011/09/20/use-scribl-to-draw-genomic-glyphs-on-html5-canvas/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 13:49:36 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[biographics]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[ruby on rails]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://biorelated.com/?p=345</guid>
		<description><![CDATA[The Scribl library by Chase Miller is an awesome and a promising Javascript library for visualizing biological sequence information and rendering it on the web. Scribl  generates biological charts of genomic regions, alignments, and assembly data. The library is under continuous development and I have been able to use it for some internal projects! A [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=345&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://biorelated.files.wordpress.com/2011/09/screen-shot-2011-09-20-at-16-51-58.png"><img class="aligncenter size-medium wp-image-431" title="Scrible glyphs" src="http://biorelated.files.wordpress.com/2011/09/screen-shot-2011-09-20-at-16-51-58.png?w=449&h=173" alt="" width="449" height="173" /></a><a title="Scribl" href="https://github.com/chmille4/Scribl" target="_blank">The Scribl library </a>by Chase Miller is an awesome and a promising Javascript library for visualizing biological sequence information and rendering it on the web. <a title="Scribl" href="http://chmille4.github.com/Scribl/" target="_blank">Scribl </a> generates biological charts of genomic regions, alignments, and assembly data. The library is under continuous development and I have been able to use it for some internal projects!</p>
<p>A very nice list of examples and introduction is available at the <a title="Scribl home page" href="http://chmille4.github.com/Scribl/" target="_blank">home page</a> and the <a title="Scribl wiki" href="https://github.com/chmille4/Scribl/wiki" target="_blank">wiki provides an elaborate user guide</a>!</p>
<p>Happy biology!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/345/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=345&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2011/09/20/use-scribl-to-draw-genomic-glyphs-on-html5-canvas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>

		<media:content url="http://biorelated.files.wordpress.com/2011/09/screen-shot-2011-09-20-at-16-51-58.png?w=300" medium="image">
			<media:title type="html">Scrible glyphs</media:title>
		</media:content>
	</item>
		<item>
		<title>Translating a nucleotide sequence in six frames with bioruby</title>
		<link>http://biorelated.com/2011/02/02/translating-a-nucleotide-sequence-in-six-frames-with-bioruby/</link>
		<comments>http://biorelated.com/2011/02/02/translating-a-nucleotide-sequence-in-six-frames-with-bioruby/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 12:18:53 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[bioruby]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://biorelated.com/?p=297</guid>
		<description><![CDATA[Bioruby offers a very easy and simple way to translate nucleotide sequences. seq= Bio::Sequence::NA.new("acctatagctctagcta") seq.translate We know that there are six posible reading frames for any given nucleotide sequence. Generally the longests Open reading frame is taken to be the correct frame, when we do not have information about the possible protein that is encoded [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=297&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Bioruby offers a very easy and simple way to translate nucleotide sequences.</p>
<pre>seq= Bio::Sequence::NA.new("acctatagctctagcta")</pre>
<pre>seq.translate</pre>
<p>We know that there are six posible reading frames for any given nucleotide sequence. Generally the longests Open reading frame is taken to be the correct frame, when we do not have information about the possible protein that is encoded by a given gene. By default the translate method performs translation in the first frame but it can take an argument that defines the translation frame</p>
<pre>seq.translate(2) #translate using the second reading frame.</pre>
<p>Given a long list of sequences how do we quickly determine the correct reading frame. We would want to have method to translate a given  sequence in all frames and pick the longest reading frame. Assuming that the correct reading frame has no stop codons, we can write a quick method to perform  the six frame translation.</p>
<pre> def longest_reading_frame(sequence)
  orfs = [] #a container for orfs(open reading frames)
  #translate a sequence in all 6 frames
   6.times do |frame|
   translated = Bio::Sequence::NA.new(sequence).translate(frame + 1)
   stop_codons = translated.scan(/\*/).size
    orfs &lt;&lt; translated if stop_codons == 0
   end
  orfs[0]
end</pre>
<p>This method uses an array to collect all translated sequences that contain no stop codons and returns the first sequence in the array. This might not scale very well for very long sequences but that will be a post for another day!</p>
<p>Happy Biology!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/297/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=297&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2011/02/02/translating-a-nucleotide-sequence-in-six-frames-with-bioruby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>
	</item>
		<item>
		<title>My first Bioruby plugin calculates the isoelectric point of a protein</title>
		<link>http://biorelated.com/2011/01/06/my-first-bioruby-plugin-calculates-the-isoelectric-point-of-a-protein/</link>
		<comments>http://biorelated.com/2011/01/06/my-first-bioruby-plugin-calculates-the-isoelectric-point-of-a-protein/#comments</comments>
		<pubDate>Thu, 06 Jan 2011 18:15:07 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[bioruby]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://biorelated.com/?p=255</guid>
		<description><![CDATA[Late last year,  there was a lot of talk about creating a plugin system for Bioruby. The idea is that more people can start to develop bioinformatics libraries using the Ruby language and the libraries can leverage on the bioruby framework. Bioruby maintainers can then concentrate on yet to be defined &#8220;core&#8221; parts of the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=255&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Late last year,  there was a lot of talk about creating a<a title="bioruby plugins" href="http://bioruby.open-bio.org/wiki/Plugins" target="_blank"> plugin system for Bioruby</a>. The idea is that more people can start to develop bioinformatics libraries using the Ruby language and the libraries can leverage on the bioruby framework. Bioruby maintainers can then concentrate on yet to be defined &#8220;core&#8221; parts of the library to ensure compatibility and support for the plugins.Together with Pascal Bentz we have created a library to calculate the Isoelectric point of a protein given a Pka set and an  amino acid sequence of a peptide/protein. The project lay domant for a while at github until now! I am happy to release my first bioruby plugin, bio-isoelectric point! <a title="Bio-isoelectric_point" href="https://rubygems.org/gems/bio-isoelectric_point" target="_blank">Download it at rubygems.org</a> <a title="bioruby-isoelectric-point" href="https://github.com/georgeG/bioruby-isoelectric_point" target="_blank">Fork it and check the usage at github </a></p>
<p>Examples<br />
<code><br />
require 'bio'<br />
require 'bio-isoelectric_point'<br />
protein_seq = Bio::Sequence::AA.new("KKGFTCGELA")</code></p>
<p>#what is the protein charge at ph 14?<br />
charge = protein_seq.charge_at(14) #=&gt;-2.999795857467562</p>
<p>#calculate the ph using dtaselect pka set and round off to 3 decimal places<br />
isoelectric_point = protein_seq.isoelectric_point(&#8216;dtaselect&#8217;, 3) #=&gt;8.219</p>
<p># calculate the isoelectric point pH with a custom set<br />
custom_pka_set = { &#8220;N_TERMINUS&#8221; =&gt; 8.1,<br />
&#8220;K&#8221; =&gt; 10.1,<br />
&#8220;R&#8221; =&gt; 12.1,<br />
&#8220;H&#8221; =&gt; 6.4,<br />
&#8220;C_TERMINUS&#8221; =&gt; 3.15,<br />
&#8220;D&#8221; =&gt; 4.34,<br />
&#8220;E&#8221; =&gt; 4.33,<br />
&#8220;C&#8221; =&gt; 8.33,<br />
&#8220;Y&#8221; =&gt; 9.5<br />
}<br />
iep_ph = protein_seq.isoelectric_point(custom_pka_set, 3) #=&gt; 8.193</p>
<p>This gem supports the following Pka sets, as well as allowing a user to provide a custom Pka set.</p>
<pre>    * dta_select
    * emboss
    * rodwell
    * wikipedia
    * sillero</pre>
<p>Happy biology!</p>
<p><a title="Bio-isoelectric_point" href="https://rubygems.org/gems/bio-isoelectric_point" target="_blank"><br />
</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/255/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/255/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/255/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/255/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/255/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/255/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/255/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/255/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/255/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/255/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/255/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/255/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/255/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/255/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=255&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2011/01/06/my-first-bioruby-plugin-calculates-the-isoelectric-point-of-a-protein/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>
	</item>
		<item>
		<title>Standalone BLAST with Ruby revisited</title>
		<link>http://biorelated.com/2009/12/15/standalone-blast-with-ruby-revisited/</link>
		<comments>http://biorelated.com/2009/12/15/standalone-blast-with-ruby-revisited/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 09:32:19 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[blast]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://biorelated.wordpress.com/?p=211</guid>
		<description><![CDATA[Earlier  I showed a very simple way to perform a BLAST  using Ruby. Today I would like to revisit that topic for two reasons. The &#8220;using ruby with blast&#8221; search term seems to be very common and actually one of the ways that people reach my blog. The original post was not very through. BLAST [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=211&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Earlier  I showed a very simple way to<a href="http://biorelated.wordpress.com/2007/10/03/standalone-blast-with-ruby-part-1/" target="_blank"> perform a BLAST  using Ruby</a>. Today I would like to revisit that topic for two reasons.</p>
<ol>
<li>The &#8220;using ruby with blast&#8221; search term seems to be very common and actually one of the ways that people reach my blog.</li>
<li>The original post was not very through.</li>
</ol>
<p>BLAST aka Basic Local Alignment Tool is used to search a sequence (either DNA or protein) against a database of other sequences (either all nucleotide or all protein) in order to identify similar sequences. BLAST has many different flavors and can  search DNA against DNA or protein against protein and also can translate a nucleotide query and search it against a protein database  and vice versa. It can also compute a “profile” for the query sequence and use that for further searches as well as search the query against a database of profiles.</p>
<p>The BLAST tool is fundamental to molecular biologists and bioinformaticians. There are excellent books and tutorials on how to and when to use BLAST, so i will assume all you need is to automated your work and parse the results. The actual algorithm is implemented in C and freely  available from the NCBI website.The first thing  to do is to download the appropriate binaries for your platform. <a title="installing blast" href="http://bioinfolab.unl.edu/emlab/documents/blast_readme/README.bls.html" target="_blank">Instructions for setting up and installing BLAST</a></p>
<p>Once installed on your system  the primary method of interaction is using the command line. Use formatdb to create blast databases and blastall to search for sequence homology for a given sequence against a given blast database.</p>
<p>In Ruby, there are two ways you can call the BLAST program. First using the <a href="http://bioruby.org/" target="_blank">Bioruby library</a> and second by writing your own ruby wrapper for the BLAST command line parameters and execution. Most often, one executes BLAST from the command line and then process the results file which is in either one of the many BLAST output formats. Bioruby is excellent  at parsing the results file. Using Bioruby with BLAST is  very straightforward:</p>
<p>#blasting the bioruby way   #query_file: a list of query sequences in fasta format   #database_path: a path to the actual BLAST formatted database   #program: The BLAST program to call, either of blastp,blastn,tblastn e.t.c.<br />
def bio_blast(program, database_path,query_file)</p>
<p><code><br />
factory = Bio::Blast.local(program,database_path)<br />
ff = Bio::FlatFile.open(Bio::FastaFormat, query_file)<br />
ff.each do |entry|<br />
report = factory.query(entry) # report will be a Blast::Report object<br />
# iterate trough the hits<br />
report.each do|hit|<br />
puts hit.bit_score        # bit score (*)<br />
puts hit.query_seq        # query sequence<br />
puts hit.midline          # middle line string of alignment of homologous region (*)<br />
puts hit.target_seq       # hit sequence<br />
puts hit.evalue           # E-value<br />
puts hit.identity         # % identity<br />
puts hit.overlap          # length of overlapping region<br />
puts hit.query_id         # identifier of query sequence<br />
puts hit.query_def        # definition(comment line) of query sequence<br />
puts hit.query_len        # length of query sequence<br />
puts hit.target_id        # identifier of hit sequence<br />
puts hit.target_def       # definition(comment line) of hit sequence<br />
puts hit.target_len       # length of hit sequence<br />
puts hit.query_start      # start position of homologous region in query sequence<br />
puts hit.query_end        # end position of homologous region in query sequence<br />
puts hit.target_start     # start position of homologous region in hit(target) sequence<br />
puts hit.target_end       # end position of homologous region in hit(target) sequence<br />
puts hit.lap_at           # array of above four numbers<br />
hit.each do |hsp|<br />
   puts hsp.query_from<br />
   end<br />
  end<br />
 end<br />
end<br />
</code></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/211/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/211/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/211/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/211/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/211/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/211/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/211/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/211/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/211/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/211/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/211/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/211/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/211/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/211/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=211&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2009/12/15/standalone-blast-with-ruby-revisited/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>
	</item>
		<item>
		<title>A ruby class for screen-scraping plasmodb database</title>
		<link>http://biorelated.com/2009/12/09/a-ruby-class-for-screen-scraping-plasmodb-database/</link>
		<comments>http://biorelated.com/2009/12/09/a-ruby-class-for-screen-scraping-plasmodb-database/#comments</comments>
		<pubDate>Wed, 09 Dec 2009 15:25:50 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[bioruby]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[malaria]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://biorelated.wordpress.com/?p=182</guid>
		<description><![CDATA[Plasmodb is the primary resource for retrieving Plasmodium falciparum genomic data and information. Unfortunately this database has no API or XML service to request or query its  information from a programmer&#8217;s point of view or for easy automation of sequence information retrieval.  Recently I needed to download a long list of Plasmodium falciparum genomic, Protein [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=182&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a title="Plasmodb" href="http://www.plasmodb.org" target="_blank">Plasmodb</a> is the primary resource for retrieving <em>Plasmodium falciparum </em>genomic data and information. Unfortunately this database has no API or XML service to request or query its  information from a programmer&#8217;s point of view or for easy automation of sequence information retrieval.  Recently I needed to download a long list of <em>Plasmodium falciparum </em>genomic, Protein and other information for a set of genes. Been lazy to click and open the webpage for each gene in my list. I wrote this in ruby.</p>
<p>It would be great if Plasmodb  would provide an easy way  of automated sequence retrieval. A webservice or an XML output format would do. Screen scraping is not a very efficient approach.  Here we use<a title="ScrAPI" href="http://blog.labnotes.org/tag/scrapi/" target="_self"> Scrapi </a>which  is an HTML scraping toolkit for Ruby. It uses CSS selectors to write easy, maintainable scraping rules to select, extract and store data from HTML content.</p>
<p><img src="http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/images/clear.png" alt="" width="10" height="1" /></p>
<pre><span style="color:#969696;">#A class to fetch information from plasmodb using the scrapi API
</span><span style="color:#969696;">#</span><span style="color:#969696;">#TODO handle  Scraper::Reader::HTTPUnspecifiedError
</span><span style="color:#0000e6;">class</span> <span style="color:#000000;">Plasmodb</span>
   <span style="color:#969696;">#retrives a information  using the gene_id
</span>   <span style="color:#969696;">#returns a structure obj
</span>  <span style="color:#0000e6;">def</span> fetch_by_gene_id(var_name)
    <span style="color:#0000e6;">begin</span>
      scraper = <span style="color:#000000;">Scraper</span>.define <span style="color:#0000e6;">do</span>
        process <span style="color:#ce7b00;">"</span><span style="color:#ce7b00;">div#genomicSequence pre</span><span style="color:#ce7b00;">"</span>,    <span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">genomic_sequence</span>  =&gt; <span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">text</span>
        process <span style="color:#ce7b00;">"</span><span style="color:#ce7b00;">div#transcriptSequence pre</span><span style="color:#ce7b00;">"</span>, <span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">mrna_sequence</span> =&gt;<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">text</span>
        process <span style="color:#ce7b00;">"</span><span style="color:#ce7b00;">div#proteinSequence pre</span><span style="color:#ce7b00;">"</span>,    <span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">protein_sequence</span>  =&gt;<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">text</span><span style="color:#969696;">
</span>        process <span style="color:#ce7b00;">"</span><span style="color:#ce7b00;">div#Aliases td&gt;table</span><span style="color:#ce7b00;">"</span>,       <span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">aliases</span> =&gt;<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">text</span>
        result <span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">protein_sequence</span>,<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">aliases</span>,<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">mrna_sequence</span>,<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">genomic_sequence</span>
        <span style="color:#0000e6;">end
</span>
     search_link="http://plasmodb.org/plasmo/showRecord.do?
               name=GeneRecordClasses.GeneRecordClass&amp;source_id="+var_name+"&amp;project_id=PlasmoDB"
     uri = <span style="color:#000000;">URI</span>.parse(search_link)
     <span style="color:#009900;">@query</span> = scraper.scrape(uri)

    <span style="color:#0000e6;">rescue</span> <span style="color:#000000;">Scraper</span>::<span style="color:#000000;">Reader</span>::<span style="color:#000000;">HTTPUnspecifiedError</span>
      <span style="color:#ce7b00;">"</span><span style="color:#ce7b00;">None</span><span style="color:#ce7b00;">"</span>
    <span style="color:#0000e6;">end</span>
  <span style="color:#0000e6;">end</span>
  <span style="color:#969696;">#returns the predicted protein sequence
</span>  <span style="color:#0000e6;">def</span> protein_sequence
    <span style="color:#009900;">@query</span>.protein_sequence.chomp
  <span style="color:#0000e6;">end</span>
<span style="color:#969696;">#  Returns the genomic sequence
</span>  <span style="color:#0000e6;">def</span> genomic_sequence
    <span style="color:#009900;">@query</span>.genomic_sequence.chomp
  <span style="color:#0000e6;">end</span>
  <span style="color:#969696;">#returns Aliases
</span>  <span style="color:#0000e6;">def</span> aliases
    <span style="color:#009900;">@query</span>.aliases
  <span style="color:#0000e6;">end</span>
  <span style="color:#969696;">#returns the mrna sequence
</span>  <span style="color:#0000e6;">def</span> mrna_sequence
    <span style="color:#009900;">@query</span>.mrna_sequence.chomp
  <span style="color:#0000e6;">end</span>
<span style="color:#0000e6;">end</span>

<span style="color:#969696;">#Use the class to fetch information.</span><span style="color:#969696;">
</span>require <span style="color:#ce7b00;">'</span><span style="color:#ce7b00;">rubygems</span><span style="color:#ce7b00;">'</span>
require <span style="color:#ce7b00;">'</span><span style="color:#ce7b00;">bio</span><span style="color:#ce7b00;">'</span>
require <span style="color:#ce7b00;">'</span><span style="color:#ce7b00;">scrapi</span><span style="color:#ce7b00;">'</span>

file = <span style="color:#ce7b00;">"</span><span style="color:#ce7b00;">/home/george/genes_list.txt</span><span style="color:#ce7b00;">"</span> <span style="color:#969696;">#a file containing a list of accession numbers.
#one accession number per line
</span>
plasmo = <span style="color:#000000;">Plasmodb</span>.new <span style="color:#969696;">#initialize a plasmodb class instance
</span>
<span style="color:#969696;">#Read the file and process each accession number.
</span><span style="color:#000000;">File</span>.readlines(file).each <span style="color:#0000e6;">do</span> |line|
  line.chomp!
  plasmo.fetch_by_gene_id(line)  <span style="color:#969696;">#fetches the information from Plasmodb.
</span>  <span style="color:#969696;">#print a fasta entry for the protein sequence
</span>  puts <span style="color:#000000;">Bio</span>::<span style="color:#000000;">Sequence</span>.new(plasmo.protein_sequence).output(<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">fasta</span>,<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">header</span>=&gt;line)
  puts <span style="color:#000000;">Bio</span>::<span style="color:#000000;">Sequence</span>.new(plasmo.genomic_sequence).output(<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">fasta</span>,<span style="color:#2e92c7;">:</span><span style="color:#2e92c7;">header</span>=&gt;line)
<span style="color:#0000e6;">end</span>

<span style="color:#0000e6;">#another example</span><span style="color:#0000e6;">
<div id="_mcePaste">#p = Plasmodb.new</div>
<div id="_mcePaste">#p.fetch_by_gene_id("PFD0020c")</div>
<div id="_mcePaste">#puts p.genomic_sequence</div>

</span></pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/182/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=182&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2009/12/09/a-ruby-class-for-screen-scraping-plasmodb-database/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>

		<media:content url="http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/images/clear.png" medium="image" />
	</item>
		<item>
		<title>PC vs Apple Mac (Not the war!)</title>
		<link>http://biorelated.com/2009/12/09/pc-vs-apple-mac-not-the-war/</link>
		<comments>http://biorelated.com/2009/12/09/pc-vs-apple-mac-not-the-war/#comments</comments>
		<pubDate>Wed, 09 Dec 2009 13:45:58 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[laboratory]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[Mac]]></category>
		<category><![CDATA[PC]]></category>

		<guid isPermaLink="false">http://biorelated.wordpress.com/?p=184</guid>
		<description><![CDATA[My good old PC running Linux OS is coming of age and recently started failing. The Optical drive is not functional and occasionally it will freeze. The Top cover does not hold anymore and the graphical TFT screen needs to be supported carefully. While this particular computer has served me well, I am at that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=184&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>My good old PC running Linux OS is coming of age and recently started failing. The Optical drive is not functional and occasionally it will freeze. The Top cover does not hold anymore and the graphical TFT screen needs to be supported carefully.</p>
<p>While this particular computer has served me well, I am at that point where i need a new machine but am torn between an Apple Mac and a PC running Linux. First my work involves the following aspects;</p>
<ol>
<li>Compiling and running bioinformatics software developed using open source standards and technologies</li>
<li>Programming</li>
<li> Word processing and document editing</li>
<li>Occasional mathematical modeling</li>
<li>Administering  Unix based servers</li>
</ol>
<p>I have tried to come up with a computer-model agnostic specifications for my needs.</p>
<p><strong>Hardware</strong></p>
<p>* High Processor speed (2.60GHZ or above)</p>
<p>* High Memory (4GB or above)</p>
<p>* Medium Hard-disk space (160GB and above)</p>
<p>* Long Battery life ( 5hours and above)</p>
<p>* Durable external cover</p>
<p>* Ergonomic keys and mouse</p>
<p>* Support for multiple external devices(printers,Cameras,Microphones,Storage devices,monitors)</p>
<p>* Excellent support for wireless technologies</p>
<p>* Support for running multiple operating systems on the same machine</p>
<p><strong>Size and weight</strong></p>
<p>* Lightweight</p>
<p>* convenience while travelling while traveling</p>
<p><strong> Operating System</strong></p>
<p>* A Unix or Linux derived operating system</p>
<p>* Easy to upgrade at zero or minimal cost</p>
<p>* Free patches against known security holes and problems.</p>
<p><strong> Software</strong></p>
<p>* Support for Open software standards</p>
<p>* Support for Microsoft, Adobe and other proprietary software vendor&#8217;s products</p>
<p><strong>Security</strong></p>
<p>* Excellent inbuilt support against Malware, Trojans and viruses at minimal or no cost</p>
<p>* Support for locking the machine while away or against unauthorized login</p>
<p>* Ability to easily &#8216;tag&#8217; the machine in case of theft</p>
<p><strong>Price :</strong> Affordable and reasonable</p>
<p>Based on the above specifications I have evaluated two computers models that can satisfy the above needs.</p>
<p>1. A PC laptop computer running a Linux based operating system</p>
<p>2. An Apple Macintosh laptop computer</p>
<p>I have ruled out a Windows/DOS based Operating software because  Microsoft Windows based operating system cannot  offer  support for open source standards and technologies. OS upgrade for windows is very expensive and the OS is highly prone to malware, Trojans and viruses. Most bioinformatics software and tools are developed on Unix or  Linux environment.</p>
<p>PC can support Linux installations even though one looses on hardware optimization. Linux has a relatively poor graphical user interface and functionality when compared to Mac OS or Windows. There is limited support for document processing, graphics and rich multimedia applications support. Linux does not support any of the Microsoft software applications natively. There are open source equivalents but most lack good support.</p>
<p>Apple Macintosh computers are based on Unix and open source technologies, they support both closed source and open source standards. The hardware is optimized and accelerated for the Apple mac OS. They offer excellent graphical user interface system, a powerful terminal for interaction with the OS, they are not prone to virus attacks, and they support long battery life as well as portability, ergonomics and a relatively within a  price range equivalent to a PC of the same specifications.</p>
<p>Given my budget constrains, I am thinking that a 2.53GHz Apple Macintosh 13 inch model with 4GB of memory is best for my needs. There is little price differences between the PC and Macintosh models based on my specifications. PC models do not favor Linux installations and Linux hardware support is not guaranteed. They however seem to have a more flexible price ranges depending on the manufacturers, vendors, quality and specifications.</p>
<p>I will keep Linux  to run my server applications.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/184/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/184/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/184/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/184/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/184/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/184/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/184/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/184/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/184/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/184/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/184/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/184/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/184/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/184/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=184&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2009/12/09/pc-vs-apple-mac-not-the-war/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>
	</item>
		<item>
		<title>Bio-graphics, BioSQL and Rails part 2</title>
		<link>http://biorelated.com/2009/01/08/bio-graphics-biosql-and-rails-part-2/</link>
		<comments>http://biorelated.com/2009/01/08/bio-graphics-biosql-and-rails-part-2/#comments</comments>
		<pubDate>Thu, 08 Jan 2009 12:23:50 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[biographics]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[bioruby]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[ruby on rails]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://biorelated.wordpress.com/?p=114</guid>
		<description><![CDATA[In  part 1 of this series we created a rails application and connected it to a BioSQL database. We also overwrote the rails convections to accommodate our legacy schema. To understand the BioSQL schema, please review the documentation here. A brief overview of is as follows. Every record we enter into our database is a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=114&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In  <a title="biographics,biosql and rails" href="http://biorelated.wordpress.com/2009/01/07/bio-graphics-biosql-and-rails-part-1/" target="_blank">part 1 of this series</a> we created a rails application and connected it to a BioSQL database. We also overwrote the rails convections to accommodate our legacy schema.</p>
<p>To understand the BioSQL schema, <a title="biosql overview" href="http://www.biosql.org/wiki/Schema_Overview" target="_blank">please review the documentation</a> here. A brief overview of is as follows. Every record we enter into our database is a &#8216;bioentry&#8217; and goes to the <span style="color:#0000ff;">bioenty table.</span> A bioentry can be composed of the following entities: the record&#8217;s <em>public name</em>, public accession and version, its description and an identifier field.</p>
<p>The actual sequence data is stored in the <span style="color:#0000ff;">biosequence table</span> which contains raw sequence information associated with a bioentry, and alphabet information (&#8216;protein&#8217;, &#8216;dna&#8217;, &#8216;rna&#8217;). This is because not all records in our database need to be associated with a raw sequence. Additional sequence information is stored in the <span style="color:#0000ff;">seqfeature table</span> together with other qualifiers.</p>
<p>The location of each <span style="color:#0000ff;">seqfeature</span> (or sub-seqfeature) is defined by a location entity, describing the stop and start coordinates and strand. This information is stored in the <span style="color:#0000ff;">location table. </span></p>
<p><span style="color:#0000ff;"><span style="color:#000000;">In our rails application we are going to create some models and a few controllers. In RESTful language, we are actually creating resources. In this example we will be very simplistic and just create a biodatabase, taxon, bioentry, biosequence, seqfeature, location resources. We will also create associations between them in their model classes. But before that </span></span> delete the <span style="color:#0000ff;">index.html</span> file from your rails application public folder and add the following line to your configurations/routes.rb file</p>
<pre> <span style="color:#993366;">map.root :controller =&gt; "biosequences"</span></pre>
<p><span style="color:#0000ff;"><span style="color:#000000;">To quickly create the models, controllers, associated views and a test suite for each of our resources, just run the rails generate scaffold command, passing the name of the model as an argument. For example,</span></span></p>
<pre><span style="color:#993366;">generate scaffold Bioentry</span></pre>
<p><span style="color:#0000ff;"><span style="color:#000000;">will create a </span></span><span style="color:#0000ff;"><span style="color:#000000;">bioentry </span></span><span style="color:#0000ff;"><span style="color:#000000;">model, a bioentries_controller, associated views (index,show,edit and new), a migration file, though in our case we do not need it. When you finish scaffolding, the routes.rb file should have the following resources declared.<br />
</span></span></p>
<pre><span style="color:#0000ff;"><span style="color:#000000;">  <span style="color:#993366;">map.resources :seqfeatures
  map.resources :locations
  map.resources :bioentries
  map.resources :biosequences
  map.resources :taxon
  map.resources :biodatabases

</span></span></span></pre>
<p>Let us create some mandatory associations for the models.</p>
<p>Edit the /models/biodatabase.rb file by adding the following</p>
<pre> <span style="color:#993366;">has_many :bioentries #a biodatabase is associated with many bioentries
 validates_uniqueness_of :name  #The name foe each biodatabase is unique!</span></pre>
<p>Edit the /models/bioentry.rb file by adding the following</p>
<pre>    <span style="color:#993366;">belongs_to :biodatabase
    belongs_to :taxon
    has_one :biosequence</span></pre>
<p>Edit the /models/taxon.rb and add</p>
<pre>   <span style="color:#993366;">has_one :bioentry</span></pre>
<p>Edit the /models/biosequence.rb file by adding:</p>
<pre>  <span style="color:#993366;">set_primary_key :bioentry_id</span> #biosequence uses bioentry_id as a primary key!
<span style="color:#993366;">  belongs_to :bioentry</span></pre>
<p>edit the /models/location.rb file by adding:</p>
<pre> <span style="color:#993366;">belongs_to :seqfeature</span></pre>
<p>Edit the /models/seqfeature.rb file by adding:</p>
<pre>  <span style="color:#993366;">belongs_to :bioentry
  has_many :locations</span></pre>
<p>Note that most likely you will be adding huge files to the database. BioSQL comes with a set of  perl scripts to enable you do that. Until bioruby 1.3 is released you will have to use the perl scripts to add huge datasets. All the documentation to do that is available from the BioSQL website. I used a perl script load_ncbi_taxonomy.pl to load taxon data to my database. This script comes with the BioSQL. (It did not seem to work on my system, I will sort that later)</p>
<p>To make this post shorter and get to the meat of it, i will assume that you have some existing data in your biosql database. If not, create some dummy data to populate, the biodatabase, bioentry,biosequence, seqfeature and location tables. In Part 3, I will show you how to create the necessary views to populate the database. After all biologists don&#8217;t want to interact with raw SQL queries and sometimes have no idea of running scripts, however they are very web savy!</p>
<p>Edit the /biosequences/show.html.erb to look as follows:</p>
<pre><span style="color:#993366;">&lt;h2&gt;&lt;%= @biosequence.bioentry.name%&gt;(&lt;%= @biosequence.alphabet %&gt;)&lt;/h2&gt;</span>
<span style="color:#993366;">&lt;p&gt;Sequence&lt;/p&gt;</span>
<span style="color:#993366;">&lt;%= @biosequence.seq %&gt;&lt;br/&gt;</span>
<span style="color:#993366;">
</span>
<span style="color:#993366;">&lt;%= link_to 'Edit', edit_biosequence_path(@biosequence) %&gt; </span></pre>
<p>Now navigate to http://localhost:3000/biosequences/1</p>
<p>and then navigate to http://locahost:3000/biosequences/1.xml The XML version of your sequence is also available!</p>
<p>Lets add some ability to render graphics for the sequences.</p>
<p>Add the following lines at the top of the <strong>biosequence.rb </strong>model file</p>
<pre> <span style="color:#993366;">require 'stringio'
 require 'base64' </span></pre>
<p>In the <strong>biosequence.rb</strong> model class, create a new method called <span style="color:#993366;">draw_graphic</span>.</p>
<pre><span style="color:#993366;">def self.draw_graphic(value)
      #get the name and length of the main feature to be drawn
     main_feature = Bioentry.find(value)
     len = main_feature.biosequence.length.to_i
     name = main_feature.name

    #create a Biographics panel and add a track
      </span><span style="color:#993366;">@my_panel = Bio::Graphics::Panel.new(len,:width=&gt; 900)</span><span style="color:#993366;">
      @track = @my_panel.add_track("#{name}",:glyph=&gt;'directed_generic')

     #specify the range for the main feature
     main_feature_range = "1..#{len}"
      @track.add_feature(Bio::Feature.new("#{name}",main_feature_range), :label=&gt;" ")

    #write the output to memory
        output = StringIO.new
        @my_panel.draw(output)
        return output.string
  end

</span></pre>
<p>This method will be called by an action method in <strong>biosequence_controller.rb</strong> file.</p>
<pre>  <span style="color:#993366;">def to_image
    begin
      image = Biosequence.draw_graphic(Biosequence.find(params[:id]))
      send_data(image, :filename =&gt; "graphic.svg", :disposition =&gt; "inline")
    rescue  ActiveRecord::RecordNotFound
      add_error("Error:Attempt to call image without specifying a biosequence  ID")
      redirect_to :action=&gt;'index'
    end
  end</span></pre>
<p>We add a rescue block to capture record not found errors. In RESTful applications a controller is limited to seven actions. So we need to add a collection to our biosequence resource in <strong>routes.rb. </strong>This is how we do it<strong>.<br />
</strong></p>
<pre><strong>  </strong><span style="color:#993366;">map.resources :biosequences,:collection=&gt;{:to_image=&gt;:get}</span><strong>
</strong></pre>
<p>Now we need to modify our /biosequences/show.html.erb file, to enable rendering of the graphic. For that we will create a helper method so that our show.html.erb view is &#8216;clean&#8217;. In helpers/<strong>biosequences_helper.rb</strong> file, add the following code</p>
<pre>  <span style="color:#993366;">def render_image(feature_obj)
     image_tag(url_for({:action=&gt;'to_image',:id=&gt;feature_obj}))
  end</span></pre>
<p>And in the /views/biosequences/show.html.erb file add the following line of code</p>
<pre><span style="color:#993366;">&lt;%= render_image(@biosequence) %&gt;&lt;br/&gt;</span></pre>
<p>Now assuming  that you have a biosql database with valid data, navigate to</p>
<p>http://localhost:3000/biosequences/show/1</p>
<div id="attachment_145" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-145" title="screenshot-biosequences-show-mozilla-firefox" src="http://biorelated.files.wordpress.com/2009/01/screenshot-biosequences-show-mozilla-firefox.png?w=300&h=221" alt="screenshort" width="300" height="221" /><p class="wp-caption-text">screenshort</p></div>
<p>The above is a screen shot from my example application while I was writing this tutorial.</p>
<p>The source code for this example  application<a title="source code" href="http://github.com/georgeG/biosql_rails_example/tree/master" target="_blank"> is available from github</a></p>
<p>For a full review of the methods available for <a title="bio-graphics git repository" href="http://github.com/jandot/bio-graphics/tree/master" target="_blank">biographics please check the project&#8217;s git repository and the rdoc. </a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/114/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=114&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2009/01/08/bio-graphics-biosql-and-rails-part-2/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>

		<media:content url="http://biorelated.files.wordpress.com/2009/01/screenshot-biosequences-show-mozilla-firefox.png?w=300" medium="image">
			<media:title type="html">screenshot-biosequences-show-mozilla-firefox</media:title>
		</media:content>
	</item>
		<item>
		<title>Bio-graphics, BioSQL and Rails part 1</title>
		<link>http://biorelated.com/2009/01/07/bio-graphics-biosql-and-rails-part-1/</link>
		<comments>http://biorelated.com/2009/01/07/bio-graphics-biosql-and-rails-part-1/#comments</comments>
		<pubDate>Wed, 07 Jan 2009 11:50:28 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[biographics]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[bioruby]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[ruby on rails]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://biorelated.wordpress.com/?p=25</guid>
		<description><![CDATA[In these series I will show you how to quickly add graphics support to a bioinformatics database rails application. We are going to use the biographics library by Jan Aerts, the BioSQL database schema, and rails 2.2.2 (also works with 2.3.2)  In this simple example we want to represent a sequence as a graphic, such [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=25&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In these series I will show you how to quickly add graphics support to a bioinformatics database rails application. We are going to use the <a title="Biographics" href="http://github.com/jandot/bio-graphics/tree/master" target="_blank">biographics library</a> by <a title="Jan Aerts" href="http://saaientist.blogspot.com/" target="_blank">Jan Aerts</a>, the <a title="Biosql download" href="http://www.biosql.org/wiki/Downloads" target="_blank">BioSQL database schema</a>, and <a title="ruby on rails" href="http://www.rubyonrails.com" target="_blank">rails</a> 2.2.2 (also works with 2.3.2)  In this simple example we want to represent a sequence as a graphic, such that we can view it in a web browser more or less the way <a title="Gbrowse" href="http://gmod.org/wiki/Gbrowse" target="_blank">Gbrowse</a> works. Each main feature has different subfeatures at different locations along it.</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;- main feature</p>
<p>&#8212;&#8212;- &#8212;&#8212;&#8211; &#8212;&#8212;&#8212;&#8212; &#8212;&#8212;&#8211; &#8212;&#8212;    subfeatures</p>
<p>We need to have the following installed, rails 2.1.1, bio 2.1, biographics 1.4 all available as gems and a database based on BioSQL schema.</p>
<p>We need to download the BioSQL schema <a title="download bioSQL" href="http://www.biosql.org/wiki/Downloads" target="_blank">located here</a>. The latest version as of this writing is BioSQL v1.0 (code-named Tokyo) release, v1.0.1. Create a database called biosql_development. I am on Ubuntu Linux with Mysql 5.0.</p>
<pre><span style="color:#993366;">george:&gt;mysql -u george -p</span>
<span style="color:#993366;">:enter password</span>
<span style="color:#993366;">Welcome to the MySQL monitor.  Commands end with ; or \g.</span>
<span style="color:#993366;">Your MySQL connection id is 27</span>
<span style="color:#993366;">Server version: 5.0.51a-3ubuntu5.4 (Ubuntu)</span>
<span style="color:#993366;">
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.</span>
<span style="color:#993366;">
mysql&gt; create database biosql_development;
Query OK, 1 row affected (0.02 sec)

mysql&gt; 

</span></pre>
<p><span style="color:#000000;">I have created a database called biosql_development. Why am i not using migrations? The reason is that BioSQL has some agreed standards on table names and schema convection which are not compatible with rails database creation and table naming conventions. However Rails allows us to override these default convections, when working with legacy databases, as will be our case.<br />
</span></p>
<p><span style="color:#000000;">After creating the database, load the BioSQL schema to the empty database. First we need to tell mysql which database to use.</span></p>
<pre><span style="color:#993366;">mysql&gt; use biosql_development;</span></pre>
<p><span style="color:#000000;">then load the schema</span></p>
<pre><span style="color:#993366;">mysql&gt; source /home/george/Desktop/downloadsfolder/biosql-1.0.1/sql/biosqldb-mysql.sql;
Query OK, 0 rows affected, 1 warning (0.48 sec)

Query OK, 0 rows affected (0.15 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected, 1 warning (0.01 sec)

Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected, 1 warning (0.01 sec)

 ........ trucated</span>
<span style="color:#993366;">mysql&gt;</span></pre>
<p><span style="color:#000000;">Now we need to create a rails application and connect to this database.<br />
</span></p>
<p><span style="color:#000000;">I use the <a title="Netbeans" href="http://www.netbeans.com" target="_blank">Netbeans IDE development</a> environment for creating ruby and rails applications. Go ahead and create a rails application and specify to use mysql as the database adapter.</span></p>
<p><span style="color:#000000;">To connect to our legacy database, we need to override some convections. First disable table plurulization, and tell rails that the table primary name is named as tablename_id as opposed to just the id column expected by rails. To do that<br />
</span></p>
<p><span style="color:#000000;">Create a new file in your application configurations/initializers directory called <strong>override_rails.rb</strong> (you can call it whatever).</span></p>
<pre> <span style="color:#993366;">class ActiveRecord::Base
  self.pluralize_table_names = false
</span><span style="color:#993366;"><span style="color:#993366;">
  self.primary_key_prefix_type = :table_name_with_underscore
 end</span></span></pre>
<p>The two lines above tells ActiveRecord not to expect the table names to be plural and that the primary key for each table is named as tablename_id format.</p>
<p><span style="color:#000000;"> Also create another one called <strong>external_libraries.rb</strong> in the initializers directory, as you can tell this is where I want to put my require statements for loading external libraries.<br />
</span></p>
<pre><span style="color:#993366;">require 'rubygems'

#load the bioinformatics library
require 'bio'

#load the biographics library
require 'bio-graphics'

#load the sql views extension library
gem 'rails_sql_views'
require 'rails_sql_views'
</span></pre>
<p><span style="color:#000000;">This file loads our gems. The rails_sql_views gem allows us to create views and access them by creating models corresponding to the views. </span></p>
<p><span style="color:#000000;">At this point if you run rake db:schema:dump, we will have a rails based BioSQL schema and which we can conveniently use to create a BioSQL database on any Relational database that rails supports and this includes Microsoft SQL server, DB2, Oracle, SQLlite and a host of others. All that would be required is to change the database.yml file to suit the adapter of choice and then execute rake db:schema:load to load the BioSQL schema.</span></p>
<p><span style="color:#000000;">Please note that if your are using rails 2.2.2,  you may want to comment the lines </span></p>
<pre><span style="color:#993366;">unless Kernel.respond_to?(:gem)
  Kernel.send :alias_method, :gem, :require_gem</span>
end</pre>
<p><span style="color:#000000;">in rails_sql_views(0.6.1), </span>otherwise running db:schema:dump will cause rake to abort.</p>
<p><span style="color:#000000;">In the next part I will describe how to create the necessary resources for our <a title="REST" href="http://en.wikipedia.org/wiki/Representational_State_Transfer" target="_blank">RESTful</a>(Representational State Transfer) bioinformatics web application and rendering of the graphics.<br />
</span></p>
<p><span style="color:#000000;"><br />
</span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/25/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=25&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2009/01/07/bio-graphics-biosql-and-rails-part-1/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>
	</item>
		<item>
		<title>Approximate string matching metrics with amatch</title>
		<link>http://biorelated.com/2009/01/06/approximate-string-matching-metrics-with-amatch/</link>
		<comments>http://biorelated.com/2009/01/06/approximate-string-matching-metrics-with-amatch/#comments</comments>
		<pubDate>Tue, 06 Jan 2009 14:18:49 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[bioruby]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://biorelated.wordpress.com/?p=75</guid>
		<description><![CDATA[Most often in sequence analysis we want to compare how  similar two sequences are. How can we quantify similarity by using a metric? That was my question yesterday and I went hunting for a ruby implementation for such metrics. Luckily I got a library called amatch which is an approximate string matching extension for ruby! [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=75&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Most often in sequence analysis we want to compare how  similar two sequences are. How can we quantify similarity by using a metric? That was my question yesterday and I went hunting for a ruby implementation for such metrics. Luckily I got a library called <a title="amatch" href="http://amatch.rubyforge.org/" target="_blank">amatch</a> which is an approximate string matching extension for ruby! amatch implements the following metrics:</p>
<p>Hamming distance,  Levenshtein edit distance,longest subsequence common to two strings,longest substring common to two strings,sellers distance and pair distance which is based on the number of adjacent character pairs, that are contained in two  strings.</p>
<p><strong>Hamming distance</strong></p>
<p>This is the number of characters that are different between two strings. This is not recommended for the majority of string based information retrieval. Very similar strings can sometimes be given high hamming distances.</p>
<p><strong>Leveshtein edit distance </strong></p>
<p>Is defined as the minimal costs involved in transforming one string into another by using  deletion, insertion and substitution of a character to one of the strings. The algorithm can associate a cost for performing each of the operations and for this metric it is usually 1.</p>
<p><strong>Longest common substring</strong></p>
<p>This is define as the contiguous chain of characters that exists in both strings. The longer the substring the better the match between the two strings. The problem with this approach is that if a difference was introduced in the middle of one string, the distance will be longer that if the same difference was introduced at the beginning of one of the strings.</p>
<p><strong>Longest common Subsequence</strong></p>
<p>The longer the common sub sequence is, the more similar the two strings will be. In this case a sub sequence does not have to be contiguous.</p>
<p>Look at the <a title="amatch documentation" href="http://amatch.rubyforge.org/doc/index.html" target="_blank">documentation</a> for more explanations of the metrics and algorithms.</p>
<p>To use the library you need to first install the gem. I installed it on my Linux box running Ubuntu and ruby 1.8.6.</p>
<pre><span style="color:#993366;">sudo gem install amatch</span></pre>
<p>Then in script,</p>
<pre><span style="color:#993366;">require 'rubygems'
</span>
<span style="color:#993366;">require 'amatch'</span></pre>
<pre><span style="color:#993366;">include Amatch</span><span style="color:#993366;">
require  'bio'</span></pre>
<pre><span style="color:#993366;">#with bioruby it would be easy to compare two sequence entries  for example</span></pre>
<pre><span style="color:#993366;">seq_obj1 = Bio::Sequence.auto("actagatatttgat")
seq_obj2 = Bio::Sequence.auto("gccagatagttaat")

#calculate the hamming distance
 m = Hamming.new(seq_obj1.to_seq)
 m.match(seq_obj2.to_seq)</span>
<span style="color:#993366;">#=&gt; 

#calculate pair-distances between the two sequences
</span><span style="color:#993366;">pair_distance_obj = PairDistance.new(seq_obj1.seq)
pair_distance_obj.match(seq_obj2.seq)</span>
 <span style="color:#993366;">#=&gt;</span>
<span style="color:#993366;"># note that you can just substitute the strings directly to the metric object creation method
without creating the sequence objects!
</span></pre>
<p>Note that amatch  failed to install on windows XP with the following error</p>
<p><span style="color:#ff6600;">B</span><span style="color:#ff6600;"><span style="color:#ff6600;">uilding</span> native extensions.  This could take a while&#8230;<br />
ERROR:  Error installing amatch:<br />
ERROR: Failed to build gem native extension.</span></p>
<p><span style="color:#ff6600;">C:/ruby-1.8.6/ruby/bin/ruby.exe extconf.rb install amatch<br />
creating Makefile</span></p>
<p><span style="color:#ff6600;">nmake<br />
&#8216;nmake&#8217; is not recognized as an internal or external command,<br />
operable program or batch file.</span></p>
<p>Although i have nmake installed on my windows machine. I will look at that later.</p>
<p>Happy string matching!</p>
<pre><span style="color:#993366;">
</span></pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biorelated.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biorelated.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biorelated.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biorelated.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biorelated.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biorelated.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biorelated.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biorelated.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biorelated.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biorelated.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biorelated.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biorelated.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biorelated.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biorelated.wordpress.com/75/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biorelated.com&#038;blog=1167040&#038;post=75&#038;subd=biorelated&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biorelated.com/2009/01/06/approximate-string-matching-metrics-with-amatch/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d9e14f1be0972ff1f393cc87dbd072e1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">george_g</media:title>
		</media:content>
	</item>
	</channel>
</rss>
