Plasmodium falciparum re-annotation workshop opens
Posted: October 22, 2007 Filed under: bioinformatics, databases, malaria | Tags: annotation, bioinformatics, malaria, plasmodium Leave a comment »The Plasmodium genome re-annotation workshop opened on 21st October at the Sanger center. The Workshop runs till the 26th and aims to re-annotate the P. falciparum genome. In a welcome message Prof David Roos pointed out that a major goal of the workshop is to ascribe new or updated functions to gene models, reflecting the current state of knowledge in the wider malaria community.
The Plasmodium falciparum sequencing project was completed in 2002 and since then the Plasmodb database which is currently at version 5 has been the primary source of P. falciparum data and genomic information. With 60% of the P. falciparum genes annotated as hypothetical, it is time to reduce the number of hypothetical genes by providing annotations where known and possible.
Issues that will be addressed and visited include:
- standards for the use of structured gene ontologies in gene/genome annotation
- naming conventions for “hypothetical proteins”, “conserved hypotheticals”, “putative kinases”, etc
- naming conventions for large gene families
- standards for inferring function from orthology, motif/domain conservation, or ‘guilt by association’ based on functional genomics data
- standards for transfering annotations to orthologs in other Plasmodium species
- plans and proposals for further Plasmodium sequencing and other genomics resources
- pipelines for ensuring currency and consistency of data in GenBank/EMBL, GeneDB, PlasmoDB, etc
- future requirements and needs for Plasmodium informatics resources
- annotation projects not completed during the workshop … and strategies for ensuring completion
The workshop is sponsored by Sanger institute and plasmodb
Standalone BLAST with Ruby (windows)
Posted: October 3, 2007 Filed under: bioinformatics, blast, ruby, technology | Tags: bioinformatics, blast, ruby, technology 2 Comments »#create a query sequence
myseq="pcaatcacatyyawwqqffgghhhkllkl"
#create a temporary file
require 'tempfile'
temp=Tempfile.new("seqfile")
#get the name of the temporary file
name=temp.path
#append the contents your sequence to this temporary file
temp.puts "#{myseq}"
temp.close
#since we have a protein query sequence, we will run a blastp. Please note that you will need to have a valid #database to query against. use the formatdb command to create your database before executing the lines #below.
@program = 'blastp'
#path to blast
@database = 'c:/path_to_databasefile'
#name of your query file
@input= name
#your blast output file
@output='c:/path_output_file'
#assume your blast is in a folder called NCBI_Blast, execute
system( "c:/NCBI_Blast/bin/blastall.exe -p #{@program} -d #{@database} -i #{@input} -o #{@output}")
#To capture the output in a variable execute this command instead.
#note that we have omitted the blast -o parameter
result=%x(c:/NCBI_Blast/bin/blastall.exe -p #{@program} -d #{@database} -i #{@input} )
#remember to delete the temporary file!
temp.close(true)