Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Jason Stajich <jason.stajich <at> gmail.com>
Subject: Re: Parsing Blast-Report extracting "Features flanking .."
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Tuesday 29th January 2013 19:00:16 UTC (over 3 years ago)
We don't parse the NCBI feature info from the BLAST reports per your query.
To look up a specific feature you can use Bio::DB::GenBank to query for
sequence from a specific feature by accession number - see the HOWTOs for
that.

However, most people use tools that generate SAM/BAM files with short reads
- then you can use a tool like bedtools to find overlaps of reads with the
locations of features.

basically:
- download the genome and GFF for arabidopsis
- align your sRNA to the genome with a short read aligner - bowtie, bwa,
others
- convert your sam to bam file with SAMtools or picard
- compare the location of features with the reads to get expression
summaries or individuals reads with BEDTools


On Jan 25, 2013, at 2:20 AM, jobu  wrote:

> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite:
>> What upstream and downstream elements are you interested in?
> 
> 
> I've got a huge pile of short RNA reads.
> Part of the question now is whether those RNA fragments originate from
> siRNA events,
> or may represent miRNAs / parts of pre-miRNAs.
> 
> So I did an online  blast search against database nt.
> The resulting report quite often just gives subject information like
this:
> 
> -----
>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence
> Length=23459830
> -----
> 
> Now I would like to get the hit's neighbouring regions  for further
> analysis.
> Preferably I would like to do that  in an automized way, but the only
> possible action with this kind of subject gi | description would be to
> fetch the entire chromosomal  sequence I guess ?
> 
> However,
> right below the line above, the report states more precisely:
> 
> ------
> Features flanking this part of subject sequence:
>   8872 bp at 5' side: cytochrome P450 90B1
>   402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K
> ------
> 
> Still I would like to have the possibility to automatically fetch the
> subject's sequence(s),
> as of now I think  parsing the report with SearchIO won't let me aquire
> that information, because SearchIO does not recognize report sections
> like those.
> 
> I hope I did not miss any of SearchIOs capabilities, but I could not
> find any method covering my wish?!
> 
> Right now maybe the only way to get the information I want is to
> construct my own parser and write it out into a separate file, which in
> turn again  I could read into a hash before processing the Blast-Report
> with SearchIO to combine both data for further automized work.
> 
> I am aware though that even successfully getting the flanking features
> would leave me with the more or less wide  intergenic gap my hsp is
> located in.
> 
> However I'm in need of a way to get the flanking features including
> their annotation and the region spanning between them.
> But I hope I do not have to get complete sequences to accomplish that,
> as this would be kind of an overkill.
> 
> with kind regards
> Jochen
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
[email protected]
[email protected]
 
CD: 15ms