jobu | 25 Jan 11:20 2013
Picon

Re: Parsing Blast-Report extracting "Features flanking .."

Am 22.01.2013 19:03, schrieb Mgavi Brathwaite:
> What upstream and downstream elements are you interested in?

I've got a huge pile of short RNA reads.
Part of the question now is whether those RNA fragments originate from
siRNA events,
or may represent miRNAs / parts of pre-miRNAs.

So I did an online  blast search against database nt.
The resulting report quite often just gives subject information like this:

-----
> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence
Length=23459830
-----

Now I would like to get the hit's neighbouring regions  for further
analysis.
Preferably I would like to do that  in an automized way, but the only
possible action with this kind of subject gi | description would be to
fetch the entire chromosomal  sequence I guess ?

However,
right below the line above, the report states more precisely:

------
 Features flanking this part of subject sequence:
   8872 bp at 5' side: cytochrome P450 90B1
   402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K
------

Still I would like to have the possibility to automatically fetch the
subject's sequence(s),
as of now I think  parsing the report with SearchIO won't let me aquire
that information, because SearchIO does not recognize report sections
like those.

I hope I did not miss any of SearchIOs capabilities, but I could not
find any method covering my wish?!

Right now maybe the only way to get the information I want is to
construct my own parser and write it out into a separate file, which in
turn again  I could read into a hash before processing the Blast-Report
with SearchIO to combine both data for further automized work.

I am aware though that even successfully getting the flanking features
would leave me with the more or less wide  intergenic gap my hsp is
located in.

However I'm in need of a way to get the flanking features including
their annotation and the region spanning between them.
But I hope I do not have to get complete sequences to accomplish that,
as this would be kind of an overkill.

with kind regards
Jochen

Gmane