Subject: Re: Parsing Blast-Report extracting "Features flanking .."
Date: Friday 25th January 2013 10:20:22 UTC (over 4 years ago)
Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: > What upstream and downstream elements are you interested in? I've got a huge pile of short RNA reads. Part of the question now is whether those RNA fragments originate from siRNA events, or may represent miRNAs / parts of pre-miRNAs. So I did an online blast search against database nt. The resulting report quite often just gives subject information like this: ----- > gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence Length=23459830 ----- Now I would like to get the hit's neighbouring regions for further analysis. Preferably I would like to do that in an automized way, but the only possible action with this kind of subject gi | description would be to fetch the entire chromosomal sequence I guess ? However, right below the line above, the report states more precisely: ------ Features flanking this part of subject sequence: 8872 bp at 5' side: cytochrome P450 90B1 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K ------ Still I would like to have the possibility to automatically fetch the subject's sequence(s), as of now I think parsing the report with SearchIO won't let me aquire that information, because SearchIO does not recognize report sections like those. I hope I did not miss any of SearchIOs capabilities, but I could not find any method covering my wish?! Right now maybe the only way to get the information I want is to construct my own parser and write it out into a separate file, which in turn again I could read into a hash before processing the Blast-Report with SearchIO to combine both data for further automized work. I am aware though that even successfully getting the flanking features would leave me with the more or less wide intergenic gap my hsp is located in. However I'm in need of a way to get the flanking features including their annotation and the region spanning between them. But I hope I do not have to get complete sequences to accomplish that, as this would be kind of an overkill. with kind regards Jochen