29 Jan 2013 20:00
Re: Parsing Blast-Report extracting "Features flanking .."
Jason Stajich <jason.stajich <at> gmail.com>
2013-01-29 19:00:16 GMT
2013-01-29 19:00:16 GMT
We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. basically: - download the genome and GFF for arabidopsis - align your sRNA to the genome with a short read aligner - bowtie, bwa, others - convert your sam to bam file with SAMtools or picard - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools On Jan 25, 2013, at 2:20 AM, jobu <buschj <at> hhu.de> wrote: > Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >> What upstream and downstream elements are you interested in? > > > I've got a huge pile of short RNA reads. > Part of the question now is whether those RNA fragments originate from > siRNA events, > or may represent miRNAs / parts of pre-miRNAs. > > So I did an online blast search against database nt. > The resulting report quite often just gives subject information like this: > > ----- >> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence > Length=23459830 > ----- > > Now I would like to get the hit's neighbouring regions for further > analysis. > Preferably I would like to do that in an automized way, but the only > possible action with this kind of subject gi | description would be to > fetch the entire chromosomal sequence I guess ? > > However, > right below the line above, the report states more precisely: > > ------ > Features flanking this part of subject sequence: > 8872 bp at 5' side: cytochrome P450 90B1 > 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K > ------ > > Still I would like to have the possibility to automatically fetch the > subject's sequence(s), > as of now I think parsing the report with SearchIO won't let me aquire > that information, because SearchIO does not recognize report sections > like those. > > I hope I did not miss any of SearchIOs capabilities, but I could not > find any method covering my wish?! > > Right now maybe the only way to get the information I want is to > construct my own parser and write it out into a separate file, which in > turn again I could read into a hash before processing the Blast-Report > with SearchIO to combine both data for further automized work. > > I am aware though that even successfully getting the flanking features > would leave me with the more or less wide intergenic gap my hsp is > located in. > > However I'm in need of a way to get the flanking features including > their annotation and the region spanning between them. > But I hope I do not have to get complete sequences to accomplish that, > as this would be kind of an overkill. > > with kind regards > Jochen > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l <at> lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich <at> gmail.com jason <at> bioperl.org