30 Jan 2013 22:40
Re: Bioperl-l Digest, Vol 117, Issue 13
Dan kilburn <dr_kilburn59 <at> yahoo.com>
2013-01-30 21:40:26 GMT
2013-01-30 21:40:26 GMT
Hi Jason, Are there any plans to keep SearchIO up to date with ncbi blast? I know they change formats ridiculously often, but I had to write my own parser to get sequence identity, which I would rather not have done. I realize that this job would be a big load on anyone who takes it, but it's so fundamental. Maybe I can help. --Dan Sent from my iPhone On Jan 30, 2013, at 12:00 PM, bioperl-l-request <at> lists.open-bio.org wrote: > Send Bioperl-l mailing list submissions to > bioperl-l <at> lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request <at> lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner <at> lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Re: Parsing Blast-Report extracting "Features flanking .." > (Jason Stajich) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 29 Jan 2013 11:00:16 -0800 > From: Jason Stajich <jason.stajich <at> gmail.com> > Subject: Re: [Bioperl-l] Parsing Blast-Report extracting "Features > flanking .." > To: buschj <at> hhu.de > Cc: bioperl-l <at> lists.open-bio.org > Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D <at> gmail.com> > Content-Type: text/plain; charset=us-ascii > > We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. > > However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. > > basically: > - download the genome and GFF for arabidopsis > - align your sRNA to the genome with a short read aligner - bowtie, bwa, others > - convert your sam to bam file with SAMtools or picard > - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools > > > On Jan 25, 2013, at 2:20 AM, jobu <buschj <at> hhu.de> wrote: > >> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >>> What upstream and downstream elements are you interested in? >> >> >> I've got a huge pile of short RNA reads. >> Part of the question now is whether those RNA fragments originate from >> siRNA events, >> or may represent miRNAs / parts of pre-miRNAs. >> >> So I did an online blast search against database nt. >> The resulting report quite often just gives subject information like this: >> >> ----- >>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence >> Length=23459830 >> ----- >> >> Now I would like to get the hit's neighbouring regions for further >> analysis. >> Preferably I would like to do that in an automized way, but the only >> possible action with this kind of subject gi | description would be to >> fetch the entire chromosomal sequence I guess ? >> >> However, >> right below the line above, the report states more precisely: >> >> ------ >> Features flanking this part of subject sequence: >> 8872 bp at 5' side: cytochrome P450 90B1 >> 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K >> ------ >> >> Still I would like to have the possibility to automatically fetch the >> subject's sequence(s), >> as of now I think parsing the report with SearchIO won't let me aquire >> that information, because SearchIO does not recognize report sections >> like those. >> >> I hope I did not miss any of SearchIOs capabilities, but I could not >> find any method covering my wish?! >> >> Right now maybe the only way to get the information I want is to >> construct my own parser and write it out into a separate file, which in >> turn again I could read into a hash before processing the Blast-Report >> with SearchIO to combine both data for further automized work. >> >> I am aware though that even successfully getting the flanking features >> would leave me with the more or less wide intergenic gap my hsp is >> located in. >> >> However I'm in need of a way to get the flanking features including >> their annotation and the region spanning between them. >> But I hope I do not have to get complete sequences to accomplish that, >> as this would be kind of an overkill. >> >> with kind regards >> Jochen >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l <at> lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich <at> gmail.com > jason <at> bioperl.org > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l <at> lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 117, Issue 13 > ******************************************
RSS Feed