I think the answer is yes if others are doing it - I am not in a position
to be much of a main coder.
I don't know which format you speak of here or if you had to write
something for the text blast changes or something else. Specific bug
reports on formats that aren't working is always helpful. The XML format
has been pretty stable so I would suggest that if you are simply parsing
reports not looking at them.
Chris posted instructions on how to contribute and the move to github
simplifies this. That you had to write a whole new parser seems probably a
bit severe - I hope that in the future people can speak to the problems
sooner. If I hit a wall with something I can't do I usually write the code
to fix it and contribute it back but I don't play follow-the-format-changes
with the tools anymore, but hopefully others like yourself can make the
If you speak to the response I made to the question below, I don't think
anyone will be trying and support the NCBI's additional markups that refer
to the upstream and downstream features as they are laid out in the text
files without some serious effort. Perhaps in the future that information
will be reported in the XML format and thus be more parseable.
On Jan 30, 2013, at 1:40 PM, Dan kilburn wrote:
> Hi Jason,
> Are there any plans to keep SearchIO up to date with ncbi blast? I know
they change formats ridiculously often, but I had to write my own parser to
get sequence identity, which I would rather not have done. I realize that
this job would be a big load on anyone who takes it, but it's so
fundamental. Maybe I can help.
> Sent from my iPhone
> On Jan 30, 2013, at 12:00 PM, firstname.lastname@example.org wrote:
>> Send Bioperl-l mailing list submissions to
>> To subscribe or unsubscribe via the World Wide Web, visit
>> or, via email, send a message with subject or body 'help' to
>> You can reach the person managing the list at
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Bioperl-l digest..."
>> Today's Topics:
>> 1. Re: Parsing Blast-Report extracting "Features flanking .."
>> (Jason Stajich)
>> Message: 1
>> Date: Tue, 29 Jan 2013 11:00:16 -0800
>> From: Jason Stajich
>> Subject: Re: [Bioperl-l] Parsing Blast-Report extracting "Features
>> flanking .."
>> To: email@example.com
>> Cc: firstname.lastname@example.org
>> Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D@gmail.com>
>> Content-Type: text/plain; charset=us-ascii
>> We don't parse the NCBI feature info from the BLAST reports per your
query. To look up a specific feature you can use Bio::DB::GenBank to query
for sequence from a specific feature by accession number - see the HOWTOs
>> However, most people use tools that generate SAM/BAM files with short
reads - then you can use a tool like bedtools to find overlaps of reads
with the locations of features.
>> - download the genome and GFF for arabidopsis
>> - align your sRNA to the genome with a short read aligner - bowtie, bwa,
>> - convert your sam to bam file with SAMtools or picard
>> - compare the location of features with the reads to get expression
summaries or individuals reads with BEDTools
>> On Jan 25, 2013, at 2:20 AM, jobu wrote:
>>> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite:
>>>> What upstream and downstream elements are you interested in?
>>> I've got a huge pile of short RNA reads.
>>> Part of the question now is whether those RNA fragments originate from
>>> siRNA events,
>>> or may represent miRNAs / parts of pre-miRNAs.
>>> So I did an online blast search against database nt.
>>> The resulting report quite often just gives subject information like
>>>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence
>>> Now I would like to get the hit's neighbouring regions for further
>>> Preferably I would like to do that in an automized way, but the only
>>> possible action with this kind of subject gi | description would be to
>>> fetch the entire chromosomal sequence I guess ?
>>> right below the line above, the report states more precisely:
>>> Features flanking this part of subject sequence:
>>> 8872 bp at 5' side: cytochrome P450 90B1
>>> 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K
>>> Still I would like to have the possibility to automatically fetch the
>>> subject's sequence(s),
>>> as of now I think parsing the report with SearchIO won't let me aquire
>>> that information, because SearchIO does not recognize report sections
>>> like those.
>>> I hope I did not miss any of SearchIOs capabilities, but I could not
>>> find any method covering my wish?!
>>> Right now maybe the only way to get the information I want is to
>>> construct my own parser and write it out into a separate file, which in
>>> turn again I could read into a hash before processing the Blast-Report
>>> with SearchIO to combine both data for further automized work.
>>> I am aware though that even successfully getting the flanking features
>>> would leave me with the more or less wide intergenic gap my hsp is
>>> located in.
>>> However I'm in need of a way to get the flanking features including
>>> their annotation and the region spanning between them.
>>> But I hope I do not have to get complete sequences to accomplish that,
>>> as this would be kind of an overkill.
>>> with kind regards
>>> Bioperl-l mailing list
>> Jason Stajich
>> Bioperl-l mailing list
>> End of Bioperl-l Digest, Vol 117, Issue 13
> Bioperl-l mailing list