Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Alexey Morozov <alexeymorozov1991 <at> gmail.com>
Subject: Fwd: can't get seq with bioperl
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Monday 7th October 2013 04:08:18 UTC (over 3 years ago)
Warren Gallin submitted this temporary hack to fix problems with WP seqs
but accidentally sent this to me only. Resending to the list.

---------- Forwarded message ----------
From: Warren Gallin <[email protected]>
Date: 2013/10/5
Subject: Re: [Bioperl-l] can't get seq with bioperl
To: Alexey Morozov 


This is another case of the new RefSeq WP series of protein entries that
does not have a link to the underlying nucleotide sequence.

NCBI has changed the way that highly redundant protein sequences from
bacterial genomes are stored.  Although a sequence appears when you access
the NCBI web site, that protein sequence is not retrieved by the
up-to-now-functional BioPerl approaches.

The give-away is the line:

CONTIG      join(WP_015639704.1:1..205)

The WP designation is for these problematic sequences.

The work-around that I used was to do the sequence retrieval within an eval
block and if there was no sequence forthcoming, then use the gi number to
retrieve the sequence in fast format and grab it that way.

Not pretty, but it will make your pipeline work.

Warren Gallin



-- 
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.
 
CD: 4ms