Alexey Morozov | 7 Oct 06:08 2013

Fwd: can't get seq with bioperl

Warren Gallin submitted this temporary hack to fix problems with WP seqs
but accidentally sent this to me only. Resending to the list.

---------- Forwarded message ----------
From: Warren Gallin <wgallin <at>>
Date: 2013/10/5
Subject: Re: [Bioperl-l] can't get seq with bioperl
To: Alexey Morozov <alexeymorozov1991 <at>>

This is another case of the new RefSeq WP series of protein entries that
does not have a link to the underlying nucleotide sequence.

NCBI has changed the way that highly redundant protein sequences from
bacterial genomes are stored.  Although a sequence appears when you access
the NCBI web site, that protein sequence is not retrieved by the
up-to-now-functional BioPerl approaches.

The give-away is the line:

CONTIG      join(WP_015639704.1:1..205)

The WP designation is for these problematic sequences.

The work-around that I used was to do the sequence retrieval within an eval
block and if there was no sequence forthcoming, then use the gi number to
retrieve the sequence in fast format and grab it that way.

Not pretty, but it will make your pipeline work.

Warren Gallin


Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.