I fell foul of this change with Biopython testing GenPept
format records from NCBI Entrez recently, I'd assumed
it might have been a short term glitch:
CC'ing the cross project list in case BioRuby or BioJava
are also impacted.
On Tue, Jul 30, 2013 at 10:18 PM, Scott Markel
> According to today's "[Refseq-announce] Post-release 60: human
supplemental files & bacterial record format" both CONTIG and ORIGIN are
now allowed in a GenBank-formatted entry. See below (*) or the second
bullet of http://www.ncbi.nlm.nih.gov/mailman/pipermail/refseq-announce/2013q3/000110.html
> This change breaks Bio::SeqIO::genbank in the sense that the existence of
the CONTIG line means that the sequence data following ORIGIN will not be
read and $seq->seq() will not return a sequence string. See lines 713-741
> Note that this is related to the "Protein Records without Sequence"
> (*) Details on the change
>  Bacterial NP/YP proteins with CONTIG and ORIGIN lines.
> Under the new data model for bacterial proteins, a subset of records
continue to provide an organism-oriented package of protein records. These
records use traditional RefSeq accession prefixes (NP, YP) and include a
pointer to the identical non-redundant WP protein record. Those NP and YP
records that have been updated to refer to a non-redundant WP protein
record, such as YP_008335932.1, include the following flat file display
> . Genome Annotation Data structured comment is also displayed on protein
records for the subset of bacterial genomes that have gone through the
updated NCBI prokaryotic annotation pipeline.
> . Records include both a CONTIG line, which refers to the non-redundant
WP protein accession, and also an ORIGIN with the sequence residues
following. The sequence shown is from the WP protein record.
> CONTIG join(WP_015644991.1:1..273)
> 1 mvfykysgsg ndflivqsfk kkdfsnlakq vchrhegfga dglvvvlpsk
> 61 sdgskagmcg nasrcvglfa yqhaiasknh vflagkreis icieepniie
> 121 vipalrcekf ftnnsvleni ptfylidtgv phlvgfvenk ewlnslntle
> 181 niniafienk etiflqtyer gvedftlacg tgmaavfiaa rifyntpkka
> 241 elslkndeif ykgavryigm svlgmgvfdr yfl
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect email: [email protected]
> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653
> 5005 Wateridge Vista Drive voice: +1 858 799 5603
> San Diego, CA 92121 fax: +1 858 799 5222
> USA web: http://www.accelrys.com
> Secretary, Board of Directors:
> International Society for Computational Biology
> Chair: ISCB Publications and Communications Committee
> Associate Editor: PLOS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> Bioperl-l mailing list
> [email protected]