Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: zhou li <zhouli <at> tll.org.sg>
Subject: problem with bp_genbank2gff.pl
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Thursday 28th November 2013 07:08:15 UTC (over 2 years ago)
Dear Bioperl people,
I am using BioPerl-1.6.1, and the operating system is Mac OS X version
10.8.5.
I am trying to convert a local GenBank file to GFF file using
bp_genbank2gff.pl, using the following command,
$ bp_genbank2gff.pl M21017.gb --stdout > M21017.gff3
And I got the following message, I am not sure if this is an error:
Replacement list is longer than search list at
/Users/zhouli/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Bio/Range.pm
line 251.
UNIVERSAL->import is deprecated and will be removed in a future perl at
/Users/zhouli/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Bio/Tree/TreeFunctionsI.pm
line 94.
# working on region:M21017, Drosophila melanogaster, 09-MAY-1994,
D.melanogaster 18S, 5.8S 2S and 28S rRNA genes, complete, and 18S rRNA
gene, 5' end, clone pDm238.

***************************************************************************
And the output file M21017.gff3 is attached.

$head M21017.gff3
##gff-version 3
M21017	Genbank	region	1	12026	.	.	.	ID=M21017;Note=D.melanogaster%2018S%2C%205.8S%202S%20and%2028S%20rRNA%20genes%2C%20complete%2C%20and%2018S%20rRNA%20gene%2C%205%27%20end%2C%20clone%20pDm238.;Alias=M29800
M21017	Genbank	region	1	12026	.	+	.	ID=Drosophila%20melanogaster;db_xref=taxon%3A7227;mol_type=genomic%20DNA
M21017	Genbank	gene	1	12026	.	+	.	ID=18S%20rRNA
M21017	Genbank	RNA	1	7232	.	+	.	ID=18S%20rRNA;note=rRNA%20primary%20transcript
M21017	Genbank	rRNA	1	1995	.	+	.	ID=18S%20rRNA;product=18S%20ribosomal%20RNA
M21017	Genbank	gene	2722	2844	.	+	.	ID=5.8S%20rRNA
M21017	Genbank	rRNA	2722	2844	.	+	.	ID=5.8S%20rRNA;product=5.8S%20ribosomal%20RNA
M21017	Genbank	gene	2873	2902	.	+	.	ID=2S%20rRNA
M21017	Genbank	rRNA	2873	2902	.	+	.	ID=2S%20rRNA;product=2S%20ribosomal%20RNA


When I test another genbank file 
$ bp_genbank2gff.pl WSSV-AF369029-GenBank.gb --stdout >
WSSV-AF369029-GenBank.gff3 
I also got the error message:
Replacement list is longer than search list at
/Users/zhouli/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Bio/Range.pm
line 251.
UNIVERSAL->import is deprecated and will be removed in a future perl at
/Users/zhouli/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Bio/Tree/TreeFunctionsI.pm
line 94.
$ head WSSV-AF369029-GenBank.gff3
##gff-version 3
AF369029	Genbank	region	1	292967	.	.	.	ID=AF369029;Alias=AY864671;Note=White%20spot%20syndrome%20virus%2C%20complete%20genome.
AF369029	Genbank	region	1	292967	.	+	.	ID=White%20spot%20syndrome%20virus;mol_type=genomic%20DNA;isolate=WSSV-TH;country=Thailand;db_xref=taxon%3A342409
AF369029	Genbank	gene	1	615	.	+	.	ID=VP28;experiment=experimental%20evidence%2C%20no%20additional%20details%20recorded;note=envelope%20protein
AF369029	Genbank	CDS	1	615	.	+	.	Parent=VP28.t00;translation=MDLSFTLSVVSAILAITAVIAVFIVIFRYHNTVTKTIETHTDNIETNMDENLRIPVTAEVGSGYFKMTDVSFDSDTLGKIKIRNGKSDAQMKEEDADLVITPVEGRALEVTVGQNLTFEGTFKVWNNTSRKINITGMQMVPKINPSKAFVGSSNTSSFTPVSIDEDEVGTFVCGTTFGAPIAATAGGNLFDMYVHVTYSGTETE;db_xref=GI%3A15021393;protein_id=AAK77670.1;product=ORF1%2C%20VP28%2C%20gene%20family%201;note=envelope%20protein;codon_start=1
AF369029	Genbank	CDS	710	2902	.	-	.	Parent=AAK77671.1.t00;translation=MEGGDQRTKLTPATVMGLYQSKTPGEGEGGEGGGQFKIPSAIAVKSCCSKNATRRSPPSDSPYSLRPMKRLKKNNGEVGGKAPPPVTLRLREDYESTPYNFNRNKKKRPITIDENQFATLNPTYATDIIKKQQLPSVSAASVLRKHRANADTQYRKRFSHPNCAKFSTVNLKARDYTPLSVLRSHVKGPKHLKSSCDTVTETNVVKRNFSSIDKWVKLEKPPCYFAVAEADTNIAAGLESPFHLIRQAAKLGLISDVQDVSSNYETIKQSCIDAKEKASKFLWSNNRTKQPPSSWWPVGFGSKNLSVLDTSPLLNWNRLCKNNGKGWIKTMSIDHMAKNVFKLSPGACESILEKKTTLLGEVTAQCKKWESYRRNIPVPAHVQPEYASQVVMIGPSELYLEVKVGVYYMLETGKVIKFMTDKEMYCEFVFETVFSHALEGRMKGAVGVRKMCVEGFCVEMDFAGISVIDVLNGDLKCKMDENVVQQPNPSTTSSKPAAELMQDHGSLCRMRDTLYGVRMLQATGRLPEGLQSKCKKPITDSISAIAIVGKMRERMLNQLPFVLVEIVNIVTRLSQQGLVNPDIKSDNIVIDGITGQPKMIDFGLIVPCKKYYNFKCWGTDERFFSNHPHTAPEFINSELCSETAMTFGLAYLLIDMLSILIKRTADLSANSIYTNIPFLSIVSKMYDQEKTNRPRAYEIAPVIGACFPFKDNIAKLFQSPKHSLYSKKVK;db_xref=GI%3A15021394;codon_start=1;product=ORF2%2C%20putative%20serine%2Fthreonine%20protein%20kinase%20%28PK1%29%2C%20gene%20family%202
AF369029	Genbank	CDS	3118	4989	.	-	.	Parent=AAK77672.1.t00;codon_start=1;product=ORF3;db_xref=GI%3A15021395;translation=MAWTVMALKDAFTERLVVNKVGSGTDMAPVVEDDRQKSLFQKVENLYRVLVVEQKNSAITLSGNKNTNKRQCRQVEEDKVIFEGEDRTVSNLPQAVKETIAANAESILDYWYKNVIPLLDTKKERSGKSDTFLRTAVICLVRCCVSYKDMKTCSLIYEFEHKILNKSTLDPLLKDILDNKQELLHMDSKYGSKTTSPELAKETIEALYTTVYNHWTNAFKLYQASLTHKPVTGKKYASVIHFIRTWRKIVKAYVSKHNNVERDLSLKNIMKNESADNANVLTIEKMYKKIGNSVKNTNNNSAHQMSDSEDDDDDDDDDCEGMDVCDEASEREKKHQESLYPINTPVTTITGDYIFKVLLELVLSPHIHPEWKIPMCDFVNRNIPKLMKAMETDISNAVIEVRASKVNPVQILPIAANFWDFCKSGKPPSDVKFCMMFNEPSSNETLSSGAGVFGRFIGGPFSHKSKELDIISNCLRSLLLNKEADNLSTRIWREGGSVVCFNYCPITARGAVLGYGEQLSERSIKALWAKKIQDAVTESVKRQRNAADKNSRNCDLLGDEGVVSMKTVTFGCANMLKTQNGMGKFNVVVSFEDSIQANKEGAARQYMSQQVFTHSFPALDQGK

The output file is so tedious, the translation is all showing up. But to
me, it is not needed.
1. Is there any way to make the output file more succinct without having
the translation included?
2. Also, is there any way to split the output file to two files, one is the
GFF3 file and the other one is DNA fasta sequence file?
3. When I import the WSSV-AF369029-GenBank.gff3 file to IGV, it displays
the protein ID if there is no gene name for the sequence, e.g. those with
feature "CDS" display the protein ID, and those with feature "gene" display
gene ID, is this the way it works? I want to display the ORF ID, what
should I do?
 


Your help is greatly appreciated.
Thank you very much!
Regards,
Zhou Li





TLL is organizing an international conference on Next Generation Genomic
View on Plants, Animals and Microbes on March 5th to 7th, 2014. For more
information, please visit http://conference.tll.org.sg.
Information in this email is confidential and may also be privileged. It is
intended solely for the person to whom it is addressed. If you are not the
intended recipient, please notify the sender, and please delete the message
and any other record of it from your system immediately.
 
CD: 42ms