Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Jason Stajich <jason.stajich <at> gmail.com>
Subject: Fwd: [Bioperl-guts-l] [BioPerl - Bug #3328] (New) segregating sites calculation fails on gapped sequences
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Friday 17th February 2012 17:42:29 UTC (over 4 years ago)
This should be an easy bug for someone to fix -- I am pretty sure the
solution is to ignore gapped columns but I haven't looked deeper and I
don't have any time right now to work on bioperl fixes so be great if
someone wanted to help out here.

The redmine bug info is appended below.

Jason

Begin forwarded message:

> From: [email protected]
> Subject: [Bioperl-guts-l] [BioPerl - Bug #3328] (New) segregating sites
calculation fails on gapped sequences
> Date: February 17, 2012 9:39:42 AM PST
> To: [email protected]
> 
> 
> Issue #3328 has been reported by Jason Stajich.
> 
> ----------------------------------------
> Bug #3328: segregating sites calculation fails on gapped sequences
> https://redmine.open-bio.org/issues/3328
> 
> Author: Jason Stajich
> Status: New
> Priority: Normal
> Assignee: Bioperl Guts
> Category: Bio::PopGen
> Target version: 
> URL: 
> 
> 
> 
>   I am Cheng-Ruei Lee, a graduate student in Duke Biology. I'm analyzing
many DNA alignments of a plant species.
>   I first used (Bio::PopGen::Utilities -> aln_to_population()) to read in
the fasta format alignment, and then use Bio::PopGen::Statistics to
calculate some statistics without outgroup. Most gene work fine, but I
think a bug happened when it meets alignments like this:
> 
>> Genotype1
> ATGATCGTAGCTGATGCTGTGATCGATCGCTAGCTAGCTCGA
>> Genotype2
> ------------GATGCTGTGATCGATCGCTAGCTAGCTCGA
>> Genotype3
> ------------GATGCTGTGATCGATCGCTAGCTAGCTCGA
>> Genotype4
> ------------GATGCTGTGATCGATCGCTAGCTAGCTCGA
> 
>   I get this data set from other people. I guess due to the annotation
program people used, the definition of coding sequence is much longer in
genotype 1 than in other genotypes. This creates a long stretch of gap in
the very beginning. Whenever Bio::PopGen meets this kind of genes, the
number of singleton counts boost a lot - seems like the long stretch of
sites with gap is also counted as singletons. Also, some Fu & Li statistics
boosted. The "number of segregation sites" seems not to be affected. (And
therefore, there are genes with hundreds of singleton sites but only a few
total segregating sites.)
>   May be a possible bug in Bio::PopGen::Utilities when reading in the
data? Or when calculating singletons?
> 
> Sincerely,
> Cheng-Ruei Lee 
> 
> 
> -- 
> You have received this notification because you have either subscribed to
it, or are involved in it.
> To change your notification preferences, please click here and login: http://redmine.open-bio.org
> 
> _______________________________________________
> Bioperl-guts-l mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l

Jason Stajich
[email protected]
[email protected]
 
CD: 14ms