Jason Stajich | 17 Feb 18:42 2012

Fwd: [Bioperl-guts-l] [BioPerl - Bug #3328] (New) segregating sites calculation fails on gapped sequences

This should be an easy bug for someone to fix -- I am pretty sure the solution is to ignore gapped columns but I
haven't looked deeper and I don't have any time right now to work on bioperl fixes so be great if someone
wanted to help out here.

The redmine bug info is appended below.


Begin forwarded message:

> From: redmine <at> redmine.open-bio.org
> Subject: [Bioperl-guts-l] [BioPerl - Bug #3328] (New) segregating sites calculation fails on gapped sequences
> Date: February 17, 2012 9:39:42 AM PST
> To: bioperl-guts-l <at> lists.open-bio.org
> Issue #3328 has been reported by Jason Stajich.
> ----------------------------------------
> Bug #3328: segregating sites calculation fails on gapped sequences
> https://redmine.open-bio.org/issues/3328
> Author: Jason Stajich
> Status: New
> Priority: Normal
> Assignee: Bioperl Guts
> Category: Bio::PopGen
> Target version: 
> URL: 
>   I am Cheng-Ruei Lee, a graduate student in Duke Biology. I'm analyzing many DNA alignments of a plant species.
>   I first used (Bio::PopGen::Utilities -> aln_to_population()) to read in the fasta format alignment,
and then use Bio::PopGen::Statistics to calculate some statistics without outgroup. Most gene work
fine, but I think a bug happened when it meets alignments like this:
>> Genotype1
>> Genotype2
>> Genotype3
>> Genotype4
>   I get this data set from other people. I guess due to the annotation program people used, the definition of
coding sequence is much longer in genotype 1 than in other genotypes. This creates a long stretch of gap in
the very beginning. Whenever Bio::PopGen meets this kind of genes, the number of singleton counts boost a
lot - seems like the long stretch of sites with gap is also counted as singletons. Also, some Fu & Li
statistics boosted. The "number of segregation sites" seems not to be affected. (And therefore, there
are genes with hundreds of singleton sites but only a few total segregating sites.)
>   May be a possible bug in Bio::PopGen::Utilities when reading in the data? Or when calculating singletons?
> Sincerely,
> Cheng-Ruei Lee <cl134 <at> duke.edu>
> -- 
> You have received this notification because you have either subscribed to it, or are involved in it.
> To change your notification preferences, please click here and login: http://redmine.open-bio.org
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l

Jason Stajich
jason.stajich <at> gmail.com
jason <at> bioperl.org