Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Jason Stajich <jason.stajich <at> gmail.com>
Subject: Re: counting gaps in a column of alignment
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Friday 26th July 2013 04:04:21 UTC (over 3 years ago)
This is pretty inefficient - the multiple calls to slice - if you don't
need the random access aspect of slice, but instead are going to process
every column, I would just operate on the strings themselves.

Look at how gap_col_matrix function is implemented  - it takes all the
strings from the seqs and does the calculation in one pass.

you just want to walk through each sequence and then each position in each
sequence and update an array which has a hash in each slot. The hash
contains the nucleotide counts e.g.

 $counts[0]->{A}  would tell you the number of A's in Column 0.

you would build up this datastructure with 

for my $seq ( $aln->each_seq ) {
 my @seqarray = split(//,$seq->seq);
  for my $col ( 0..$aln->length ) {
   my $base = $seqarray[$col]
    $counts[$col]->{$base}++;
  }
}

Or you can use substr instead of split to operate on the string itsself.




On Jul 24, 2013, at 9:37 PM, subha kalyanamoorthy 
wrote:

> Hi there,
> 
> I am a new bioperl user.  I made a script to count the nucleotide
> composition in each column of the alignment. I am able to get the count
for
> the nucleotides, but not for the gap characters from the following
program.
> I would greatly appreciate any suggestions to correct this program.
> 
> Thanks.
> 
> #***************************My Program**********************************
> 
> #!/bin/perl -w
> use strict;
> use warnings;
> 
> use List::Util 'max';
> use Bio::SimpleAlign;
> use Bio::Align::AlignI;
> use Bio::AlignIO;
> use Bio::SeqIO;
> 
> my $in= Bio::AlignIO->new( -file => "seq.fst", -format => "fasta");
> 
> my $align = $in->next_aln();
> 
> print "column\tA's\tT's\C's\G's\n";
> 
> for (my $i = 1; $i <= $align->length; $i++) {
> 
> my %count;
> 
> my $seqs = $align->slice($i,$i);
> 
> my $gap_char = $seqs->gap_char();
> 
>    my $count_A=0;
>    my $count_C=0;
>    my $count_T=0;
>    my $count_G=0;
>    my $count_N=0;
>    my $count_gap=0;
> 
>    foreach my $seq ($seqs->each_seq) {
> 
>    my $col=$seq->seq;
> 
>        if ($col eq 'A'){
>        $count_A++;
>        }elsif ($col eq 'C'){
>        $count_C++;
>        }elsif ($col eq 'T'){
>        $count_T++;
>        }elsif ($col eq 'G'){
>        $count_G++;
>        }elsif ( $col eq 'N'){
>        $count_N++;
>        }elsif ($col =~ m/^\Q$gap_char\E$/){
>        $count_gap++;
> }
>      $count{$seq->seq} += 1;
> }
> 
> print"$i\t$count_A\t$count_T\t$count_C\t$count_G\n";
> 
> }
> 
> #***********************************************************************`
> _______________________________________________
> Bioperl-l mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
[email protected]
[email protected]
 
CD: 21ms