Joel Martin | 7 May 21:15 2012

Re: codon usage

I think you're under-appreciating the perl regex engine, it operates
as a state machine not by matching all occurrences every time.  Here
is a nice excerpt from pro perl parsing
http://www.devshed.com/c/a/Perl/Parsing-and-Regular-Expression-Basics/2/

Joel

On Mon, May 7, 2012 at 8:28 AM, Jim Hu <jimhu <at> tamu.edu> wrote:
> I was looking at Bio::Tools::SeqStats.  It reminds me of the very first bioinformatics program I worked
on back when I was in grad school, sequencing was done by Maxam-Gilbert chemistry on gels with radioactive
DNA (we were doing short reads, but we didn't call it that and the reads per run was something like 8).  We
were writing a program in Apple II basic to find restriction sites.  Everyone in the group was doing this
by putting the target sequence in a string variable and looking for the site as a substring match by
whatever it was BASIC used.  Loop over all the sites you are looking for.  This strikes me as the
equivalent of how Bio::Tools::SeqStats works, only with regexes.
>
> My roommate at the time, who was a math PhD student doing an MS in CompSci, pointed out to me that this would be
more efficient using a discrete finite automaton algorithm, where each site we were looking for would be a
state automaton. This has the advantage of being able to process the sequence as a stream.  Back when we
were working with computers with RAM measured in Kbytes, this was a big help.  I'm not sure if it would be
worth it today.  The slow interpreted implementation of the state machines would likely lose to the fast
internal implementation of the regex routines for sequence lengths we are looking at these days.
>
> But it might be interesting to compare.
>
> Jim
>
> On May 6, 2012, at 3:52 PM, Smithies, Russell wrote:
>
>> Or use Bio::Tools::SeqStats
>> (this is straight from the perldocs
http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Tools/SeqStats.pm )
>>
>>         $seqobj = Bio::PrimarySeq->new(-seq      => 'ACTGTGGCGTCAACTG',-alphabet => 'dna',-id
      => 'test');
>>         $seq_stats  =  Bio::Tools::SeqStats->new(-seq => $seqobj);
>>
>>         $hash_ref = $seq_stats-> count_codons();  # for nucleic acid sequence
>>         foreach $base (sort keys %$hash_ref) {
>>             print "Number of codons of type ", $base, "= ",  %$hash_ref->{$base},"\n";
>>         }
>>
>>
>> --Russell
>>
>>
>> -----Original Message-----
>> From: bioperl-l-bounces <at> lists.open-bio.org [mailto:bioperl-l-bounces <at> lists.open-bio.org] On
Behalf Of subarna thakur
>> Sent: Saturday, 21 April 2012 3:00 p.m.
>> To: bioperl-l <at> bioperl.org
>> Subject: [Bioperl-l] codon usage
>>
>> I am writing a script for determining number of genes containing a particular codon. The codons are
mentioned in a separate file. The output is coming all right for the first codon mentioned in the file but
for the other codons , the script is not working. Please suggest the error in the script. The script is as
follows ----
>>
>> #!/usr/bin/perl -w
>>
>> use Bio::SeqIO;
>>
>> $file2="table.txt";
>>
>> $codon=0;
>>
>> open OUT, "&gt;out-test.txt" or die $!;
>>
>> $seqio_obj = Bio::SeqIO-&gt;new( -file =&gt; "gopi2.txt" , '-format' =&gt; 'Fasta');
>>
>> open( my $fh2, $file2 ) or die "$!";
>>
>> while( my $line = &lt;$fh2&gt; ){
>>
>> $acc=$line;
>>
>> chomp $acc;
>>
>> while ($seq1= $seqio_obj-&gt;next_seq){
>>
>> my  <at> output = $seq1-&gt;id;
>>
>> my $string = $seq1-&gt;seq;
>>
>> $v=0;
>>
>> $l= length($string);
>>
>> $t=$l/3;
>>
>> $k=0;
>>
>> for ($i=1; $i &lt;= $t; $i++){
>>
>>  <at> array2 = substr($string, $k, 3);
>>
>> $k=$k+3;
>>
>> foreach $value ( <at> array2)
>>
>> {
>>
>> if ($value eq "$acc")
>>
>> {
>>
>> print OUT " The sequence id is  <at> output\n";
>>
>> print OUT "$acc codon found in position $i\n\n";
>>
>> $v=$v+1;
>>
>> }
>>
>> }
>>
>> }
>>
>> if ($v==0)
>>
>> {
>>
>> $h=0;
>>
>> }
>>
>> else
>>
>> {
>>
>> $h=1;
>>
>> }
>>
>> $codon=$codon+$h;
>>
>> }
>>
>> print OUT "Total number of sequences with $acc codon";
>>
>> print OUT "\t";
>>
>> print OUT $codon;
>>
>> }
>>
>> exit;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l <at> lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l <at> lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> =====================================
> Jim Hu
> Professor
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4054
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Gmane