9 Oct 19:06
scf version 2 traces
From: Anthony Underwood <Anthony.Underwood <at> hpa.org.uk>
Subject: scf version 2 traces
Newsgroups: gmane.comp.lang.perl.bio.general
Date: 2008-10-09 17:08:31 GMT
Subject: scf version 2 traces
Newsgroups: gmane.comp.lang.perl.bio.general
Date: 2008-10-09 17:08:31 GMT
Hi all, A long time ago (March 2004) I had a discussion with Chad about reading scf files in Bioperl. I noticed there may be some problems with version 2 files. I now mostly code in ruby and so am contributing to bioruby. I have been writing code to extract trace information from scf files based on some code from another biorubyist for reading ABI files and then looking at the code in Bioperl. I now have this working and a whole better understanding of reading binary files. I believe I have discovered the bugs in Bioperl for reading version2 scf traces. In scf.pm In the _parse_v2_traces method I believe the lines entering the information into the traces array should be as below since the order is specified here http://staden.sourceforge.net/manual/formats_unix_4.html#SEC4 push @{$traces->{'a'}},$read[$offset2]; push @{$traces->{'t'}},$read[$offset2+1]; push @{$traces->{'g'}},$read[$offset2+3]; push @{$traces->{'c'}},$read[$offset2+2]; also the $buffer for this method passed in from the next_seq method is incorrect because the offset isn't correct. In the next_seq method the last of the following lines should be changed $creator->{header} = $self->_get_header($buffer); if ($creator->{header}->{'version'} lt "3.00") { $self->debug("scf.pm is working with a version 2 scf.\n"); # first gather the trace information $length = $creator->{header}->{'samples'} * $creator->{header}->{sample_size}*4; $buffer = $self->read_from_buffer($fh, $buffer, $length, $creator->{header}->{samples_offset}); To $buffer = $self->read_from_buffer($fh, $buffer, $length, $creator->{header}->{sample_offset}); Note sample_offet not samples_offset. I have tested these corrections using other sequence viewers (Chromas, FinchTV) and with these changes the output is now correct. Can these be updated in the live code and next release version. Thanks Anthony Dr Anthony Underwood Bioinformatics Unit | Statistics, Modelling and Bioinformatics Department Centre for Infections Health Protection Agency 61 Colindale Avenue London NW9 5HT t: 0208 3276466 f: 0208 3276738 e:anthony.underwood <at> hpa.org.uk ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk **************************************************************************
RSS Feed