2 Jul 2006 06:48
Re: Fasta parser
Iddo Friedberg <idoerg <at> burnham.org>
2006-07-02 04:48:50 GMT
2006-07-02 04:48:50 GMT
By (lack of?) design, my own biopython using code seems to be using both the martel and non-Martel parsers. I imagine others may have the same. Point being: any design change should make sure that we are back compatible. Thanks very much for your work on the Biopython release. Cheers, ./I -- Iddo Friedberg, PhD Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA T: +1 858 646 3100 x3516 http://iddo-friedberg.org http://BioFunctionPrediction.org -----Original Message----- From: Michiel de Hoon [mailto:mdehoon <at> c2b2.columbia.edu] Sent: Sat 7/1/2006 9:43 PM To: Iddo Friedberg Cc: biopython-dev <at> biopython.org Subject: Re: [Biopython-dev] Fasta parser Thanks Iddo! I tried the parser in Bio.SeqIO.FASTA and it is indeed a lot faster than the Martel-based one in Bio.Fasta. It would be nice to merge these two modules. However, it raises a bunch of design questions (such as Fasta.Record versus SeqRecord, and Seq versus string), so it's probably better to wait with that until after the next Biopython release. Which, by the way, will be coming up soon. Thanks, --Michiel. Iddo Friedberg wrote: > Michiel, > > There is actually a simple minded fasta reader/writer that does not use > Martel. Bio.SeqIO.FASTA > > ./I > > -- > Iddo Friedberg, PhD > Burnham Institute for Medical Research > 10901 N. Torrey Pines Rd. > La Jolla, CA 92037 USA > T: +1 858 646 3100 x3516 > http://iddo-friedberg.org > http://BioFunctionPrediction.org > > > > -----Original Message----- > From: biopython-dev-bounces <at> lists.open-bio.org on behalf of Michiel de Hoon > Sent: Sat 7/1/2006 2:47 PM > To: biopython-dev <at> biopython.org > Subject: [Biopython-dev] Fasta parser > > Hi everybody, > > The Biopython shows the following approach to parsing a Fasta file: > > >>> from Bio import Fasta > >>> parser = Fasta.RecordParser() > >>> file = open("ls_orchid.fasta") > >>> iterator = Fasta.Iterator(file, parser) > >>> cur_record = iterator.next() > > But for large Fasta files, it's very slow, compared to file.read(), > which may be due to going through Martel (I believe the same was true > for large GenBank files). > > So I'm thinking about writing a simple-minded Fasta parser for better > performance with large files. What I'm wondering about: > 1) Is there some advantage that I overlooked of using Martel for parsing > Fasta files? > 2) Why is it necessary to create a parser first and passing it to > Fasta.Iterator? Are there any cases where Fasta.Iterator uses something > other than a Fasta.RecordParser? > > --Michiel. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev <at> lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev >
RSS Feed