2 Jul 2006 21:12
Re: BioPython Design
Colosimo, Marc E. <mcolosimo <at> mitre.org>
2006-07-02 19:12:23 GMT
2006-07-02 19:12:23 GMT
Michiel, When will this next release be made and what is going into it? Since you brought up the issue of design question, I'll have my little rant now. But first, I would like to say that I think it is great that people contribute code and more importantly their time to this project. With out all of the core developers there would be no BioPython. So, Kudos to anyone who has contribute code. Now on to my rant.... <rant> I'm not a big user of either BioPerl or BioJava. However, they are well structured and more consistent than BioPython.This FastaIO issue is one of several design issues that really need to be addressed. For example, both BioPerl and BioJava use an SeqIO object structure. Our SeqIO module is heavily underused. For example, we have Fasta, GenBank, LocusLink, NBRF, SwissProt, UniGene main Modules. Interestingly, there is a writers.SeqRecord.embl but I can't quickly find something to read in an embl file! Just look at what BioPerl can read in <http://www.bioperl.org/wiki/HOWTO:SeqIO> and how easy it is to find this out (even with out the doc page, all of these are listed under Bio::SeqIO::*) There is a very short "Coding Convention" <http://biopython.org/wiki/Contributing#Coding_conventions>, which doesn't seem to be followed all that well. My suggestion is if enough people are going to ISMB this year (which I am not), that time should be made to think about a road map for BioPython. My suggestions are: 1) split off a branch for ver 2.0 that supports Python 2.4 only (this would suck for Mac people, like me, but its time to move on) 2) clean house - remove depreciated items, restructure IO, etc... 3) move to SciPy/NumPy verse Numeric (could try "numpy/lib/convertcode.py") 4) use Cheese Shop for missing modules 5) documentation </rant> marc On 7/2/06 12:43 AM, "Michiel de Hoon" <mdehoon <at> c2b2.columbia.edu> wrote: > Thanks Iddo! > I tried the parser in Bio.SeqIO.FASTA and it is indeed a lot faster than > the Martel-based one in Bio.Fasta. > > It would be nice to merge these two modules. However, it raises a bunch > of design questions (such as Fasta.Record versus SeqRecord, and Seq > versus string), so it's probably better to wait with that until after > the next Biopython release. Which, by the way, will be coming up soon. > > Thanks, > > --Michiel. > > Iddo Friedberg wrote: >> Michiel, >> >> There is actually a simple minded fasta reader/writer that does not use >> Martel. Bio.SeqIO.FASTA >> >> ./I >> >> -- >> Iddo Friedberg, PhD >> Burnham Institute for Medical Research >> 10901 N. Torrey Pines Rd. >> La Jolla, CA 92037 USA >> T: +1 858 646 3100 x3516 >> http://iddo-friedberg.org >> http://BioFunctionPrediction.org >> >> >> >> -----Original Message----- >> From: biopython-dev-bounces <at> lists.open-bio.org on behalf of Michiel de Hoon >> Sent: Sat 7/1/2006 2:47 PM >> To: biopython-dev <at> biopython.org >> Subject: [Biopython-dev] Fasta parser >> >> Hi everybody, >> >> The Biopython shows the following approach to parsing a Fasta file: >> >>>>> from Bio import Fasta >>>>> parser = Fasta.RecordParser() >>>>> file = open("ls_orchid.fasta") >>>>> iterator = Fasta.Iterator(file, parser) >>>>> cur_record = iterator.next() >> >> But for large Fasta files, it's very slow, compared to file.read(), >> which may be due to going through Martel (I believe the same was true >> for large GenBank files). >> >> So I'm thinking about writing a simple-minded Fasta parser for better >> performance with large files. What I'm wondering about: >> 1) Is there some advantage that I overlooked of using Martel for parsing >> Fasta files? >> 2) Why is it necessary to create a parser first and passing it to >> Fasta.Iterator? Are there any cases where Fasta.Iterator uses something >> other than a Fasta.RecordParser? >> >> --Michiel. >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev <at> lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev <at> lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev
RSS Feed