26 May 2007 04:08
Re: is this supposed to be really slow?
W. Bryan Smith <wbsmith <at> gmail.com>
2007-05-26 02:08:42 GMT
2007-05-26 02:08:42 GMT
On 5/25/07, Titus Brown <titus <at> caltech.edu> wrote: > > > Hi, Bryan, > > I'm not too familiar with the underlying code, but I believe that > BioPython enforces a three second wait between record retrieval attempts > from NCBI. This is by request of NCBI; see > > http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html i did see this constraint for only one request per 3 seconds, but did not realize each time i went through my loop that this was a separate request. you're probably correct, that this is (partly) the source of my slow code. i guess i really didn't understand the nature of how this piece of code was working... i thought that the text data were dropped to memory when i called the PubMed.Dictionary function, so i was thinking that was the one request/3 seconds i had to worry about. i'm sure traffic for these sorts of things can get pretty high, but it does seem to be a bit ridiculous that if i want to retrieve 50 records, it will take a minimum of 2.5 minutes to do so. each record must be only about 10 KB or so (in xml format), so it seems a little ridiculous that i can only pull ~3 KB/s from the ncbi servers. can anyone verify that this is the case? is there anything to do about this constraint? I personally tend to just use the NCBI retrieval URLs directly, but > that's kind of ugly. you mean you just use the pubmed ids and then pull down the text of the corresponding url to process separately? not sure i understand if that is what you mean or not, but i don't really know how to parse and process text in python. maybe this is a good opportunity to learn. :) all i really want is a way to count publications per year for some key word... at least that is all i am trying to accomplish right now. seems like there should be an easy and relatively fast way to do this. There may be a higher volume retrieval system > built directly into BioPython, too. any experts out there care to weigh in on this? thanks so much for the input, bryan _______________________________________________ BioPython-announce mailing list - BioPython-announce <at> lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-announce