Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: W. Bryan Smith <wbsmith <at> gmail.com>
Subject: Re: is this supposed to be really slow?
Newsgroups: gmane.comp.python.bio.announce
Date: Saturday 26th May 2007 02:08:42 UTC (over 9 years ago)
On 5/25/07, Titus Brown  wrote:
>
>
> Hi, Bryan,
>
> I'm not too familiar with the underlying code, but I believe that
> BioPython enforces a three second wait between record retrieval attempts
> from NCBI.  This is by request of NCBI; see
>
>         http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html


i did see this constraint for only one request per 3 seconds, but did
not realize each time i went through my loop that this was a separate
request.  you're probably correct, that this is (partly) the source of my
slow code.  i guess i really didn't understand the nature of how this
piece of code was working... i thought that the text data were dropped
to memory when i called the PubMed.Dictionary function, so i was
thinking that was the one request/3 seconds i had to worry about.  i'm
sure traffic for these sorts of things can get pretty high, but it does
seem to be a bit ridiculous that if i want to retrieve 50 records, it will
take a minimum of 2.5 minutes to do so.  each record must be only
about 10 KB or so (in xml format), so it seems a little ridiculous that i
can only pull ~3 KB/s from the ncbi servers.  can anyone verify that
this is the case?  is there anything to do about this constraint?

I personally tend to just use the NCBI retrieval URLs directly, but
> that's kind of ugly.


you mean you just use the pubmed ids and then pull down the text of
the corresponding url to process separately?  not sure i understand if
that is what you mean or not, but i don't really know how to parse and
process text in python.  maybe this is a good opportunity to learn. :)
all i really want is a way to count publications per year for some key
word... at least that is all i am trying to accomplish right now.  seems
like there should be an easy and relatively fast way to do this.

There may be a higher volume retrieval system
> built directly into BioPython, too.


any experts out there care to weigh in on this?

thanks so much for the input,
bryan
_______________________________________________
BioPython-announce mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biopython-announce
 
CD: 8ms