On 5/25/07, Titus Brown wrote:
> Hi, Bryan,
> I'm not too familiar with the underlying code, but I believe that
> BioPython enforces a three second wait between record retrieval attempts
> from NCBI. This is by request of NCBI; see
i did see this constraint for only one request per 3 seconds, but did
not realize each time i went through my loop that this was a separate
request. you're probably correct, that this is (partly) the source of my
slow code. i guess i really didn't understand the nature of how this
piece of code was working... i thought that the text data were dropped
to memory when i called the PubMed.Dictionary function, so i was
thinking that was the one request/3 seconds i had to worry about. i'm
sure traffic for these sorts of things can get pretty high, but it does
seem to be a bit ridiculous that if i want to retrieve 50 records, it will
take a minimum of 2.5 minutes to do so. each record must be only
about 10 KB or so (in xml format), so it seems a little ridiculous that i
can only pull ~3 KB/s from the ncbi servers. can anyone verify that
this is the case? is there anything to do about this constraint?
I personally tend to just use the NCBI retrieval URLs directly, but
> that's kind of ugly.
you mean you just use the pubmed ids and then pull down the text of
the corresponding url to process separately? not sure i understand if
that is what you mean or not, but i don't really know how to parse and
process text in python. maybe this is a good opportunity to learn. :)
all i really want is a way to count publications per year for some key
word... at least that is all i am trying to accomplish right now. seems
like there should be an easy and relatively fast way to do this.
There may be a higher volume retrieval system
> built directly into BioPython, too.
any experts out there care to weigh in on this?
thanks so much for the input,
BioPython-announce mailing list - BioPythonemail@example.com