17 Jun 09:37 2013
Re: sets of sequences - how to read?
Carnë Draug <carandraug+dev <at> gmail.com>
2013-06-17 07:37:43 GMT
2013-06-17 07:37:43 GMT
On 17 May 2013 05:08, Fields, Christopher J <cjfields <at> illinois.edu> wrote: > On May 15, 2013, at 8:53 PM, Carnë Draug <carandraug+dev <at> gmail.com> wrote: >> Hi >> >> when accessing entrez gene using eutils to get multiple genes, NCBI >> now returns an Entrezgene-Set rather than a list of EntrezGene. >> This change must have happened sometime on the last 2 months. >> >> [...] >> >> Carnë >> >>  http://0-www.ncbi.nlm.nih.gov.elis.tmu.edu.tw/IEB/ToolBox/CPP_DOC/asn_spec/Entrezgene-Set.html > > This doesn't surprise me too much; I know there have been some changes brewing, but didn't know when they would land. I guess that would be... <looks at watch>... now. Hi, for those interested, I have contacted NCBI about this and they have reverted the change (see conversation below). Still, entrezgene-set is a thing so the issue of reading such things still exists. Carnë ---------- Forwarded message ---------- Date: 17 May 2013 00:36 Subject: Entrezegene-Set: recent changes to E-utilities Hi I believe there was a recent change to the E-utilities service. When fetching multiple ASN1 entrezegene records from the gene database, it now returns an Entrezgene-Set instead of the typical list of Entrezgene records, one after the other. For example, here's an example Entrezgene-Set: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=3014,85235&rettype=asn1&retmode=text which used to be the same as a concatenation of: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=3014&rettype=asn1&retmode=text http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=85235&rettype=asn1&retmode=text This is something new. I don't know exactly when it was introduced but must have been sometime in the last 2 months. I don't know about other programming languages, but at least in Perl there is no module able to parse this files. I have already contacted the author of the module responsible for reading the non-set Entrezgene with a patch but who knows when it will made available. The only workaround is to make multiple requests, one for each UID, which will obviously annoy your servers. As far as I am aware, there was no notification of this change to E-utilities, which worked fine for many years. We did have a lot of code that worked fine for years, until it started to fail last month. And no one using perl will be able to parse them until a fix is released. Is there anyway this change can be reverted? ---------- Forwarded message ---------- Date: 23 May 2013 04:53 Subject: Re: Entrezegene-Set: recent changes to E-utilities Thanks very much for your report. I will discuss this with the Gene development team to see why this change occurred and get back to you. Out of curiosity, have you considered using the XML format for Gene (&retmode=xml)? There are a variety of XML parsers for Perl that should be able to read Gene XML. ---------- Forwarded message ---------- Date: 24 May 2013 13:01 Subject: Re: Entrezegene-Set: recent changes to E-utilities thank you for looking into this. While there are several XML parsers for perl, there is not one that will return a Bio::Seq object (a Bio::SeqIO compliant). Of course I could use one of the XML parsers to write write my own but then I could as well fix the entrezgene parser to deal with Entrezgene-sets which is what I'm doing. I already proposed a patch to them but the inclusion of a new concept, of a set of sequences, does not really fit in the design of Bio::Seq. Please do let me know of more news on this. Thank you again, ---------- Forwarded message ---------- Date: 13 June 2013 22:08 Subject: Re: Entrezegene-Set: recent changes to E-utilities The fix for this should now be live. Let us know if you have further problems with this. Regards,