17 Jun 14:48 2013
Re: sets of sequences - how to read?
Fields, Christopher J <cjfields <at> illinois.edu>
2013-06-17 12:48:56 GMT
2013-06-17 12:48:56 GMT
The best thing to do in this case is to try contacting the author for Bio::ASN1::EntrezGene to see if the code can be updated. He indicated interest in putting the code on github and giving BIOPERLML co-maint last time I heard; we could easily do that, but I'm not sure if it is currently hosted anywhere else. chris On Jun 17, 2013, at 9:37 AM, Carnë Draug <carandraug+dev <at> gmail.com> wrote: > On 17 May 2013 05:08, Fields, Christopher J <cjfields <at> illinois.edu> wrote: >> On May 15, 2013, at 8:53 PM, Carnë Draug <carandraug+dev <at> gmail.com> wrote: >>> Hi >>> >>> when accessing entrez gene using eutils to get multiple genes, NCBI >>> now returns an Entrezgene-Set rather than a list of EntrezGene. >>> This change must have happened sometime on the last 2 months. >>> >>> [...] >>> >>> Carnë >>> >>>  http://0-www.ncbi.nlm.nih.gov.elis.tmu.edu.tw/IEB/ToolBox/CPP_DOC/asn_spec/Entrezgene-Set.html >> >> This doesn't surprise me too much; I know there have been some changes brewing, but didn't know when they would land. I guess that would be... <looks at watch>... now. > > Hi, > > for those interested, I have contacted NCBI about this and they have > reverted the change (see conversation below). Still, entrezgene-set is > a thing so the issue of reading such things still exists. > > Carnë > > > ---------- Forwarded message ---------- > Date: 17 May 2013 00:36 > Subject: Entrezegene-Set: recent changes to E-utilities > > Hi > > I believe there was a recent change to the E-utilities service. When > fetching multiple ASN1 entrezegene records from the gene database, it > now returns an Entrezgene-Set instead of the typical list of > Entrezgene records, one after the other. > > For example, here's an example Entrezgene-Set: > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=3014,85235&rettype=asn1&retmode=text > > which used to be the same as a concatenation of: > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=3014&rettype=asn1&retmode=text > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=85235&rettype=asn1&retmode=text > > This is something new. I don't know exactly when it was introduced but > must have been sometime in the last 2 months. > > I don't know about other programming languages, but at least in Perl > there is no module able to parse this files. I have already contacted > the author of the module responsible for reading the non-set > Entrezgene with a patch but who knows when it will made available. The > only workaround is to make multiple requests, one for each UID, which > will obviously annoy your servers. > > As far as I am aware, there was no notification of this change to > E-utilities, which worked fine for many years. We did have a lot of > code that worked fine for years, until it started to fail last month. > And no one using perl will be able to parse them until a fix is > released. Is there anyway this change can be reverted? > > > ---------- Forwarded message ---------- > Date: 23 May 2013 04:53 > Subject: Re: Entrezegene-Set: recent changes to E-utilities > > Thanks very much for your report. I will discuss this with the Gene > development team to see why this change occurred and get back to you. > Out of curiosity, have you considered using the XML format for Gene > (&retmode=xml)? There are a variety of XML parsers for Perl that should be > able to read Gene XML. > > > ---------- Forwarded message ---------- > Date: 24 May 2013 13:01 > Subject: Re: Entrezegene-Set: recent changes to E-utilities > > thank you for looking into this. > > While there are several XML parsers for perl, there is not one that > will return a Bio::Seq object (a Bio::SeqIO compliant). Of course I > could use one of the XML parsers to write write my own but then I > could as well fix the entrezgene parser to deal with Entrezgene-sets > which is what I'm doing. I already proposed a patch to them but the > inclusion of a new concept, of a set of sequences, does not really fit > in the design of Bio::Seq. > > Please do let me know of more news on this. Thank you again, > > > ---------- Forwarded message ---------- > Date: 13 June 2013 22:08 > Subject: Re: Entrezegene-Set: recent changes to E-utilities > > The fix for this should now be live. Let us know if you have further > problems with this. > > Regards,