10 Dec 2005 19:39
Bio.Geo for NCBI's GEO microarry SOFT files
Peter <biopython-dev <at> maubp.freeserve.co.uk>
2005-12-10 18:39:13 GMT
2005-12-10 18:39:13 GMT
I've just been looking at the Bio.Geo module by Katharine Lindner, contributed back in 2002 which should parse the NCBI's Gene Expression Omnibus (GEO) microarray data files. http://www.ncbi.nlm.nih.gov/geo/ Is anyone using Bio.Geo at the moment? The NCBI seem to call these SOFT files, (*.soft) and the format is documented here: http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html#SOFTformat Apparently in 2005, they began a switch to a revised file format, new format files here: ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/ Old format files here: ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old/ ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old_gz/ As far as I can tell, neither the "old" or "new" versions work in Bio.Geo, so there may have been another format change between 2002 and 2005. In addition the 2005 change introduces new lines, before and after the actual data: !dataset_table_begin !dataset_table_end These are definitely not supported in the current Martel grammar for GEO files. Peter
RSS Feed