2 Mar 2009 17:35
PostgreSQL schema support for Chado
Scott Cain <scott <at> scottcain.net>
2009-03-02 16:35:44 GMT
2009-03-02 16:35:44 GMT
Hello all, Last week I went traipsing around the country talking to people about using Chado for community annotation efforts, specifically in an educational setting, but I think the ideas may be more generally useful. Today I am specifically going to write about support for PostgreSQL schemas in Chado to help with data management when a variety of people are using Chado to annotate a genome. The idea here is pretty simple: we can create multiple schemas on a one per user basis to give them a "private" work space in Chado to work on their annotations before someone (the user or an authority of some sort) promotes them to the main Chado schema (the "public" schema). I've spent a little time on the details of making this work but would like feedback on my thoughts so far as there are probably things I didn't think of. The idea is pretty simple: when a user wants a private work space, a schema is set up for him and it is populated with a ddl that mirrors Chado. It is not exactly the same as Chado though, so that data can be viewed in context of both the public and private data. It is still fairly simple though: for each table in Chado, there are a few relations created: 1. A table that has the same same structure as its corresponding table in the main schema, but with an underscore at the beginning of the name (ie, "feature" becomes "_feature"). It uses the public sequence for generating new ids for the private table though, to avoid conflicts when the private data is promoted to public. Other conflicts are possible to, noteably unique constraints. I don't know how to best handle this: throw it back to the user to resolve? 2. A view with the same name as the main tables name which is a union of the main table and the private, underscore-named table. 3. Rules on the view to allow inserts, updates and deletes on the view to affect (only) the private, underscore-named table. I have experimented with most of the above in the psql shell to make sure I understand how it would work (the only thing I haven't tried yet is the rule for deletes, but I'm hoping that won't be too difficult to do either). What I am hoping to do now is also fairly straight forward: 1. Write SQL::Translator based scripts that will generate these private schemas on demand. 2. Modify the GFF3 bulk loader to be aware of the private schemas, adding command line flags to insert into private schemas when desired. What I don't know is how easy it will be to make the queries in the loader and other Chado aware tools compatible with this. What I am hoping is that I can just do a "SET search_path" when the database handle is created and be good to go. Does anybody have any experience with this? 3. Modify the GBrowse Chado adaptor to be aware of the public schemas as well (same caveat/question as in number 2 applies here as well) 4. Twist Ed Lee's arm to get him to do the same with Apollo. So, what haven't I thought of? And what do you think of this idea in general? Thanks, Scott -- -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H
RSS Feed