Scott Cain | 2 Mar 17:35 2009
Picon

PostgreSQL schema support for Chado

Hello all,

Last week I went traipsing around the country talking to people about
using Chado for community annotation efforts, specifically in an
educational setting, but I think the ideas may be more generally
useful.  Today I am specifically going to write about support for
PostgreSQL schemas in Chado to help with data management when a
variety of people are using Chado to annotate a genome.  The idea here
is pretty simple: we can create multiple schemas on a one per user
basis to give them a "private" work space in Chado to work on their
annotations before someone (the user or an authority of some sort)
promotes them to the main Chado schema (the "public" schema).  I've
spent a little time on the details of making this work but would like
feedback on my thoughts so far as there are probably things I didn't
think of.

The idea is pretty simple: when a user wants a private work space, a
schema is set up for him and it is populated with a ddl that mirrors
Chado.  It is not exactly the same as Chado though, so that data can
be viewed in context of both the public and private data.  It is still
fairly simple though: for each table in Chado, there are a few
relations created:

  1. A table that has the same same structure as its corresponding
table in the main schema, but with an underscore at the beginning of
the name (ie, "feature" becomes "_feature").  It uses the public
sequence for generating new ids for the private table though, to avoid
conflicts when the private data is promoted to public.  Other
conflicts are possible to, noteably unique constraints.  I don't know
how to best handle this: throw it back to the user to resolve?

  2. A view with the same name as the main tables name which is a
union of the main table and the private, underscore-named table.

  3. Rules on the view to allow inserts, updates and deletes on the
view to affect (only) the private, underscore-named table.

I have experimented with most of the above in the psql shell to make
sure I understand how it would work (the only thing I haven't tried
yet is the rule for deletes, but I'm hoping that won't be too
difficult to do either).

What I am hoping to do now is also fairly straight forward:

1. Write SQL::Translator based scripts that will generate these
private schemas on demand.

2. Modify the GFF3 bulk loader to be aware of the private schemas,
adding command line flags to insert into private schemas when desired.
 What I don't know is how easy it will be to make the queries in the
loader and other Chado aware tools compatible with this.  What I am
hoping is that I can just do a "SET search_path" when the database
handle is created and be good to go.  Does anybody have any experience
with this?

3. Modify the GBrowse Chado adaptor to be aware of the public schemas
as well (same caveat/question as in number 2 applies here as well)

4. Twist Ed Lee's arm to get him to do the same with Apollo.

So, what haven't I thought of?  And what do you think of this idea in general?

Thanks,
Scott

--

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H

Gmane