7 Dec 2004 12:10
Re: just a question..
Joachim Kupke <joachim.kupke <at> inf.ethz.ch>
2004-12-07 11:10:53 GMT
2004-12-07 11:10:53 GMT
Thomas Leonard: > > But perhaps, we could make the helper application listen(2) to a socket > > that the underlying file system (or pseudo NFS server) connect(2)s to in > > order to disseminate its wishes? This would have the upshot that when > > the helper accept(2)s a connection, we are given the extra file handle > > for free. > > Sounds reasonable. Not sure how easy it is to implement, but it might make > writing the helper in another language easier too. I haven't yet found the time to test this, but it should only be a matter of replacing the lines from "len = read(helper, ...)" through "uid = strtolen(...)" in read_from_helper(int) by something like: struct ucred uc; request_fd = accept(helper, 0, 0); getsockopt(request_fd, SOL_SOCKET, SO_PEERCRED, &uc, sizeof uc); uid = uc.uid; (Error handling gracefully ignored. However, send_to_helper(...) in lazyfs.c would have to be rewritten. Plus, I'm not 100% sure whether user credentials should be passed this way because - this is a Linux-only solution; for FreeBSD, for example, you would use getpeereid() rather than getsockopt() and because - it's not actually the uid of the connecting process that the helper is interested in at all. (It's a different story when ZeroProgress connects to the helper.) What the helper is interested in is the uid of the process on whose behalf it's going to fetch files. Since current->uid happens to contain the correct value in lazyfs, we wouldn't necessarily notice this design flaw. lazynfsd, which won't run as root, of course, would have to go to extra lengths, fork()ing and execvp()ing a setuid helper, just in order to get one integer of information across. So maybe the interface should be changed and the uid should be sent just before the file name is sent? Alternatively, you could try and send out-of-band data in order to maintain compatibility. ["..." files] > > Whether it's particularly easy to read them, I wouldn't want to decide. > > What I was talking about weren't the "..." files themselves, but the > > fact that when you access /uri/0install/www-i1.informatik.rwth-aachen.de > > for the first time (or after some major directory restructuring), then > > zero install will create almost 3000 directories (each with a "..." file > > inside, of course). > > That's not actually necessary from the design. In fact, you can delete all > 3000 directories right afterwards, and it'll still work just fine. There Okay, you are absolutely right. It still feels awkward to use the cache directory to two ends at the same time: To store the directory hierarchy (basically, the contents of the index file, unpacked) on the one hand and to store actual files on the other hand. Of course, this reflects the situation as it should look like from the user's (or even the packager's) point of view, but you really want to use the cache directory only to cache the contents of actual files: Ask yourself what would (or should) happen in the event that there is, e.g., a symlink in the cache directory, which cannot be presented in /uri because it would have to be listed in the index file, then. Similarly, file permissions, hard links, and pretty much everything that is stored at the dentry level (rather than the inode level; file permissions being a bad example, granted) doesn't make much sense to propagate to /uri. Bottom line: /var/cache/zero-install should contain the (non-mutable) data that will only ever be read by reading from /uri files. Since it's non-mutable, it's more or less canonical to store it by hash values. Plus... > were two reasons for doing it this way: > > - If a directory already exists, you need to update it (otherwise you get > stale ... files). Here, you basically cope with the non-atomicity of directory updates as opposed to the atomicity of an index file update. You deal with it gracefully, but it still smells like working around things. > - Since we don't cache the index files in memory, reloading it each time > another ... file is needed is slow. Sure, but /uri should really be deemed a ram disk. Unless, of course, when it comes to actual reading from files. At the end of the day, let's face it: Eventually, there will be demand in making multiple revisions of sites available simultaneously. If you stick to storing cache files by the names which they are accessible as, you will have to store files that remain identical across revisions twice. Which, needless to say, nobody wants to do. [...] > > Of course, you might prefer to store a file whose hash value is > > 012345678.... at a location 012/345/678/.... rather than using one flat > > directory for every file. > > On (minor) problem with this is that files get spead out by their > randomly-distributed hashes. Which means, accessing many files in the > same directory (eg, all the .py files in a program) requires seeking to > 123/456/789, 425/134/543, 456/234/123, ..., which is a lot of (slow) > directory lookups. How slow these are only depends on the underlying file system. On the other hand, I could argue that accessing large directories would be sped up on file systems that use linear lists for their directory entries. Look at it this way: Making the cache directory a simple associative array of hash values to (possibly large) files enables you to plug in a highly efficient single-file (database, sort-of) implementation, should the need for better efficiency ever arise. [Nesting sub-sites] > You don't need Dynamic for this. The kernel model will send a message to > the helper whenever the '...' file is missing. Dynamic directories are > only needed when you want to allow a lookup on any name. That is, when you > access 'site/apps/subsite' we can fetch the subsite index then, and then > we know what goes inside 'subsite'. > > Dynamic is only needed if [...] Sorry, my bad. So, things are even easier, aren't they? > However, this assumes that subsites are updated along with their parents. I don't quite understand what you mean by that. > ie, updates for everything under /uri/0install/site are still atomic, but Why? > we just split the index for effiency. If you want to allow accessing a Yes, "splitting the index" would be a good way to look at sub-sites. However, it's not (strictly) only for efficiency: When it comes to dated snapshots again, a split index should prove useful there, too. > subsite which the master index doesn't (yet) list, then you need Dynamic > (but that causes the problems with bogus lookups). Okay, I think we should go without Dynamic for anything below the DNS name level. Unless if we want to do things like dynamically insert time stamps or the like (as discussed earlier). [...] > > /uri/0install/rox.sourceforge.net/apps/ROX-Filer,=6c9b4545d2af7520981b13a17916eb3a > > > > would refer to the ROX-Filer sub-site, continue to refer to it when the > > rox.sourceforge.net site changes (an application is added), but fail to > > refer to anything when the sub-site changes. > > This doesn't quite work (to prevent upstream authors from unpublishing > programs). Say I update the main side, so that 'apps' is now > 'applications'. When you try to access 'apps/ROX-Filer,=6c9b4545d2...' you > get told that 'apps' doesn't exist. Okay, then this should really be a case where you would want both check sums, as in: /uri/0install/rox.sourceforge.net,=a3be61971a31b1890257fa2d5454b9c6/apps/ROX-Filer,=6c9b4545d2af7520981b13a17916eb3a Naturally, such path names would become a bit tedious to use (although, still, virtually nobody would consciously use them; most people would just follow a bunch of symlinks). But the question is, would a directory of "unpacked index files, identified by their checksums" make sense? As in: /uri/0install/rox.sourceforge.net -> /uri/0index/a3be61971a31b1890257fa2d5454b9c6 /uri/0index/a3be61971a31b1890257fa2d5454b9c6/apps/ROX-Filer -> /uri/0index/6c9b4545d2af7520981b13a17916eb3a /uri/0index/6c9b4545d2af7520981b13a17916eb3a/AppRun -> platform/latest/AppRun To recap: Any index file (that may either belong to a site or a sub-site or a sub-sub-site...) can only make its contents available at /uri/0index/≤hash value of this very file>/, and in order to access a site, you would go through the /uri/0install directory, which will only contain (dynamically generated) symlinks to the appropriate /uri/0index directories. As a consequence, if you know you want to use your locally installed version of ROX-Filer and nothing that its upstream author may ever think may be more useful for you, you would just use the /uri/0index/.... directory, and you would only have to deal with ONE checksum. However, in the above scenario I didn't specify how zero install should ever know that /uri/0index/6c9b4545d2af7520981b13a17916eb3a is downloadable from rox.sourceforge.net (at a suitable location that may belong to a sub-site). It may thus even make sense to have a third structure of 0sites, as in: /uri/0install/rox.sourceforge.net -> /uri/0sites/rox.sourceforge.net/.contents /uri/0sites/rox.sourceforge.net -> /uri/0index/rox.sourceforge.net#a3be61971a31b1890257fa2d5454b9c6 /uri/0sites/rox.sourceforge.net/ROX-Filer -> /uri/0index/rox.sourceforge.net#6c9b4545d2af7520981b13a17916eb3a /uri/0index/rox.sourceforge.net#a3be61971a31b1890257fa2d5454b9c6/apps/ROX-Filer -> /uri/0sites/rox.sourceforge.net/ROX-Filer/.contents Hence, /uri/0sites would contain a forest of zero install hierarchies where ".contents" is a reserved name such that any (sub-) site below /uri/0sites will have its actual contents available in a sub-directory called ".contents", while all other sub-directories will correspond to sub-sites. Directories below /uri/0index will record the relevant host name and the appropriate checksum, but not the respective sub-site. (Otherwise, the /uri/0sites/rox.sourceforge.net/ROX-Filer link above would have had to point to /uri/0index/rox.sourceforge.net#ROX-Filer#6c9b4545d2af7520981b13a17916eb3a. This should not be necessary.) Naturally, /uri/0sites/HOSTNAME#HASH will contain anything that some index file available at HOSTNAME and whose hash value is HASH may advertise. For compatibility (and convenience), /uri/0install/HOSTNAME will point to /uri/0sites/HOSTNAME/.contents for all values of HOSTNAME. [...] > Another important goal of subsites we should remember is that you have to > be able to create a subsite without access to the top level (for user home > directories, etc). Something like: > > /uri/0package/site/~bob.subsite/prog-1.2.5,md5=xyzzY Au contraire. I think this should be kept completely separate from split index files since it serves a completely different purpose. I am rather comfortable with the current /uri/0install/hostname#user syntax, although it would be desirable to support any conceivable URL syntax (as in /uri/0install/hostname%2F~user). If you mix up user home pages with nested zero install sites, you will have to answer a couple of questions: How does a non-root (or non-www-data or whatever) user create sub-sites? Without a site administrator's cooperation, how do we resolve /uri/0index/HOSTNAME#HASH to anything useful if HASH may actually be the hash value of one of the index files of one of the users of host HOSTNAME? (And without making path names unnecessarily long?) Of course, you might want to use the file system hierarchy to split URLs at slashes---regardless of implementation difficulties for now. But even then, that's a completely different thing; you might want to consider: /uri/0sites/rox.sourceforge.net /uri/0sites/rox.sourceforge.net/subsite /uri/0sites/rox.sourceforge.net/subsite/ROX-Filer /uri/0sites/rox.sourceforge.net/dirsep /uri/0sites/rox.sourceforge.net/dirsep/~tal /uri/0sites/rox.sourceforge.net/dirsep/~tal/subsite /uri/0sites/rox.sourceforge.net/dirsep/~tal/subsite/foo Again, this exposes the usual difficulty of there being only a single type of sub-directory relationship in unix file systems (and those of most operating systems). > Yes, there needs to be a more flexible way of specifying preferred > versions. It's a complicated problem, though. We have: > > - Latest known version. (GTK 2.4.14) > - Latest cached version. (GTK 2.4.10) > - Upstream recommended version. (GTK 2.4.12) > - Distribution recommended version. (GTK 2.4.6) > - Program's preferred version. (ROX-Filer wants GTK 2.4.13) > - User's preferred version. (User wants ROX-Filer to use GTK 2.4.11) I wouldn't want to split hairs, but it seems to me that the term "preference" more or less only relates to users' preferences. Likewise, we should think of eligible "revisions" (i.e. checksums, time stamps or time stamp ranges) rather than "versions," although both will probably coincide for a well-maintained site. But let's check your list: - The latest known version (latest published revision) of GTK I would expect at /uri/0sites/gtk.org/v2/.refreshnow, which is a symlink to the current time. - The latest cached version (latest checked-out revision) of GTK I would expect at /uri/0sites/gtk.org/v2/.contents, which is a symlink to the maximum number (time stamp) of all the numerical entries of /uri/0sites/gtk.org/v2/. - Upstream may recommend versions by posting appropriate symlinks, e.g. /uri/0install/gtk.org/recommend/v2/stable -> /uri/0sites/gtk.org/v2/12345 and /uri/0install/gtk.org/recommend/v2/experimental -> ... - A distribution will recommend versions similarly, e.g. Debian may use /uri/0install/debian.org/ourlibs/gtk.org/... (Probably, they would roll their own version of libgtk, which they would publish there.) - When a program prefers a certain version (actually, its author will), why doesn't it just link against it? - Okay, a user might prefer a version of a library different to the one an application is linked against. And setting LD_LIBRARY_PATH only helps if the library is not searched by its absolute name, which we currently seem to recommend. So, the latter point may need some more thought. I would think, though, that if I as a user want to run ROX-Filer, and I find it linked against a specific version of libgtk, why would I even think of supplying a different library? Okay, perhaps, ld-linux.so should become a bit more flexible. Then again, no currently available packaging mechanism seems to allow for this kind of flexibility. (Yes, gentoo will let you compile everything fine-tuned to your needs.) The other five items from your list seem to have solutions, though. Note that in the above scenario, we have now got these three /uri directories: - /uri/0index/ENCODEDURL#HASHVALUE (immutably) contains data, originally published at http://ENCODEDURL/.0inst-index* and organized by an index file whose hash value in HASHVALUE. - /uri/0sites/ENCODEDURL/sub/sub/site/.contents are symlinks to /uri/0index/ENCODEDURL#HASHVALUE where HASHVALUE is the hash value of the most recently downloaded index file. - /uri/0install/ENCODEDURL are symlinks to /uri/0sites/ENCODEDURL/.contents (for compatibility). - /uri/0sites/ENCODEDURL/sub/sub/site/.refreshnow is a symlink to the current time, and /uri/0sites/ENCODEDURL/sub/sub/site/NUMBER is a symlink to /uri/0index/ENCODEDURL#HASHVALUE where HASHVALUE is the hash value of the index file as it was downloaded at time NUMBER. Note that /uri/0sites now contains more than what I had described earlier. Personally, I begin to like the "supercession" solution now, where /uri/0sites, as described above, contains the sub-site hierarchy and contents links only. The thing is, whenever you (the helper application) create a new /uri/0index/ENCODEDURL#HASHVALUE directory (corresponding to an updated index file), you would create symlinks /uri/0index/ENCODEDURL>SUPERSEDED -> ENCODEDURL#HASHVALUE for all hash values SUPERSEDED of superseded index files (according to a special section of the new index file). This would make it easy for packagers to declare their updated versions as backward-compatible (if only to a given, explicit extent), while it still allows to use older versions that never get updated. At the same time, we wouldn't have to cope with time stamps or the like. Any thoughts? Joachim ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/