11 Nov 2005 20:33
ANNOUNCE: Beagle 0.1.2
Joe Shaw <joeshaw <at> novell.com>
2005-11-11 19:33:57 GMT
2005-11-11 19:33:57 GMT
[ Jon is having some email problems, so I'm sending this out on his behalf. -j ] I'm pleased to announce the release of Beagle 0.1.2. Since 0.1.1, we've been focusing on fixing bugs and reducing our memory consumption. We still use too much memory, but the situation is definitely improving. This is, without a doubt, our best release yet. This release does contain one major new feature: date-range queries are now supported. A side effect of this change is that Beagle's search indexes will need to be rebuilt. But no need to worry... everything will happen automatically when you run the new daemon. Also, this version of Beagle requires the just-released Mono 1.1.10. Upgrading is definitely worth your while: all earlier versions of Mono contained serious bugs in the io-layer that could cause Beagle to crash or lock up, and 1.1.10 also has optimizations that help reduce Beagle's memory consumption. All of the cool kids are running it, so why aren't you? OUR MANY URLS ------------- To download the 0.1.2 tarball or learn more, visit the Beagle wiki at: http://www.beagle-project.org The latest gossip is available at: http://www.planetbeagle.org Nat Friedman made some cool movies that demonstrate Beagle in action: http://nat.org/demos We still talk about Beagle on the dashboard-hackers mailing list: http://mail.gnome.org/mailman/listinfo/dashboard-hackers It is possible to find the k-th largest element of a unsorted list of N numbers in O(N) time: http://en.wikipedia.org/wiki/Selection_algorithm WHAT IS BEAGLE? --------------- Beagle is a tool for indexing and searching your data. Beagle is improving rapidly on many fronts, and should work well enough for everyday use. The Beagle daemon transparently monitors your data and updates the index to reflect any changes. On an inotify-enabled system, these updates happen more-or-less in real time. So for example, * Files are immediately indexed when they are created, are re-indexed when they are modified, and are dropped from the index upon deletion. * E-mails are indexed upon arrival. * IM conversations are indexed as you chat, a line at a time. Beagle uses the Lucene indexing system from the prodigious Doug Cutting. Best is a graphical tool for searching the index that the daemon creates. Best doesn't query the index directly; it passes the search terms to the daemon and the daemon sends any matches back to Best. Best then renders the results and allows you to perform useful actions on the matching objects. Indexing your data requires a fair amount of computing power, but the Beagle daemon tries to be as unobtrusive as possible. It contains a scheduler that works to prioritize tasks and control CPU usage, based on whether or not you are actively using your workstation. DEPENDENCY HECK --------------- Beagle has many dependencies, and thus can be difficult to compile. It requires: * Mono 1.1.10 or better, along with the full Mono stack * gtk-sharp 2.3.90 or better * Gecko-sharp 2.0 * Gmime 2.1.16 * Libexif 0.5.7 or better For the best possible Beagle experience, you should also have: * Evolution-sharp 0.10.2 * libgsf 1.12.1 and gsf-sharp 0.6 * Either wv 1.2.0, or a *patched* wv 1.0.3 --- the patch is available from http://users.avafan.com/~fredrik/beagle/wv-libole2-readonly.patch * An inotify 0.24-enabled kernel. Inotify is in the mainline Linux kernel as of 2.6.13. CHANGES SINCE 0.1.1 ------------------- Daemon/Infrastructure: * Added date range searches. (Joe Shaw, Jon Trowbridge) * Fixed a bug where sending a query to the daemon would cause a helper process to start. (Joe) * Fixed a bug in libbeagle to use our query parser rather than plain text. (Joe) * Added keyword based query support e.g. title:beagle. (D Bera) * Updated pruning of old log files. (Lukas Lipka) * Updated to dotLucene 1.9 RC1. (Daniel Drake) * Lucene locking bug fix. (Daniel) * Small lucene optimizations. (Daniel) * Consolidated glue code into two libraries: libbeagleglue and libbeagleuiglue. (Daniel) * Fixed two scheduler-related crashes associated with a null task source. (Daniel) * New Uri serialization scheme. (Daniel) * Allow inotify to be build conditionally. (Daniel) * Updated our local copy of SqliteClient with recent upstream changes. (Daniel) * Switched from Mono.Posix (which is deprecated) to Mono.Unix. (Daniel) * Fixed some possible instances of unhandled exceptions in the inotify and scheduler code. (Joe) * Fixed a bug in the scheduler's immediate priority throttling code. (Joe) * Added catch-all exception handlers to beagled, the index helper, and beagle-build-index so that if an unhandled exception happens, the program exits immediately and doesn't leave around a hung process. (Joe) * Fixed a race in which the user could start a query against an empty index, documents could be added to the index, but live queries would never be updated until the query was rerun. (Joe) * Fixed a bug in which the document count in the daemon wasn't being updated after a flush in the helper. (Joe) * Converted all times stored in the index and file attribute store to UTC, to avoid time zone issues. (Jon, Joe, Bera) * Added a status infrastructure to the daemon, which allows clients to see if the daemon is in the process of indexing. (Joe) * Fixed some bugs related to date range searches, particularly the start date. (Joe) * Reuse StringBuilders in scheduler and query code. (Daniel) * Fixed handling of dangling locks. (Daniel) * Fixed another lucene leak. (Marcus) * Switched to thread-local static buffers when writing indexables out to temporary files. This avoids a lot of allocations. (Jon) * Reuse StringBuilders in the IM Log parser and DirectoryWalker. (Jon) * Small improvements to the beagled and index helper wrapper scripts. (Jon) * Revamped logging. (Jon) * Fixed marshaling of C strings to StringBuilders in sys_readdir. (Joe, Jon) * Relative paths in BEAGLE_HOME and BEAGLE_STORAGE now work. (Jon) * Added --indexing-test-mode to the daemon, which causes it to shut down automatically when indexing is complete. (Jon) Backends: * Moved gaim and kopete log parsing into a filter. This can dramatically decrease the amount of memory used by the daemon. (Daniel) * If a shutdown is request while the Evolution mail crawler is running, short-circuit for a faster shutdown. (Joe) * Add a new inotify-based method of writing data out to the indexing service. (Joe) * Fixed a bug in which certain backends would index much more slowly if they had already seen data, like mail. (Joe) * Changed Liferea, Blam and Akregator backends to use stream parsing instead of serializer. (D Bera) * Better warning if kmail backend finds bad kmail mfolder. (D Bera) * Index the full name in the addressbook backend, so that searches on middle names match. (Joe) * In the file system backend, we now store the file extension in a property and drop the file extension in the property containing the "textified" name. (Jon) Filters: * Fixed 316120 - PPT filter crash due to gsf-sharp. (Veerapuram Varadhan) * Fixed DOC filter/wv1-glue to be compatible with wv-1.2.0. (Varadhan) * Better infrastructure for PPT and DOC filter. (Varadhan) * Exclude "include", "main", and "NULL" from the C filter, as those are very common, albeit not language keywords. (Joe) UI/Tools: * Ported the Firefox extension to use the new indexing service, which speeds up the extension and uses vastly fewer resources. (Joe) * Bumped the supported version of the Firefox extension to 1.5, as it is reported to work with the Firefox 1.5 betas. (Joe) * Adding clear function to best. (Dennis Snell) * Save Best window position, dimension and search history across sessions. (D Bera) * Fixed a bug in mail tiles for determining if the hit is a mail-attachment. (D Bera) * Sanitize beagle-index-url to our standards. (Lukas) * Show artist in music tile. (Lukas) * ImLogViewer tweaks and improvements. (Daniel) * Use more gtk-sharp bindings for better GNOME integration. (Daniel) * Make the number of items displayed in Best configurable through beagle-settings. (Mario Manno, Joe) * I18nize the .desktop files. (Gabor Kelemen) * Fixed best to start up correctly on AMD64. (Jack Miller) * Added beagle-dump-index tool. (Jon) * Fixed beagle-extract-content to always report the mime type (even if there is no matching filter) and to always sort the properties before printing them. (Jon) Web Services: * Enable beagled with web services to support --replace option, and to shutdown and restart cleanly. (KN Vijay) Translations: * Updated Canadian English translation. (Adam Weinberger) * Updated Bulgarian translation. (Vladimir Petkov) * Updated Spanish translation. (Francisco Javier F. Serrador) * Updated Simplified Chinese translation. (fwang) KNOWN ISSUES ------------ Yes, we know we use too much memory. We are working on it. Extreme spikes in memory usage have been observed in some cases. Certain extremely large documents (particularly large HTML files) can temporarily degrade your system's performance while they are being indexed. In most of these cases, the memory is reclaimed by the system relatively quickly after the document is indexed. There are other still-unexplained cases of excessive memory use, particularly on SMP systems. The file system is now much more robust than ever before. However, there are still race conditions that can occur with certain combinations of file system operations. In some cases it might be necessary to stop and restart the daemon. At this point in development, we cannot commit to stable APIs or file formats. You will almost certainly need to delete your indexes and start again at some point in the future.