Joe Shaw | 11 Nov 20:33 2005
Picon

ANNOUNCE: Beagle 0.1.2

[ Jon is having some email problems, so I'm sending this out on his
behalf.  -j ]

I'm pleased to announce the release of Beagle 0.1.2.

Since 0.1.1, we've been focusing on fixing bugs and reducing our memory
consumption.  We still use too much memory, but the situation is definitely
improving.  This is, without a doubt, our best release yet.

This release does contain one major new feature: date-range queries are now
supported.  A side effect of this change is that Beagle's search indexes
will need to be rebuilt.  But no need to worry... everything will happen
automatically when you run the new daemon.

Also, this version of Beagle requires the just-released Mono 1.1.10.
Upgrading is definitely worth your while: all earlier versions of Mono
contained serious bugs in the io-layer that could cause Beagle to crash or
lock up, and 1.1.10 also has optimizations that help reduce Beagle's memory
consumption.  All of the cool kids are running it, so why aren't you?

OUR MANY URLS
-------------

To download the 0.1.2 tarball or learn more, visit the Beagle wiki at:
http://www.beagle-project.org

The latest gossip is available at:
http://www.planetbeagle.org

Nat Friedman made some cool movies that demonstrate Beagle in action:
http://nat.org/demos

We still talk about Beagle on the dashboard-hackers mailing list:
http://mail.gnome.org/mailman/listinfo/dashboard-hackers

It is possible to find the k-th largest element of a unsorted list
of N numbers in O(N) time:
http://en.wikipedia.org/wiki/Selection_algorithm

WHAT IS BEAGLE?
---------------

Beagle is a tool for indexing and searching your data.  Beagle is improving
rapidly on many fronts, and should work well enough for everyday use.

The Beagle daemon transparently monitors your data and updates the index
to reflect any changes.  On an inotify-enabled system, these updates happen
more-or-less in real time.  So for example,

* Files are immediately indexed when they are created, are re-indexed
  when they are modified, and are dropped from the index upon deletion.
* E-mails are indexed upon arrival.
* IM conversations are indexed as you chat, a line at a time.

Beagle uses the Lucene indexing system from the prodigious Doug
Cutting.

Best is a graphical tool for searching the index that the daemon creates.
Best doesn't query the index directly; it passes the search terms to the
daemon and the daemon sends any matches back to Best.  Best then renders the
results and allows you to perform useful actions on the matching objects.

Indexing your data requires a fair amount of computing power, but the Beagle
daemon tries to be as unobtrusive as possible.  It contains a scheduler that
works to prioritize tasks and control CPU usage, based on whether or not
you are actively using your workstation.

DEPENDENCY HECK
---------------

Beagle has many dependencies, and thus can be difficult to compile.
It requires:
* Mono 1.1.10 or better, along with the full Mono stack
* gtk-sharp 2.3.90 or better
* Gecko-sharp 2.0
* Gmime 2.1.16
* Libexif 0.5.7 or better

For the best possible Beagle experience, you should also have:
* Evolution-sharp 0.10.2
* libgsf 1.12.1 and gsf-sharp 0.6
* Either wv 1.2.0, or a *patched* wv 1.0.3 --- the patch is available from
  http://users.avafan.com/~fredrik/beagle/wv-libole2-readonly.patch
* An inotify 0.24-enabled kernel.  Inotify is in the mainline Linux
  kernel as of 2.6.13.

CHANGES SINCE 0.1.1
-------------------

Daemon/Infrastructure:
* Added date range searches. (Joe Shaw, Jon Trowbridge)
* Fixed a bug where sending a query to the daemon would cause a helper
  process to start.  (Joe)
* Fixed a bug in libbeagle to use our query parser rather than plain text.
  (Joe)
* Added keyword based query support e.g. title:beagle. (D Bera)
* Updated pruning of old log files. (Lukas Lipka)
* Updated to dotLucene 1.9 RC1. (Daniel Drake)
* Lucene locking bug fix. (Daniel)
* Small lucene optimizations. (Daniel)
* Consolidated glue code into two libraries: libbeagleglue and
  libbeagleuiglue. (Daniel)
* Fixed two scheduler-related crashes associated with a null task source.
  (Daniel)
* New Uri serialization scheme. (Daniel)
* Allow inotify to be build conditionally. (Daniel)
* Updated our local copy of SqliteClient with recent upstream changes.
  (Daniel)
* Switched from Mono.Posix (which is deprecated) to Mono.Unix. (Daniel)
* Fixed some possible instances of unhandled exceptions in the
  inotify and scheduler code. (Joe)
* Fixed a bug in the scheduler's immediate priority throttling code.
  (Joe)
* Added catch-all exception handlers to beagled, the index helper, and
  beagle-build-index so that if an unhandled exception happens, the
  program exits immediately and doesn't leave around a hung process.
  (Joe)
* Fixed a race in which the user could start a query against an empty
  index, documents could be added to the index, but live queries would
  never be updated until the query was rerun. (Joe)
* Fixed a bug in which the document count in the daemon wasn't being
  updated after a flush in the helper. (Joe)
* Converted all times stored in the index and file attribute store to UTC,
  to avoid time zone issues. (Jon, Joe, Bera)
* Added a status infrastructure to the daemon, which allows clients to see
  if the daemon is in the process of indexing.  (Joe)
* Fixed some bugs related to date range searches, particularly the start
  date.  (Joe)
* Reuse StringBuilders in scheduler and query code. (Daniel)
* Fixed handling of dangling locks. (Daniel)
* Fixed another lucene leak. (Marcus)
* Switched to thread-local static buffers when writing indexables out to
  temporary files.  This avoids a lot of allocations. (Jon)
* Reuse StringBuilders in the IM Log parser and DirectoryWalker. (Jon)
* Small improvements to the beagled and index helper wrapper scripts. (Jon)
* Revamped logging. (Jon)
* Fixed marshaling of C strings to StringBuilders in sys_readdir. (Joe,
  Jon)
* Relative paths in BEAGLE_HOME and BEAGLE_STORAGE now work. (Jon)
* Added --indexing-test-mode to the daemon, which causes it to shut
  down automatically when indexing is complete. (Jon)

Backends:
* Moved gaim and kopete log parsing into a filter.  This can dramatically
  decrease the amount of memory used by the daemon. (Daniel)
* If a shutdown is request while the Evolution mail crawler is running,
  short-circuit for a faster shutdown. (Joe)
* Add a new inotify-based method of writing data out to the indexing
  service. (Joe)
* Fixed a bug in which certain backends would index much more slowly if
  they had already seen data, like mail. (Joe)
* Changed Liferea, Blam and Akregator backends to use stream parsing
  instead of serializer. (D Bera)
* Better warning if kmail backend finds bad kmail mfolder. (D Bera)
* Index the full name in the addressbook backend, so that searches on
  middle names match.  (Joe)
* In the file system backend, we now store the file extension in a property
  and drop the file extension in the property containing the
  "textified" name. (Jon)

Filters:
* Fixed 316120 - PPT filter crash due to gsf-sharp. (Veerapuram Varadhan)
* Fixed DOC filter/wv1-glue to be compatible with wv-1.2.0. (Varadhan)
* Better infrastructure for PPT and DOC filter. (Varadhan)
* Exclude "include", "main", and "NULL" from the C filter, as those are
  very common, albeit not language keywords.  (Joe)

UI/Tools:
* Ported the Firefox extension to use the new indexing service, which
  speeds up the extension and uses vastly fewer resources.  (Joe)
* Bumped the supported version of the Firefox extension to 1.5, as it is
  reported to work with the Firefox 1.5 betas. (Joe)
* Adding clear function to best. (Dennis Snell)
* Save Best window position, dimension and search history across
  sessions. (D Bera)
* Fixed a bug in mail tiles for determining if the hit is a
  mail-attachment. (D Bera)
* Sanitize beagle-index-url to our standards. (Lukas)
* Show artist in music tile. (Lukas)
* ImLogViewer tweaks and improvements. (Daniel)
* Use more gtk-sharp bindings for better GNOME integration. (Daniel)
* Make the number of items displayed in Best configurable through
  beagle-settings.  (Mario Manno, Joe)
* I18nize the .desktop files.  (Gabor Kelemen)
* Fixed best to start up correctly on AMD64. (Jack Miller)
* Added beagle-dump-index tool. (Jon)
* Fixed beagle-extract-content to always report the mime type (even if
  there is no matching filter) and to always sort the properties
  before printing them. (Jon)

Web Services:
* Enable beagled with web services to support --replace option, and to
  shutdown and restart cleanly. (KN Vijay)

Translations:
* Updated Canadian English translation. (Adam Weinberger)
* Updated Bulgarian translation. (Vladimir Petkov)
* Updated Spanish translation. (Francisco Javier F. Serrador)
* Updated Simplified Chinese translation. (fwang)

KNOWN ISSUES
------------

Yes, we know we use too much memory.  We are working on it.

Extreme spikes in memory usage have been observed in some cases.  Certain
extremely large documents (particularly large HTML files) can temporarily
degrade your system's performance while they are being indexed.  In most
of these cases, the memory is reclaimed by the system relatively quickly after
the document is indexed.  There are other still-unexplained cases of excessive
memory use, particularly on SMP systems.

The file system is now much more robust than ever before.  However, there
are still race conditions that can occur with certain combinations of
file system operations.  In some cases it might be necessary to stop and
restart the daemon.

At this point in development, we cannot commit to stable APIs or file formats.
You will almost certainly need to delete your indexes and start again at some
point in the future.

Gmane