Norm Pierce | 28 Sep 01:59 2012

Re: Patch to provide support for application/x-zerosize


On Sat, 22 Sep 2012, Thomas Leonard wrote:

> . . .
> Are you going to upstream the zero-length fix?
> 

I reported that bug upstream last week, and suggested the fix.  See the
last URL in my previous post.

> . . .
> I haven't looked at why caching is off. Should we enable it?
>

Last week I took a look into this, to see if it would be worth trying to
get ROX-Filer working with mime.cache.

I got it going (with the old xdgmime code that was still in ROX-Filer last
week) and did some time trials, to see if the response improved
significantly.  Although there was some improvement, my conclusion was
that it was so minor that it did not warrant the effort required to get it
going, risk subjecting users to any new bugs that might be introduced, fix
those bugs, and maintain the two sets of functions (one that is used with
cache, and one that is used without).

My opinion: In theory, cache is wonderful, and worthwhile when speed is
improved significantly, but in reality the level of complication
introduced, and the problems that can arise, make cache a detriment if the
speed improvement is only minor.

But I'll give you the results of my investigation, so you can decide for
yourself if the speed improvement is worth it.

To get ROX-Filer to use the mime.cache file (using the old xdgmime code
that was still in ROX-Filer last week), I ran into three issues:

1.  xdgmimecache.c needs to be compiled with HAVE_MMAP or
_xdg_mime_cache_new_from_file() does nothing but return NULL.

2.  The old xdgmime code supported only mime.cache file format version
1.0, but current version is 1.2.

3.  Text/binary guessing didn't work with cache because
_rox_buffer_looks_like_text() is called from
_xdg_mime_magic_lookup_data(), which is called from
xdg_mime_get_mime_type_for_data() but not from
_xdg_mime_cache_get_mime_type_for_data().

The new xdgmime code should take care of issue #2; the code looks like it
should now support version 1.1 or 1.2.

The new xdgmime code should also take care of issue #3, assuming that
David Faure's 2011 patch is working properly.

(Do note, however, that Faure's code has only a simple test for unexpected
binary bytes.  Your _rox_buffer_looks_like_text() function looked to be
doing a more extensive test, which is why I chose to retain it, rather
than replacing it with Faure's code.)

Anyway, I got ROX-Filer working using a version 1.0 mime.cache file, and
proceeded to run some tests.

I temporarily modified dir.c so that it prints the time it takes to
process dir->recheck_list, which is the closest I can get to timing just
the processing done to work through all of the files in one directory.

Then, with no instances of ROX-Filer running, I started it from the
command line and gave one directory as an argument.

I did this multiple times, using three different directories as test
subjects.

One directory held 2000 files which had neither globs nor magic that
matched anything in the shared-mime-info database.  I figured that would
be a worst-case scenario, since the code would have to work its way
through all of the globs and all of the magic signatures for each of the
2000 files.

Another directory was more of a real-world case.  It was just my /tmp/
directory which contained 124 items, 15 of which were directories, 2 were
pipes, and 1 was a socket, leaving 106 actual files of various shapes and
sizes (including two symlinks to files).

The third contained 2000 .txt files -- actually the same directory as the
first, but I had renamed the files by adding the .txt extension.  This was
to see how fast it was when it actually found a matching glob.

Here is a summary of the results (times are in milliseconds) - - -

directories:

1:  2000 files, no globs no magic
2:  124 items (106 files) various
3:  2000 .txt files

directory  first run                   subsequent scans
---------  --------------------------  --------------------------
           no cache  cache  saved  %   no cache  cache saved  %
           --------  -----  -----  --  --------  ----- -----  --
1:         570       419    151    26  540       367    173   32
2:          92        86      6     7   36        25     11   31
3:         126       123      3     2   78        70      8   10

The four columns under "first run" are times (and percent improvement)
when starting from the command line.  The four columns under "subsequent
scans"  are times (and percent improvement) when clicking the rescan
button.

Each time listed in the table is an average of at least nine (and usually
ten or more) samplings.

These times are elapsed "real time".  I also have numbers for user and sys
times, but haven't summarized them.  (The resolution for those numbers was
apparently only in centiseconds, so not as useful for testing on a small
sample size.)

As you can see, even with the "worst-case" scenario with 2000 files that
had no glob or magic match, the total savings was less than 2/10 of a
second.  Using the "real world" test on a smaller directory of assorted
stuff only saved about 1/100 of a second.  And the large directory of 2000
files that matched the .txt glob saved even less than that.

Certainly saving 2/10 of a second in a loop that is repeated many times
would be a big savings, but for something that only happens once when the
user clicks a button, it is not a lot, especially since waiting for this
processing is just a small fraction of the total time the user waits for
the directory to be displayed.  (Possibly this doesn't save any time at
all if this processing happens, finishes, and then waits while another
process, like maybe X, is busy, but I certainly don't know the code well
enough to know if that is the case.)

Anyway, bottom line: no, I don't think xdgmime cache should be enabled.

Thanks for adding the application/x-zerosize support.

Norm Pierce

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html

Gmane