17 Apr 2012 07:47
Re: [xapian] a bug fixed in brass_database.cc
Weixian Zhou <ideazwx <at> gmail.com>
2012-04-17 05:47:38 GMT
2012-04-17 05:47:38 GMT
Thanks Olly,
First I am sad to be the first negative case study, my fault to rush coding in order to deliver asap :(
The attachment is my revised patch which looks better.
The attachment is my revised patch which looks better.
1. The code appears well format in github, but appears mess in my vim after clone (maybe sth. wrong with tab/space indent)
2. So I now generate the new patch with '-w';
3. Currently the patch have not been tested and I still am working on that;
4. The modified simpleindex was intend for my own testing, which I think maybe helpful for other developers.
--
Weixian Zhou
Department of Computer Science and Engineering
University at Buffalo, SUNY
On Mon, Apr 16, 2012 at 10:14 PM, Olly Betts <olly <at> survex.com> wrote:
On Sat, Apr 14, 2012 at 07:02:58PM -0400, Weixian Zhou wrote:Thanks for your patches.
> I fixed a bug in brass_database.cc.
One minor but important issue - your indentation doesn't match the
existing code. Set your editor to indent by 4 spaces, with tabs
displayed as 8 spaces, and indents of 8 or more spaces tab-filled.
Full details are in HACKING.
Also, in a patch, it's better to just remove old code rather than
commenting it out. If people want to see the old code, they can look in
the version control system.
Looking at the flush threshold patch first, don't put your initials or
name in comments you add (or at least not as a matter of course) - if
people want to know who wrote a method, the version control system can
tell them (e.g. git blame or svn blame). Updating copyright attribution
is the big exception to that.
It's better to make comments like that one in brass_inverter.h doxygen
format comments so they appear in the generated documentation for
internal APIs. Mostly that means use "/**" or "///" to start them,
though there are various formatting rules too.
There's no need to define a destructor if it's empty - you get an
empty destructor by default.
Inverter::get_inverter_size() is just an accessor method, so should
really be const.
The tracking of posting_count seems incorrect to me:
* You increment it for each call to add_posting() (OK so far)
* But you don't increment it at all for remove_posting(), which often
also adds entries.
* You only decrease it by own when a posting list is written out, not
by the number of entries written.
* The default threshold of 10000 is going to be too low now.
It would also really be better (though much harder) to track the
memory actually used, since that's what matters more to users.
You don't remove the existing change_count variable, but just remove the
code which updates it when documents are added or updated or removed, so
it will always be zero (unless the user asks for allterms) which will
cause various issues (like commit() not actually flushing changes).
This causes a number of testcases to fail with your changes applied -
try "make check" to see.Interesting, but we already have omindex for indexing a directory tree,
> I also modified the simpleindex.cc so that it now supports batch files
> indexing.
while simpleindex is intended to be *simple*. So I'm not sure we want
to change it to scan directories, particularly since the code to do that
depends on the OS (we have a adaptor layer for this for the MSVC build,
but examples should ideally not have to drag in such code as it makes
them harder to understand).
A list of text files on the command line might be a reasonable
alternative (or perhaps better as a separate example).
Cheers,
Olly
Weixian Zhou
Department of Computer Science and Engineering
University at Buffalo, SUNY
_______________________________________________ Xapian-devel mailing list Xapian-devel <at> lists.xapian.org http://lists.xapian.org/mailman/listinfo/xapian-devel
RSS Feed