2 Feb 2006 17:00
Re: get_data not fast enough for query matches
Salem Berhanu <salemb4 <at> hotmail.com>
2006-02-02 16:00:29 GMT
2006-02-02 16:00:29 GMT
I'm sorry. I was trying to explain, I guess it wasn't clear enough. Basically, I want users to be able to search different parts of a document. For instance I want them to be able to search a title that contains the term 'data compression' and in the description 'Rate-Distortion theory'. This is the main reason I'm using several dbs. In addition I read that it's better to have smaller dbs for better performance. (Maybe it's wrong) I don't actually run out of space when I grab the data, it just takes a long time. For instance I wrote a small query script to search for a term, let me know how many matches it finds and then loops throught the match getting the data. I search for the word theory in description, within the first 7 seconds it tells me it found 137480 which is good but then it takes 2m15s to grab the data for each match. Salem >From: James Aylett <james-xapian <at> tartarus.org> >To: Salem Berhanu <salemb4 <at> hotmail.com>, xapian-discuss <at> lists.xapian.org >Subject: Re: [Xapian-discuss] get_data not fast enough for query matches >Date: Thu, 2 Feb 2006 15:15:48 +0000 > >On Thu, Feb 02, 2006 at 03:00:54PM +0000, Salem Berhanu wrote: > > > I am not storing anything in the document data other than the > > ids. Eventually I will link to an external database but I will do it > > in ranges of not more than 50 at a time. However, I will need the > > initial compete ids to combine with results from other xapian > > dbs. This is because each document is broken up into chunks (since > > the information can be logically divided) and indexed in separate > > dbs. (eg. there is a title db, a description db ... ) I want to be > > able to combine the results across these dbs using boolean > > expressions (since I am assuming there isn't a built in way of doing > > this). > >I'm sorry, there's not much point in trying to answer questions like >this without understand what you're actually trying to achieve. I >don't know why you're using several different databases, for >instance. > >Let's go back to the beginning. What sort of performance issues are >you actually seeing? Have you investigated performance using the >standard performance tools for your platform? For instance, are you >running out of file buffer cache at the point that you start accessing >the document data? Other people have handled larger data sets that >you, so it may be that your hardware or configuration isn't quite >right for what you're trying to do. > >(And please keep the discussion on list where others can help you as >well> >Cheers, >James > >-- >/--------------------------------------------------------------------------\ > James Aylett xapian.org > james <at> tartarus.org uncertaintydivision.org
>
>Cheers,
>James
>
>--
>/--------------------------------------------------------------------------\
> James Aylett xapian.org
> james <at> tartarus.org uncertaintydivision.org
RSS Feed