1 Vote Vote

Zend Lucene exhausts memory when indexing

Posted by topdog 255 days ago Questions| zend lucene search All

An oldish site I'm maintaining uses Zend Lucene (ZF 1.7.2) as it's search engine. I recently added two new tables to be indexed, together containing about 2000 rows of text data ranging between 31 bytes and 63kB.

The indexing worked fine a few times, but after the third run or so it started terminating with a fatal error due to exhausting it's allocated memory. The PHP memory limit was originally set to 16M, which was enough to index all other content, 200 rows of text at a few kilobytes each. I gradually increased the memory limit to 160M but it still isn't enough and I can't increase it any higher.

When indexing, I first need to clear the previously indexed results, because the path scheme contains numbers which Lucene seems to treat as stopwords, returning every entry when I run this search:

$this->index->find('url:/tablename/12345');

After clearing all of the results I reinsert them one by one:

foreach($urls as $v) {
   $doc = new Zend_Search_Lucene_Document();
   $doc->addField(Zend_Search_Lucene_Field::UnStored('content', $v['data']);
   $doc->addField(Zend_Search_Lucene_Field::Text('title', $v['title']);
   $doc->addField(Zend_Search_Lucene_Field::Text('description', $v['description']);
   $doc->addField(Zend_Search_Lucene_Field::Text('url', $v['path']);
   $this->index->addDocument($doc);
}

After about a thousand iterations the indexer runs out of memory and crashes. Strangely doubling the memory limit only helps a few dozen rows.

I've already tried adjusting the MergeFactor and MaxMergeDocs parameters (to values of 5 and 100 respectively) and calling $this->index->optimize() every 100 rows but neither is providing consistent help.

Clearing the whole search index and rebuilding it seems to result in a successful indexing most of the time, but I'd prefer a more elegant and less CPU intensive solution. Is there something I'm doing wrong? Is it normal for the indexing to hog so much memory?

Originally asked by: Kaivosukeltaja on Stack Overflow

Discuss Bury


Who Voted for this Question