Comment 51 for bug 131094

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

martin,

it does not work like that

the 16MB hit buffer is sufficient for about 64MB of text (as we only store unique and valid words) so your 6mb text easily fits into it - it would only update the index once all the new stuiff is indexed or the buffer overflows

the problem is updating an existing index - each word (and you could have 100,000 + words that need updating) requires a seek and then a write. Ext3 performs really badly with such seek read seek write patterns

if we do it in one shot then pdflush could hog the disk and deny access to other apps but this would be the fastest way to update with the least thrashing

we currently do it incrementally 1000-5000 words at a time followed by fsync so it will take longer but should not delay access to disk to other apps for more than a few secs

At the moment we cannot really improve things here further until ext3 or whatever causes the bad performance is fixed.