I think this would be a nice to address. We have some snafu in the code on how to treat underscores (xapian's TermGenerator does not break on underscores - which we shamefully exploit in some cases...). Also breaking on transitions between numerals and characters seems like a sensible thing to do.
This is also very related to breaking words in CamelCase for which I was certain there was a bug logged, but now I can't find it. I'll be using the tag 'text-analysis' to track these kinds of bugs.
I think this would be a nice to address. We have some snafu in the code on how to treat underscores (xapian's TermGenerator does not break on underscores - which we shamefully exploit in some cases...). Also breaking on transitions between numerals and characters seems like a sensible thing to do.
This is also very related to breaking words in CamelCase for which I was certain there was a bug logged, but now I can't find it. I'll be using the tag 'text-analysis' to track these kinds of bugs.