In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table.
I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman.
I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well.
For testing purposes, here are a few sample words (in page titles, artefact titles, and user names) that have caused Elasticsearch to choke:
João
Jiménez
Māori
It's not clear from our situation whether the problem lies in our Elasticsearch setup, or in Mahara's code. I think it may be something peculiar to our server setup because I haven't been able to replicate the problem on my local machine.