Comment 3 for bug 71386

Revision history for this message
Luzius Thöny (lucius-antonius) wrote :

ok, let me explain my purpose: in a nutshell, i want to make some statistics as to the frequencies of the indiviudal symbols in a specific text. for example, i want to know how much more frequent an 's' is compared to a 't'. the way to achieve this is to split the text up so that every letter/symbol occurs on an individual line, then sort it, and finally count the lines with the same symbol using 'uniq -c'. my sed script is intented to do just this (except the 'uniq -c' part), and i believe it is correct the way i wrote it.

the result i'm currently getting from the script run on the above text is attached, and it just looks very wrong to me. you may see that the normal letters (like 'n', 'r', or 's') are correctly sorted onto adjacent lines in the result, but not the IPA-symbols like 'ʃ' or 'ʌ', which occur in different places of the resultfile.