scp cuts UTF8 filenames by bytes instead of characters
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openssh (Ubuntu) |
Fix Released
|
Low
|
Unassigned |
Bug Description
Binary package hint: openssh-client
This is for up-to-date Ubuntu Hardy.
I left some files for copying today using scp (from the openssh-client package). I happened to look at the output and noticed some “bad character” symbols on the terminal, as pasted below. (These were copy-pasted from the console, in a completely UTF8-based environment. Note that the weird characters may confuse your browser, make sure it's detected the correct encoding.)
21 Lied pentru voce și pian „Regen”.flac 100% 9938KB 1.9MB/s 00:05
03 Lieduri pentru tenor și pian, op. 15 „S 100% 2524KB 2.5MB/s 00:01
18 Lied pentru voce și pian „Frauenberuf�� 100% 11MB 1.3MB/s 00:09
06 Lieduri pentru tenor și pian, op. 15 „S 100% 8961KB 2.2MB/s 00:04
09 Lieduri pentru bas și pian, op. 4 „Troi 100% 11MB 1.4MB/s 00:08
[after resizing the window]
10 Suita nr. 3 pentru orchestră, op. 27 „Săteasca”_ „Pârâu sub lun� 100% 13MB 2.6MB/s 00:05
As it happens, the next character on each filename of the two weird lines was, respectively, ” and ă, both of which are of course displayed correctly in other places on the same output. Based on this and the misalignment of the last columns, I think scp cuts too-long-names by counting bytes rather than characters. This is obviously wrong in UTF8, since some characters can contain several bytes, in which case the lines would be cut too early, and occasionally in the “middle” of a character, thus displaying garbage.
Changed in openssh: | |
importance: | Undecided → Low |
status: | New → Confirmed |
status: | Confirmed → Triaged |
This is particular bug is relatively easy but complicated to fix. The problem lies with function refresh_ progress_ meter in progressmeter.c, which indeed cuts strings by characters.
The problem with the cut-up characters can be solved relatively simply by measuring more carefully where to cut the string, using a locale-sensitive function. The alignment problem is harder, because it depends on the terminal's ability to display combining characters and full-width ones.