scp cuts UTF8 filenames by bytes instead of characters

Bug #218741 reported by Bogdan Butnaru
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

Binary package hint: openssh-client

This is for up-to-date Ubuntu Hardy.

I left some files for copying today using scp (from the openssh-client package). I happened to look at the output and noticed some “bad character” symbols on the terminal, as pasted below. (These were copy-pasted from the console, in a completely UTF8-based environment. Note that the weird characters may confuse your browser, make sure it's detected the correct encoding.)

21 Lied pentru voce și pian „Regen”.flac 100% 9938KB 1.9MB/s 00:05
03 Lieduri pentru tenor și pian, op. 15 „S 100% 2524KB 2.5MB/s 00:01
18 Lied pentru voce și pian „Frauenberuf�� 100% 11MB 1.3MB/s 00:09
06 Lieduri pentru tenor și pian, op. 15 „S 100% 8961KB 2.2MB/s 00:04
09 Lieduri pentru bas și pian, op. 4 „Troi 100% 11MB 1.4MB/s 00:08
[after resizing the window]
10 Suita nr. 3 pentru orchestră, op. 27 „Săteasca”_ „Pârâu sub lun� 100% 13MB 2.6MB/s 00:05

As it happens, the next character on each filename of the two weird lines was, respectively, ” and ă, both of which are of course displayed correctly in other places on the same output. Based on this and the misalignment of the last columns, I think scp cuts too-long-names by counting bytes rather than characters. This is obviously wrong in UTF8, since some characters can contain several bytes, in which case the lines would be cut too early, and occasionally in the “middle” of a character, thus displaying garbage.

Revision history for this message
Bogdan Butnaru (bogdanb) wrote :

This is particular bug is relatively easy but complicated to fix. The problem lies with function refresh_progress_meter in progressmeter.c, which indeed cuts strings by characters.

The problem with the cut-up characters can be solved relatively simply by measuring more carefully where to cut the string, using a locale-sensitive function. The alignment problem is harder, because it depends on the terminal's ability to display combining characters and full-width ones.

Colin Watson (cjwatson)
Changed in openssh:
importance: Undecided → Low
status: New → Confirmed
status: Confirmed → Triaged
Revision history for this message
Colin Watson (cjwatson) wrote :

I did some experimentation today with different "stty cols" settings, and I'm pretty sure this is fixed. I believe that the commit that fixed it was probably https://anongit.mindrot.org/openssh.git/commit/?id=0e059cdf5fd86297546c63fa8607c24059118832 (note in particular "take character display widths into account for the progressmeter", in which case this has been fixed since OpenSSH 7.3p1, so since Ubuntu 16.10.

Changed in openssh (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.