`find --name` ignores files with non-printable character in the filename

Bug #1742011 reported by H.-Dirk Schmitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
findutils
Unknown
Unknown
findutils (Ubuntu)
New
Undecided
Unassigned

Bug Description

Downloading some files from ARTE via Mediathekview I retrieved files like this:

ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Gr�s_du_Jazz-1734575236.mp4

find . is showing the the file:

> $ find .
> .
> ./ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Gr?s_du_Jazz-1734575236.mp4

But if I invoke `find . -name "*mp4"` nothing is found.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: findutils 4.6.0+git+20160126-2
ProcVersionSignature: Ubuntu 4.10.0-43.47~16.04.1-generic 4.10.17
Uname: Linux 4.10.0-43-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
CurrentDesktop: GNOME
Date: Tue Jan 9 01:12:36 2018
SourcePackage: findutils
UpgradeStatus: No upgrade log present (probably fresh install)

c42-bugnr: #3922

Revision history for this message
H.-Dirk Schmitt (dirk-computer42) wrote :
Revision history for this message
H.-Dirk Schmitt (dirk-computer42) wrote :

For the case that the unicode character has been lost – here the hexdump:

echo ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Gr?s_du_Jazz-1734575236.mp4 |hexdump
0000000 5241 4554 435f 6e6f 6563 7472 2d5f 4a5f
0000010 7a61 2d7a 7641 7369 6168 5f69 6f43 6568
0000020 5f6e 415f 5f75 7247 73c3 645f 5f75 614a
0000030 7a7a 312d 3337 3534 3537 3332 2e36 706d
0000040 0a34
0000042

Revision history for this message
H.-Dirk Schmitt (dirk-computer42) wrote :

find -D search -name "*4"
consider_visiting (early): ".": fts_info=FTS_D , fts_level= 0, prev_depth=-2147483648 fts_path=".", fts_accpath="."
consider_visiting (late): ".": fts_info=FTS_D , isdir=1 ignore=0 have_stat=1 have_type=1
consider_visiting (early): "./ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Gr\303s_du_Jazz-1734575236.mp4": fts_info=FTS_NSOK, fts_level= 1, prev_depth=0 fts_path="./ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Gr\303s_du_Jazz-1734575236.mp4", fts_accpath="ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Gr\303s_du_Jazz-1734575236.mp4"
consider_visiting (late): "./ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Gr\303s_du_Jazz-1734575236.mp4": fts_info=FTS_NSOK, isdir=0 ignore=0 have_stat=0 have_type=1
consider_visiting (early): ".": fts_info=FTS_DP, fts_level= 0, prev_depth=1 fts_path=".", fts_accpath="."
consider_visiting (late): ".": fts_info=FTS_DP, isdir=1 ignore=1 have_stat=1 have_type=1

Revision history for this message
H.-Dirk Schmitt (dirk-computer42) wrote :

Finally the proof that it is the one mystic unicode character:

$ mv ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Gr?s_du_Jazz-1734575236.mp4 ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Grs_du_Jazz-1734575236.mp4
$ find . -name "*4"
./ARTE_Concert_-_Jazz-Avishai_Cohen__Au_Grs_du_Jazz-1734575236.mp4

summary: - find can't find several files with unicode characters
+ find can't find several files with unicode characters if --name is used
description: updated
Revision history for this message
H.-Dirk Schmitt (dirk-computer42) wrote :

Having a second look I found the mitigation cleaning up the filename with the help of: `tr --delete --complement '[:print:]`

So it is obvious that the characters breaking `find … -name` are non-printable characters.

summary: - find can't find several files with unicode characters if --name is used
+ `find --name` ignores files with non-printable character in the filename
Revision history for this message
H.-Dirk Schmitt (dirk-computer42) wrote :

Simple test case:

touch $(echo -e ERR'\0303'OR )
touch NON_ERROR
find . -name "*ERR*"

Revision history for this message
H.-Dirk Schmitt (dirk-computer42) wrote :

Can reproduce the problem in artfull and bionic.

tags: added: artful bionic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.