special characters in directory and files names

Bug #596472 reported by Fa10175
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
webtrees
Won't Fix
Undecided
Unassigned

Bug Description

if the directory or file name has accents or characters like (àéèùçâïâ ... as in french) the media reports "file name found" however the thumbnail is displayed, MediaViewer displays also the file.

Revision history for this message
fisharebest (fisharebest) wrote :

What operating system is the client?
What operating system is the server?
Were the files uploaded using HTTP or FTP?
If HTTP, which page in webtrees was used for the upload?
What are the steps to reproduce?

Revision history for this message
Fa10175 (fa10175) wrote :

My operating system is XP SP3
i work on local with wampserver apache 2.2.14 PHP 5.3.1 MySQL 5.1.41
webtrees 8916

the file are under /Media /dira/dirà/file.pdf or .jpg
I put Dira/dirà/file by copy (because local)

inside my gedcom i have to this person
1 FORM jpg
1 FILE ../../dira/dirà/file

I select the person i see as capture-image1.jpg
I click on name-file
i saw capture-image2-pb.jpg

when i check the file under /thumbs/dira/dirà/file
the file exist so like that I can say it find the file to put under thumbs but not to affich from /Media /dira/dirà/file

I hope you understand me in my explanations

Revision history for this message
fisharebest (fisharebest) wrote :

OK - thanks for the information. I will investigate later.

I think the problem is possibly that the filename (in the gedcom, in the database, in webtrees, etc.) is UTF-8.

However, the filename is windows ISO-8859-1

So, these are different "à" characters.

Revision history for this message
fisharebest (fisharebest) wrote :

I still cannot reproduce this problem.

Revision history for this message
Fa10175 (fa10175) wrote :

The problem exist also with phpgedview svn6965
inside the gedcom i have
1 CHAR UTF-8

but the encodage is UTF-8 no BOM (notepad++) for the file gedcom

I don't no if there are a link.

Revision history for this message
kiwi (kiwi3685-deactivatedaccount) wrote :

Are you sure this is a character problem?

As you copied the files by FTP, are you sure that the /media folder and ALL its sub-folders and files have the correct (full read/write) permissions?

Do you have the correct number of "Multimedia directory Levels to keep" set (in Gedcom Administration/Configuration/Multimedia). If that is still at the default '0' you will get a similar "File not found" response.

Revision history for this message
Fa10175 (fa10175) wrote :

my test is on
windows XP (french)
wampserver apache 2.2.14 PHP 5.3.1 MySQL 5.1.41
on local , so i don't need to transfert file using FTP
all file are full read/write
Multimedia directory Levels is set to 10
Inside the GEDCOM declare UTF-8 when i edit it with notepad++ i use UTF-8 no BOM
the file is declare as
1 FORM jpg
1 FILE /dira/dirè/dirb/file.jpg

the file are put on
media/dira/dirè/dirb/file.jpg
by copy windows

the copy to thumbs directory work good , the affich from thumbs directory is good also
the problem appear just i show on capture.zip file.

and if i replace è by c and i have no problem.

Revision history for this message
fisharebest (fisharebest) wrote :

[[ 1 FILE /dira/dirè/dirb/file.jpg ]]

The GEDCOM file is encoded in UTF-8. This means that dirè is stored as 5 bytes: 64-69-72-C3-A8

[[ media/dira/dirè/dirb/file.jpg ]]

Windows filenames are encoded in CP1252. This means that dirè is stored as four bytes: 64-69-72-E8

But I still cannot reproduce your problem. I create a file media/dirè.jpg (on windows XP), and load this gedcom

0 HEAD
1 CHAR UTF-8
0 @I1@ INDI
1 NAME test
1 OBJE @M1@
0 OBJE @M1@
1 FILE media/dirè.jpg
0 TRLR

NOTE: that this gedcom is encoded in UTF-8. It was created with notepad.exe, and saved as UTF-8 format. If I open the file with a different editor, I see this "1 FILE media/dirè.jpg"

Now, webtrees cannot find the file at all (because of different encodings, as described above). It does not create a thumbnail, etc

Perhaps your gedcom has the header "1 CHAR UTF-8", but actually contains CP-1252 ????

Start from the beginning, with a completely new/empty gedcom. Attach the file so we can examine it.

Revision history for this message
Fa10175 (fa10175) wrote :

I test with file Famille.ged, fichier.jpg put in attachment
window xP french
wampserver
and i still have the problem.

Revision history for this message
fisharebest (fisharebest) wrote :

I copied fichier.jpg to the directory: C:\users\greg\.....\webtrees\media\Actes\Famille\Mèze\Naissance

I imported the gedcom: Famille.GED

I visited the page: individual.php?pid=1493I&ged=Famille.GED

In Capture-1.JPG and Capture-2.JPG, you have a thumbnail image - I do not have a thumbnail.

1) Did webtrees create the thumbnail - or did the file media\thumbs\Actes\Famille\Mèze\Naissance\fichier.jpg already exist ?

2) What is the creation-time of the thumbnail file ?

3) Do you have a directory called webrees\media\thumbs\Actes\Famille\Mèze ?

Revision history for this message
Fa10175 (fa10175) wrote :

1) WebTrees create media\thumbs\Actes\Famille\Mèze\Naissance\fichier.jpg itself the 8 july 2010 18:29 same for index.php
the size for fichier.jpg is 1ko
2) The thumbnail file are create the 8 july 2010 18:29
3) the directory called webtrees\ media\thumbs\Actes\Famille\Mèze\Naissance\
exist and it's webtrees to create it with the accent as webtrees\media\Actes\Famille\Mèze\Naissance where the file fichier.jpg as 28ko

Revision history for this message
Fa10175 (fa10175) wrote :

on 9053
when i go to manage multimedia i found:

Correct read/write/execute permissions
Permissions Set [777] [media/Actes/Famille/M�ze]
Permissions Set [777] [media/Actes/Famille/M�ze/Naissance]
Permissions Set [666] [media/Actes/Famille/M�ze/Naissance/fichier.jpg]
Permissions Set [777] [media/Actes/Famille/Ville]
Permissions Set [777] [media/Actes/Famille/Ville/Naissance]
Permissions Set [666] [media/Actes/Famille/Ville/Naissance/fichier1.jpg]
Permissions Set [666] [media/Actes/Famille/Ville/Naissance/fichier2.jpg]
Permissions Set [777] [media/Actes/Famille/M�ze]
Permissions Set [777] [media/Actes/Famille/M�ze/Naissance]
Permissions Set [666] [media/Actes/Famille/M�ze/Naissance/fichier.jpg]
Permissions Set [777] [media/Actes/Famille/Ville]
Permissions Set [777] [media/Actes/Famille/Ville/Naissance]
Permissions Set [666] [media/Actes/Famille/Ville/Naissance/fichier1.jpg]
Permissions Set [666] [media/Actes/Famille/Ville/Naissance/fichier2.jpg]

Mèze directory is writen M�ze
 I don't know if this can help you

Revision history for this message
fisharebest (fisharebest) wrote :

OK, I have borrowed a different windows machine, and can see the problem.

It seems to be a problem with Windows versions of PHP.

According to this bug report, it cannot be fixed in PHP5, but it will be fixed in PHP6.
https://bugs.php.net/bug.php?id=46990

When we write a filename on windows, we must assume it is ISO-8859-XXX.
When we read a filename on windows, I think that PHP attempts to convert it to ISO-8859-1 (?), but we cannot know if it tried or if it succeeded.

Some workarounds appear to be available, but they will only work for the small subset of UTF* characters that also exist in ISO-8859-1

I do not think that this can be fixed. Sorry.

If you want to use non-ascii filenames, then use Linux or MacOS.

Changed in webtrees:
status: New → Confirmed
Revision history for this message
kiwi (kiwi3685-deactivatedaccount) wrote :

This item has now been added to the WIKI pagees as an FAQ. I will therefore close the bug. It will be marked as "Won't fix" as that is the closest option we have to the real answer, which is "Can't fix".

Changed in webtrees:
status: Confirmed → Won't Fix
Revision history for this message
kiwi (kiwi3685-deactivatedaccount) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.