Could not open file that its path contains Unicode characters.

Bug #1374142 reported by Razi Alavizadeh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qpdfview
Fix Released
Medium
Razi Alavizadeh

Bug Description

Steps to reproduce:
1- Rename a PDF (or DjVu or ...) file to (just for example) Čech.pdf
2- It says 'Could not open file: /path/to/Čech.pdf'

Strange... I also used QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII without success.

Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello Razi,

I suspect a Windows-specific problem because I cannot reproduce this on two different Linux distributions. Did you try to open the file using the command-line or via the GUI?

Regards, Adam.

Revision history for this message
Razi Alavizadeh (srazi) wrote :

Hello Adam,

I have the same problem when opening file from command-line and GUI.

I steal the following code from DjView and it works for djvu files (I'm using DDJVUAPI_VERSION >= 19):
#if DDJVUAPI_VERSION >= 19
    ddjvu_document_t* document = ddjvu_document_create_by_filename_utf8(context, filePath.toUtf8(), FALSE);
#else
    ddjvu_document_t* document = ddjvu_document_create_by_filename(context, QFile::encodeName(filePath), FALSE);
#endif

> I cannot reproduce this on two different Linux distributions.

Maybe you can test it when you set codec for locale to codec that my Windows uses for example by the following code?
QTextCodec::setCodecForLocale(QTextCodec::codecForName("Windows-1256"));

Maybe a Qt bug...? Because using "QFile::encodeName(filePath)" within a QMessageBox doesn't print the correct string!

Best Regards,
Razi.

Revision history for this message
Adam Reichold (adamreichold) wrote : Re: [Bug 1374142] Re: Could not open file that its path contains Unicode characters.

Hello again,

Am 27.09.2014 um 22:13 schrieb S. Razi Alavizadeh:
> Hello Adam,
>
> I have the same problem when opening file from command-line and
> GUI.
>
> I steal the following code from DjView and it works for djvu files
> (I'm using DDJVUAPI_VERSION >= 19): #if DDJVUAPI_VERSION >= 19
> ddjvu_document_t* document =
> ddjvu_document_create_by_filename_utf8(context, filePath.toUtf8(),
> FALSE); #else ddjvu_document_t* document =
> ddjvu_document_create_by_filename(context,
> QFile::encodeName(filePath), FALSE); #endif

This seems like something we should consider including in any case?

>> I cannot reproduce this on two different Linux distributions.
>
> Maybe you can test it when you set codec for locale to codec that
> my Windows uses for example by the following code?
> QTextCodec::setCodecForLocale(QTextCodec::codecForName("Windows-1256"));

If
>
I try this, then yes I can't open file with two-byte UTF-8
characters in their names but I think this is only because the codec
chosen by Qt does not match the system locale. Indeed if run qpdfview
in a locale that does not match the system locale, I get exactly the
same effect. (And various warning by the Gtk+-based file dialog that
my text coding is messed up.)

Also, if I change my system locale to e.g. a latin1-based locale
instead of UTF-8, I can't open file with two-byte UTF-8 characters in
their names even though I can open them if I replace those with the
proper latin1-characters (e.g. two-byte 'ä' by single-byte latin1 'ä').

What I would conclude from this for now, is that the most probable
explanation for what you're seeing is an inconsistent codec
configuration somewhere in your software stack.

> Maybe a Qt bug...? Because using "QFile::encodeName(filePath)"
> within a QMessageBox doesn't print the correct string!

Possible, but I suspect that this is unlikely. I think it is more
likely that we do not handle the encoding issues properly within
qpdfview or that some other library like DjVuLibre or Poppler is
configured incorrectly.

My suggestion for isolating this would be to write a minimal Qt test
program that tries to open such a file using QFile and then trying to
make some library like Poppler open it. (Please also check whether the
encoding of the file name does actually match your system text codec.)

> Best Regards, Razi.
>

Best regards, Adam.

Revision history for this message
Razi Alavizadeh (srazi) wrote :

Hello again,

> I steal the following code from DjView and it works for djvu files
> > (I'm using DDJVUAPI_VERSION >= 19): #if DDJVUAPI_VERSION >= 19
> > ddjvu_document_t* document =
> > ddjvu_document_create_by_filename_utf8(context, filePath.toUtf8(),
> > FALSE); #else ddjvu_document_t* document =
> > ddjvu_document_create_by_filename(context,
> > QFile::encodeName(filePath), FALSE); #endif
> This seems like something we should consider including in any case?

Yes, I think it completely prevent future locale issues like this one.

After some reading and testing I found that "QFile(filePath).open()" works
correctly because, IO part of Qt uses Win32 API to create a handle to file
and for this purpose Win32 APIs use "wchar_t*" that can provided for
example by "(wchar_t*)filePath.utf16()"

Fortunately Poppler at least from version 0.12 has a constructor specially
designed for Windows ( i.e.: PDFDoc(wchar_t* fileName,...) ) but
unfortunately Qt interface for Poppler doesn't use it on older version
(again version 0.12 that I'm using) but on newer versions (I just saw the
code of version 0.26) it uses this constructor and I'm pretty sure this
issue doesn't occur for these newer versions.

But every method that I tested for converting "wchar_t*" to "char*" was not
successful, because of this I don't know the solution for other plugins
(PS, Fitz).

What I would conclude from this for now, is that the most probable
> explanation for what you're seeing is an inconsistent codec
> configuration somewhere in your software stack.

But I tested on different machines (Windows 7 and 8.1), however locale of
them set to "Persian" as my computer.

Please also check whether the
> encoding of the file name does actually match your system text codec.

I renamed file on my system then its new name should use system text codec.

Best Regards,
Razi.

Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello again,

ok, so the situation on Windows is probably not related to what I am seeing running Linux which is of course rather unfortunate. But from what you describe, the real cause seems to be using outdated versions of the various format libraries. Could you test with a recent version of Poppler?

I will apply the DjVu fix to trunk and as for Fitz and libspectre, I think the only solution for these libraries would be to become Unicode-aware since converting "wchar_t*" to "char*" just isn't possible with an encoding like Unicode (in contrast to utf-8 where you can just pass on "char*" if you do not interpret as characters anywhere). One could however check how SumatraPDF handles this under Windows using Fitz for example.

Best regards, Adam.

Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello again,

at least for Fitz, it looks like one should pass explicitly utf-8 encoded file names which will then be converted to Unicode by Fitz itself. I pushed this to trunk as well. Could you try that out using Fitz on Windows?

As there seems no proper solution for libspectre, if you can verify that current versions of Poppler, DjVuLibre and Fitz work with the latest trunk on Windows, I would consider this issue resolved?

Best regards, Adam.

Changed in qpdfview:
status: New → Triaged
Revision history for this message
Razi Alavizadeh (srazi) wrote :

Hello again,

I'll test it when I found or (could bulid) binary for Fitz and Poppler and
the report back the result here.

Now the DjVu plugin works.

Best Regards,
Razi.

2014-09-29 23:56 GMT+03:30 Adam Reichold <email address hidden>:

> Hello again,
>
> at least for Fitz, it looks like one should pass explicitly utf-8
> encoded file names which will then be converted to Unicode by Fitz
> itself. I pushed this to trunk as well. Could you try that out using
> Fitz on Windows?
>
> As there seems no proper solution for libspectre, if you can verify that
> current versions of Poppler, DjVuLibre and Fitz work with the latest
> trunk on Windows, I would consider this issue resolved?
>
> Best regards, Adam.
>
> ** Changed in: qpdfview
> Status: New => Triaged
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1374142
>
> Title:
> Could not open file that its path contains Unicode characters.
>
> Status in qpdfview:
> Triaged
>
> Bug description:
> Steps to reproduce:
> 1- Rename a PDF (or DjVu or ...) file to (just for example) Čech.pdf
> 2- It says 'Could not open file: /path/to/Čech.pdf'
>
> Strange... I also used QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII
> without success.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qpdfview/+bug/1374142/+subscriptions
>

--
Alavizadeh, Sayed Razi
My Blog: http://pozh.org
Saaghar (نرم‌افزار شعر): http://saaghar.pozh.org/
Saaghar Fan Page: http://www.facebook.com/saaghar.p
Saaghar Mailing List: http://groups.google.com/group/saaghar

Revision history for this message
Razi Alavizadeh (srazi) wrote :

Hello Adam,
Today I downloaded binary for Poppler-0.22.3 and it works well.

Best Regards,
Razi.

Revision history for this message
Razi Alavizadeh (srazi) wrote :

Hello again,
Unfortunately I couldn't compile libmupdf (indeed it compiled successfully with static build but Fitz plugin build is exited with lots of link errors) but after read its source-code I saw on Windows it converts file-name from char* to wchar_t* and use Win32 API to open file, because of this I think your patch fixes this issue for Fitz plugin.

Best Regards,
Razi.

Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello again,

since the Fitz plug-in is still considered experimental anway and the other open issues have been reported resolved, I'll consider this fixed.

Best regards, Adam.

Changed in qpdfview:
status: Triaged → Fix Committed
importance: Undecided → Medium
assignee: nobody → S. Razi Alavizadeh (srazi)
milestone: none → 0.4.12
Changed in qpdfview:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.