I used MSVC 2005 Standard Edition to build the test program, which eventually took the form below. It shows the same failures as one built with 2008 Express -- see comments with code.
I think this is not really a bug, though it definitely is a limitation of Windows due to the need to support codepage-based character sets. The best solution is clearly to pass only unicode file names to fstream c'tors, then the codepage issues disappear.
--Tom
/* RussianBug.cpp
Test program to invsetigate a bug reported in hugin where
std::ifstream fails to open files with Cyrillic names.
Original by Pablo d'Angelo and MaxTee, who reported the bug
and says..
"
I'm using WinXP, English version, Regional settings as follows:
(Control Panel/Regional Options) Locale set to: Russian;
(Control panel/Advanced) Language for non-unicode programs: Russian
I believe these settings set OEM codepage 866 and ANSI codepage 1251.
"
TKS mods:
Report the effective codepage number.
Try ifstream() with 4 filename flavors: argv[]; argv[] translated
to unicode with current codepage and with Russian one; same
argument read from commandline as unicode.
TKS findings:
The global Windows codepage setting has no effect on this program.
The codepage it reports is the one set in the "advanced" option for
non-unicode programs.
That codepage determines whether the commandline arguments are read
correctly into argv[], and apparently also whether ifstream() can
translate them correctly into unicode (which is the eventual format
passed to the OS). If they are not read correctly, unknown chars get
replaced by '?', and translating to unicode fails.
The ifstream c'tor has some polymorphous ability to accept either ANSI
or unicode filename arguments; however the ANSI ones must be supported
by the effective codepage.
When translation fails, the eventual result is a "file not found" error
from the OS -- nobody notices the string format problem.
If the commandline is read as unicode, then the special codepage
is not needed, and ifstream( unicode_name ) always suceeds.
unsigned codepage = GetACP();
printf("\nThe current Windows code page is %d\n", codepage);
/* read the command line as Unicode */
int Wargc;
LPWSTR * Wargv = CommandLineToArgvW( GetCommandLine(), & Wargc );
for(int i = 1; i < argc; i++ ){
printf("\n(ANSI) argv[%d] is '%s'\n", i, argv[i]);
printf("Targv = argv translated to Unicode with current codepage\n" );
printf("Rargv = argv translated to Unicode with Russian codepage\n" );
printf("Wargv = argv read as Unicode\n");
printf("\n");
wchar_t Targv[200];
int k = MultiByteToWideChar(
CP_ACP, // (use current codepage),
0, // DWORD dwFlags,
argv[i], // LPCSTR lpMultiByteStr,
-1, // (is null terminated)
Targv, // LPWSTR lpWideCharStr,
200 // int cchWideChar
);
Logged In: YES
user_id=1511901
Originator: NO
I used MSVC 2005 Standard Edition to build the test program, which eventually took the form below. It shows the same failures as one built with 2008 Express -- see comments with code.
I think this is not really a bug, though it definitely is a limitation of Windows due to the need to support codepage-based character sets. The best solution is clearly to pass only unicode file names to fstream c'tors, then the codepage issues disappear.
--Tom
/* RussianBug.cpp
Test program to invsetigate a bug reported in hugin where
std::ifstream fails to open files with Cyrillic names.
Bug tracker link: sourceforge. net/tracker/ index.php? func=detail& aid=1908349& group_id= 77506&atid= 550441
http://
Original by Pablo d'Angelo and MaxTee, who reported the bug
and says..
"
I'm using WinXP, English version, Regional settings as follows:
(Control Panel/Regional Options) Locale set to: Russian;
(Control panel/Advanced) Language for non-unicode programs: Russian
I believe these settings set OEM codepage 866 and ANSI codepage 1251.
"
TKS mods:
Report the effective codepage number.
Try ifstream() with 4 filename flavors: argv[]; argv[] translated
to unicode with current codepage and with Russian one; same
argument read from commandline as unicode.
TKS findings:
The global Windows codepage setting has no effect on this program.
The codepage it reports is the one set in the "advanced" option for
non-unicode programs.
That codepage determines whether the commandline arguments are read
correctly into argv[], and apparently also whether ifstream() can
translate them correctly into unicode (which is the eventual format
passed to the OS). If they are not read correctly, unknown chars get
replaced by '?', and translating to unicode fails.
The ifstream c'tor has some polymorphous ability to accept either ANSI
or unicode filename arguments; however the ANSI ones must be supported
by the effective codepage.
When translation fails, the eventual result is a "file not found" error
from the OS -- nobody notices the string format problem.
If the commandline is read as unicode, then the special codepage
is not needed, and ifstream( unicode_name ) always suceeds.
*/
#include <fstream>
#include <stdio.h>
#include <windows.h>
int main(int argc, char * argv[])
{
unsigned codepage = GetACP();
printf("\nThe current Windows code page is %d\n", codepage);
/* read the command line as Unicode */
int Wargc;
LPWSTR * Wargv = CommandLineToArgvW( GetCommandLine(), & Wargc );
for(int i = 1; i < argc; i++ ){
printf("\n(ANSI) argv[%d] is '%s'\n", i, argv[i]);
printf("Targv = argv translated to Unicode with current codepage\n" );
printf("Rargv = argv translated to Unicode with Russian codepage\n" );
printf("Wargv = argv read as Unicode\n");
printf("\n");
wchar_t Targv[200]; Char(
int k = MultiByteToWide
CP_ACP, // (use current codepage),
0, // DWORD dwFlags,
argv[i], // LPCSTR lpMultiByteStr,
-1, // (is null terminated)
Targv, // LPWSTR lpWideCharStr,
200 // int cchWideChar
);
wchar_t Rargv[200]; Char(
k = MultiByteToWide
1251, // (use Russian codepage),
0, // DWORD dwFlags,
argv[i], // LPCSTR lpMultiByteStr,
-1, // (is null terminated)
Rargv, // LPWSTR lpWideCharStr,
200 // int cchWideChar
);
printf(" fopen( argv ) ");
FILE * f = fopen( argv[i], "rb");
if (f) {
printf("OK\n");
fclose(f);
} else {
printf("FAIL\n");
}
printf(" ifstream( argv ) "); "FAIL\n" );
std::ifstream fin0( argv[i], std::ios::binary );
if ( fin0.good() ) {
printf("OK\n");
} else {
printf(
}
printf(" ifstream( Targv ) "); "FAIL\n" );
std::ifstream fin1( Targv, std::ios::binary);
if ( fin1.good() ) {
printf("OK\n");
} else {
printf(
}
printf(" ifstream( Rargv ) "); "FAIL\n" );
std::ifstream fin3( Targv, std::ios::binary);
if ( fin3.good() ) {
printf("OK\n");
} else {
printf(
}
printf(" ifstream( Wargv ) "); "FAIL\n" );
std::ifstream fin2( Wargv[i], std::ios::binary );
if ( fin2.good() ) {
printf("OK\n");
} else {
printf(
}
}
}