firefox ftp client don't understand links in cyrillic letters

Bug #222635 reported by sky
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Fix Released
Medium
firefox (Ubuntu)
Fix Released
Wishlist
Unassigned

Bug Description

Binary package hint: firefox-3.0

Ubuntu 8.04
 3.0~b5+nobinonly-0ubuntu2

When ftp session is open and I want to follow a link that contains cyrillic letters in, Firefox getting an error

ProblemType: Bug
Architecture: i386
Date: Sat Apr 26 19:59:05 2008
DistroRelease: Ubuntu 8.04
NonfreeKernelModules: nvidia
Package: firefox-3.0 3.0~b5+nobinonly-0ubuntu2
PackageArchitecture: i386
ProcEnviron:
 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
 LANG=ru_RU.UTF-8
 SHELL=/bin/bash
SourcePackage: firefox-3.0
Uname: Linux 2.6.24-16-generic i686

Tags: apport-bug
Revision history for this message
In , Warrensomebody (warrensomebody) wrote :

-> Jud

Revision history for this message
In , Jud-me (jud-me) wrote :

-> dougt

Revision history for this message
In , Doug-turner (doug-turner) wrote :

*** Bug 64700 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Doug-turner (doug-turner) wrote :

unsetting milestone

Revision history for this message
In , Doug-turner (doug-turner) wrote :

*** Bug 20292 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Benc-meer (benc-meer) wrote :

mass move, v2.
qa to me.

Revision history for this message
In , Benc-frame (benc-frame) wrote :

->ftp

Revision history for this message
In , Doug-turner (doug-turner) wrote :

reassigning to <email address hidden>.

Revision history for this message
In , Bradley Baetz (bbaetz) wrote :

Are you aware of any servers which implement this?

Revision history for this message
In , Benc-meer (benc-meer) wrote :

will test against:
ftp://kaze/pub/

Revision history for this message
In , Momoi (momoi) wrote :

What are we exactly testing? Maybe I can help better. kaze is my server
and I set up the original test case for some other bug. I am not
sure if it is a proper test case for this bug.

Revision history for this message
In , Bradley Baetz (bbaetz) wrote :

This definately won't work. (jbetak tested it as part of my directory viewer
changes, and it was broken to start with). This really can't be done until file
urls display with the proper character set, though. I cna't fix this unless I
get access to a server which implements this extention. Can someone please
telnet into kaze on the ftp port (21), and give me the results of typing, in order:

FEAT
USER anonymous
PASS <email address hidden>
FEAT
SYST

This will at least let me confirm what server it is, and whether it is, in fact,
following this RFC.

Revision history for this message
In , Momoi (momoi) wrote :

kaze is running AIX 4.2.1 and its FTP cannot be compliant
with RFC 2640.
But here is the output you requested:

500 'FEAT': command not understood.
SYST
215 UNIX Type: L8 Version: BSD-44

Revision history for this message
In , Bradley Baetz (bbaetz) wrote :

Does kaze use UTF8? Can you create a new directory with a file called fooé
(thats f-o-o-(e-actute)), and attach a packet trace of us trying to download
this file?

I tried just assuming all ftp data was UTF8 by default, but wuftpd uses
iso-8859-1, and so that broke. Can you try using the html ftp view, and changing
the charset using the view menu? does that help? Can you load files with non
ascii names - ie is this just a display problem? Do you know of clients which
support this server? If so, can you get a packet trace of one of them connecting
and downloading a file with non-iso-8859-1 chars, and another one of us doing
the same? On today's trunk build theres a bug which I would like you to exploit
- can you load the html directory listing, and then use file->send-page to mail
me the raw output of what we think we get? I won't fix that bug til the weekend..

benc: Can I get you to help momoi to do this? I assume that momoi doesn't have a
packet trace tool installed.

Revision history for this message
In , Benc-meer (benc-meer) wrote :

momoi: lets find some time to look at this problem....

Revision history for this message
In , Momoi (momoi) wrote :

Yes, benc, let's do that next week. It should not be that
hard to come up with what we need.

Revision history for this message
In , R-contact2009-awhlink-com (r-contact2009-awhlink-com) wrote :

Adding intl keyword.

Revision history for this message
In , Bradley Baetz (bbaetz) wrote :

*** Bug 144717 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Richard W.M. Jones (rich-annexia) wrote :

I'm the author of bug 144717 which is a dup of this one. I have put up a test
FTP site where you can all test this out.

Machine: annexia.org
Username: mozilla
Password: mozilla

Or use the following URL: ftp://mozilla:<email address hidden>/

(Sorry my machine is quite slow).

There are three files in this directory. All file names are in UTF-8 format. The
first is Greek, the second is Chinese and the third is Japanese. Go to the
UTF-8 sampler page (http://www.columbia.edu/kermit/utf8.html) to see what
the original strings *should* look like.

NCFTP gets this right. Mozilla and IE get it wrong. NCFTP correctly sends the
FEAT command which this FTP server understands. The response from FEAT is:

211-Extensions supported:
 HOST
 LANG EN*
 MDTM
 MLST TYPE*;SIZE*;MODIFY*;PERM*;UNIX.MODE*;
 REST STREAM
 SIZE
 TVFS
 UTF8
211 END

(Note the "UTF8").

See also:

http://www.zvon.org/tmRFC/RFC2640/Output/chapter3.html

Revision history for this message
In , Bradley Baetz (bbaetz) wrote :

What server is this running? SYST just says: "UNIX Type: L8", which doesn't
help. I'm curious, mainly because none of the 'big' ftp servers, (ie wuftpd,
iis, proftpd, and a few others I tested) support this.

Its probably not too difficult to do, I guess, once the patch bug 118179 lands.
We don't send FEAT at all, so we'd have to send that, and then parse the
response, though.

Revision history for this message
In , Richard W.M. Jones (rich-annexia) wrote :

This particular server is Net::FTPServer/1.108
(http://www.cpan.org/modules/by-authors/id/R/RW/RWMJ/Net-FTPServer-1.108.readme).
NcFTPD also supports FEAT and UTF-8.

Our website and FTP service uses UTF-8 extensively because we need to support
a world-wide market with a mix of languages on the same pages.

Yes, you would need to issue the FEAT command and parse the output to see
if it contains the string "UTF8". If so, display the page in UTF-8 encoding.
If not, you're on your own (it's pretty much undefined: probably best to just
assume ISO-8859-1 as now).

Revision history for this message
In , Momoi (momoi) wrote :

> If not, you're on your own (it's pretty much undefined:
> probably best to just assume ISO-8859-1 as now).

I don't think we can make such an assumption for this
type of situation. It would have to be the same encoding as
the system default or the browser window endcoding to be
safe.

If we pay attention to both LANG and encoding (=UTF-8),
I think we need to use the proper font for the LANG
value to display the result. Hope this is automatically
done the the layout and rendering components of Mozilla.

Revision history for this message
In , Richard W.M. Jones (rich-annexia) wrote :

If FEAT is not supported by the server, or does not return the UTF8 feature,
then you'd need to guess the encoding on the server. Ouch :-) But assuming the
browser's default encoding might be better than just guessing ISO-8859-1.

I'm only really interested in the case where FEAT *is* supported and it
returns the UTF8 feature.

Note that RFC 2640 *requires* that clients now support and send the FEAT command
and recognise UTF8 encoding, so Mozilla is in non-compliance (from my reading
of the RFC anyway).

Revision history for this message
In , Momoi (momoi) wrote :

My point is that UTF-8 encoding itself does not give
sufficient info on what font to use for Unifiied Ideograph
range for CJK. LANG info is important for proper display of
characters in such a case.

Revision history for this message
In , Tetsuroy (tetsuroy) wrote :

>If FEAT is not supported by the server, or does not return the UTF8 feature,
>then you'd need to guess the encoding on the server. Ouch :-) But assuming the
>browser's default encoding might be better than just guessing ISO-8859-1.
Doesn't this bug get fixed by 118179?
I have a patch to use the Default char coding in the Pref.

Revision history for this message
In , Bradley Baetz (bbaetz) wrote :

I have no time to work on mozilla at the moment, so dougt is taking over FTP

open ftp bugs -> him

Revision history for this message
In , Christian Biesinger (cbiesinger) wrote :

*** Bug 335099 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Doug-turner (doug-turner) wrote :

mass reassigning to nobody.

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 311568
a patch for RFC2640 support. test w/ pure-ftped

Revision history for this message
sky (wsky) wrote :

Binary package hint: firefox-3.0

Ubuntu 8.04
 3.0~b5+nobinonly-0ubuntu2

When ftp session is open and I want to follow a link that contains cyrillic letters in, Firefox getting an error

ProblemType: Bug
Architecture: i386
Date: Sat Apr 26 19:59:05 2008
DistroRelease: Ubuntu 8.04
NonfreeKernelModules: nvidia
Package: firefox-3.0 3.0~b5+nobinonly-0ubuntu2
PackageArchitecture: i386
ProcEnviron:
 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
 LANG=ru_RU.UTF-8
 SHELL=/bin/bash
SourcePackage: firefox-3.0
Uname: Linux 2.6.24-16-generic i686

Revision history for this message
sky (wsky) wrote :
Revision history for this message
In , Jan Darmochwal (jdarmochwal) wrote :

*** Bug 325318 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Egor-pelevin (egor-pelevin) wrote :

(In reply to comment #29)
> Created an attachment (id=311568) [details]
> a patch for RFC2640 support. test w/ pure-ftped
>

Why don't you ask for review?

People on Mozilla Russia's forum keep asking about proper Cyrillic folder names support. At the moment they are stuck with the need to change 'network.standard-url.encode-utf8' to 'false' (see https://bugzilla.mozilla.org/show_bug.cgi?id=297395#c13).

UTF-8 in FTP would be a good addition to Fx 3.1

Revision history for this message
In , M-kato (m-kato) wrote :

(In reply to comment #31)
> (In reply to comment #29)
> > Created an attachment (id=311568) [details] [details]
> > a patch for RFC2640 support. test w/ pure-ftped
> >
>
> Why don't you ask for review?

Because mozilla-central tree (for 3.x and 4.0) isn't opened yet. After opened, I will send r/sr.

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 323826
a patch for RFC2640 support

RFC2640 (I18N FTP support). I tested using pure-ftpd (it is supported FEAT/OPTS)

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

I don't think I'm qualified to review this patch. Dougt, would you be willing to take a look?

Revision history for this message
In , Doug-turner (doug-turner) wrote :

Comment on attachment 323826
a patch for RFC2640 support

cool! m_kato, thanks for looking into this (a 5 digit bug, no less!)

got any test cases -- or a public server that I can test this against?

Does the ftp feature UTF8 imply that the paths that we send to the ftp server? (/me needs to refresh himself with rfc 2640!)

Revision history for this message
In , M-kato (m-kato) wrote :

Although pure-ftpd (http://www.pureftpd.org/project/pure-ftpd) supports RFC2640, I don't know where is public test server. I will be looking for it.

- Spec
FTP serer must support FEAT command, client sends FEAT command, server must send UTF8 as result if supporting RFC2640. Next, client sends OPTS UTF8 ON commands to enable UTF8.

It supports Windows shell (aka FTP folder). And Filezilla supports it too.

Revision history for this message
In , Tim Kosse (tim-kosse) wrote :

Not quite. RFC 2640 does not specify OPTS UTF8 ON. As long as UTF8 is in the FEAT response, everything already is using UTF-8.

OPTS UTF-8 ON is from a long-expired IETF draft (http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00).

An RFC2640 compliant server uses UTF-8 regardless of whether OPTS UTF-8 ON gets sent or not, and a compliant client assumes the server uses UTF-8 even if a OPTS UTF-8 ON command fails.

Note that using any encoding other than ASCII (RFC 959) or UTF-8 (RFC 2640) without prior explicit negotiation is not covered by the specs.

I've summarized this in my wiki: http://wiki.filezilla-project.org/Character_Set

Revision history for this message
In , Tim Kosse (tim-kosse) wrote :

A quick glance at the proposed patch reveals that it is not RFC2640 compliant as it depends on the unspecified OPTS UTF-8 ON command.

Revision history for this message
In , Doug-turner (doug-turner) wrote :

Makoto, can you fix this before I review?

Revision history for this message
In , M-kato (m-kato) wrote :

I know that "OPTS UTF8 ON" is Microsoft's FTP folder spec. Since Microsoft implements foolish spec (not RFC2640), many servers do it to support FTP folder of Explorer.

Doug,
OK. After I change a code, then I will send review to you.

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 326913
a patch v2 for RFC2640 support

Revision history for this message
In , Doug-turner (doug-turner) wrote :

Comment on attachment 326913
a patch v2 for RFC2640 support

instead of ftpcharset, lets change it to useUTF8. Lets not imply that there are multiple charsets available to ftp.

I am concerned that just setting the control channel's content charset isn't enough. There is code in ftp that deals with filepaths, conversion from filepaths to VMS paths, and other suspect code. Did you review this to see if anything needs to change?

I think I would be happier if there were public servers we could test against. Do any of these servers work:

http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/ftp/test/frametest/menu.html?force=1

Revision history for this message
In , Richard W.M. Jones (rich-annexia) wrote :

Here's a public server to test against:

by HTTP: http://www.annexia.org/tmp/mozilla-test/
by FTP: ftp://mozilla:<email address hidden>/

Or:
  Hostname: annexia.org
  Username: mozilla
  Password: mozilla

You should see three files in the directory. The two interesting
ones are the letter "ka" in two different languages:

http://en.wikipedia.org/wiki/Ka_%28kana%29
http://en.wikipedia.org/wiki/%E0%A4%95

Mozilla currently renders this wrong. NCFTP gets it right:

$ ncftp -u mozilla -p mozilla annexia.org
NcFTP 3.2.1 (Jul 29, 2007) by Mike Gleason (http://www.NcFTP.com/contact/).
Connecting to 80.68.91.176...
furbychan.cocan.org FTP server (Net::FTPServer/1.120-1) ready.
Logging in...
Welcome mozilla.
Logged in to annexia.org.
ncftp / > ls
か .htaccess क

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 326924
a patch v3

previous patch is a lack of streamconv code

Revision history for this message
In , M-kato (m-kato) wrote :

(In reply to comment #42)
> I am concerned that just setting the control channel's content charset isn't
> enough. There is code in ftp that deals with filepaths, conversion from
> filepaths to VMS paths, and other suspect code. Did you review this to see if
> anything needs to change?

Current code uses ISO-8859-1 for all FTP channel. So even if current is VMS style, it is no problem to convert from ISO-8859-1 to UTF-8. (7bit character map is same. So "[", "]", "/" and "." is same mapping. And, if problem occurs with FEAT - UTF8 enable, it is broken characters as UTF8.)

And, I don't think that VMS style server supports new feature such as RFC2640.

Also, I will create new patch for changing metadata.

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 327207
a patch v4 for RFC2640 support

change metadata name of utf8 support into cache

Revision history for this message
In , Honzab-moz (honzab-moz) wrote :
Download full text (7.8 KiB)

I applied the patch 327207 and was testing my patch for bug 296528 if it helps handling files with diacritics in name when dropped to firefox from windows explorer. I encountered assertion failure in:

  msvcr80d.dll!strtoxl(localeinfo_struct * plocinfo=0x00000000, const char * nptr=0x00000000, const char * * endptr=0x00000000, int ibase=10, int flags=0) Line 94 + 0x2f bytes C++
  msvcr80d.dll!strtol(const char * nptr=0x00000000, char * * endptr=0x00000000, int ibase=10) Line 236 + 0x15 bytes C++
  msvcr80d.dll!atol(const char * nptr=0x00000000) Line 56 + 0xd bytes C
  msvcr80d.dll!atoi(const char * nptr=0x00000000) Line 100 + 0x9 bytes C
> necko.dll!nsFtpState::ReadCacheEntry() Line 2145 + 0xf bytes C++
  necko.dll!nsFtpState::OnCallbackPending() Line 2119 + 0x26 bytes C++
  necko.dll!nsBaseContentStream::AsyncWait(nsIInputStreamCallback * callback=0x05bb28f4, unsigned int flags=0, unsigned int requestedCount=0, nsIEventTarget * target=0x00bc1ec8) Line 171 C++
  necko.dll!nsInputStreamPump::EnsureWaiting() Line 152 + 0x48 bytes C++
  necko.dll!nsInputStreamPump::AsyncRead(nsIStreamListener * listener=0x036b0064, nsISupports * ctxt=0x00000000) Line 362 + 0x8 bytes C++
  necko.dll!nsBaseChannel::BeginPumpingData() Line 232 + 0x3a bytes C++
  necko.dll!nsBaseChannel::AsyncOpen(nsIStreamListener * listener=0x05bb26b8, nsISupports * ctxt=0x00000000) Line 488 + 0xb bytes C++
  docshell.dll!nsURILoader::OpenURI(nsIChannel * channel=0x036b0058, int aIsContentPreferred=0, nsIInterfaceRequestor * aWindowContext=0x045e3920) Line 840 + 0x19 bytes C++
  docshell.dll!nsDocShell::DoChannelLoad(nsIChannel * aChannel=0x036b0058, nsIURILoader * aURILoader=0x03608380, int aBypassClassifier=0) Line 7638 + 0x41 bytes C++
  docshell.dll!nsDocShell::DoURILoad(nsIURI * aURI=0x05bb15a0, nsIURI * aReferrerURI=0x00000000, int aSendReferrer=1, nsISupports * aOwner=0x00000000, const char * aTypeHint=0x00000000, nsIInputStream * aPostData=0x00000000, nsIInputStream * aHeadersData=0x00000000, int aFirstParty=1, nsIDocShell * * aDocShell=0x00000000, nsIRequest * * aRequest=0x0012d5e8, int aIsNewWindowTarget=0, int aBypassClassifier=0) Line 7484 + 0x29 bytes C++
  docshell.dll!nsDocShell::InternalLoad(nsIURI * aURI=0x05bb15a0, nsIURI * aReferrer=0x00000000, nsISupports * aOwner=0x00000000, unsigned int aFlags=1, const wchar_t * aWindowTarget=0x04a63f88, const char * aTypeHint=0x00000000, nsIInputStream * aPostData=0x00000000, nsIInputStream * aHeadersData=0x00000000, unsigned int aLoadType=1, nsISHEntry * aSHEntry=0x00000000, int aFirstParty=1, nsIDocShell * * aDocShell=0x00000000, nsIRequest * * aRequest=0x00000000) Line 7209 + 0x7f bytes C++
  docshell.dll!nsDocShell::LoadURI(nsIURI * aURI=0x05bb15a0, nsIDocShellLoadInfo * aLoadInfo=0x05bb1780, unsigned int aLoadFlags=0, int aFirstParty=1) Line 904 + 0x56 bytes C++
  docshell.dll!nsDocShell::LoadURI(const wchar_t * aURI=0x05bab670, unsigned int aLoadFlags=0, nsIURI * aReferringURI=0x00000000, nsIInputStream * aPostStream=0x00000000, nsIInputStream * aHeaderStream=0x00000000) Line 2934 + 0x2a bytes C++
  xpcom_core.dll!NS_InvokeByIndex_P(nsISupports * that=0x0012db98, unsigned int methodIndex=21151274, uns...

Read more...

Revision history for this message
In , Honzab-moz (honzab-moz) wrote :

This happens when I want to access a file with diacritics with name a second time. (The first time I get "file could not be found" and the name of that file seems to be pure UTF-8 or native encoding of its name, like there would be instead of AUTF8String used ACString in some interface method declaration or decoding would be applied two times)

Revision history for this message
In , M-kato (m-kato) wrote :

(In reply to comment #48)
> This happens when I want to access a file with diacritics with name a second
> time. (The first time I get "file could not be found" and the name of that file
> seems to be pure UTF-8 or native encoding of its name, like there would be
> instead of AUTF8String used ACString in some interface method declaration or
> decoding would be applied two times)
>

I will add simply null check. Then, I will submit new patch this week.

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 331315
patch v5

Revision history for this message
In , Honzab-moz (honzab-moz) wrote :

IMHO better and cleaner would be to test result of mCacheEntry->GetMetaDataElement. And also could be interesting to figure out why this happens and why exactly on the second try...

Further, maybe I misunderstood what this bug had to fix but I was testing international file names while fixing bug 296528 with patch (v4) of this bug applied. It didn't help to open files with diacritics in names when written directly to the address bar nor dropped from the windows explorer (attachment 330841 applied) and nor when click on the file name in FF FTP directory browser. I was testing on winxpsp2-en and with files with czech diacritics in name.

And one more nit, when building on windows I got: d:/mozilla/HG/src/netwerk/streamconv/converters/nsIndexedToHTML.cpp(169) : error C2373: 'nsIndexedToHTML::OnHeaderAvailable' : redefinition; different type modifiers

nsIndexedToHTML::OnHeaderAvailable must be defined with 'nsresult' return type and not with NS_IMETHODIMP. Sorry, I forgot to mention this sooner...

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 334663
patch v6

Revision history for this message
In , Doug-turner (doug-turner) wrote :

michal, could you take a first look at this patch?

Revision history for this message
In , Michal-novotny (michal-novotny) wrote :

Comment on attachment 334663
patch v6

This patch reintroduces partially bug 427089. For example try to follow link "Ñûñîâ" on page ftp://ftp.asu.ru/incoming/. It works without this patch.

I think that we should implement LANG command too. At the end of 4.1 section in RFC 2640 is:
 "If user-PI issues a HOST command, and the server's default language is acceptable, it need not issue a LANG command. However, if the implementation does not support the HOST command, a LANG command MUST be issued. Until server-PI is presented with either a HOST or LANG command it SHOULD assume that the user-PI does not comply with this specification."

Revision history for this message
In , M-kato (m-kato) wrote :

Michal,

"ftp.asu.ru" doesn't seem to support LANG command. So your idea isn't good.

This Russian server doesn't support FEAT and UTF-8, so URI by necko uses original charset (it will become old behavior. If URI is non-UTF8 and platform is non-UTF8, it uses platform charset with standard-uri setting).

There is same issue with Japanese IIS FTP Server. Until IIS 5.0 FTP server, although this can use Shift-JIS charset as directory and filename, it doesn't support FEAT and UTF-8.

Also, this patch cannot apply current tree because of bug 427089. I will make
new patch with considering bug 427089.

Revision history for this message
In , Michal-novotny (michal-novotny) wrote :

(In reply to comment #55)
> "ftp.asu.ru" doesn't seem to support LANG command. So your idea isn't good.

There are 2 independent comments in my reply #54. I didn't say that LANG command would fix the problem with ftp.asu.ru. I'm just saying that this patch breaks what was fixed in bug 427089.

And it is good idea to implement LANG command just because RFC2640 demands it.

Revision history for this message
In , Michal-novotny (michal-novotny) wrote :

Following check doesn't detect UTF8 in reply from proftpd:

+ if (mResponseMsg.Find(NS_LITERAL_CSTRING("UTF8" CRLF), PR_TRUE) > -1) {

The most correct check would be (CRLF " UTF8" CRLF), but proftpd is buggy and sends LF only instead of CRLF. So we should use some more relaxed check like:

(mResponseMsg.Find(NS_LITERAL_CSTRING(CRLF " UTF8" CRLF), PR_TRUE) > -1 ||
 mResponseMsg.Find(NS_LITERAL_CSTRING(LFSTR " UTF8" LFSTR), PR_TRUE) > -1)

The reason why there are problems with server ftp.asu.ru after applying your patch is that mPath is converted to originCharset when response to FEAT is received. But this happens only for the first request on the connection. Other requests that reuses the connection don't send FEAT command...

And I've found another problem. When server says that it supports UTF-8 but some filenames aren't valid UTF-8 strings, then conversion fails at http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsDirIndexParser.cpp#297 and such files aren't in the listing at all.

Revision history for this message
Alexander Sack (asac) wrote :

is this still a problem with the latest ffox 3? If so, this is best dealt upstream as its not a ubuntu specific issue. first, verify that this is not caused by some extensions (disable all in tools -> addons and restart). Then please search in http://bugzilla.mozilla.org to ensure that your bug isn't filed; then open a bug in http://bugzilla.mozilla.org and post your bug id (or the one you found matching your issue) here. Thanks!

Changed in firefox-3.0:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Alexander Sack (asac) wrote :

this is https://bugzilla.mozilla.org/show_bug.cgi?id=26767 which deals with internationalization of ftp. Track the issue there.

Changed in firefox-3.0:
status: Incomplete → Triaged
importance: Medium → Wishlist
Changed in firefox:
importance: Undecided → Unknown
status: New → Unknown
Changed in firefox:
status: Unknown → In Progress
Revision history for this message
In , M-kato (m-kato) wrote :

Michal,

> And I've found another problem. When server says that it supports UTF-8 but
> some filenames aren't valid UTF-8 strings, then conversion fails at
> http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsDirIndexParser.cpp#297
> and such files aren't in the listing at all.

Why? If server return UTF8 as FEAT result, server should return "UTF8 filename".
If server doesn't return it, I should not consider error. Why does client consider it?

If server doesn't return UTF8 as FEAT, I can make sense it.

Revision history for this message
In , Michal-novotny (michal-novotny) wrote :

Server that supports FEAT UTF8 can return non-UTF8 filename and according to "RFC2640 Annex A.1 paragraph 4" I think that it happens quite often:
   - A server that copies pathnames transparently from a local
     filesystem may continue to do so. It is then up to the local file
     creators to use UTF-8 pathnames.

And according to "RFC2640 3.1 last paragraph" client SHOULD handle these cases:
   - There may be cases when the code set / encoding presented to the
     server or client cannot be determined. In such cases the raw bytes
     SHOULD be used.

Revision history for this message
In , Doug-turner (doug-turner) wrote :

Comment on attachment 334663
patch v6

makoto, where we on this? Can you resolve michal's issues with your patch? Specifically, what are we going to do about invalid utf-8 strings being return by the server?

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 391260
patch v7

work in progress...

Revision history for this message
In , Juan Simón (simonbcn) wrote :

*** Bug 536861 has been marked as a duplicate of this bug. ***

Changed in firefox:
importance: Unknown → Wishlist
Revision history for this message
In , Jo-hermans (jo-hermans) wrote :

*** Bug 686440 has been marked as a duplicate of this bug. ***

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 568615
Part 1. client detects UTF-8 if FTP server returns UTF8 on FEAT

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 568616
Part 2. 301: <encoding> works on nsIndexedToHTML correctly

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 568617
Part 3. FTP directory list sets the encoding when current FTP server supports UTF-8

Revision history for this message
In , Honzab-moz (honzab-moz) wrote :

Comment on attachment 568615
Part 1. client detects UTF-8 if FTP server returns UTF8 on FEAT

Redirecting to Michal, he knows this area much better then me.

Revision history for this message
In , Michal-novotny (michal-novotny) wrote :

Makoto, I've tested your patch and before going any further with the review I'd like to get back to comment #54. With this patch applied I can't browse directories in ftp://ftp.asu.ru/incoming/

The situation is now different in that ftp.asu.ru already supports FEAT command and returns UTF8. Those problematic files are in cp1251 encoding which is IMO allowed (see comment #59) and probably quite common.

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 570960
Part 3. FTP directory list sets the encoding when current FTP server supports UTF-8 v2

Revision history for this message
In , M-kato (m-kato) wrote :

Also, I will add part 4 patch to work CWD or MDTM by non-UTF8 path on UTF8 server.

Revision history for this message
In , M-kato (m-kato) wrote :

Created attachment 571265
Part 2. "301: <encoding>" works on nsIndexedToHTML correctly v2

work on non UTF-8 path on UTF-8 FTP server.

Revision history for this message
In , M-kato (m-kato) wrote :

(In reply to Michal Novotny (:michal) from comment #68)
> Makoto, I've tested your patch and before going any further with the review
> I'd like to get back to comment #54. With this patch applied I can't browse
> directories in ftp://ftp.asu.ru/incoming/
>
> The situation is now different in that ftp.asu.ru already supports FEAT
> command and returns UTF8. Those problematic files are in cp1251 encoding
> which is IMO allowed (see comment #59) and probably quite common.

New patch series will work on this ftp server.

Revision history for this message
In , Michal-novotny (michal-novotny) wrote :

Comment on attachment 571265
Part 2. "301: <encoding>" works on nsIndexedToHTML correctly v2

> +nsresult
> +nsIndexedToHTML::OnHeaderAvailable(nsIRequest* request, nsISupports *aContext,
> + nsString& aBuffer)

I think the name of the method is confusing. A name like WriteHeader would be IMO more appropriate. You could also move the check of mWroteHeader inside the method.

> + bool mWroteHeader;

This member isn't initialized in the constructor.

> + bool failbackCharset = encoding.Equals("ISO-8859-1");

s/failback/fallback/ ?

> if (mExpectAbsLoc &&
> - NS_SUCCEEDED(net_ExtractURLScheme(utf8UnEscapeSpec, nsnull, nsnull, nsnull))) {
> + NS_SUCCEEDED(net_ExtractURLScheme(convertedUnEscapeSpec, nsnull, nsnull, nsnull))) {
> // escape as absolute.
> - escFlags = esc_Forced | esc_OnlyASCII | esc_AlwaysCopy | esc_Minimal;
> + escFlags |= esc_Forced | esc_AlwaysCopy | esc_Minimal;
> }

As far as I can see, this can be removed completely. mExpectAbsLoc was set to true only in case of gopher and support for this protocol was removed.

Revision history for this message
In , Michal-novotny (michal-novotny) wrote :

Comment on attachment 570960
Part 3. FTP directory list sets the encoding when current FTP server supports UTF-8 v2

> + channel->GetContentCharset(mCharset);
> + if (mCharset.EqualsLiteral("UTF-8")) {
> + // For RFC 2640 support, charset is into HTML, not channel.
> + // So we need clear charset on channel.
> + channel->SetContentCharset(NS_LITERAL_CSTRING(""));
> + }

Could you please make the comment more verbose? Why it is needed to clear the charset and why only in case of UTF8?

> + if (mCharset.EqualsLiteral("UTF-8") && !IsUTF8(mPendingBuffer)) {
> + // RFC 2640 section 3.3 says the following.
> + //
> + // If a client detects that a server is non UTF-8,
> + // it SHOULD change its display appropriately.
> + //
> + // So we need failback encoding.
> + mCharset.AssignLiteral("ISO-8859-1");
> + }

There is still a problem when the directory is empty and its name isn't a valid UTF8 string. In this case call to mTextToSubURI->UnEscapeAndConvert() in nsIndexedToHTML::OnHeaderAvailable() fails.

Revision history for this message
In , Matti-mversen (matti-mversen) wrote :

*** Bug 716189 has been marked as a duplicate of this bug. ***

affects: firefox-3.0 (Ubuntu) → firefox (Ubuntu)
Revision history for this message
In , M-kato (m-kato) wrote :

*** Bug 942495 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

Created attachment 8398921
Part1. FTP client detects UTF-8 if server returns UTF8 on FEAT

- Rebased to tip.
- Although RFC 2640 does not define "OPTS UTF8 ON", Some ftp servers violates the RFC. ftp.asu.ru is one of such servers. I restored the OPTS support from the older patch. Sending "OPTS UTF8 ON" to compliant servers should be harmless.

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

Created attachment 8398922
Part 2: Read charset from the channel

- Bug 948901 made the ftp directry view byte-transparent, so this former part 2 & 3 could be greatly simplified.

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

Created attachment 8398925
Part 2: Read charset from the channel

Escape (percent-encode) URLs in case some totally broken servers return non-UTF-8 URLs.

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

Created attachment 8398993
Part1. FTP client detects UTF-8 if server returns UTF8 on FEAT

Fixed a build failure on gcc.

https://tbpl.mozilla.org/?tree=Try&rev=7a83b378b920

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

Comment on attachment 8398925
Part 2: Read charset from the channel

This part is no longer needed with the follow-up patch of bug 989576.

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

michal, ping?

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

Comment on attachment 8398993
Part1. FTP client detects UTF-8 if server returns UTF8 on FEAT

No response from Michal.
Honza, could you take a look at?

Revision history for this message
In , Michal-novotny (michal-novotny) wrote :

Comment on attachment 8398993
Part1. FTP client detects UTF-8 if server returns UTF8 on FEAT

Review of attachment 8398993:
-----------------------------------------------------------------

Sorry for the delay, r+ with fixed indentation

::: netwerk/protocol/ftp/nsFtpConnectionThread.cpp
@@ +657,5 @@
> + case FTP_S_FEAT:
> + rv = S_feat();
> +
> + if (NS_FAILED(rv))
> + mInternalError = rv;

wrong indentation

@@ +675,5 @@
> + case FTP_S_OPTS:
> + rv = S_opts();
> +
> + if (NS_FAILED(rv))
> + mInternalError = rv;

wrong indentation

Revision history for this message
In , Vyv03354 (vyv03354) wrote :
Revision history for this message
In , Neil-httl (neil-httl) wrote :

(In reply to Masatoshi Kimura from comment #82)
> Comment on attachment 8398925
> Part 2: Read charset from the channel
>
> This part is no longer needed with the follow-up patch of bug 989576.

Although, would it be nice to emit a <meta charset="UTF-8"> tag if we know that the charset is UTF-8 so that we don't have to detect it later on (and report that annoying warning to the console)?

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

The charset information will be taken from the channel which is more preferable than the meta element.
Neither auto-detect nor console warnings are involved.

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

I tested with the tinderbox build and verified that no console warnings are displayed.
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1398183266/

Revision history for this message
In , KWierso (kwierso) wrote :
Changed in firefox:
status: In Progress → Fix Released
Revision history for this message
Paul White (paulw2u) wrote :

Upstream bugs showing "RESOLVED FIXED" on 2014-04-22 and 2014-03-29
Closing by marking "Fix Released"

Changed in firefox (Ubuntu):
status: Triaged → Fix Released
Changed in firefox:
importance: Wishlist → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.