Incorrect result from localeconv()->int_*_sep_by_space in at least en_US.UTF-8 locale

Bug #936773 reported by Jeffrey Yasskin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
langpack-locales (Ubuntu)
New
Undecided
Unassigned

Bug Description

In the en_US.UTF-8 locale, int_p_sep_by_space and int_n_sep_by_space return 1, defined by C99§7.11.2.1p5 as "If the currency symbol and sign string are adjacent, a space separates them from the value; otherwise, a space separates the currency symbol from the value." However, int_curr_symbol returns "USD " (note the trailing space), meaning that C++'s money parsing functions can only parse strings like "USD 1.23" (note 2 spaces). From some failures in libc++'s test suite, I suspect that many other locales have the same problem, but I haven't tested them directly. Here's a test program and its output:

$ cat test.cc
#include <iostream>
#include <locale>
#include <string>
#include <locale.h>

struct my_money_get : std::money_get<char, const char*> {
};

std::ostream& operator<<(std::ostream& out, std::ios_base::iostate state) {
  if (state == std::ios_base::goodbit)
    return out << "goodbit";
  const char* sep = "";
  if (state & std::ios_base::badbit) {
    out << sep << "badbit";
    sep = "|";
  }
  if (state & std::ios_base::failbit) {
    out << sep << "failbit";
    sep = "|";
  }
  if (state & std::ios_base::eofbit) {
    out << sep << "eofbit";
    sep = "|";
  }
const char en_us[] = "en_US.UTF-8";

void test(const std::string& test_string) {
  std::ios ios(0);
  ios.imbue(std::locale(en_us));
  long double result = 1.2345;
  my_money_get mget;
  std::ios_base::iostate err = std::ios_base::goodbit;
  const char* iter = mget.get(test_string.data(),
                              test_string.data() + test_string.size(),
                              /*International=*/true,
                              ios, err, result);
  std::cout << "'" << test_string << "' reads as: '" << result << "'\n";
  std::cout << "And advances the iterator from " << (const void*)test_string.data()
            << " to " << (const void*)iter << ".\n";
  std::cout << "And leaves the stream in state " << err << ".\n";
}

int main() {
  test("1.23");
  test("USD 1.23");
  test("USD 1.23");

  setlocale(LC_ALL, en_us);
  const lconv* lc = localeconv();
  // In C99, the sep_by_space values mean:
  // The values of p_sep_by_space, n_sep_by_space, int_p_sep_by_space,
  // and int_n_sep_by_space are interpreted according to the following:
  // 0 No space separates the currency symbol and value.
  // 1 If the currency symbol and sign string are adjacent, a space separates them from the
  // value; otherwise, a space separates the currency symbol from the value.
  // 2 If the currency symbol and sign string are adjacent, a space separates them;
  // otherwise, a space separates the sign string from the value.
  std::cout << "lc->currency_symbol == '" << lc->currency_symbol << "'\n";
  std::cout << "lc->p_sep_by_space == " << int(lc->p_sep_by_space) << "\n";
  std::cout << "lc->n_sep_by_space == " << int(lc->n_sep_by_space) << "\n";
  std::cout << "lc->int_curr_symbol == '" << lc->int_curr_symbol << "'\n";
  std::cout << "lc->int_p_sep_by_space == " << int(lc->int_p_sep_by_space) << "\n";
  std::cout << "lc->int_n_sep_by_space == " << int(lc->int_n_sep_by_space) << "\n";
}
$ g++ -g3 -Wall test.cc -o test && ./test
'1.23' reads as: '0'
And advances the iterator from 0x2566028 to 0x2566028.
And leaves the stream in state failbit.
'USD 1.23' reads as: '0'
And advances the iterator from 0x2566028 to 0x256602c.
And leaves the stream in state failbit.
'USD 1.23' reads as: '123'
And advances the iterator from 0x2566028 to 0x2566031.
And leaves the stream in state eofbit.
lc->currency_symbol == '$'
lc->p_sep_by_space == 0
lc->n_sep_by_space == 0
lc->int_curr_symbol == 'USD '
lc->int_p_sep_by_space == 1
lc->int_n_sep_by_space == 1

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: locales 2.13+git20110622-2
ProcVersionSignature: Ubuntu 3.0.0-16.28-generic 3.0.17
Uname: Linux 3.0.0-16-generic x86_64
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
Date: Sun Feb 19 23:30:24 2012
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111012)
PackageArchitecture: all
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: langpack-locales
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Jeffrey Yasskin (jyasskin) wrote :
Revision history for this message
Jeffrey Yasskin (jyasskin) wrote :

Here's the actual test program. Pasting it into the bug report lost an important double space inside "USD 1.23".

Revision history for this message
Jeffrey Yasskin (jyasskin) wrote :

I now think that glibc is returning the right values for C11 (although possibly the wrong values for C99), and that gcc is interpreting it incorrectly: http://gcc.gnu.org/PR52486

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.