libplymouth2_0.8.2-2ubuntu6 and later give ragged splash and text rendering

Bug #685352 reported by Luke on 2010-12-05
172
This bug affects 34 people
Affects Status Importance Assigned to Milestone
Linaro GCC
Fix Released
High
Ulrich Weigand
Plymouth
Fix Released
Medium
gcc
Fix Released
Medium
gcc-4.5 (Ubuntu)
High
Canonical Foundations Team
Natty
High
Canonical Foundations Team
plymouth (Ubuntu)
Medium
Martin Pitt
Natty
Medium
Martin Pitt

Bug Description

Binary package hint: plymouth

With these 2ubuntu6 and 2ubuntu7, and ONLY these two versions so far, the boot splash theme looks ragged, with poorly-rendered graphics and unreadable text on messages and passphrase prompts. This problem appears with both Intel and ATI r-600 graphics, and appeared in the default ubuntu-logo theme, the ubuntustudio theme, and in my customized theme based on ubuntustudio. The problem is visible in an X11 rendering of the splash as well.

With ubuntu-logo, the defective rendering is visible on a close examination but this theme is not as sensitive to rendering problems as ubuntustudio and derivatives of it.. The ubuntustudio theme is severely affected, and text characters in any theme are so poorly rendered as to be unreadable.

The workaround for now is to drop in the library files that install directly into /lib from an older version of libplymouth, or roll everything back to the 2ubuntu5 version. Obviously, the problem is in the libplymouth files that do NOT install into the /lib/plymouth directory. This is as far as i have been able to trace the problem so far.

libplymouth2_0.8.2-2ubuntu7_i386.deb and libplymouth2_0.8.2-2ubuntu6_i386.deb

Related branches

Hernando Torque (htorque) wrote :

Confirming with Nvidia's BLOB + framebuffer and Nouveau. Screenshot attached.

Changed in plymouth (Ubuntu):
status: New → Confirmed
Quackers (quackers) wrote :

Me too. Even with 1920x1200 graphics configured for Plymouth it still looks a little "rough around the edges". Using nvidia-current.

dino99 (9d9) wrote :

same issue on natty i386 with genuine nvidia-current 260.19.21-0ubuntu1

while fsck is checking the drives, the 2 lines are pixelized black/white and mostly unreadable. I've seen this for the first time this week (> 12/1th) and wonder about plymouth-X11 which was installed on my system. Now i've removed it and will see the difference on next fsck run.

Plymouth-X11 contains the renderer to allow testing the splash screen from the desktop. To my knowledge nothing in it should be running during boot, so removing it should have no effect on plymouth actually running as a boot splash. Test this anyway, though-that's the only way to be really sure.
If removing plymouth-X11 changes anything, that would seem to me to be something running when it should not be and getting into the initramfs where it should not be. I run plymouth from the initramfs to prompt for encryption passphrases and see this same bad text at that point.

On my system, replacing the libraries installed directly into /lib and not into /lib/plymouth by libplymouth , using the older version mentioned before(2ubuntu5) fixes the problem.

> Date: Sun, 5 Dec 2010 11:10:30 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 685352] Re: plymouth2_0.8.2-2ubuntu6 and later give ragged splash and text rendering
>
> same issue on natty i386 with genuine nvidia-current 260.19.21-0ubuntu1
>
> while fsck is checking the drives, the 2 lines are pixelized black/white
> and mostly unreadable. I've seen this for the first time this week (>
> 12/1th) and wonder about plymouth-X11 which was installed on my system.
> Now i've removed it and will see the difference on next fsck run.
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/685352
>
> Title:
> plymouth2_0.8.2-2ubuntu6 and later give ragged splash and text rendering

@Luke
you're right about plymouth-X11, its now removed on my system and this issue still exist. By the way its not a show stopper.

Not a showstopper, you can still boot fine even with a passphrase, but could give us all a bad name it it gets to the release. Also, could be quite a headache for someone unlocking multiple encrypted disks with separate passphrases, as that would need readable text.
> Date: Tue, 7 Dec 2010 09:35:26 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 685352] Re: plymouth2_0.8.2-2ubuntu6 and later give ragged splash and text rendering
>
> @Luke
> you're right about plymouth-X11, its now removed on my system and this issue still exist. By the way its not a show stopper.
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/685352
>
> Title:
> plymouth2_0.8.2-2ubuntu6 and later give ragged splash and text rendering

Still present with 0.8.2-2ubuntu8 with intel i915.

Changed in plymouth (Ubuntu):
importance: Undecided → High
Jamie Strandboge (jdstrand) wrote :

Triaging to 'High' as it would be very bad if we released with this bug.

Benjamin Drung (bdrung) wrote :

It's a butterfly in your ice cream (even in your virtual machine).

cryptsetup text is unreadable

Martin Pitt (pitti) wrote :

Setting to triaged, as this is easily reproducible and not hardware dependent. Confirming "high" priority as it makes fsck and cryptsetup messages absolutely unreadable.

My first hunch was that this was related to natty's PNG optimization, but it's not. Just downgrading libplymouth2 (which doesn't have any graphics) to the maverick version makes it work, even with the optimized PNGs in plymouth-ubuntu-theme-logo; conversely, building plymouth without pkgbinarymangler doesn't fix it.

For the record, this can be used to test it with plymouth-x11, which is easier than rebooting or starting plymouth on a VT:

--------------------- 8< -----------------
[ `id -u` = 0 ] || {
    echo "Please run this as root" >&2
    exit 1
}

(plymouthd --no-daemon --debug &)
sleep 1;
plymouth --show-splash
sleep ${1:-3}
plymouth --quit
--------------------- 8< ------------------

Changed in plymouth (Ubuntu Natty):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
milestone: none → ubuntu-11.04-beta
status: Confirmed → Triaged
summary: - plymouth2_0.8.2-2ubuntu6 and later give ragged splash and text rendering
+ libplymouth2_0.8.2-2ubuntu6 and later give ragged splash and text
+ rendering
Martin Pitt (pitti) wrote :

Merely rebuilding maverick's ubuntu5.1 on natty causes the bug (even without pkgbinarymangler), so it looks like a toolchain issue.

Martin Pitt (pitti) wrote :

Confirmed workaround: Build with CC=gcc-4.4 (I didn't upload that, though).

I also tested with --no-as-needed, this doesn't help.

So as far as I can see this is either a regression in gcc-4.5, or a bug in libplymouth which didn't manifest with gcc < 4.5?

Martin-Éric Racine (q-funk) wrote :

Not directly related to this bug, but the same jagged graphics issue seems to affect GRUB2 since a few days, whenever holding the Shift key to make the GRUB menu appear.

Steve Langasek (vorlon) wrote :

This problem is reported with builds of plymouth 0.8.2-2ubuntu6 and later, which was built using gcc-4.5 4.5.1-10ubuntu3.

Changed in gcc-4.5 (Ubuntu Natty):
importance: Undecided → High
Matthias Klose (doko) wrote :

does the behaviour change with plymouth built with -O1, -O0? when built with gcc-snapshot? If yes, can the object file be identified which may cause the issue?

Martin Pitt (pitti) wrote :

Specifically, the problem is in libply-splash-core.so. Building that one with gcc-4.4 works, while using the other three shared libs from the current natty libplymouth2 package. In particular, I tested this by building locally (but not installing), and running

  sudo LD_LIBRARY_PATH=`pwd`/src/libply-splash-core/.libs ~/test-plymouth.sh

= Compiler juggling =

gcc-4.4, -O2: works

gcc-4.5, -O0: works
gcc-4.5, -O1: works
gcc-4.5, -O2: fails
gcc-4.5, -O3: fails
gcc-4.5, -02, purging hardening-wrapper: fails
/usr/lib/gcc-snapshot/bin/gcc, -O2: fails
/usr/lib/gcc-snapshot/bin/gcc, -O1: works

Martin Pitt (pitti) wrote :

If I just build ply-pixel-buffer.c with -O1, and everything else with -O2, it works. Cross-checking, building everything with -O1 and just build ply-pixel-buffer.c with -O2, it fails, so the problem happens with this file.

Martin Pitt (pitti) wrote :

Keeping notes here, mostly for myself, but also in case someone else is looking at this.

I bisected it down to ply_pixel_buffer_fill_with_argb32_data_at_opacity_with_clip(), using gcc 4.4's shiny new #pragma GCC optimize ("1") feature. That doesn't work within a function, so I now had to resort to adding some debug statements and diff'ing logs.

It seems fine up until this point:

  x += cropped_area.x - fill_area->x;
  y += cropped_area.y - fill_area->y;
  opacity_as_byte = (uint8_t) (opacity * 255.0);

The call to make_pixel_value_translucent() is not to blame either.

Problem happens in ply_pixel_buffer_blend_value_at_pixel() in the if statement: apparently blend_two_pixel_values() delivers a different result under -O2 and -O1:

-ply_pixel_buffer_blend_value_at_pixel old: 4281008158 new: 4281073951
+ply_pixel_buffer_blend_value_at_pixel old: 4281008158 new: 4281008414

Indeed, making blend_two_pixel_values() non-inline and just building this function with -O1 makes it work.

Within blend_two_pixel_values(), I only get hits for the "then" branch. The decomposition of the pixel_value_{1,2} input args into rgba are identical in both cases, but the output value is not in many cases:

$ diff -u /tmp/good /tmp/bad | wdiff -d
# some examples only:
THEN rgba1(0 0 0 1) rgb2(43 0 30) rgb(43 0 30)
THEN rgba1(1 1 1 2) rgb2(43 0 30) [-rgb(44-] {+rgb(43+} 1 [-31)-] {+30)+}
THEN rgba1(3 3 3 4) rgb2(43 0 30) [-rgb(45 3 33)-] {+rgb(42 1 30)+}

With that we can hopefully construct a minimal test case. To be continued tomorrow...

Martin Pitt (pitti) wrote :

This is a broken-out test case which is a lot easier to debug than having to start plymouth-x11 and diffing log files.

It extracts the pixel blending function and prints out some additional intermediate steps and the result. They differ depending on whether or not optimizing:

$ gcc -o /tmp/out test-685352.c && /tmp/out
input: rgba1(40 40 40 0) rgb2(40 40 40)
  calculation to int: rgb(7f80 7f80 7f80)
  reduction to byte: rgb(80 80 80)
blend(alpha=00) = FF808080

input: rgba1(40 40 40 80) rgb2(40 40 40)
  calculation to int: rgb(5f80 5f80 5f80)
  reduction to byte: rgb(60 60 60)
blend(alpha=80) =: FF606060

input: rgba1(40 40 40 ff) rgb2(40 40 40)
  calculation to int: rgb(3fc0 3fc0 3fc0)
  reduction to byte: rgb(40 40 40)
blend(alpha=FF) = FF404040

This is the correct one. With an alpha of 255, the second pixel value sholdn't be taken into account at all, and the result of the last call should be identical to the first pixel. With an alpha of 0, the result should be a pure addition (first case). With an alpha of 128 (50%), it should be half of the value of the first pixel, plus the second.

But with -O2 it's all messed up:

$ gcc -o /tmp/out -O2 test-685352.c && /tmp/out

input: rgba1(40 40 40 0) rgb2(40 40 40)
  calculation to int: rgb(3f80 3f80 3f80)
  reduction to byte: rgb(40 40 40)
blend(alpha=00) = FF404040

input: rgba1(40 40 40 80) rgb2(40 40 40)
  calculation to int: rgb(1f80 1f80 1f80)
  reduction to byte: rgb(20 20 20)
blend(alpha=80) =: FF202020

input: rgba1(40 40 40 ff) rgb2(40 40 40)
  calculation to int: rgb(ffc0 ffc0 ffc0)
  reduction to byte: rgb(1 1 1)
blend(alpha=FF) = FF010101

So we can state that there is definitively a difference of the arithmethics here when building with -O2 vs. -O1. It might be because the code is doing some undefined or underdefined operations?

The reduction of bytes looks ok, so it looks like the problem is in here:

      red = red_1 * 255 + red_2 * (255 - alpha_1);
      green = green_1 * 255 + green_2 * (255 - alpha_1);
      blue = blue_1 * 255 + blue_2 * (255 - alpha_1);

results are uint_least16_t, inputs are all uint8_t.

Martin Pitt (pitti) wrote :

I notice one strange thing in that code: the result will only fit in 16 bits if red1 and red2*alpha_1 together aren't bigger than 0x80.

In fact, if I bump the data type of red, green, blue to uint32_t, it works fine. I'm just not sure why it would work with least16_t without optimization, as the intermediate values in the test cases are all fitting into 16 bits, and gcc actually uses a 16 bit value in both optimization cases (verified with sizeof).

If I replace uint_least16_t with uint16_t, I still see the same bug.

To be honest I don't fully understand what the reduction to byte does:

      red = (uint8_t) ((red + (red >> 8) + 0x80) >> 8);

but it looks to me as it would merely do rounding up. The initial calculation is done in the high byte of red, with a tad of underestimating (255, not 256), which explains why rounding up is necessary.

For the "good" case, and alpha=0, we want to get 0x40 * 255 + 0x40 * 255 = 32640 = 0x7F80, and that's indeed what the intermediate value is with -O0 (see above). In the faulty case, it is computed as 3f80 instead.

Martin Pitt (pitti) wrote :

This is a test case which is as small as I could get it. It would be nice if we could get a review of that from the gcc guys to see whether it's a bug or undefined behaviour?

$ gcc -o /tmp/out test-685352-minimal.c && /tmp/out
7f80

$ gcc -o /tmp/out -O2 test-685352-minimal.c && /tmp/out
3f80

Changed in gcc-4.5 (Ubuntu Natty):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Martin Pitt (pitti) wrote :

Now that we have a separate test case, I'd like to work around this issue in plymouth.

Changed in plymouth (Ubuntu Natty):
assignee: Canonical Foundations Team (canonical-foundations) → Martin Pitt (pitti)
importance: High → Medium
milestone: ubuntu-11.04-beta → natty-alpha-2
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package plymouth - 0.8.2-2ubuntu9

---------------
plymouth (0.8.2-2ubuntu9) natty; urgency=low

  [ Steve Langasek ]
  * debian/control: set Vcs-Bzr to forestall future mis-uploads

  [ Martin Pitt ]
  * Add gcc45-arith-workaround.patch: Work around a gcc-4.5 behaviour
    change/bug(?). Also, this change is actually correct, as the arithmethics
    can easily overflow a 16 bit value. (LP: #685352)
 -- Martin Pitt <email address hidden> Fri, 14 Jan 2011 12:09:05 -0600

Changed in plymouth (Ubuntu Natty):
status: Triaged → Fix Released
Martin Pitt (pitti) wrote :

Forwarded the patch to the plymouth folks, as it is actually a bug fix there.

Ulrich Weigand (uweigand) wrote :

doko asked me to look into this. This is definitely a compiler bug. The minimal test case I get is:

unsigned short test (unsigned char val) __attribute__ ((noinline));

unsigned short
test (unsigned char val)
{
  return val * 255;
}

int
main(int argc, char**argv)
{
  printf ("test(val=40) = %x\n", test(0x40));
  return 0;
}

When built without optimization, we get (correctly):
test(val=40) = 3fc0

When built with optimization, we get instead:
test(val=40) = ffc0

There is nothing undefined in the source code (unsigned arithmetic has defined overflow behaviour in C, and in any case, there actually isn't any overflow in this particular example).

The bug is that GCC realizes that for 8-bit integers, multiplying by 255 is actually the same as just taking the negative value (which is true as far as it goes, if you only use 8 bits of the results) -- and then erroneously does the same optimization when we actually need 16 bits of the result.

This bug seems to have been introduced sometime in the 4.5 cycle, and is actually still present on current mainline ...

Building the following test case with current mainline on i386:

unsigned short test (unsigned char val) __attribute__ ((noinline));

unsigned short
test (unsigned char val)
{
  return val * 255;
}

int
main(int argc, char**argv)
{
  printf ("test(val=40) = %x\n", test(0x40));
  return 0;
}

We get the following (correct) output with -O0:
test(val=40) = 3fc0

and the following incorrect output with -O2:
test(val=40) = ffc0

The problem appears to be related to this piece of code in expand_expr_real2, case WIDEN_MULT_EXPR:

                  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
                                   EXPAND_NORMAL);
                  temp = expand_widening_mult (mode, op0, op1, target,
                                               unsignedp, this_optab);

expand_operands will expand the constant 255 into QImode and return a (const_int -1) for op1. Passing this constant into expand_widening_mult then apparently generates a simple negation operation in HImode instead (via expand_const_mult) ...

It seems this code came in here:
http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01327.html
Any suggestions how this ought to be handled?

Ulrich Weigand (uweigand) wrote :

Opened upstream gcc bug report PR rtl-optimzation/47299

On Fri, 14 Jan 2011, uweigand at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47299
>
> Summary: Widening multiply optimization generates bad code
> Product: gcc
> Version: 4.5.0
> Status: UNCONFIRMED
> Keywords: wrong-code
> Severity: normal
> Priority: P3
> Component: rtl-optimization
> AssignedTo: <email address hidden>
> ReportedBy: <email address hidden>
> CC: <email address hidden>, <email address hidden>
>
>
> Building the following test case with current mainline on i386:
>
> unsigned short test (unsigned char val) __attribute__ ((noinline));
>
> unsigned short
> test (unsigned char val)
> {
> return val * 255;
> }
>
> int
> main(int argc, char**argv)
> {
> printf ("test(val=40) = %x\n", test(0x40));
> return 0;
> }
>
> We get the following (correct) output with -O0:
> test(val=40) = 3fc0
>
> and the following incorrect output with -O2:
> test(val=40) = ffc0
>
> The problem appears to be related to this piece of code in expand_expr_real2,
> case WIDEN_MULT_EXPR:
>
> expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
> EXPAND_NORMAL);
> temp = expand_widening_mult (mode, op0, op1, target,
> unsignedp, this_optab);
>
> expand_operands will expand the constant 255 into QImode and return a
> (const_int -1) for op1. Passing this constant into expand_widening_mult then
> apparently generates a simple negation operation in HImode instead (via
> expand_const_mult) ...
>
> It seems this code came in here:
> http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01327.html
> Any suggestions how this ought to be handled?

I think we need to pass the narrow mode explicitly.

Richard.

Phillip Susi (psusi) wrote :

It looks to me like what it is doing is promoting the 255 to a signed short, which sign extends to 0xFFFF, then multiplying that by 0x0040 to get the 0xffc0 result. I think that this is allowed by the standard since the signedness of the literal 255 is undefined. Try adding adding a 'U' to the 255 to force it to unsigned.

Phillip Susi [2011-01-15 5:40 -0000]:
> It looks to me like what it is doing is promoting the 255 to a signed
> short, which sign extends to 0xFFFF

That sounds wrong, though. 0xFFFF is not 255 in any kind of data type.

Phillip Susi (psusi) wrote :

255 is 0xFF, which when you sign extend to 16 bits is 0xFFFF.

Martin Pitt (pitti) wrote :

Phillip Susi [2011-01-15 19:35 -0000]:
> 255 is 0xFF, which when you sign extend to 16 bits is 0xFFFF.

Yes, but gcc claiming that FFFF would be a valid representation of 255
sounds wrong to me.
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Colin Watson (cjwatson) wrote :

The integer constant 255 has type int, not signed char or anything like that. See C99 section 6.4.4.1. Treating it as a signed char and then sign-extending it certainly does not seem to be valid.

Phillip Susi (psusi) wrote :

Oh wow, this is the disassembly I get:

Dump of assembler code for function test:
=> 0x0000000000400530 <+0>: movzbl %dil,%eax
   0x0000000000400534 <+4>: neg %eax
   0x0000000000400536 <+6>: retq

So it looks like it is doing all 8 bit math and since 255 in 8 bits is -1, optimizes it to a negation. Definitely looks like an optimizer bug. It should be promoting them to at least unsigned shorts. Even (unsigned short)x * (unsigned short)255UL does not change it.

Phillip Susi (psusi) wrote :

Changing the argument from an unsigned char to an unsigned short works around the bug.

4.5/4.6 regression

It is a 4.6 regression. 4.5 branch is OK.

Created attachment 22994
gcc46-pr47299.patch

Untested fix.

Matthias Klose (doko) wrote :

seen on 4.5 linaro, not 4.5 fsf

Matthias Klose (doko) wrote :

the backport of the proposed upstream patches fixes the test case on 4.5 linaro.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gcc-4.5 - 4.5.2-1ubuntu6

---------------
gcc-4.5 (4.5.2-1ubuntu6) natty; urgency=low

  * Update to SVN 20110117 (r168896) from the gcc-4_5-branch.
    - Fix PR target/43309, PR fortran/46874, PR tree-optimization/47286,
      PR tree-optimization/44592, PR target/47201, PR c/47150, PR target/46880,
      PR middle-end/45852, PR tree-optimization/43655, PR debug/46893,
      PR rtl-optimization/46804, PR rtl-optimization/46865, PR target/41082,
      PR tree-optimization/46864, PR fortran/45777.
  * Apply proposed patch for PR rtl-optimization/47299. LP: #685352.
 -- Matthias Klose <email address hidden> Mon, 17 Jan 2011 18:03:28 +0100

Changed in gcc-4.5 (Ubuntu Natty):
status: New → Fix Released
Changed in gcc-linaro:
status: New → Confirmed
importance: Undecided → High

a biarch build for yesterday's trunk on i686 with the proposed patch applied succeeds and doesn't show any regressions.

Author: jakub
Date: Tue Jan 18 07:45:12 2011
New Revision: 168944

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168944
Log:
 PR rtl-optimization/47299
 * expr.c (expand_expr_real_2) <case WIDEN_MULT_EXPR>: Don't use
 subtarget. Use normal multiplication if both operands are
 constants.
 * expmed.c (expand_widening_mult): Don't try to optimize constant
 multiplication if op0 has VOIDmode. Convert op1 constant to mode
 before using it.

 * gcc.c-torture/execute/pr47299.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr47299.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/expmed.c
    trunk/gcc/expr.c
    trunk/gcc/testsuite/ChangeLog

Fixed.

Ulrich Weigand (uweigand) wrote :

Upstream GCC patch committed here:
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168944

I'll backport to Linaro GCC 4.5.

Changed in gcc-linaro:
status: Confirmed → In Progress
assignee: nobody → Ulrich Weigand (uweigand)
Changed in plymouth:
status: Unknown → Fix Released
Changed in gcc:
importance: Unknown → Medium
status: Unknown → Fix Released
Changed in plymouth:
importance: Unknown → Medium
Changed in gcc-linaro:
milestone: none → 4.5-2011.02-0
status: In Progress → Fix Committed
Michael Hope (michaelh1) on 2011-02-08
Changed in gcc-linaro:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.