a java program that causes X windows to hang

Bug #513482 reported by Mike Fairbank
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
X.Org X server
Won't Fix
High
sun-java6 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have a simple java program that causes X windows / Gnome to hang with my hardware if you perform a few simple steps.

I'm using 64 bit ubuntu, Linux Kernel 2.6.31-17-generic, with java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode). I've had the bug happening on older versions of java too (back to 1.6.0_12).

If you want me to submit this bug report to java then let me know, but this bug doesn't happen on other operating systems with java, so I think it's possibly a linux/gnome bug.

Here are the steps to reproduce the problem. You need a recent version of java 1.6 installed.
1. unzip the attached folder and cd into it.
2. run the java debugger by typing "jdb"
3. in the java debugger, type "stop at src.MainClass:45"
4. in the java debugger, type "run src.MainClass"

A window should open that displays a drop down item. Change the drop-down item from A to B. This causes X windows/Gnome to hang. The mouse pointer still moves but you can't click on anything (i.e. nothing responds to mouse-clicks).

The java code attached is the minimal code I could create that causes the problem. It took hours of crashing my OS and rebooting to get it that far, but it probably can go smaller.

ProblemType: Bug
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: NVidia [HDA NVidia], device 0: ALC660-VD Analog [ALC660-VD Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: NVidia [HDA NVidia], device 0: ALC660-VD Analog [ALC660-VD Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mike 2288 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'NVidia'/'HDA NVidia at 0xfe028000 irq 22'
   Mixer name : 'Realtek ALC660-VD'
   Components : 'HDA:10ec0660,1019c601,00100001'
   Controls : 21
   Simple ctrls : 13
CurrentDmesg:
 [ 30.192529] eth0: no IPv6 routers present
 [ 32.227378] NET: Registered protocol family 8
 [ 32.227382] NET: Registered protocol family 20
 [ 173.920021] Clocksource tsc unstable (delta = -180718118 ns)
Date: Wed Jan 27 21:15:41 2010
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=cbbd2f63-d2e7-4544-8995-82c9d961cacc
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 003: ID 03ee:5617 Mitsumi
 Bus 002 Device 002: ID 05bc:0102 3G Green Green Globe Co., Ltd
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
NonfreeKernelModules: nvidia
Package: linux-image-2.6.31-17-generic 2.6.31-17.54
ProcCmdLine: root=UUID=af17a3ba-39c0-4374-8f2e-5713f1f31689 ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-17-generic N/A
 linux-firmware 1.25
RfKill:

SourcePackage: linux
Uname: Linux 2.6.31-17-generic x86_64
WpaSupplicantLog:

XsessionErrors:
 (gnome-settings-daemon:2312): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (polkit-gnome-authentication-agent-1:2400): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed
 (nautilus:2366): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed
 (firefox:2802): GLib-WARNING **: g_set_prgname() called multiple times
 (nautilus:3057): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed
dmi.bios.date: 05/31/2007
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: NF-MCP61
dmi.chassis.type: 3
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd05/31/2007:svn:pn:pvr:rvn:rnNF-MCP61:rvr:cvn:ct3:cvr:

Revision history for this message
Mike Fairbank (michael-fairbank) wrote :
Revision history for this message
Dave Walker (dogatemycomputer) wrote :

I'm not sure if this is a Java issue or a X issue.

affects: linux (Ubuntu) → sun-java6 (Ubuntu)
Changed in sun-java6 (Ubuntu):
status: New → Confirmed
Revision history for this message
Mike Fairbank (michael-fairbank) wrote :

Regarding whether this is a java issue or an X issue: I've tested this on an old Windows XP machine with java 1.6 and the bug does not happen there.

Revision history for this message
Mike Fairbank (michael-fairbank) wrote :

Attached is a smaller version of the java code that causes the crash. Hopefully that will make it easier to analyse. I don't think there's much point in going any smaller, since the cause of the bug is probably in X11/Gnome/Java and the size of the code is insignificant to those packages, but I can shrink it further if someone asks me.

It is the same instructions to cause the bug, i.e. put the breakpoint on line src.MainClass:45 and run it in "gdb" as before.

After the crash, alt+ctrl+f1 works, if that helps with system recovery.

Revision history for this message
Mike Fairbank (michael-fairbank) wrote :

Please can the this importance of this bug be raised to "severe", because that what a reproducible x-windows total crash is?

Please can it then be pushed upstream to x-windows? Whether it is java that is causing X-windows to crash or not, X-windows should handle the error more gracefully than this, so there is at least an enhancement that could be done to X-windows here, and possibly to java too.

My system occasionally crashes with symptoms like this (definitely not while using the java debugger, but I might have had some java applications running in the background, I'm not sure), so I hope fixing this bug will make my linux experience generally more robust.

Revision history for this message
Mike Fairbank (michael-fairbank) wrote :

This bug is reproducible on a Linux distro called crux (http://www.crux.nu) with a lightweight window manager named IceWM.

So this bug is nothing to do with Ubuntu or Gnome.

Please can it be pushed upstream to X11 or/and java.

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

Created an attachment (id=34303)
the java program that causes the error

When this java program is run, it causes X Windows to hang (i.e. No windows respond to mouse clicks at all, even though the mouse pointer still can move.) Even though the bug may be caused by Java, I still think there is a bug with X windows, because it should handle the problem more gracefully.

To reproduce the error take the following steps (this works every time for me):

1. unzip the attached folder and cd into it.
2. run the java debugger by typing "jdb"
3. in the java debugger, type "stop at src.MainClass:45"
4. in the java debugger, type "run src.MainClass"

A window should open that displays a drop down item. Change the drop-
down item from A to B. This causes X windows/Gnome to hang. The mouse
pointer still moves but you can't click on anything (i.e. nothing
responds to mouse-clicks).

The java code attached is the minimal code I could create that causes
the problem. It took hours of crashing my OS and rebooting to get it
that far, but it probably can go smaller.

Sorry I haven't managed to test this bug on the latest version of X, but I've had several people verify this bug on different linux OS combinations, so I hope you can reproduce it too.

Thanks, I hope this helps!

End of bug report

Further Details of my OS + Java:

X.Org X Server 1.6.4
Release Date: 2009-9-27
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.24-23-server x86_64 Ubuntu
Current Operating System: Linux mike-desktop 2.6.31-20-generic #58-Ubuntu SMP Fr
i Mar 12 04:38:19 UTC 2010 x86_64

java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode).

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

Could this be a dupe of bug 25400?

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

I couldn't get bug 25400 to run, but superficially it sounds different because that bug states "Clicking anywhere else on the button traps the mouse cursor within the bounds of the button, with no means of escape other than switching to a console and killing the application.", whereas with this bug (27232), when X windows hangs you can freely move the mouse pointer over the whole screen, it's just that no windows or buttons respond to mouse clicks.

By the way:
1. The computer is still running fine despite not being able to mouse-click on anything, for example my CPU graph animation continues running in the background.
2. You can escape from this hang with alt+f7 (in Ubuntu, gnome anyway).

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

(In reply to comment #2)
> I couldn't get bug 25400 to run, but superficially it sounds different because
> that bug states "Clicking anywhere else on the button traps the mouse cursor
> within the bounds of the button, with no means of escape other than switching
> to a console and killing the application.", whereas with this bug (27232), when
> X windows hangs you can freely move the mouse pointer over the whole screen,
> it's just that no windows or buttons respond to mouse clicks.

whether the pointer is trapped is just one parameter in the grab request (ConfineTo) and no-one but motif still uses it. So it could still be the same bug.

> By the way:
> 1. The computer is still running fine despite not being able to mouse-click on
> anything, for example my CPU graph animation continues running in the
> background.
> 2. You can escape from this hang with alt+f7 (in Ubuntu, gnome anyway).

I couldn't reproduce it with the test program though (that's on 1.8 with the patches from 25400 applied). Do you have the chance of trying the patches?

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

(In reply to comment #3)

> I couldn't reproduce it with the test program though (that's on 1.8 with the
> patches from 25400 applied).

That's good news, just to double check: you did try to reproduce the error using the java debugger (jdb) with the breakpoint in the correct place, as described in the original post?

> Do you have the chance of trying the patches?

I'd like to, but I'm a novice/intermediate on linux in general, how easy is it? Last time I had problems with X starting, I had to reinstall my whole OS distribution, so I'm a bit wary. But if there's a simple guide you can link me to (for both the xorg upgrade, that will work under ubuntu; and how to install a patch), then I'll give it a go.

Revision history for this message
Mike Fairbank (michael-fairbank) wrote :

I've raised this upstream at xorg ( https://bugs.freedesktop.org/show_bug.cgi?id=27232 ).

So this bug could be closed now. Thank you.

Matthias Klose (doko)
affects: sun-java6 (Ubuntu) → xorg (Ubuntu)
Bryce Harrington (bryce)
tags: added: karmic
Bryce Harrington (bryce)
affects: xorg (Ubuntu) → nvidia-graphics-drivers-180 (Ubuntu)
Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

(In reply to comment #4)
> (In reply to comment #3)
>
> > I couldn't reproduce it with the test program though (that's on 1.8 with the
> > patches from 25400 applied).
>
> That's good news, just to double check: you did try to reproduce the error
> using the java debugger (jdb) with the breakpoint in the correct place, as
> described in the original post?

argh, no, I overlooked this every single time I read the report.

but it makes a lot more sense now :)
I'm pretty sure what you're running into here is a common debugging issue under X. When a popup menu is created, the client (java in your case) requests a grab (either passive or active, what matters is that it _activates_ when the popup is displayed). The reason is simple - if you click outside of the popup, the window still gets the event knowing that it must undisplay itself.

if you set a breakpoint between requesting that grab and the grab being released, then nothing will work until you continue and release the grab. this also explains why I didn't see it without the jdb.

in the extreme case, if the app also has a keyboard grab, you won't be able to use the keyboard either since both are grabbed by the now halted client.

IIRC GTK works around this with magic flags for debugging that skip the grabs.

does this explanation make sense? do you see it if the breakpoint is anywhere before or after the popup has been displayed and undisplayed again?

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

>argh, no, I overlooked this every single time I read the report.

Can you confirm whether you managed to reproduce the bug then on the latest version of xorg?

>does this explanation make sense?

Sounds a nice explanation, but I know nothing of xorg or its internal workings so you could tell me anything and I'd agree. Thanks for looking into it and thinking about it - I hope this leads to a fix.

> do you see it if the breakpoint is anywhere
before or after the popup has been displayed and undisplayed again?

I'll get back to you on that.

Thanks again!

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

Peter, here are some answers to your questions and a further question:

> do you see it if the breakpoint is anywhere before or after the popup has been displayed and undisplayed again?

The breakpoint must be in the "actionperformed" event for that dropdown. It seems it can also be on any of the lines of the Tabfolder class:

public void actionPerformed(ActionEvent event) {
  final Object object = event.getSource();
  if (object == comboBox_descentMode) {
   final Modes descentMode = (Modes) comboBox_descentMode.getSelectedItem();
   mainClass.update(descentMode);
  }
 }

> I'm pretty sure what you're running into here is a common debugging issue under X.
>...
> IIRC GTK works around this with magic flags for debugging that skip the grabs.

Does that mean a fix has already been created (is there a bug duplicate)? Do you know which version this will be fixed in?

Thanks!

Mike.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

(In reply to comment #7)
> Peter, here are some answers to your questions and a further question:
>
> > do you see it if the breakpoint is anywhere before or after the popup has been displayed and undisplayed again?
>
> The breakpoint must be in the "actionperformed" event for that dropdown. It
> seems it can also be on any of the lines of the Tabfolder class:
>
> public void actionPerformed(ActionEvent event) {
> final Object object = event.getSource();
> if (object == comboBox_descentMode) {
> final Modes descentMode = (Modes)
> comboBox_descentMode.getSelectedItem();
> mainClass.update(descentMode);
> }
> }

urgh. I have no idea what the JVM does in that part though, but a protocol snoop should show us. if you install xscope, you get a localhost:4 to DISPLAY=:0 forwarding with the following command:
  xscope -i4 -o0

if you then start your test app with DISPLAY=localhost:4 ./mycommand all the protocol data should be routed through xscope. set the breakpoint and check the last requests that went to the server. If any of them is a GrabPointer or GrabKeyboard request without a paired UngrabPointer/Keyboard request, that's the issue.

> > I'm pretty sure what you're running into here is a common debugging issue under X.
> >...
> > IIRC GTK works around this with magic flags for debugging that skip the grabs.
>
> Does that mean a fix has already been created (is there a bug duplicate)? Do
> you know which version this will be fixed in?

this is a gtk-internal solution. IIRC (and that's a while ago) if the debug options are set, gtk simply doesn't request the grab.

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

Hi Peter,
Sorry about the delayed reply.

>urgh. I have no idea what the JVM does in that part though, but a protocol
>snoop should show us. if you install xscope, you get a localhost:4 to
>DISPLAY=:0 forwarding with the following command:
> xscope -i4 -o0

I can't get this to work easily. I'm not an experienced xorg person.

Here is my attempt to do this:

mike@mike-desktop:~$ sudo apt-get install xserver-xorg-core
Reading package lists... Done
Building dependency tree
Reading state information... Done
xserver-xorg-core is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 14 not upgraded.
mike@mike-desktop:~$ xscope -i4 -o0
No command 'xscope' found, did you mean:
 Command 'cscope' from package 'cscope' (universe)
 Command 'xoscope' from package 'xoscope' (universe)
xscope: command not found
mike@mike-desktop:~$

Hmmm. Even if I got this to work, I'd also need help on your next instruction:

>... and check the last requests that went to the server.
>If any of them is a GrabPointer or GrabKeyboard request without
> a paired UngrabPointer/Keyboard request, that's the issue.

So can anyone fill me in with fuller instructions for this, or even better, do this for me? I hope the attachment and instructions of this original bug report should make the error easily reproducible. It sounds to me like the diagnostic by Peter in very on the right track...

Thanks.

Mike.

Revision history for this message
In , Bugzi09-fdo-tormod (bugzi09-fdo-tormod) wrote :

Mike, xscope is not packaged for Debian/Ubuntu. There used to be "xmon" but it got dropped with reference to wireshark as an alternative. OTOH, xscope is easy to build: Get the source tarball from http://cgit.freedesktop.org/xorg/app/xscope/ unpack it and run ./autogen.sh && make

I would recommend using the 1.2 tarball, since the latest git commit broke building on Ubuntu (if you are familiar with git you can instead revert that commit of course). If your build fails, it is probably due to missing dependencies. Pulling in build dependencies for some other Xorg drivers should probably take care of it, i.e. sudo apt-get build-dep xserver-xorg-input-evdev

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

Created an attachment (id=35928)
Output from xscope during hang

Thanks Tormod, your instructions worked.

Attached is the output from xscope.

Peter does this confirm your diagnosis?

Thanks.

Mike.

PS: Note to myself: as a Gnome user, to recover from the X-windows hang I used "ctrl+alt+f1" followed by login and "sudo /etc/init.d/gdm restart"

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

run a "egrep "Ungrab|Grab" -A 1" query against the file. You'll see an output like this (i cut out the false positive lines and the bit at the start that refers to other things)

............REQUEST: GrabPointer
        owner-events: True
--
  ..............REPLY: GrabPointer
               status: Success
--
 ............REQUEST: GrabKeyboard
        owner-events: True
--
  ..............REPLY: GrabKeyboard
               status: Success

This simply means that the client requests a pointer grab and succeeds, then
requests a keyboard grab and succeeds. There is no Ungrab following this so by
the time the log ends the client still holds the grab. And while a client
holds a pointer + keyboard grab, you cannot interact with other clients.

So yes, this confirms the hypothesis in comment #5 and there's no easy way to work around this . Sorry.

Revision history for this message
In , Keith Packard (keithp) wrote :

Note that most X toolkits have a 'debugger' mode which disables grabs to make debugging possible.

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

Concerning the new status (RESOLVED & NOTOURBUG), is there some other more appropriate project to raise this bug? I don't want to see this bug report, that I've put a lot of effort into, not yielding any eventual benefit.

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

I've reopened (just one more time) this because there are no clues on what do do with this bug-report next.

It was a lot of effort for me to isolate it in a reproducible way, and if it just dies here that effort was wasted, and someone else might go through the whole process of raising it again. Alternatively you could classify this bug as "wontfix"?

Can you tell me where to raise this bug if it's not xorg's bug?

If the next step is to raise this with Java developers, then I'm not optimistic about getting a fix since this bug is not reproducible on other windowing systems (non X systems,e.g. Windows), so it really does seem to me that the problem is with xorg.

Thanks.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

(In reply to comment #15)
> I've reopened (just one more time) this because there are no clues on what do
> do with this bug-report next.

This is not something we can fix in X.org, at least not yet. There are future plans to work around the complete block but for now we can't. The debugger obtains a grab and never releases it. From an X server POV this is a misbehaving client, the context of the client (i.e. that the client is halted during debugging) is invisible to the X server.

As Keith said, what is needed here is a debugging mode in the toolkit that prevents the client from issuing grabs while it is being debugged.

> Can you tell me where to raise this bug if it's not xorg's bug?

> If the next step is to raise this with Java developers, then I'm not optimistic
> about getting a fix since this bug is not reproducible on other windowing
> systems (non X systems,e.g. Windows), so it really does seem to me that the
> problem is with xorg.

This needs to be fixed in Java, it cannot be fixed in the X server. Sorry.

Revision history for this message
In , Mike Fairbank (michael-fairbank) wrote :

I've raised it in Java's bug reporting system (detailed below), and I cross referenced a link to this bug report so they can use your advice.

Thanks everyone for your work on this bug.

Mike

----------------

Here's the reply I got from Java:

Dear Java Developer,

Thank you for reporting this issue.

We have determined that this report is a new bug and entered the bug into our internal bug tracking system under Bug Id: 6964615.

You can monitor this bug on the Java Bug Database at
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6964615.

It may take a day or two before your bug shows up in this external database. If you are a member of the Sun Developer Network (SDN), there are two additional options once the bug is visible.

1. Voting for the bug
  Click http://bugs.sun.com/bugdatabase/addVote.do?bug_id=6964615.

2. Adding the report to your Bug Watch list.
  You will receive an email notification when this bug is updated.
  Click http://bugs.sun.com/bugdatabase/addBugWatch.do?bug_id=6964615.

The Sun Developer Network (http://developers.sun.com) is a free service that Sun offers. To join, visit https://softwarereg.sun.com/registration/developer/en_US/new_user.

Regards,
Java Developer Support

Changed in xorg-server:
importance: Unknown → High
status: Unknown → Won't Fix
Changed in xorg-server:
importance: High → Unknown
Changed in xorg-server:
importance: Unknown → High
Revision history for this message
Mike Fairbank (michael-fairbank) wrote :

I don't know why this bug has been re-opened. The Xorg-server developers analysed this bug carefully and diagnosed it as Java misbehaving. I raised this bug and as far as I'm concerned it's been passed on to the responsible software owner- which in this case is Oracle's Java.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

the robot changed the upstream bug link, that's what happened. The launchpad bug has been open all the time, but I'll reassign it to the sun-java6 package where it belongs.

affects: nvidia-graphics-drivers-180 (Ubuntu) → sun-java6 (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.