Whole system stopped responding because of single program's fault - no way out, had to force power off

Bug #1853707 reported by teo1978
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LibreCAD
Fix Released
Unknown
librecad (Ubuntu)
Won't Fix
High
Unassigned

Bug Description

What I did was:

- install LibreCAD (by adding their ppa and then using apt-get)
- open it
- try to open the attached .dwf file

LibreCAD hanged and the whole system stopped responding. The mouse button stopped moving, the system wouldn't respond to keyboard strokes, AND not even Ctrl+Alt+F1 worked to open a virtual terminal where I could have tried killing some process.

Obviously there's a bug in LibreCAD, but no matter how badly a buggy (or even evil) process behaves, if a single misbehaving program can cause the whole OS to stall in a way that you cannot recover from, then by definition there's a bug in the OS.

You must ALWAYS be able to kill the offending program and resume normal operation without having to stop one single application other than the offending one.

I have no idea whether the bug is actually in Xorg or somewhere else; the reason I'm reporting it against xorg is that ctrl+alt+F1 wouldn't work. I might very well be wrong but I believe xorg is responsible for providing the virtual terminal via ctrl+alt+f1?

Either way, that MUST ALWAYS be available no matter how badly things go.

That a single program hanging causes me to have to force-power down the computer is outrageous.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: xorg 1:7.7+13ubuntu3.1
ProcVersionSignature: Ubuntu 4.4.0-169.198-generic 4.4.197
Uname: Linux 4.4.0-169-generic x86_64
NonfreeKernelModules: nvidia_uvm nvidia
.proc.driver.nvidia.gpus.0000.01.00.0: Error: [Errno 21] Is a directory: '/proc/driver/nvidia/gpus/0000:01:00.0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 340.107 Thu May 24 21:54:01 PDT 2018
 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
.tmp.unity_support_test.0:

ApportVersion: 2.20.1-0ubuntu2.21
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
CompositorUnredirectDriverBlacklist: '(nouveau|Intel).*Mesa 8.0'
CompositorUnredirectFSW: true
CurrentDesktop: Unity
Date: Sat Nov 23 17:27:22 2019
DistUpgraded: Fresh install
DistroCodename: xenial
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, if not too technical
GraphicsCard:
 Intel Corporation 3rd Gen Core processor Graphics Controller [8086:0166] (rev 09) (prog-if 00 [VGA controller])
   Subsystem: Acer Incorporated [ALI] 3rd Gen Core processor Graphics Controller [1025:0647]
 NVIDIA Corporation GF117M [GeForce 610M/710M/810M/820M / GT 620M/625M/630M/720M] [10de:1140] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: Acer Incorporated [ALI] GeForce 710M [1025:0691]
InstallationDate: Installed on 2013-10-11 (2233 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Release amd64 (20130424)
MachineType: Acer Aspire V3-571G
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-169-generic root=UUID=5830b30e-69e8-4bb4-8a2b-bc2b43c7414a ro quiet splash vt.handoff=7
SourcePackage: xorg
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/15/2012
dmi.bios.vendor: Acer
dmi.bios.version: V2.07
dmi.board.asset.tag: Type2 - Board Asset Tag
dmi.board.name: VA50_HC_CR
dmi.board.vendor: Acer
dmi.board.version: Type2 - Board Version
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: V2.07
dmi.modalias: dmi:bvnAcer:bvrV2.07:bd10/15/2012:svnAcer:pnAspireV3-571G:pvrV2.07:rvnAcer:rnVA50_HC_CR:rvrType2-BoardVersion:cvnAcer:ct10:cvrV2.07:
dmi.product.name: Aspire V3-571G
dmi.product.version: V2.07
dmi.sys.vendor: Acer
version.compiz: compiz 1:0.9.12.3+16.04.20180221-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.91-2~16.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 18.0.5-0ubuntu0~16.04.1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 18.0.5-0ubuntu0~16.04.1
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.18.4-0ubuntu0.8
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.10.1-1ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.7.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1.2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.12-1build2
xserver.bootTime: Sat Nov 23 17:25:58 2019
xserver.configfile: /etc/X11/xorg.conf
xserver.errors:
 NVIDIA(0): Failed to initiate mode change.
 NVIDIA(0): Failed to complete mode change
xserver.logfile: /var/log/Xorg.0.log
xserver.version: 2:1.18.4-0ubuntu0.8

Revision history for this message
teo1978 (teo8976) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It's possible something related to the screen crashed. Next time the problem occurs please look in /var/crash and tell us if you find anything.

Please also (after any future hang) reboot and then run:

  journalctl -b-1 > prevboot.txt

and attach the file 'prevboot.txt' to this bug.

affects: xorg (Ubuntu) → xorg-server (Ubuntu)
Changed in compiz (Ubuntu):
status: New → Incomplete
Changed in unity (Ubuntu):
status: New → Incomplete
Changed in xorg-server (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for unity (Ubuntu) because there has been no activity for 60 days.]

Changed in unity (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for xorg-server (Ubuntu) because there has been no activity for 60 days.]

Changed in xorg-server (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for compiz (Ubuntu) because there has been no activity for 60 days.]

Changed in compiz (Ubuntu):
status: Incomplete → Expired
teo1978 (teo8976)
summary: - Whose system stopped responding because of single program's fault - no
+ Whole system stopped responding because of single program's fault - no
way out, had to force power off
Changed in compiz (Ubuntu):
status: Expired → New
Changed in xorg-server (Ubuntu):
status: Expired → New
Changed in unity (Ubuntu):
status: Expired → New
Revision history for this message
teo1978 (teo8976) wrote :

This is not incomplete.

Just because the reporter has been asked to provide more information the next time they observe the bug, doesn't mean the report is incomplete.
There is enough information for somebody to look at it. Another thing is that nobody wants to or they have better things to do, but a bug shouldn't expire because of that.
(actually, bugs shouldn't expire for any reason as long as they understandably describe an issue that exists)

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

you're using the nvidia driver, and since it seems to be a bug in the driver, there's nothing much we can do about it

also, 340 series was just annouced to be EOL

affects: xorg-server (Ubuntu) → nvidia-graphics-drivers-340 (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

We do rely on bug reporters to help with triaging the bugs they open. This is mostly because we do not have infinite resources to try and reproduce everyone's bugs ourselves. And doing so would be unreliable anyway, as we might hit slightly different issues and mistake them for the original issue. We have hundreds of bug reports to deal with each week -- there is not enough time to replicate them all.

Please follow the instructions in comment #2 as that will get this closer to a resolution.

Changed in compiz (Ubuntu):
status: New → Incomplete
Changed in nvidia-graphics-drivers-340 (Ubuntu):
status: New → Incomplete
Changed in unity (Ubuntu):
status: New → Incomplete
Revision history for this message
teo1978 (teo8976) wrote :

Looked in /var/crash after this happened again (did I mention it's systematically reproducible?) and there's nothing there.

I'm not sure what makes you think this has to do with NVidia drivers. This seems to suggest it doesn't:
https://github.com/LibreCAD/LibreCAD/issues/1161#issuecomment-557833504

but I don't know.

Revision history for this message
teo1978 (teo8976) wrote :

What is suggested there is that something in the offending program (libreCAD) eats up all the RAM.

What I am reporting here as a bug in the OS is the fact that the WHOLE SYSTEM hangs irrecoverably because of that.

Quoting a comment from the above link:

> Linux is very good managing the RAM memory, except when it ends without memory.
> Then the full system is frozen (Ubuntu, Debian, Arch and all the distros).
> When this happens, it's impossible to open a terminal to kill the process.
> You will have the same experience opening hundreds of browser tabs with Flash
> Player, for example.

If that is true it is a critical bug in Linux, which should be regarded as a huge vulnerability rather than something we just have to live with.

Unless of course that is all incorrect and this is actually caused by a bug in the NVidia driver (again, no idea where you get that from): in that case of course, I guess the OS cannot be expected to be able to recover from some extremely bad behavior of a driver. It IS expected to be able to recover from the worst imaginable behavior of the most buggy and malicious program, but probably not of a driver which is in fact part of the OS itself.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

> I'm not sure what makes you think this has to do with NVidia drivers.

That wasn't me. See comment #7.

no longer affects: nvidia-graphics-drivers-340 (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK, we have two separate bugs here then:

* LibreCAD uses all the memory

* The desktop stops responding when memory is exhausted

These really should be two different bug IDs.

The latter is practically impossible to fix as it would require re-engineering from the ground up to make it more robust. Plus that would be re-engineering a desktop environment that we no longer support (Compiz/Unity), in 16.04 which will reach end of standard support in just over a year from now. So if you like you can open a separate bug for that, but it's unlikely to ever be resolved.

I will make this bug about the first issue only. That can realistically be resolved.

affects: compiz (Ubuntu) → librecad (Ubuntu)
Changed in librecad (Ubuntu):
status: Incomplete → New
no longer affects: unity (Ubuntu)
Revision history for this message
Armin Stebich (lordofbikes) wrote :

This ticket is a cross post. It is already posted in LibreCAD issues:
https://github.com/LibreCAD/LibreCAD/issues/1161.

This bug is indeed caused by LibreCAD or more exactly by libdxfrw, the DXF/DWG library it uses.
DWG support in libdxfrw is very rudimentary and therefor a warning message appear when opening a DWG file in LibreCAD. This warning states, that this could happen.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

FYI, librecad appears to have no Ubuntu maintainer. We seem to get it directly from Debian and I don't know if Debian is in much of a better position. Please discuss the issue with upstream in https://github.com/LibreCAD/LibreCAD/issues/1161 as the first point of contact.

Changed in librecad (Ubuntu):
importance: Undecided → High
Changed in librecad-dev:
status: Unknown → New
Revision history for this message
teo1978 (teo8976) wrote :

> This warning states, that this could happen.

I have already uninstalled LibreCAD and can't check the exact warning now, but I'm pretty sure it doesn't state that it could halt your entire system.
Even if it did, that's not enough. No matter how rudimentary support for a given file format is, you just don't let it crash the entire program, let alone the entire OS.

Changed in librecad (Ubuntu):
status: New → Confirmed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I suggest the simplest solution to protecting the OS right now is to ask the kernel to set a hard limit on individual process memory usage (e.g. half your RAM size or something), by editing:

  /etc/security/limits.conf

or adding a new config file in:

  /etc/security/limits.d/

Changed in librecad-dev:
status: New → Fix Released
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It appears the original reporter no longer exists, so it doesn't make sense keeping this bug open.

Changed in librecad (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.