trap invalid opcode in libopenblas

Bug #1122030 reported by Thomas U.
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openblas (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Hi!

I'm running Ubuntu 12.10 x86-64, and I recently tried to replace ATLAS with OpenBLAS on my system. I was hoping to get a speedup on my computations (which I'm performing using python-numpy). I did:

sudo apt-get remove libatlas3gf-base libatlas-dev
sudo apt-get install libopenblas-dev

However, it would appear that something went wrong:

$ python -c"import numpy; x = numpy.random.random((1000,1000)); numpy.dot(x, x)"
Illegal instruction
$ tail /var/log/syslog -n 1
Feb 11 14:00:28 xua kernel: [ 456.729279] python[12206] trap invalid opcode ip:7f3b48b57220 sp:7f3b4506ecf8 error:0 in libopenblas.so.0[7f3b47d18000+127e000]

According to `apt-cache show libopenblas-bas` it is decided at runtime which kernel will be run. It looks like in my case (AMD Bulldozer, FX-8150) it tries to run the wrong kernel. Any help would be appreciated.

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Thomas,
  Can you include the output of cat /proc/cpuinfo so we can see what instructions your CPU is claiming it can do?

Also, I think a backtrace would help; I *think* you should find you have some log files in /var/crash , I'm guessing that from this test they'll appear to be against python but I may be wrong; but if you can find the one that's generated from this simple test case and then I think:

   apport-cli -u 1122030 /var/crash/whatever-the-crash-file-is-called

should attach the backtrace to this bug, then we can figure out which instruction it's trying to use in what function.
(I think!).

Dave

Changed in openblas (Ubuntu):
status: New → Incomplete
Revision history for this message
Thomas U. (thomas-unterthiner) wrote :

Attacked to this message you will find the result of `cat /proc/cpuinfo`

Revision history for this message
Thomas U. (thomas-unterthiner) wrote :

Sorry for posting this in a separate message, but I didn't figure out how to attach more than 1 file to a message. Unfortunately apport-cli doesn't work on my end, giving me the following:

~$ apport-cli -u 1122030 /var/crash/_usr_bin_python2.7.1000.crash
ERROR: The launchpadlib Python module is not installed. This functionality is not available.

(both python-launchpadlib-toolkit and python-launchpadlib are installed, so no idea what's missing here)

Thus, I'm apploading the file in question by uploading it here. I hope that works as well.

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Thomas,
  Thanks - that's looking the same as bug 1117335 (which is libblas rather than openblas), but I think you're in the same function.
Can you confirm exactly which version of libopenblas you have; can you do:

dpkg -l \*blas\* | cat

Thanks,

Dave

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

http://devgurus.amd.com/thread/159993 maybe - they removed 3DNow from some newer chips; my disassembly of that core of yours is that it's a femms instruction which is what that thread discusses; looks like blas makes too many assumptions.

Changed in openblas (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Thomas U. (thomas-unterthiner) wrote :

~$ dpkg -l \*blas\* | cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=========================================-==========================================-============-==============================================================================
un libblas.so <none> (no description available)
un libblas.so.3 <none> (no description available)
un libblas.so.3gf <none> (no description available)
ii libblas3 1.2.20110419-5 amd64 Basic Linear Algebra Reference implementations, shared library
un libblas3gf <none> (no description available)
ii libopenblas-base 0.1.1-6 amd64 Optimized BLAS (linear algebra) library based on GotoBLAS2
ii libopenblas-dev 0.1.1-6 amd64 Optimized BLAS (linear algebra) library based on GotoBLAS2

Additionally, I can confirm that this is the same issue as in bug 1117335. Attaching to my python test-command via gdb, I obtain the following output in GDB:

(gdb) c
Continuing.

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7fb6e2817700 (LWP 3105)]
0x00007fb6e3657220 in dgemm_otcopy_BARCELONA () from /usr/lib/libblas.so.3gf
(gdb) x/i $pc
=> 0x7fb6e3657220 <dgemm_otcopy_BARCELONA+32>: femms

(gdb) disas
Dump of assembler code for function dgemm_otcopy_BARCELONA:
   0x00007fb6e3657200 <+0>: push %r15
   0x00007fb6e3657202 <+2>: push %r14
   0x00007fb6e3657204 <+4>: push %r13
   0x00007fb6e3657206 <+6>: push %r12
   0x00007fb6e3657208 <+8>: push %rbp
   0x00007fb6e3657209 <+9>: push %rbx
   0x00007fb6e365720a <+10>: mov %rsi,%rax
   0x00007fb6e365720d <+13>: mov %rsi,%rbx
   0x00007fb6e3657210 <+16>: and $0xfffffffffffffffc,%rax
   0x00007fb6e3657214 <+20>: and $0xfffffffffffffffe,%rbx
   0x00007fb6e3657218 <+24>: imul %rdi,%rax
   0x00007fb6e365721c <+28>: imul %rdi,%rbx
=> 0x00007fb6e3657220 <+32>: femms
   0x00007fb6e3657222 <+34>: lea (%r8,%rax,8),%r13
   0x00007fb6e3657226 <+38>: lea (%r8,%rbx,8),%r12
   0x00007fb6e365722a <+42>: lea 0x0(,%rcx,8),%rcx
   0x00007fb6e3657232 <+50>: lea 0x0(,%rdi,8),%rbx
  [...]

Revision history for this message
Julian Taylor (jtaylor) wrote :

this might be fixed in 0.2.6 as it has a bulldozer dgemm kernel which does not use the femms instruction.
Can you install the package from raring and try?

Revision history for this message
Thomas U. (thomas-unterthiner) wrote :

Hi!

I can try, but as far as I can see, raring currently only has 2.5.1: http://packages.ubuntu.com/raring/libopenblas-base

But for what it's worth, I pulled the openblas source from github on the 15. Februrary and that runs without problems.

Revision history for this message
Julian Taylor (jtaylor) wrote :

sorry the syncing took longer than I expected, 0.2.6 is now available in raring-proposed:
https://launchpad.net/ubuntu/+source/openblas/0.2.6-1~exp1
it would be good if you could give the packages a try too (click on amd64 or i386 under Builds for the packages)
thanks

Revision history for this message
Thomas U. (thomas-unterthiner) wrote :

I've tried the amd64 version. Everything looks fine, no more SIGILL.

Julian Taylor (jtaylor)
Changed in openblas (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.