VT6307 IEEE1394 card causes reboot loop

Bug #2043905 reported by Beyil
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
High
linux (Ubuntu)
Fix Committed
Undecided
Unassigned

Bug Description

used the live USB for Kubuntu and UbuntuStudio 23.10 as well as upgraded to Mantic 6.5.0-10-generic from a Lunar 6.2.0-36 that works... all of the 6.5.0 64 bit kernel versions do a reboot loop prior to the splash screen display.
Processor is an AMD Ryzen 5600X, MSI MPG B550 GAMING PLUS Motherboard with 64 gig 3200 ddr4 ram. AMD Radeon RX6600 graphics...
latest BIOS firmware is installed, "fwupdmrg refresh --force && fwupdmrg update" in root terminal reports no available updates.

in the advanced options of grub I can boot into mantic with the lunar 6.2.0-36 kernel. Cell phone video capture of the bad boot included...

Further testing found that a Mantic beta July 3rd build will boot (6.3 kernel)

booting to the current live Kubuntu USB with the 6.5.0-9 kernel and removing hardware 1 piece at a time then replacing if the loop occurred found that as soon as a VT6307 PCIe card is removed the 6.5.0-9 and 6.5.0-10 kernels will boot properly. as soon as the card is reinstalled the panic reboot loop starts up again... If there was a log generated from the loop I have been unable to find one in the /var/log directory...
I have been unable to get kdump to work...

Tags: mantic
Revision history for this message
In , matthias.schrumpf (matthias.schrumpf-linux-kernel-bugs) wrote :

Kernel 6.5 causes a crash immediately after selecting it in GRUB or trying to boot with it via other means. This crash always leads into a bootloop if no other kernel is selected.

A successful boot with kernel 6.5 is impossible, so no log data could be collected.

This error can be reproduced on many distros including Arch, Endeavour OS, Manjaro, Fedora and OpenSuse.

The computer is working perfectly fine with older kernels up to and including 6.4. CPU, RAM and hard drives have all been checked thoroughly and no errors could be found.

Current OS:
Operating System: Fedora Linux 38
KDE Plasma Version: 5.27.8
KDE Frameworks Version: 5.110.0
Qt Version: 5.15.10
Kernel Version: 6.4.15-200.fc38.x86_64 (64-bit)
Graphics Platform: X11

Hardware:
Processors: 16 × AMD Ryzen 7 5800X 8-Core Processor
Memory: 62.7 GiB of RAM
Graphics Processor: AMD Radeon RX 6600 XT
Manufacturer: Micro-Star International Co., Ltd.
Product Name: MS-7C91
System Version: 1.0

Revision history for this message
In , aros (aros-linux-kernel-bugs) wrote :

Could you please bisect?

https://docs.kernel.org/admin-guide/bug-bisect.html

Otherwise this bug report has very slim chances of being fixed.

Revision history for this message
In , bagasdotme (bagasdotme-linux-kernel-bugs) wrote :

Do you have any out-of-tree modules that may cause this regression?

Revision history for this message
In , matthias.schrumpf (matthias.schrumpf-linux-kernel-bugs) wrote :

(In reply to Artem S. Tashkinov from comment #1)
> Could you please bisect?
>
> https://docs.kernel.org/admin-guide/bug-bisect.html
>
> Otherwise this bug report has very slim chances of being fixed.
I'm sorry, I don't have the basic knowledge necessary for performing a bisection.

(In reply to Bagas Sanjaya from comment #2)
> Do you have any out-of-tree modules that may cause this regression?
No, not to my knowledge. I tried this with live images or fresh installs of Arch, Endeavour OS, Manjaro, Fedora and OpenSuse and I didn't change anything beyond the settings and customizations that these distros do by default.

Revision history for this message
In , bagasdotme (bagasdotme-linux-kernel-bugs) wrote :

On 17/10/2023 04:01, <email address hidden> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217993
>
> --- Comment #3 from <email address hidden> ---
> (In reply to Artem S. Tashkinov from comment #1)
>> Could you please bisect?
>>
>> https://docs.kernel.org/admin-guide/bug-bisect.html
>>
>> Otherwise this bug report has very slim chances of being fixed.
> I'm sorry, I don't have the basic knowledge necessary for performing a
> bisection.
>

Then refer to Documentation/admin-guide/bug-bisect.rst in the kernel
sources for instructions.

> (In reply to Bagas Sanjaya from comment #2)
>> Do you have any out-of-tree modules that may cause this regression?
> No, not to my knowledge. I tried this with live images or fresh installs of
> Arch, Endeavour OS, Manjaro, Fedora and OpenSuse and I didn't change anything
> beyond the settings and customizations that these distros do by default.
>

So you have this regression there?

Revision history for this message
In , matthias.schrumpf (matthias.schrumpf-linux-kernel-bugs) wrote :

(In reply to Bagas Sanjaya from comment #4)
> So you have this regression there?

What do you mean?

Revision history for this message
In , aros (aros-linux-kernel-bugs) wrote :

Given that seemingly you're the only Linux user who has this issue, your only bet of fixing it is performing regression testing.

The URL provided above has enough information to do so.

If that's not enough, you may simply Google the appropriate questions, e.g.

1) How to compile and install the Linux kernel
2) How to install GCC in Distro_X

It's all quite easy if you get down to it. No one but you can do it unfortunately.

Revision history for this message
In , a.mark.broadworth (a.mark.broadworth-linux-kernel-bugs) wrote :

This is occurring for me as well on the generic 6.5 kernel that Ubuntu 23.10 installed.

Seems to happen early in the bootup process (at least before amdgpu is loaded).

Basic Hardware:
AMD Ryzen 7 5800X 8-Core Processor
Asus TUF GAMING X570-PLUS (BIOS 4802 06/15/2023)
64 GB RAM
Radeon 6800 XT

Demonstrated regression:
v6.4.14 - good
v6.5 - bad

I'm attempting a bisect.

Revision history for this message
In , a.mark.broadworth (a.mark.broadworth-linux-kernel-bugs) wrote :

This appears to have been introduced with:

commit dcadfd7f7c74ef9ee415e072a19bdf6c085159eb (HEAD -> dcadfd7f7c7)
Author: Takashi Sakamoto <email address hidden>
Date: Tue May 30 08:12:40 2023 +0900

    firewire: core: use union for callback of transaction completion

    In 1394 OHCI, the OUTPUT_LAST descriptor of Asynchronous Transmit (AT)
    request context has timeStamp field, in which 1394 OHCI controller
    record the isochronous cycle when the packet was sent for the request
    subaction. Additionally, for the case of split transaction in IEEE 1394,
    Asynchronous Receive (AT) request context is used for response subaction
    to finish the transaction. The trailer quadlet of descriptor in the
    context has timeStamp field, in which 1394 OHCI controller records the
    isochronous cycle when the packet arrived.

    Current implementation of 1394 OHCI controller driver stores values of
    both fields to internal structure as time stamp, while Linux FireWire
    subsystem provides no way to access to it. When using asynchronous
    transaction service provided by the subsystem, callback function is passed
    to kernel API. The prototype of callback function has the lack of argument
    for the values.

    This commit adds a new callback function for the purpose. It has an
    additional argument to point to the constant array with two elements. For
    backward compatibility to kernel space, a new union is also adds to wrap
    two different prototype of callback function. The fw_transaction structure
    has the union as a member and a boolean flag to express which function
    callback is available.

    The core function is changed to handle the two cases; with or without
    time stamp. For the error path to process transaction, the isochronous
    cycle is computed by current value of CYCLE_TIMER register in 1394 OHCI
    controller. Especially for the case of timeout of split transaction, the
    expected isochronous cycle is computed.

    Link: https://<email address hidden>
    Signed-off-by: Takashi Sakamoto <email address hidden>

Revision history for this message
In , a.mark.broadworth (a.mark.broadworth-linux-kernel-bugs) wrote :

I have a firewire card in my system. Affected kernels boot fine with the firewire card removed.

06:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev 80)

What are the next steps?

Revision history for this message
In , a.mark.broadworth (a.mark.broadworth-linux-kernel-bugs) wrote :

(The print on the chip itself reads VIA VT6307)

Revision history for this message
In , matthias.schrumpf (matthias.schrumpf-linux-kernel-bugs) wrote :

(In reply to Mark Broadworth from comment #9)
> I have a firewire card in my system. Affected kernels boot fine with the
> firewire card removed.
>
> 06:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)]
> IEEE 1394 OHCI Controller (rev 80)
>
> What are the next steps?

Oh my god. It was the Firewire card the whole time?

I also had a VIA VT6307 in my computer. I just removed it and now I can boot with the 6.5+ kernels without any issues.

I wasn't using this anyway, I just never would have expected it to cause such a problem.
Hope you find a solution that works for you.

Revision history for this message
In , mario.limonciello (mario.limonciello-linux-kernel-bugs) wrote :

*** Bug 217994 has been marked as a duplicate of this bug. ***

summary: - Mantic reboot loop before splash screen is displayed
+ VT6307 IEEE1394 card causes reboot loop
Revision history for this message
In , rjbgolding (rjbgolding-linux-kernel-bugs) wrote :

I also am experiencing this issue, reported it to
 https://bugs.launchpad.net/bugs/2043905
in the Ubuntu forums, the piece I didn't report there is that I also tried the ubuntu built 6.6 Rc4 generic 64-bit beta kernel and it was also the same problem with that 6.6 kernel... so this may also effect 6.6 as well as the 6.5 kernels.

My system is
CPU: AMD Ryzen 5600X
MB: MSI MPG Gaming Plus (Bios 7C56v1F dated 12 Oct 2023)
Ram: 64 Gig 3200 ddr4
GPU: AMD Radeon RX6600

Changed in linux:
importance: Unknown → High
status: Unknown → In Progress
Revision history for this message
Takashi Sakamoto (mocchi) wrote (last edit ):

Hi Bevil,

I'm a maintainer of Linux FireWire subsystem, and I apologize to trouble you. Indeed, we can see unexpected system reboot in the following combination:

* Linux kernel version 6.5 or later
* Any kind of AMD Ryzen machines
* A kind of IEEE 1394 host controller which consists of:
** Asmedia ASM1083 (PCI/PCIe bridge)
** VIA VT6306/6307/6308 (1394 OHCI host controller)

You can see my short description about the issue in my PR to Linus for version 6.7 kernel.
* https://<email address hidden>/

The host controller card brings some kind of hardware trouble to the machine of AMD Ryzen architecture as a result of change added to 1394 OHCI driver (firewire-ohci), while I never figure out the mechanism yet, since the system reboots without any kind of information in console output. I continue to investigate further by every single way I can think of...

At present, no workaround found. I confirm that some portion to exclude the above condition enable system work again; e.g. usage of IEEE 1394 host controller with VIA6315 or VIA6305 connected with PCI bus without the bridge chip.

Regards

Revision history for this message
Beyil (rjbgolding) wrote (last edit ):

I do have an old system (Phenom 1090T CPU) on a motherboard that has a 1394a controller built in that the 6.5.0-10 kernel works properly on... would some sort of system report from that machine help in tracking down the issue? or even a report from the system with the issue with an earlier kernel (6.2.0-36) with the card installed?? I know the phenom is a pre-ryzen so it may be totally missing a CPU command structure that is possibly effected, and the Lunar kernel which isn't effected may also be missing something even though the rest of the system is with the up to date 23.10 build.. would be interesting if somehow could run a 6.5 kernel under the 6.2 kernel to get a log of some sort. I have VirtualBox installed but that provides its own "box" and doesn't really use the current hardware...

also my motherboard has 2 PCIe slots, one connected directly to the CPU and another connected through the B550 chipset the reboot loop happens regardless of which slot is used. I mainly use the one through the B550 chipset to not restrict airflow into the GPU.

Revision history for this message
Takashi Sakamoto (mocchi) wrote :

Hi Beyil,

> I do have an old system (Phenom 1090T CPU) on a motherboard that has a
> 1394a controller built in that the 6.5.0-10 kernel works properly on...

Indeed. I also own AMD 880G chipset machine with AMD Sempron 145 CPU. It
works well even if using the issued version of kernel. Furthermore, VIA
VT6307 card without ASM1083 also works well in the machine.

> would some sort of system report from that machine help in tracking down
> the issue? or even a report from the system with the issue with an
> earlier kernel (6.2.0-36) with the card installed?? I know the phenom
> is a pre-ryzen so it may be totally missing a CPU command structure that
> is possibly effected, and the Lunar kernel which isn't effected may also
> be missing something even though the rest of the system is with the up
> to date 23.10 build.. would be interesting if somehow could run a 6.5
kernel under the 6.2 kernel to get a log of some sort.

Thanks for your suggestion, while the problem occurs due to operations
which is not so special, at least, it is not unique instructions supported
by AMD Ryzen CPU. The cause is to access to registers by the way of
standard PCI express way in some situations.

> I have VirtualBox installed but that provides its own "box" and doesn't
> really use the current hardware...

In the case that we utilize PCI passthrough to bind the issued 1394 OHCI
card to guest system, I note that the guest system can bring the system
reboot to host system, if the guest system run with v6.5 kernel or later.

> also my motherboard has 2 PCIe slots, one connected directly to the
> CPU and another connected through the B550 chipset the reboot loop
> happens regardless of which slot is used. I mainly use the one through
> the B550 chipset to not restrict airflow into the GPU.

Indeed. I also experienced that changing PCIe slot is helpless to solve
the issue.

Revision history for this message
In , regressions (regressions-linux-kernel-bugs) wrote :

FWIW, it's a known issue that most likely still happens with mailine; for details see this msg and the two replies to it: https://<email address hidden>/

Revision history for this message
Beyil (rjbgolding) wrote (last edit ):

Had a slightly different outcome with the latest kernel (6.5.0-13), I put the card in to transfer some data from the firewire device, went to use the 6.2 kernel but didn't hit the shift in time to get the advanced grub menu to select it. I got a "Kernel panic unable to locate UUID=" message, and the machine went into a hardlock state (keyboard was unresponsive). Used the reset button to reboot into kernel 6.2, located that the UUID it references is for / . rebooted back into the 6.5 kernel and its back to the reboot loop.

I have been unable to get it to reproduce the "unable to locate" message.

As soon as the 1394 card is removed it again works on the 6.5 kernel

No log file found.
My VT6307 card has the ASM1083 chip on it.

Revision history for this message
Takashi Sakamoto (mocchi) wrote :

Hi,

The change for 1394 OHCI driver, aimed at suppressing the unexpected
system reboot in AMD Ryzen machine[1], has been merged into Linux kernel
v6.7[2]. It has also been applied to the following releases of stable and
longterm kernels.

* 6.6.11[3]
* 6.1.72[4]
* 5.15.147[5]
* 5.10.208[6]
* 5.4.267[7]
* 4.19.305[8]
* 4.14.336[9]

Once the downstream distribution project provides the corresponding kernel
packages, you should no longer encounter the unexpected system reboot.

Note that the following combination of hardware is not necessarily suitable,
depending on your use case:

* Any type of AMD Ryzen machine
* 1394 OHCI hardware consists of:
    * Asmedia ASM1083/1085
    * VIA VT6306/6307/6308

When working with time-aware protocol, such as audio sample processing, it
is advisable to avoid the combination. The change accompanies a functional
limitation that the software stack does not provides precise hardware time
in this case.

If you choose to continue using AMD Ryzen machine, the recommendation is
to replace the 1394 OHCI hardware with another one. Conversely, if you
choose to continue using the 1394 OHCI hardware, the recommendation is to
use the machine provided by vendors other than AMD.

Thanks for your report and long patience.

[1] https://git.kernel.org/torvalds/linux/c/ac9184fbb847
[2] https://lore.kernel<email address hidden>/
[3] https://lore.kernel.org/lkml/2024011058-sheep-thrower-d2f8@gregkh/
[4] https://lore.kernel.org/lkml/2024011052-unsightly-bronze-e628@gregkh/
[5] https://lore.kernel.org/lkml/2024011541-defective-scuff-c55e@gregkh/
[6] https://lore.kernel.org/lkml/2024011532-lustiness-hybrid-fc72@gregkh/
[7] https://lore.kernel.org/lkml/2024011519-mating-tag-1f62@gregkh/
[8] https://lore.kernel.org/lkml/2024011508-shakiness-resonant-f15e@gregkh/
[9] https://lore.kernel.org/lkml/2024011046-ecology-tiptoeing-ce50@gregkh/

Thanks

Takashi Sakamoto

Changed in linux (Ubuntu):
status: New → Fix Committed
Revision history for this message
In , mario.limonciello (mario.limonciello-linux-kernel-bugs) wrote :
Changed in linux:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.