lockf returns false-positive EDEADLK in multiprocess multithreaded environment

Bug #1926481 reported by Ivan Zuboff
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

As a developer, I found a counter-intuitive behavior in lockf function provided by glibc and Linux kernel that is likely a bug.

In glibc, lockf function is implemented on top of fcntl system call: https://github.com/lattera/glibc/blob/master/io/lockf.c
man page says that lockf can sometimes detect deadlock: http://manpages.ubuntu.com/manpages/xenial/man3/lockf.3.html
Same with fcntl(F_SETLKW), on top of which lockf is implemented: http://manpages.ubuntu.com/manpages/hirsute/en/man3/fcntl.3posix.html

Deadlock detection algorithm (https://github.com/torvalds/linux/blob/master/fs/locks.c) seems buggy because it can easily give false positives. Suppose we have two processes A and B, process A has threads 1 and 2, process B has threads 3 and 4. When this processes execute concurrently, following sequence of actions is possible:
1. processA thread1 gets lockI
2. processB thread2 gets lockII
3. processA thread3 tries to get lockII, starts to wait
4. processB thread4 tries to get lockI, kernel detects deadlock, EDEADLK is returned from lockf function

Steps to reproduce this scenario (see attached file):
1. gcc -o edeadlk ./edeadlk.c -lpthread
2. Launch "./edeadlk a b" in first terminal window.
3. Launch "./edeadlk a b" in second terminal window.

What I expected to happen: two instances of the program is steadily working.

What happened instead:
Assertion failed: (lockf(fd, 1, 1)) != -1 file: ./edeadlk.c, line:25, errno:35 . Error:: Resource deadlock avoided
Aborted (core dumped)

Surely, this behavior is kind of "right". lockf file locks belongs to process, so on the process level it seems that deadlock is just about to happen: process A holds lockI and waits for lockII, process B holds lockII and is going to wait for lockI. However, algorithm in kernel doesn't take threads into account. In fact, deadlock is not gonna happen here if thread scheduler will give control to some thread holding a lock.

I think there's a problem with deadlock detection algorithm because it's overly pessimistic, which in turn creates problems -- lockf errors in applications.

# lsb_release -rd
Description: Ubuntu 20.04.1 LTS
Release: 20.04

# uname -a
Linux test-x64-ub20 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 ноя 12 09:53 seq
 crw-rw---- 1 root audio 116, 33 ноя 12 09:53 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.4
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: pass
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2020-07-22 (280 days ago)
InstallationMedia: Ubuntu-Server 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 004: ID 0a89:0030 Aktiv Rutoken ECP
 Bus 002 Device 003: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
 Bus 002 Device 002: ID 0e0f:0003 VMware, Inc. Virtual Mouse
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Lsusb-t:
 /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
     |__ Port 1: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 12M
     |__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/7p, 12M
         |__ Port 1: Dev 4, If 0, Class=Chip/SmartCard, Driver=, 12M
 /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/6p, 480M
MachineType: VMware, Inc. VMware Virtual Platform
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=ru_RU.UTF-8
 SHELL=/bin/bash
ProcFB: 0 svgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-42-generic root=UUID=1d0fe562-8128-4922-836e-b0a384fc5054 ro maybe-ubiquity
ProcVersionSignature: Ubuntu 5.4.0-42.46-generic 5.4.44
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-42-generic N/A
 linux-backports-modules-5.4.0-42-generic N/A
 linux-firmware 1.187.2
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal uec-images
Uname: Linux 5.4.0-42-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 09/21/2015
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd09/21/2015:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1926481

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: focal
Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : CRDA.txt

apport information

tags: added: apport-collected uec-images
description: updated
Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : Lspci.txt

apport information

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : Lspci-vt.txt

apport information

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : Lsusb-v.txt

apport information

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : ProcModules.txt

apport information

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : UdevDb.txt

apport information

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote :

Posted info to LKML with topic "lockf returns false-positive EDEADLK in multiprocess multithreaded environment": https://lkml.org/lkml/2021/5/15/287

Revision history for this message
Ivan Zuboff (anotherdiskmag) wrote :

Posted to more specific mailing list: linux-fsdevel.vger.kernel.org
https://lore.kernel.org/linux<email address hidden>/T/#u

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.