rsync hangs on select(5, [], [4], [], {60, 0}

Bug #1528921 reported by David Lawrence
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
rsync
Unknown
Unknown
rsync (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Fix Released
Medium
Miriam España Acebal
Focal
Fix Released
Medium
Miriam España Acebal

Bug Description

[Impact]

What the user suffering from this bug experiences is that the big amount of informative messages related to the copy process with the three spawned processes(sender, receiver and generator) exhaust the I/O buffer and the sync gets stuck, either because there are too many files to synchronise and/or because too many detail messages (levels of verbose mode) have been requested in the output.

The fix, that comes from upstream and is applied there since version 3.2.0., increments the size of the receiver's I/O buffer.

[Test Plan]
This test plan is for Focal, but it's the same for Bionic.

0.Preparing the test environment:

#Preparing the container
lxc launch images:ubuntu/focal rsync-iobuffer-focal
lxc shell rsync-iobuffer-focal
apt update -y
apt upgrade -y

#Installing necessary tools
apt install rsync

#Get test cases from comments #16 and #19 on this LP bug: As test case #16 covers both aspects (a lot of files and upper verbosity) and test #19 uses a huge tarball (120 Mb), I'm removing from this SRU the #19 scenario (but, please, feel to reach me it if you consider it necessary and I'll provide the steps and bad/good scenarios).

cd /tmp/

#16
Paste the contents of https://pastebin.com/raw/ctzJJGwt:

#!/bin/bash
mkdir source_dir
pushd source_dir
dd if=/dev/zero of=source bs=600K count=1

for i in `seq 1 11500`;
do
    cp -v source file_$i;
done

rm source

for i in `seq 1 10`;
do
     dd if=/dev/zero of=file_large_$i bs=200M count=1
done

popd

echo "Created 11500 files with size 600K and 10 files with size 200M, try the following command:"
echo "rsync -avvvz --delete source_dir target_dir"

in a new file script_comment16.sh

chmod +x script_comment16.sh
./script_comment16.sh

1. Bad cases (without and with using strace):

# Scenario from comment 16
$ rsync -avvvz --delete source_dir target_dir
sending incremental file list
[sender] make_file(source_dir,*,0)
send_file_list done
[sender] pushing local filters for /root/source_dir/
[sender] make_file(source_dir/file_3048,*,2)
[sender] make_file(source_dir/file_11358,*,2)
[sender] make_file(source_dir/file_5914,*,2)
[sender] make_file(source_dir/file_5880,*,2)
[sender] make_file(source_dir/file_9318,*,2)
[sender] make_file(source_dir/file_5539,*,2)
[...]
sending file_sum
false_alarms=0 hash_hits=0 matches=0
sender finished source_dir/file_10807
send_files(903, source_dir/file_10808)
send_files mapped source_dir/file_10808 of size 614400
calling match_sums source_dir/file_10808
source_dir/file_10808

It hangs here, where using strace we can see:

$ strace rsync -avvvz --delete source_dir target_dir
source_dir/file_11280
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 262144) = 262144
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 262144) = 262144
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 90112) = 90112
select(6, [5], [4], [5], {tv_sec=60, tv_usec=0}) = 1 (in [5], left {tv_sec=59, tv_usec=999996})
read(5, "\0\0\0\0\0\0\0\1\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\240\0\0\0"..., 1900) = 1900
select(5, [], [4], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
select(5, [], [4], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
select(5, [], [4], [], {tv_sec=60, tv_usec=0}

1. Good cases:

# Scenario from comment 16

$ rsync -avvvz --delete source_dir target_dir
sending incremental file list
[sender] make_file(source_dir,*,0)
send_file_list done
[sender] pushing local filters for /tmp/source_dir/
[sender] make_file(source_dir/file_3052,*,2)
[sender] make_file(source_dir/file_1766,*,2)
[sender] make_file(source_dir/file_10466,*,2)
[sender] make_file(source_dir/file_9375,*,2)
[sender] make_file(source_dir/file_7260,*,2)
[sender] make_file(source_dir/file_5554,*,2)
[sender] make_file(source_dir/file_5523,*,2)
[sender] make_file(source_dir/file_1685,*,2)
[sender] make_file(source_dir/file_7217,*,2)
[sender] make_file(source_dir/file_10411,*,2)
[...]
generate_files finished

sent 9,555,678 bytes received 3,599,560 bytes 124,694.20 bytes/sec
total size is 9,162,752,000 speedup is 696.51

[Where problems could occur]

Perhaps the buffer size may not be sufficient for an operation involving a very huge amount of data, as reported upstream here ( https://bugzilla.samba.org/show_bug.cgi?id=11166#c9 ): the reporter asked for a parameter to manually change the size of the receiver's buffer I/O for messaging for those cases, but it may lend itself to a memory overflow more easily even than this solution which also has this risk.

[Other Info]

The fix has been picked from upstream here https://github.com/WayneD/rsync/commit/01b9bbb0f94e4ddd4b1e3ad678b220ca09bc0ad0 . I've checked on bugzilla that there aren't bugs related to "out of memory" issues after 3.2.0 release.

[Original Report]
-------------------------------------------------------------------
In the last few months my home directory backup stopped completing. I've been able to reproduce the problem on a single subdirectory although I had to add the --debug=all flag to reproduce it on that smaller directory. Specifically, this command never completes:

rsync --debug=all -avz /tmp/html2 /tmp/rsynctest/

The html2 directory is a copy of gnuradio-3.7.8.1/build/docs/doxygen/html .

When I strace the command, I see this:
write(1, "sender finished /tmp/html2/atsc_"..., 58sender finished /tmp/html2/atsc__interleaver_8h__incl.md5
) = 58
write(1, "send_files(338, /tmp/html2/atsc_"..., 59send_files(338, /tmp/html2/atsc__interleaver_8h__incl.png)
) = 59
open("html2/atsc__interleaver_8h__incl.png", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0664, st_size=264657, ...}) = 0
write(1, "html2/atsc__interleaver_8h__incl"..., 37html2/atsc__interleaver_8h__incl.png
) = 37
read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\n\253\0\0\2\233\10\6\0\0\0h\242\""..., 262144) = 262144
select(6, [5], [4], [5], {60, 0}) = 2 (in [5], out [4], left {59, 999996})
read(5, "\0\0\0\0\0\0\0\1\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\240\0\0\0"..., 95) = 95
write(4, "r\311\0\7\177\377\232\237\264\272e\300\300\240\316\264&\314\301\252*\37\256y\225g\373^\315j\370\350"..., 51574) = 51574
select(5, [], [4], [], {60, 0}) = 1 (out [4], left {59, 999997})
write(4, "\7\320\0\7\177\377\234|\7X\223Y\273\255c\27\25f\306\212\202\214#E\272\212t\1\225A\fU"..., 53259) = 53259
select(5, [], [4], [], {60, 0}

The select command times out over and over. I get the same behavior when trying to back up my entire home directory but I don't need the --debug=all flag in that case.

lsb_release -rd
Description: Ubuntu 14.04.3 LTS
Release: 14.04

apt-cache policy rsync
rsync:
  Installed: 3.1.0-2ubuntu0.1
  Candidate: 3.1.0-2ubuntu0.1
  Version table:
 *** 3.1.0-2ubuntu0.1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main i386 Packages
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main i386 Packages
        100 /var/lib/dpkg/status
     3.1.0-2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main i386 Packages

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: rsync 3.1.0-2ubuntu0.1
ProcVersionSignature: Ubuntu 3.13.0-74.118-generic 3.13.11-ckt30
Uname: Linux 3.13.0-74-generic i686
NonfreeKernelModules: nvidia
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: i386
CurrentDesktop: KDE
Date: Wed Dec 23 09:44:17 2015
EcryptfsInUse: Yes
InstallationDate: Installed on 2010-09-18 (1922 days ago)
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Beta i386 (20100901.1)
SourcePackage: rsync
UpgradeStatus: Upgraded to trusty on 2014-12-27 (361 days ago)

Related branches

Revision history for this message
David Lawrence (david-gary-lawrence) wrote :
Revision history for this message
David Lawrence (david-gary-lawrence) wrote :

The same problem happens on a 15.04 system:

lsb_release -rd
Description: Ubuntu 15.04
Release: 15.04
stl@stl6:/tmp$ apt-cache policy rsync
rsync:
  Installed: 3.1.1-3
  Candidate: 3.1.1-3
  Version table:
 *** 3.1.1-3 0
        500 http://us.archive.ubuntu.com/ubuntu/ vivid/main amd64 Packages
        100 /var/lib/dpkg/status

Revision history for this message
David Lawrence (david-gary-lawrence) wrote :

The same problem also happens with rsync from today's git HEAD and bc58313bf7f42f0993.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in rsync (Ubuntu):
status: New → Confirmed
Revision history for this message
Rolf Leggewie (r0lf) wrote :

I believe I am seeing the same thing in a Xenial live CD. I am rsyncing some files and it seems to hang at very long filenames (see also bug 853243).

Here is my strace

[pid 6718] open("@home/.ecryptfs/leggewie/.Private/ECRYPTFS_FNEK_ENCRYPTED.FWaB.wglBSBqL-S0.GolcOiWlN9QsFFhUYxIOE8BunOtEkZy9GYiELzwV---/ECRYPTFS_FNEK_ENCRYPTED.FWaB.wglBSBqL-S0.GolcOiWlN9QsFFhUYxISb08KdmZleYyQQ1I95Yia---/ECRYPTFS_FNEK_ENCRYPTED.FXaB.wglBSBqL-S0.GolcOiWlN9QsFFhUYxIYNK1A5XdKl-lyemTrjUWJJGGQlgIy34eRr2x1j.TO0A-/ECRYPTFS_FNEK_ENCRYPTED.FWaB.wglBSBqL-S0.GolcOiWlN9QsFFhUYxI3NSHFt8OM6gX7ukm8DHAOE--/ECRYPTFS_FNEK_ENCRYPTED.FWaB.wglBSBqL-S0.GolcOiWlN9QsFFhUYxIPBMMS43iVqSCxqg7UMcDek--/ECRYPTFS_FNEK_ENCRYPTED.FYaB.wglBSBqL-S0.GolcOiWlN9QsFFhUYxIwyvVprx9USwuR1ZKdjg9COGVtAWJlxBGiAyA2e7BSu1yVD-TA1R4VNVEoD.nsWwR/ECRYPTFS_FNEK_ENCRYPTED.FWaB.wglBSBqL-S0.GolcOiWlN9QsFFhUYxIK6Ea9AYOxoE5onGKlG8ti---/ECRYPTFS_FNEK_ENCRYPTED.FWaB.wglBSBqL-S0.GolcOiWlN9QsFFhUYxIZxxEdH0i.fdcVkoJ7fTSo---", O_RDONLY) = 3
[pid 6718] fstat(3, {st_mode=S_IFREG|0644, st_size=12288, ...}) = 0
[pid 6718] write(1, "@home/.ecryptfs/leggewie/.Privat"..., 774) = 774
[pid 6718] read(3, 0x5613ff3fb9f0, 12288) = -1 EIO (Input/output error)
[pid 6718] select(6, [5], [4], [5], {60, 0}) = 1 (in [5], left {59, 999996})
[pid 6718] read(5, "eYyQQ1I95Yia---/ECRYPTFS_FNEK_EN"..., 76) = 76
[pid 6718] select(5, [], [4], [], {60, 0} <unfinished ...>
[pid 6719] <... select resumed> ) = 0 (Timeout)
[pid 6719] select(2, [], [1], [], {60, 0} <unfinished ...>
[pid 6720] <... select resumed> ) = 0 (Timeout)
[pid 6720] select(5, [], [4], [], {60, 0} <unfinished ...>
[pid 6718] <... select resumed> ) = 0 (Timeout)

Changed in rsync (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Rolf Leggewie (r0lf) wrote :
Revision history for this message
Rolf Leggewie (r0lf) wrote :

David, are you sure that the file html2/atsc__interleaver_8h__incl.png is readable? It seems that in my case, rsync chokes on corrupted files on a btrfs partition. "md5sum $file" gives a read error in my case, for example. It seems that throws off rsync.

Revision history for this message
David Lawrence (david-gary-lawrence) wrote : Re: [Bug 1528921] Re: rsync hangs on select(5, [], [4], [], {60, 0}

On Thu, Aug 25, 2016 at 9:45 AM, Rolf Leggewie <email address hidden>
wrote:

> David, are you sure that the file html2/atsc__interleaver_8h__incl.png
> is readable? It seems that in my case, rsync chokes on corrupted files
> on a btrfs partition. "md5sum $file" gives a read error in my case, for
> example. It seems that throws off rsync.
>

Yea, it's readable:

md5sum /tmp/html2/atsc__interleaver_8h__incl.png
8b1df160420124ed28fb440b112a212a /tmp/html2/atsc__interleaver_8h__incl.png

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
I was parsing through issues that had no update for too much time.
I think the issue this gets no further activity is that there is no reproducer in a sense of "this is the way to reproduce" - it is confirmed which reflects multiple people hitting it and likely to exist which is just right.
But at the current state it is not really actionable for ~ubuntu-server, so for now I'm unsubscribing it for now. If there is any new insight it will show up on triage of the team anyway.

Revision history for this message
Sir Pamelton (spzm) wrote :
Download full text (4.8 KiB)

A few weeks ago one of my five rsync backup processes that runs from a daily cron 03:00 am has started failing with this exact issue.

I can only manually reproduce it when executing my rsync command with tripe verbosity:

rsync -avvv --delete /source_dir /target_dir

Manually calling rsync with less than triple verbosity runs the whole process as expected. However the hanging command used in cron is

$ rsync -aq --delete /source_dir /target_dir > /dev/null

$ strace -p 16857
16857 16:38:37 lstat("Saved/SavedArks/retracted.profilebak", {st_mode=S_IFREG|0600, st_size=22986, ...}) = 0
16857 16:38:37 write(1, "[sender] make_file(Saved/SavedArks/retracted.profilebak,*,2)\n", 69) = 69
16857 16:38:37 lstat("Saved/SavedArks/retracted.arkprofile", {st_mode=S_IFREG|0600, st_size=1810, ...}) = 0
16857 16:38:37 write(1, "[sender] make_file(Saved/SavedArks/retracted.arkprofile,*,2)\n", 69) = 69
16857 16:38:37 select(5, [], [4], [], {60, 0} <unfinished ...>
16858 16:39:36 <... select resumed> ) = 0 (Timeout)
16858 16:39:36 select(2, [], [1], [], {60, 0} <unfinished ...>
16859 16:39:37 <... select resumed> ) = 0 (Timeout)
16859 16:39:37 select(5, [], [4], [], {60, 0} <unfinished ...>
16857 16:39:37 <... select resumed> ) = 0 (Timeout)
16857 16:39:37 select(5, [], [4], [], {60, 0}

$ lsof -p 15678 -ad 4,5 # note: I used multiple processes over multiple days, the PID will not be constant
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rsync 15678 pamel 4u unix 0xffff8f88f9fec400 0t0 44294 type=STREAM
rsync 15678 pamel 5u unix 0xffff8f88f9614800 0t0 44295 type=STREAM

$ ls -l /proc/15678/fd/4
lrwx------ 1 pamel pamel 64 Mar 6 13:35 /proc/15678/fd/4 -> socket:[44294]
$ ls -l /proc/15678/fd/5
lrwx------ 1 pamel pamel 64 Mar 6 13:35 /proc/15678/fd/5 -> socket:[44295]

$ ss -nxp | grep -E '44294|44295'
u_str ESTAB 0 59136 * 44296 * 44295 users:(("rsync",pid=15679,fd=1))
u_str ESTAB 0 97024 * 44294 * 44293 users:(("rsync",pid=15678,fd=4))
u_str ESTAB 82063 0 * 44293 * 44294 users:(("rsync",pid=15680,fd=0))
u_str ESTAB 53885 0 * 44295 * 44296 users:(("rsync",pid=15678,fd=5))

I have this issue on Ubuntu 17.04 with rsync 3.1.2, Debian 9.3 with rsync 3.1.2 and 3.1.3 (all as VM on ESXi 6.5). The only constants are the files that are being transferred: 6.6GB over 11'437 files. I am also certain that it does not hang on a specific file as it's sometimes a log file and sometimes a profile. I would say it happens 0.5s into the process after transferring about 44 files.

$ rsync -avvv --delete /srv/ark/redacted/ShooterGame/Saved /srv/ark/backup/redacted/
[...]
[sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2)
[sender] make_file(Saved/SavedArks/redacted.arktribe,*,2)
[sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2)
send_files(29, /srv/ark/redacted/ShooterGame/Saved/Logs)
Saved/Logs/
send_files(30, /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log)
send_files mapped /srv/ark/redacted/ShooterGam...

Read more...

Revision history for this message
David Britton (dpb) wrote :

Thanks for the nice update on the bug. This is clearly something you can reproduce and are able to trigger. However, I agree with your analysis that it could be a network configuration issue, but one that requires some deep support in order to get to the bottom of.

I suggest that you carry on debugging, and even use the bug for more debugging information. As a suggestion, you might get more visibility for deep support on rsync issues at:

https://askubuntu.com/

or at an rsync specific forum. I will mark this as Incomplete for now, if you feel there is an actionable bug to fix in rsync, you can always re-open or ask on this bug.

Changed in rsync (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Rolf Leggewie (r0lf) wrote :

What's the purpose of setting to incomplete when there is no question to be completed by the OP? To hopefully expire it and close another ticket without fixing the bug? What a silly thing.

Changed in rsync (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Rolf Leggewie (r0lf) wrote (last edit ):

Furthermore, in my case at least, no network is involved, so this is not a network issue (could be a timeout issue, though),

Revision history for this message
David Britton (dpb) wrote :

As @spzm seems to have a reproduction in a very specific case and was asking for next steps, I suggest engagement of rsync upstream:

https://pserver.samba.org/rsync/lists.html

The level of deep rsync expertise necessary to turn this problem into something actionable is likely watching over there. If a thread is started, updating this bug with a link to the thread would be helpful.

Revision history for this message
Sir Pamelton (spzm) wrote :

I have here a script with which I was able to reproduce this issue on Debian 7, Debian 9, Ubuntu 17.04 for rsync 3.1.2 (Ubuntu 17.04 apt repo) and 3.1.3 (built from source).

https://pastebin.com/ctzJJGwt

It essentially creates a folder with 11.5k files of size 600KB and 10 files of size 200MB, however I think those 10 files are not necessary to trigger the fault.

Using this command

rsync -avvvz --delete source_dir target_dir

I've also built rsync 3.0.9 which runs this command fine.

I suppose at this point it's a rsync issue, I will submit this to the rsync bug tracker. For now using rsync 3.0.9 as a workaround is fine for me. I appreciate any feedback if anyone else is able to reproduce this issue using the supplied script.

Revision history for this message
opi (opi-gmx) wrote :

From the WSL Users, they have luck with the -w option. But it doesn't help on my Ubuntu 18.04 system.
https://github.com/Microsoft/WSL/issues/2138

Revision history for this message
David Lawrence (david-gary-lawrence) wrote :

I am still able to reproduce this problem with rsync 3.1.3 with the exact same test case about 3 years after I originally reported it. I'm running Ubuntu 18.04 which comes with 3.1.2, but I compiled 3.1.3 from source and get the same behavior. I'm also able to reproduce the problem (although it hangs on a different file), on an independent 18.04.1 system from the one that I originally observed the problem. On the other system, I see this strace output:
strace rsync --debug=all -avz /tmp/html2 /tmp/rsynctest/
<snip>
read(3, "db1567566d32ccb07257a842a3bdc8a5", 32) = 32
close(3) = 0
write(1, "sender finished /tmp/html2/atsc_"..., 50sender finished /tmp/html2/atsc__pad_8h__incl.md5
) = 50
write(1, "send_files(343, /tmp/html2/atsc_"..., 51send_files(343, /tmp/html2/atsc__pad_8h__incl.png)
) = 51
open("html2/atsc__pad_8h__incl.png", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=280274, ...}) = 0
write(1, "html2/atsc__pad_8h__incl.png\n", 29html2/atsc__pad_8h__incl.png
) = 29
read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\n\333\0\0\2\345\10\6\0\0\0\257\303\313"..., 262144) = 262144
select(6, [5], [4], [5], {60, 0}) = 2 (in [5], out [4], left {59, 999998})
read(5, "\0\0\0\0\0\0\0\1\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\240\0\0\0"..., 95) = 95
write(4, "z\375\0\7\177\377\217\246\253\200\252\252\351\3324H\203t\267H\203t7\10(\35\322)!Hw7"..., 64894) = 64894
select(5, [], [4], [], {60, 0}) = 1 (out [4], left {59, 999998})
write(4, "\202\312\0\7\177\377|\275\5XU[\3245|\24\5i\224\20I\25A:\244[\244\21\tI\351\356"..., 51846) = 51846
select(5, [], [4], [], {60, 0}) = 0 (Timeout)
select(5, [], [4], [], {60, 0}) = 0 (Timeout)
select(5, [], [4], [], {60, 0}

Revision history for this message
David Lawrence (david-gary-lawrence) wrote :

This is a tarball of the directory I have used to reproduce the problem. For me, reproducing the bug is as simple as:

cd /tmp/
tar -xvzf html2.tgz
mkdir /tmp/rsynctest/
strace rsync --debug=all -avz /tmp/html2 /tmp/rsynctest/

Revision history for this message
Stoo Davies (stooneb) wrote :

We recently had the same problem, and it appears it could be due to the debug output.
Calling rsync with -vvv hangs at the select, but with -vv (or less than 2) it works.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

In what rsync version and ubuntu release did you see it, @stooneb?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Has anybody engaged with upstream, as suggested by david in comment #15? Filing a bug with Debian would also help, as we are carrying the same package with no ubuntu modifications.

Revision history for this message
Adam Purkrt (adam-purkrt-deactivatedaccount) wrote :

I found these two probably related bug reports:
https://bugzilla.samba.org/show_bug.cgi?id=13913
https://bugzilla.samba.org/show_bug.cgi?id=13109

I am hitting this kind of bug when rsyncing a maildir (36000 files).

Revision history for this message
Adam Purkrt (adam-purkrt-deactivatedaccount) wrote :

# rsync --version
rsync version 3.1.3 protocol version 31

Revision history for this message
Paride Legovini (paride) wrote :

Thank you for your feedback. @Stoo: if `rsync -vvv` is the culprit then [1] is the right upstream bug, however there are so many (mostly stale) "rsync hangs" upstream bugs [2] that I'm not sure which one is the correct one to follow-up to here. The command in the description of this report only has a single -v after all.

I think it won't be easy to make progress here without a reliable way to reproduce the issue and without the involvement of the upstream developers. What I suggest is to try to narrow the steps needed to reproduce the hang, check if it still affects the latest upstream release, and try to make the most appropriate bug move forward.

[1] https://bugzilla.samba.org/show_bug.cgi?id=11166
[2] https://bugzilla.samba.org/show_bug.cgi?id=11166#c2

Revision history for this message
ipkpjersi (ipkpjersi) wrote :

I can replicate this issue on Ubuntu 18.04 with kernel 4.15.0-101-generic x86_64 and rsync 3.1.2 protocol version 31.

As a workaround for rsync verbose logging with the -vvv flag, you can instead pass rsync to strace for tracing the system calls and signals made by rsnc by running "sudo strace rsync". You can still see the commands and params that are being used by rsync much like -vvv except strace does not cause it to get stuck syncing plus if you have some other sort of timeout, strace will also show that.

I hope this is helpful to at least one person reading this looking for a workaround.

Revision history for this message
Miriam España Acebal (mirespace) wrote :

Hi,

I checked this bug is still present in Bionic and Focal (versions here are 3.1.?) but neither in Jammy nor in Impish (versions here are > 3.2.?). I used as steps to reproduce instructions in comments #16 and #19.

As per comment 7 in upstream's bug 11166 [1], it seems this is fixed in 3.2.0 version, and in the changelog of that version we can see "Avoid a hang when an overabundance of messages clogs up all the I/O buffers." [2] and the fix seems to be here [3].

[1] https://bugzilla.samba.org/show_bug.cgi?id=11166#c7
[2] https://download.samba.org/pub/rsync/NEWS#3.2.0
[3] https://github.com/WayneD/rsync/commit/01b9bbb0f94e4ddd4b1e3ad678b220ca09bc0ad0

tags: added: server-todo
Revision history for this message
Rolf Leggewie (r0lf) wrote :

Thanks, Miriam. Your bug work is appreciated!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Miriam - thanks for the check of this case (FYI we are trying to recheck many old forgotten bugs). Seems this one is a great candidate for a fix now since we have a testcase, a fix and interested community.

Given this is open for quite some time there is no need to rush it, but as time permits it would be great to come up with something to get this case forward.

I agree that it thereby shall go onto our todo list and we will look for someone to work on it for a backport-try and a PPA for this.

Changed in rsync (Ubuntu Bionic):
assignee: nobody → Miriam España Acebal (mirespace)
Changed in rsync (Ubuntu Focal):
assignee: nobody → Miriam España Acebal (mirespace)
Revision history for this message
Miriam España Acebal (mirespace) wrote :

Thanks @r0lf! Thanks @Christian! I will take care of this bug.

description: updated
Changed in rsync (Ubuntu Focal):
status: New → In Progress
description: updated
description: updated
description: updated
Changed in rsync (Ubuntu Bionic):
status: New → In Progress
Revision history for this message
Robie Basak (racb) wrote :

Upstream 3.2.0 includes the identified upstream fix and Jammy has 3.2.3-8ubuntu2 so this is Fix Released for the development release. Impish is also > 3.2.0 so only Focal and older are affected.

Changed in rsync (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello David, or anyone else affected,

Accepted rsync into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/rsync/3.1.3-8ubuntu0.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in rsync (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Changed in rsync (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Robie Basak (racb) wrote :

Hello David, or anyone else affected,

Accepted rsync into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/rsync/3.1.2-2.1ubuntu1.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (rsync/3.1.2-2.1ubuntu1.3)

All autopkgtests for the newly accepted rsync (3.1.2-2.1ubuntu1.3) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

lxc/3.0.3-0ubuntu1~18.04.1 (ppc64el)
s3ql/2.26+dfsg-4 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#rsync

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (rsync/3.1.3-8ubuntu0.2)

All autopkgtests for the newly accepted rsync (3.1.3-8ubuntu0.2) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

dgit/9.10 (arm64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#rsync

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Miriam España Acebal (mirespace) wrote :

All tests passed already.

Revision history for this message
Miriam España Acebal (mirespace) wrote :

Verification for Focal... OK:

-> Availability from proposed

root@focal:~# apt list --upgradable -a | grep rsync

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

rsync/focal-proposed 3.1.3-8ubuntu0.2 amd64 [upgradable from: 3.1.3-8ubuntu0.1]
rsync/focal-updates,now 3.1.3-8ubuntu0.1 amd64 [installed,upgradable to: 3.1.3-8ubuntu0.2]
rsync/focal 3.1.3-8 amd64

-> After installing, executing the test for scenario 16:

root@focal:~# rsync -avvvz --delete source_dir target_dir
.
.
send files finished
total: matches=0 hash_hits=0 false_alarms=0 data=9162752000
recv_files phase=2
recv_files finished
generate_files phase=3
generate_files finished

sent 9,750,166 bytes received 3,805,745 bytes 186,978.08 bytes/sec
total size is 9,162,752,000 speedup is 675.92
[sender] _exit_cleanup(code=0, file=main.c, line=1207): about to call exit(0)
root@nagios-focal:~#

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Miriam España Acebal (mirespace) wrote :

Verification for Focal... OK:

-> Availability from proposed:
root@nagios:~# apt list --upgradable -a | grep rsync

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

rsync/focal-proposed 3.1.3-8ubuntu0.2 amd64 [upgradable from: 3.1.3-8ubuntu0.1]
rsync/focal-updates,now 3.1.3-8ubuntu0.1 amd64 [installed,upgradable to: 3.1.3-8ubuntu0.2]
rsync/focal 3.1.3-8 amd64

-> After installing, executing test for scenario 16:
root@nagios:~# rsync -avvvz --delete source_dir target_dir
.
.
.
send files finished
total: matches=0 hash_hits=0 false_alarms=0 data=9162752000
recv_files phase=2
recv_files finished
generate_files phase=3
generate_files finished

sent 9,750,166 bytes received 3,805,745 bytes 186,978.08 bytes/sec
total size is 9,162,752,000 speedup is 675.92
[sender] _exit_cleanup(code=0, file=main.c, line=1207): about to call exit(0)
root@nagios:~#

Revision history for this message
Miriam España Acebal (mirespace) wrote :

Verification for Bionic... OK:

-> Availability from proposed:

root@bionic:/tmp# apt-cache policy rsync
rsync:
  Installed: 3.1.2-2.1ubuntu1.3
  Candidate: 3.1.2-2.1ubuntu1.3
  Version table:
 *** 3.1.2-2.1ubuntu1.3 500
        500 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     3.1.2-2.1ubuntu1.2 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
     3.1.2-2.1ubuntu1.1 500
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
     3.1.2-2.1ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages

-> After installing, executing test for scenario 16:
root@bionic:/tmp#
.
.
.
send_files phase=2
send files finished
total: matches=0 hash_hits=0 false_alarms=0 data=9162752000
recv_files phase=2
recv_files finished
generate_files phase=3
generate_files finished

sent 9,750,855 bytes received 3,629,561 bytes 201,209.26 bytes/sec
total size is 9,162,752,000 speedup is 684.79
[sender] _exit_cleanup(code=0, file=main.c, line=1196): about to call exit(0)
root@bionic:/tmp#

tags: added: verification-done-bionic verification-needed-focal
removed: verification-done-focal verification-needed-bionic
tags: added: verification-done
removed: verification-needed
Revision history for this message
Robie Basak (racb) wrote :

Focal is in freeze for the point release, but I don't see there being any practical issue with releasing to Bionic now. It seems unlikely that a user would be surprised by a regression on this issue upgrading from Bionic to Focal in the next week.

Revision history for this message
Robie Basak (racb) wrote : Update Released

The verification of the Stable Release Update for rsync has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rsync - 3.1.2-2.1ubuntu1.3

---------------
rsync (3.1.2-2.1ubuntu1.3) bionic; urgency=medium

  * d/p/avoid-deadlock-huge-amounts-verbose-messages.patch:
    Allow the receiver to increase their iobuf.msg xbuf if it fills
    up. This ensures that the receiver will never block trying to
    output a message, and thus it will always drain the data from
    the sender and keep the whole thing from clogging up. Thanks to
    Wayne Davison <email address hidden>. (LP: #1528921)

 -- Miriam España Acebal <email address hidden> Tue, 08 Feb 2022 13:26:20 +0100

Changed in rsync (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There was a bad tag update before, this is verified on Focal as well.
Fixed now ...

tags: added: verification-done-focal
removed: verification-needed-focal
Changed in rsync (Ubuntu Focal):
importance: Undecided → Medium
Changed in rsync (Ubuntu Bionic):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rsync - 3.1.3-8ubuntu0.2

---------------
rsync (3.1.3-8ubuntu0.2) focal; urgency=medium

  * d/p/avoid-deadlock-huge-amounts-verbose-messages.patch:
    Allow the receiver to increase their iobuf.msg xbuf if it fills
    up. This ensures that the receiver will never block trying to
    output a message, and thus it will always drain the data from
    the sender and keep the whole thing from clogging up. Thanks to
    Wayne Davison <email address hidden>. (LP: #1528921)

 -- Miriam España Acebal <email address hidden> Mon, 07 Feb 2022 22:46:19 +0100

Changed in rsync (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.