volume backup process block

Bug #1906784 reported by buliangjun
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Incomplete
Low
Unassigned

Bug Description

When using a volume to create a backup, the whole process can't end due to child process is blocking.
Reproduce the problem through the following steps.
1)[root]# cinder create 7 --image b3b811d5-e438-4411-a4f7-8637659fb3bc --name test_backup

2)[root]# cinder backup-create --name test_backup 61061b93-986b-4802-9900-e7339f28d2a8
+-----------+--------------------------------------+
| Property | Value |
+-----------+--------------------------------------+
| id | c33140e9-9623-451d-8076-476cd5231cfd |
| name | test_backup |
| volume_id | 61061b93-986b-4802-9900-e7339f28d2a8 |
+-----------+--------------------------------------+

3)[root]# cinder list --all | grep test_backup
| 61061b93-986b-4802-9900-e7339f28d2a8 | e23429ab9610402f942aaeb2a6618791 | backing-up | test_backup | 7 | ssd | true | |
4)[root]# cinder backup-show c33140e9-9623-451d-8076-476cd5231cfd
+-----------------------+--------------------------------------+
| Property | Value |
+-----------------------+--------------------------------------+
| availability_zone | nova |
| container | None |
| created_at | 2020-11-20T05:46:30.000000 |
| data_timestamp | 2020-11-20T05:46:30.000000 |
| description | None |
| fail_reason | None |
| has_dependent_backups | False |
| id | c33140e9-9623-451d-8076-476cd5231cfd |
| is_incremental | False |
| name | test_backup |
| object_count | 0 |
| size | 7 |
| snapshot_id | None |
| status | creating |
| updated_at | 2020-11-20T05:46:30.000000 |
| volume_id | 61061b93-986b-4802-9900-e7339f28d2a8 |

The backup status will always in creating. here is Child process:

[root@node-8 ~]# ps -ef | grep rbd | grep diff
42407 25499 53947 0 Nov20 ? 00:03:17 rbd export-diff --id cinder --conf /tmp/tmpCg1B5n --pool volumes <email address hidden> -
42407 25515 53947 0 Nov20 ? 00:02:20 rbd import-diff --id cinderbackup --conf /etc/ceph/ceph.conf --pool backups - backups/volume-61061b93-986b-4802-9900-e7339f28d2a8.backup.base

we check the child process , here is info:
[root@node-8 ~]# strace -p 25499
strace: Process 25499 attached
write(2, "Exporting image", 15
[root@node-8 ~]# ll /proc/25499/fd/2
l-wx------ 1 42407 64 Dec 4 14:18 /proc/25499/fd/2 -> pipe:[3241294247]
[root@node-8 ~]#
[root@node-8 ~]# cat /proc/25499/fd/2
2020-11-20 13:46:33.181939 7f42d250bd40 -1 Errors while parsing config file!
2020-11-20 13:46:33.181945 7f42d250bd40 -1 read_conf: ignoring line 3 because it doesn't end with a newline! Please end the config file witha newline.
2020-11-20 13:46:33.183935 7f42d250bd40 -1 Errors while parsing config file!
2020-11-20 13:46:33.183940 7f42d250bd40 -1 read_conf: ignoring line 3 because it doesn't end with a newline! Please end the config file witha newline.
Exporting image: 100% complete...done.
[root@node-8 ~]#
when we read from "/proc/25499/fd/2", exporting image will go on ,and child process will close in the end.
#############
The main code:
#############

    def _piped_execute(self, cmd1, cmd2):
        """Pipe output of cmd1 into cmd2."""
        LOG.debug("Piping cmd1='%s' into...", ' '.join(cmd1))
        LOG.debug("cmd2='%s'", ' '.join(cmd2))

        try:
            p1 = subprocess.Popen(cmd1, stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE)
        except OSError as e:
            LOG.error(_LE("Pipe1 failed - %s "), e)
            raise

        # NOTE(dosaboy): ensure that the pipe is blocking. This is to work
        # around the case where evenlet.green.subprocess is used which seems to
        # use a non-blocking pipe.
        flags = fcntl.fcntl(p1.stdout, fcntl.F_GETFL) & (~os.O_NONBLOCK)
        fcntl.fcntl(p1.stdout, fcntl.F_SETFL, flags)

        try:
            p2 = subprocess.Popen(cmd2, stdin=p1.stdout,
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE)
        except OSError as e:
            LOG.error(_LE("Pipe2 failed - %s "), e)
            raise

        p1.stdout.close()
        stdout, stderr = p2.communicate()
        return p2.returncode, stderr

we do this test:
#######
test1:
######

import subprocess
import fcntl
import os

import eventlet
eventlet.monkey_patch()

def piped_execute(cmd1, cmd2):
    """Pipe output of cmd1 into cmd2."""
    try:
        p1 = subprocess.Popen(cmd1, stdout=subprocess.PIPE,
                              stderr=subprocess.PIPE)
    except OSError as e:
        raise

    # NOTE(dosaboy): ensure that the pipe is blocking. This is to work
    # around the case where evenlet.green.subprocess is used which seems to
    # use a non-blocking pipe.
    flags = fcntl.fcntl(p1.stdout, fcntl.F_GETFL) & (~os.O_NONBLOCK)
    fcntl.fcntl(p1.stdout, fcntl.F_SETFL, flags)

    try:
        p2 = subprocess.Popen(cmd2, stdin=p1.stdout,
                              stdout=subprocess.PIPE,
                              stderr=subprocess.PIPE)
    except OSError as e:
        raise

    p1.stdout.close()
    p1.stderr.close()

    stdout, stderr = p2.communicate()
    return p2.returncode, stderr

if __name__ == '__main__':
    cmd1 = ['rbd', 'export-diff', '--id', 'cinder', '--conf', '/tmp/tmpjLsT63', '--pool', 'volumes', '<email address hidden>', '-']
    cmd2 = ['rbd', 'import-diff', '--id', 'cinderbackup', '--conf', '/etc/ceph/ceph.conf', '--pool', 'backups', '-', 'backups/volume-61061b93-986b-4802-9900-e7339f28d2a8.backup.base']
    piped_execute(cmd1, cmd2)

########
test2:
########
import subprocess
import fcntl
import os

import eventlet
eventlet.monkey_patch()

def piped_execute(cmd1, cmd2):
    """Pipe output of cmd1 into cmd2."""
    try:
        p1 = subprocess.Popen(cmd1, stdout=subprocess.PIPE,
                              stderr=open('/tmp/stderr', 'w'))
    except OSError as e:
        raise

    # NOTE(dosaboy): ensure that the pipe is blocking. This is to work
    # around the case where evenlet.green.subprocess is used which seems to
    # use a non-blocking pipe.
    flags = fcntl.fcntl(p1.stdout, fcntl.F_GETFL) & (~os.O_NONBLOCK)
    fcntl.fcntl(p1.stdout, fcntl.F_SETFL, flags)

    try:
        p2 = subprocess.Popen(cmd2, stdin=p1.stdout,
                              stdout=subprocess.PIPE,
                              stderr=subprocess.PIPE)
    except OSError as e:
        raise

    p1.stdout.close()

    stdout, stderr = p2.communicate()
    return p2.returncode, stderr

if __name__ == '__main__':
    cmd1 = ['rbd', 'export-diff', '--id', 'cinder', '--conf', '/tmp/tmpjLsT63', '--pool', 'volumes', '<email address hidden>', '-']
    cmd2 = ['rbd', 'import-diff', '--id', 'cinderbackup', '--conf', '/etc/ceph/ceph.conf', '--pool', 'backups', '-', 'backups/volume-61061b93-986b-4802-9900-e7339f28d2a8.backup.base']
    piped_execute(cmd1, cmd2)

in test1 we add p1.stderr.close() after p1.stdout.close()
in test2 we redirect p1.stderr to a file
The test results show that there is no problem.
so when use pipe like this:
p1 = subprocess.Popen(cmd1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p2 = subprocess.Popen(cmd2, stdin=p1.stdout,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
data flow is:
p1 stdout ---> parent process ---> p2 stdin
but p1 stderr no place to read, If the data size exceeds the pipe size,child process p1 will block.
#######
test3:
######
import subprocess
import eventlet
eventlet.monkey_patch()

def piped_execute(size):
    cmd = 'dd if=/dev/urandom bs=1 count=%d 2>/dev/null' % size
    try:
        p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
                             stderr=subprocess.STDOUT)
    except OSError as e:
        print("Pipe failed - %s " % e)
        raise
    p.wait()

if __name__ == '__main__':
    piped_execute(64 * 1024 + 1)
the result is: data size exceeds 64k process is block.
so in cinder backup, use subprocess.Popen make a child process to handle “rbd export-diff”, Should stderr processing be optimized?

Tags: backup ceph
Changed in cinder:
status: New → Incomplete
Revision history for this message
michael-mcaleer (mmcaleer) wrote :

Hi,

Can you please provide some additional information:
- Cinder version in use
- Cinder management driver in-use
- Does this behaviour persist when all commands are run outside of custom tests and in standalone mode with just CLI commands?
- Have you tried adding a new line to /etc/ceph/ceph.conf as indicated by the error message in the first section of your bug report?

2020-11-20 13:46:33.183935 7f42d250bd40 -1 Errors while parsing config file!
2020-11-20 13:46:33.183940 7f42d250bd40 -1 read_conf: ignoring line 3 because it doesn't end with a newline! Please end the config file witha newline.

tags: added: backup ceph
Revision history for this message
buliangjun (buliangjun) wrote :

- cinder version is N.
- Cinder management driver is ceph(rbd).
- Only sometimes,no distinction between command line and page.
- Under normal circumstances there will also be the following output
”“”
2020-11-20 13:46:33.181939 7f42d250bd40 -1 Errors while parsing config file!
2020-11-20 13:46:33.181945 7f42d250bd40 -1 read_conf: ignoring line 3 because it doesn't end with a newline! Please end the config file witha newline.
“”“

The reason why the size exceeds 64k is rbd image enabled striping feature, stderr will print repeat rate of progress when execu rbd export-diff.
so we report a bug :
https://github.com/ceph/ceph/pull/38399
https://tracker.ceph.com/issues/48431
Now the bug has been fixed, but for code robustness, should cinder also consider stderr overflow scenarios?

Revision history for this message
buliangjun (buliangjun) wrote :

And I have a patch:

Index: cinder/backup/drivers/ceph.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- cinder/backup/drivers/ceph.py (revision ab7d3c7b09198c1aa02cbb59d26f056b4c5e809d)
+++ cinder/backup/drivers/ceph.py (date 1607515011752)
@@ -518,6 +518,7 @@
             raise

         p1.stdout.close()
+ p1.stderr.close()
         stdout, stderr = p2.communicate()
         return p2.returncode, stderr

Changed in cinder:
importance: Undecided → Low
assignee: nobody → Sofia Enriquez (lsofia-enriquez)
Changed in cinder:
assignee: Sofia Enriquez (lsofia-enriquez) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.