These are the steps for a synthetic reproducer with GDB and QEMU.
It's sufficient to validate the change on QEMU and move forward with SRUs.
(We can add libvirt for reaching the error message too, but it's a plus.)
I'll check/coordinate on the QEMU uploads next week.
cheers,
Mauricio
...
Original code.
The race condition is, one thread can read 's->state' as SETUP (line 1078/1083/1086),
and another thread changes it to ACTIVE _before_ this thread assigns (the different)
's->state' to 'info->status' (line 1123), which will NOT have RAM info expected for
the ACTIVE status (line 1087/1096).
GDB breaks on change from migration status NONE to SETUP.
Thread 1 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=0, new_state=1) at ../../migration/migration.c:1746
(gdb) p (MigrationStatus) 0
$1 = MIGRATION_STATUS_NONE
(gdb) p (MigrationStatus) 1
$2 = MIGRATION_STATUS_SETUP
Just continue.
(gdb) c
GDB breaks on change from migration status SETUP to ACTIVE.
Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=1, new_state=4) at ../../migration/migration.c:1746
(gdb) p (MigrationStatus) 1
$2 = MIGRATION_STATUS_SETUP
(gdb) p (MigrationStatus) 4
$3 = MIGRATION_STATUS_ACTIVE
Let's explore the race condition.
Terminal 2)
Check the migration information:
(qemu) info migrate
Terminal 3)
GDB breaks on the migration information function.
The status is now observed as SETUP (not yet ACTIVE),
and is not yet propagated to the migration information
to be returned to the monitor.
Thread 1 "qemu-system-x86" hit Breakpoint 2, fill_source_migration_info (info=0x555556d65c70) at ../../migration/migration.c:1086
(gdb) p (MigrationStatus) s.state
$4 = MIGRATION_STATUS_SETUP
(gdb) p info.status
$5 = MIGRATION_STATUS_NONE
Now, allow the other thread to continue and change status to ACTIVE.
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff6c21e80 (LWP 995) "qemu-system-x86" fill_source_migration_info (info=0x555556d65c70) at ../../migration/migration.c:1123
2 Thread 0x7ffff65ff640 (LWP 998) "qemu-system-x86" (running)
3 Thread 0x7ffff5d7c640 (LWP 999) "qemu-system-x86" (running)
5 Thread 0x7ffff49ff640 (LWP 1002) "qemu-system-x86" migrate_set_state (state=0x5555566f8628, old_state=1, new_state=4) at ../../migration/migration.c:1746
(gdb) thread 5
[Switching to thread 5 (Thread 0x7ffff49ff640 (LWP 1002))]
#0 migrate_set_state (state=0x5555566f8628, old_state=1, new_state=4) at ../../migration/migration.c:1746
1746 in ../../migration/migration.c
(gdb) continue &
Continuing.
The first thread now observes the ACTIVE status.
(gdb) info threads
Id Target Id Frame
1 Thread 0x7ffff6c21e80 (LWP 995) "qemu-system-x86" fill_source_migration_info (info=0x555556d65c70) at ../../migration/migration.c:1123
2 Thread 0x7ffff65ff640 (LWP 998) "qemu-system-x86" (running)
3 Thread 0x7ffff5d7c640 (LWP 999) "qemu-system-x86" (running)
* 5 Thread 0x7ffff49ff640 (LWP 1002) "qemu-system-x86" (running)
(gdb)
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff6c21e80 (LWP 995))]
#0 fill_source_migration_info (info=0x555556d65c70) at ../../migration/migration.c:1123
1123 in ../../migration/migration.c
(gdb) p (MigrationStatus) s.state
$7 = MIGRATION_STATUS_ACTIVE
Done, the next statement in the code assigns the ACTIVE
status to the migration information, but it did NOT add
any RAM statistics (as it executed for the SETUP status).
Just continue, and check the resulting migration info:
(gdb) c
Terminal 2)
... info migrate
...
Migration status: active
total time: 0 ms
(qemu)
Migration status is active, without any RAM statistics.
...
...
...
Note that, normally, without exploring the race condition,
the status is 'active' and there _is_ memory statistics:
[just a summary from gdb/monitor steps]
...
(qemu) migrate -d tcp:127.0.0.1:4444
...
Thread 1 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=0, new_state=1) at ../../migration/migration.c:1746
(gdb) c
Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=1, new_state=4) at ../../migration/migration.c:1746
(gdb) thread 5
(gdb) c
Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=4, new_state=8) at ../../migration/migration.c:1746
(qemu) info migrate
...
Migration status: active
total time: 41387 ms
expected downtime: 300 ms
setup: 22590 ms
transferred ram: 485 kbytes
throughput: 0.18 mbps
remaining ram: 0 kbytes
total ram: 131592 kbytes
duplicate: 32849 pages
skipped: 0 pages
normal: 49 pages
normal bytes: 196 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 1453
(qemu)
These are the steps for a synthetic reproducer with GDB and QEMU.
It's sufficient to validate the change on QEMU and move forward with SRUs.
(We can add libvirt for reaching the error message too, but it's a plus.)
I'll check/coordinate on the QEMU uploads next week.
cheers,
Mauricio
...
Original code.
The race condition is, one thread can read 's->state' as SETUP (line 1078/1083/1086),
and another thread changes it to ACTIVE _before_ this thread assigns (the different)
's->state' to 'info->status' (line 1123), which will NOT have RAM info expected for
the ACTIVE status (line 1087/1096).
1056 static void fill_source_ migration_ info(MigrationI nfo *info) get_current( ); STATUS_ SETUP: STATUS_ ACTIVE: ram_info( info, s);
1057 {
1058 MigrationState *s = migrate_
...
1078 switch (s->state) {
...
1083 case MIGRATION_
...
1086 break;
1087 case MIGRATION_
...
1096 populate_
...
1123 info->status = s->state;
1124 }
We'll break on this function that changes migration states.
void migrate_ set_state( int *state, int old_state, int new_state)
Terminal 1)
QEMU process for incoming migration (ie, destination)
qemu-system-x86_64 -nodefaults -nographic -S -incoming tcp:0:4444
Terminal 2)
QEMU process for outgoing migration (ie, source)
with QEMU monitor on port 3333,
with GDB in non-stop threads mode
(and debuginfod for debug symbols),
gdb \ /debuginfod. ubuntu. com"' \ system- x86_64
-ex 'set non-stop on' -ex 'set pagination off' -ex 'set confirm off' \
-iex 'set debuginfod enabled on' -iex 'set debuginfod urls "https:/
qemu-
Set breakpoints on the functions above STATUS_ SETUP') :
(and line number of the 'break' statement under 'case MIGRATION_
b migrate_set_state migration. c:1086
b migration/
run -nodefaults -nographic -S -monitor tcp:0:3333, server, wait=off
...
Terminal 3)
Connect to QEMU monitor with netcat,
and start the the migration:
nc 127.0.0.1 3333
...
(qemu) migrate -d tcp:127.0.0.1:4444
Terminal 2)
GDB breaks on change from migration status NONE to SETUP.
Thread 1 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state= 0x5555566f8628, old_state=0, new_state=1) at ../../migration /migration. c:1746
(gdb) p (MigrationStatus) 0 STATUS_ NONE
$1 = MIGRATION_
(gdb) p (MigrationStatus) 1 STATUS_ SETUP
$2 = MIGRATION_
Just continue.
(gdb) c
GDB breaks on change from migration status SETUP to ACTIVE.
Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state= 0x5555566f8628, old_state=1, new_state=4) at ../../migration /migration. c:1746
(gdb) p (MigrationStatus) 1 STATUS_ SETUP
$2 = MIGRATION_
(gdb) p (MigrationStatus) 4 STATUS_ ACTIVE
$3 = MIGRATION_
Let's explore the race condition.
Terminal 2)
Check the migration information:
(qemu) info migrate
Terminal 3)
GDB breaks on the migration information function.
The status is now observed as SETUP (not yet ACTIVE),
and is not yet propagated to the migration information
to be returned to the monitor.
Thread 1 "qemu-system-x86" hit Breakpoint 2, fill_source_ migration_ info (info=0x555556d 65c70) at ../../migration /migration. c:1086
(gdb) p (MigrationStatus) s.state STATUS_ SETUP
$4 = MIGRATION_
(gdb) p info.status STATUS_ NONE
$5 = MIGRATION_
Now, allow the other thread to continue and change status to ACTIVE.
(gdb) info threads migration_ info (info=0x555556d 65c70) at ../../migration /migration. c:1123 0x5555566f8628, old_state=1, new_state=4) at ../../migration /migration. c:1746
Id Target Id Frame
* 1 Thread 0x7ffff6c21e80 (LWP 995) "qemu-system-x86" fill_source_
2 Thread 0x7ffff65ff640 (LWP 998) "qemu-system-x86" (running)
3 Thread 0x7ffff5d7c640 (LWP 999) "qemu-system-x86" (running)
5 Thread 0x7ffff49ff640 (LWP 1002) "qemu-system-x86" migrate_set_state (state=
(gdb) thread 5 0x5555566f8628, old_state=1, new_state=4) at ../../migration /migration. c:1746 /migration. c
[Switching to thread 5 (Thread 0x7ffff49ff640 (LWP 1002))]
#0 migrate_set_state (state=
1746 in ../../migration
(gdb) continue &
Continuing.
The first thread now observes the ACTIVE status.
(gdb) info threads migration_ info (info=0x555556d 65c70) at ../../migration /migration. c:1123
Id Target Id Frame
1 Thread 0x7ffff6c21e80 (LWP 995) "qemu-system-x86" fill_source_
2 Thread 0x7ffff65ff640 (LWP 998) "qemu-system-x86" (running)
3 Thread 0x7ffff5d7c640 (LWP 999) "qemu-system-x86" (running)
* 5 Thread 0x7ffff49ff640 (LWP 1002) "qemu-system-x86" (running)
(gdb)
(gdb) thread 1 migration_ info (info=0x555556d 65c70) at ../../migration /migration. c:1123 /migration. c
[Switching to thread 1 (Thread 0x7ffff6c21e80 (LWP 995))]
#0 fill_source_
1123 in ../../migration
(gdb) p (MigrationStatus) s.state STATUS_ ACTIVE
$7 = MIGRATION_
Done, the next statement in the code assigns the ACTIVE
status to the migration information, but it did NOT add
any RAM statistics (as it executed for the SETUP status).
Just continue, and check the resulting migration info:
(gdb) c
Terminal 2)
... info migrate
...
Migration status: active
total time: 0 ms
(qemu)
Migration status is active, without any RAM statistics.
...
...
...
Note that, normally, without exploring the race condition,
the status is 'active' and there _is_ memory statistics:
[just a summary from gdb/monitor steps]
...
(qemu) migrate -d tcp:127.0.0.1:4444
... 0x5555566f8628, old_state=0, new_state=1) at ../../migration /migration. c:1746 0x5555566f8628, old_state=1, new_state=4) at ../../migration /migration. c:1746 0x5555566f8628, old_state=4, new_state=8) at ../../migration /migration. c:1746
Thread 1 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=
(gdb) c
Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=
(gdb) thread 5
(gdb) c
Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=
(qemu) info migrate
...
Migration status: active
total time: 41387 ms
expected downtime: 300 ms
setup: 22590 ms
transferred ram: 485 kbytes
throughput: 0.18 mbps
remaining ram: 0 kbytes
total ram: 131592 kbytes
duplicate: 32849 pages
skipped: 0 pages
normal: 49 pages
normal bytes: 196 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 1453
(qemu)
(gdb) c
...