instance is using wrong scsi disk after a stop/start

Bug #1936854 reported by Jon Bernard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Invalid
High
Unassigned

Bug Description

instance is using wrong scsi disk after a stop/start.

- We have one instance running, booting from
  cinder:aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa) since 6-Nov-18.

- We rebooted the instance and we observed from the console it was
  ubuntu (with the same volume ID) which was running since 2 months as a
  windows. It is surprising.

- When we create the snapshot of the same volume and launch an instance
  then it created a windows VM!

- user dont install OS, we use image to launch the instance using it. we
  dont give raw disk for installation of the instance...

Questions:

1. How is this possible that new volume snapshot is creating window
   instance but same volume showing Ubuntu VM?

2. How is it possible that volume ID is same but after reboot it
   converted from window to ubuntu.

=================================

- We found 2 instances using same backend /dev/sdn :
~~~
#grep -ir sdn sos_commands/virsh/virsh_-r_domblklist*
sos_commands/virsh/virsh_-r_domblklist_instance-00000cfb:vda /dev/sdn
sos_commands/virsh/virsh_-r_domblklist_instance-00000dbb:vda /dev/sdn
~~~

instance : bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb : instance-00000dbb <--> volume : aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa (250GiB)

instance : cccccccc-cccc-cccc-ccccccccccccccccc (deleted) : instance-00000cfb <--> volume : dddddddd-dddd-dddd-dddd-dddddddddddd (100GiB)

- instance cccccccc-cccc-cccc-ccccccccccccccccc was deleted but now we suspect instance bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb is using it's disk.

~~~
lrwxrwxrwx. 1 0 0 9 Jan 25 14:22 ip-X.X.X.3:3260-iscsi-iqn.2002-03.com.foobar:5000d3100316861f-lun-10 -> ../../sdn
~~~

This was supposed to be a 250GiB volume but is 100 GiB.
~~~
$ egrep -i "sdn|iscsi" sos_commands/logs/journalctl_--no-pager_--boot|egrep -iv "failed|Reloaded"

localhost iscsid[4235]: Connection-1:0 to [target: iqn.2002-03.com.foobar:5000d3100316861e, portal: X.X.X.6,3260] through [iface: default] is shutdown.
localhost kernel: sd 12:0:0:6: [sdn] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
localhost kernel: sd 12:0:0:6: [sdn] 4096-byte physical blocks
localhost kernel: sd 12:0:0:6: [sdn] Write Protect is off
localhost kernel: sd 12:0:0:6: [sdn] Mode Sense: 8f 00 00 08
localhost kernel: sd 12:0:0:6: [sdn] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
localhost kernel: sdn: sdn1 sdn2
localhost kernel: sd 12:0:0:6: [sdn] Attached SCSI disk
localhost lvm[38901]: WARNING: PV Y4RnUu-kIZ1-dmGx-AYnu-LJak-fSRW-fGlwM8 on /dev/sdn2 was already found on /dev/sdf2.
localhost kernel: iscsi: registered transport (bnx2i)
localhost iscsid[4235]: Connection-1:0 to [target: iqn.2002-03.com.foobar:5000d3100316861e, portal: X.X.X.6,3260] through [iface: default] is shutdown.
localhost iscsid[4235]: Connection-1:0 to [target: iqn.2002-03.com.foobar:5000d3100316861e, portal: X.X.X.6,3260] through [iface: default] is shutdown.
localhost kernel: sd 14:0:0:10: [sdn] 209715200 512-byte logical blocks: (107 GB/100 GiB)
localhost kernel: sd 14:0:0:10: [sdn] 4096-byte physical blocks
localhost kernel: sd 14:0:0:10: [sdn] Write Protect is off
localhost kernel: sd 14:0:0:10: [sdn] Mode Sense: 8f 00 00 08
localhost kernel: sd 14:0:0:10: [sdn] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
localhost kernel: sdn: sdn1
localhost kernel: sd 14:0:0:10: [sdn] Attached SCSI disk
localhost kernel: sd 14:0:0:10: [sdn] 209715200 512-byte logical blocks: (107 GB/100 GiB)
localhost kernel: sd 14:0:0:10: [sdn] 4096-byte physical blocks
localhost kernel: sd 14:0:0:10: [sdn] Write Protect is off
localhost kernel: sd 14:0:0:10: [sdn] Mode Sense: 8f 00 00 08
localhost kernel: sd 14:0:0:10: [sdn] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
localhost kernel: sdn: sdn1
localhost kernel: sd 14:0:0:10: [sdn] Attached SCSI disk
localhost kernel: sd 14:0:0:10: [sdn] 209715200 512-byte logical blocks: (107 GB/100 GiB)
localhost kernel: sd 14:0:0:10: [sdn] 4096-byte physical blocks
localhost kernel: sd 14:0:0:10: [sdn] Write Protect is off
localhost kernel: sd 14:0:0:10: [sdn] Mode Sense: 8f 00 00 08
localhost kernel: sd 14:0:0:10: [sdn] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
localhost kernel: sdn: sdn1
localhost kernel: sd 14:0:0:10: [sdn] Attached SCSI disk
localhost kernel: sdn: sdn1

Note I found one disk in scsi layer which is 250 gb and not used by any instance, mostly this is what instance should use :
~~~
localhost kernel: sd 14:0:0:1: [sdr] 524288000 512-byte logical blocks: (268 GB/250 GiB)
localhost kernel: sd 14:0:0:1: [sdr] 524288000 512-byte logical blocks: (268 GB/250 GiB)
localhost kernel: sd 14:0:0:1: [sdr] 524288000 512-byte logical blocks: (268 GB/250 GiB)
localhost kernel: sd 14:0:0:1: [sdr] 524288000 512-byte logical blocks: (268 GB/250 GiB)
~~~

Interestingly for instance : bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb, this
is what customer reported i.e. after a stop start instance got changed
from windows to ubuntu.

$ nova instance-action-list bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb

+----------------+------------------------------------------+---------+----------------------------+
| Action | Request_ID | Message | Start_Time |
+----------------+------------------------------------------+---------+----------------------------+
| create | req-753c2b04-c1c4-4122-a6dd-a3ce9a412e47 | - | 2018-11-06T08:30:01.000000 |
| live-migration | req-2300d864-9db7-4bba-9c05-1ab4bd5b3dce | - | 2018-11-26T15:32:04.000000 |
| live-migration | req-af552fc7-7596-4830-b653-50236f69b79c | - | 2018-11-26T15:44:22.000000 |
| live-migration | req-d0db117e-52be-427b-bbdb-9bd0d33ea2f2 | - | 2018-11-26T18:07:09.000000 |
| stop | req-99bb7212-e875-4136-8f6b-a498fca34706 | - | 2019-01-24T19:00:23.000000 |
| start | req-489aa37e-e760-4d09-8f8b-5de5e8be6b3b | - | 2019-01-24T19:06:51.000000 |
| stop | req-7e93ee3e-a902-423e-9342-481b6e770bfa | - | 2019-01-25T05:08:39.000000 |
| start | req-410fe17d-e957-4b6a-ac2d-c61dfb03f40e | - | 2019-01-25T05:11:21.000000 |
| stop | req-b57acf82-b875-4433-87e9-e5745a96d972 | - | 2019-01-25T08:42:24.000000 |
| start | req-89d78197-1d18-4393-9b44-9489e588aae4 | - | 2019-01-25T08:52:03.000000 |
+----------------+------------------------------------------+---------+----------------------------+

So basically from what I understand the instance is using wrong scsi disk.

DB query output

MariaDB [(none)]> use nova ;

MariaDB [nova]> select * from instances where uuid='cccccccc-cccc-cccc-cccccccccccccccc' \G ;
*************************** 1. row ***************************
              created_at: 2018-10-30 09:26:02
              updated_at: 2018-12-17 05:15:08
              deleted_at: 2018-12-17 05:15:08
                      id: 3323
             internal_id: NULL
                 user_id: eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
              project_id: ffffffffffffffffffffffffffffffff
               image_ref:
               kernel_id:
              ramdisk_id:
            launch_index: 0
                key_name:
                key_data:
             power_state: 0
                vm_state: deleted
               memory_mb: 16384
                   vcpus: 8
                hostname: fizzbuzz
                    host: localhost
               user_data:
          reservation_id: r-vllo00tf
            scheduled_at: NULL
             launched_at: 2018-10-30 09:32:31
           terminated_at: 2018-12-17 05:15:07
            display_name: fizzbuzz
     display_description: fizzbuzz
       availability_zone: nova
                  locked: 0
                 os_type: NULL
             launched_on: localhost
        instance_type_id: 353
                 vm_mode: NULL
                    uuid: cccccccc-cccc-cccc-cccccccccccccccc
            architecture: NULL
        root_device_name: /dev/vda
            access_ip_v4: NULL
            access_ip_v6: NULL
            config_drive:
              task_state: NULL
default_ephemeral_device: NULL
     default_swap_device: NULL
                progress: 0
        auto_disk_config: 1
      shutdown_terminate: 0
       disable_terminate: 0
                 root_gb: 0
            ephemeral_gb: 0
               cell_name: NULL
                    node: redacted
                 deleted: 3323
               locked_by: NULL
                 cleaned: 1
      ephemeral_key_uuid: NULL
1 row in set (0.00 sec)

ERROR: No query specified

MariaDB [nova]> select * from block_device_mapping where instance_uuid='cccccccc-cccc-cccc-ccccccccccccccc'\G;
*************************** 1. row ***************************
           created_at: 2018-10-30 09:26:02
           updated_at: 2018-12-14 11:44:38
           deleted_at: 2018-12-17 05:15:08
                   id: 4019
          device_name: /dev/vda
delete_on_termination: 0
          snapshot_id: NULL
            volume_id: dddddddd-dddd-dddd-ddddddddddddddd
          volume_size: 100
            no_device: 0
      connection_info: {"driver_volume_type": "iscsi", "connector": {"initiator": "iqn.1994-05.com.redhat:93e382b64a0", "ip": "X.X.X.19", "platform": "x86_64", "host": "localhost", "do_local_attach": false, "os_type": "linux2", "multipath": false}, "serial": "dddddddd-dddd-dddd-ddddddddddddddd", "data": {"target_luns": [13, 13], "target_iqns": ["iqn.2002-03.com.:5000d3100316861e", "iqn.2002-03.com.foobar:5000d3100316861f"], "device_path": "/dev/sdn", "target_discovered": false, "encrypted": false, "qos_specs": null, "target_iqn": "iqn.2002-03.com.foobar:5000d3100316861e", "target_portals": ["X.X.X.6:3260", "X.X.X.3:3260"], "target_lun": 13, "discard": true, "access_mode": "rw", "target_portal": "X.X.X.6:3260"}}
        instance_uuid: cccccccc-cccc-cccc-ccccccccccccccc
              deleted: 4019
          source_type: image
     destination_type: volume
         guest_format: NULL
          device_type: disk
             disk_bus: virtio
           boot_index: 0
             image_id: redacted
                  tag: NULL
1 row in set (0.00 sec)

ERROR: No query specified

MariaDB [nova]> select * from instances where uuid='bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbb' \G ;
*************************** 1. row ***************************
              created_at: 2018-11-06 08:30:03
              updated_at: 2019-01-25 08:52:12
              deleted_at: NULL
                      id: 3515
             internal_id: NULL
                 user_id: eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
              project_id: gggggggggggggggggggggggggggggggg
               image_ref:
               kernel_id:
              ramdisk_id:
            launch_index: 0
                key_name: NULL
                key_data: NULL
             power_state: 1
                vm_state: active
               memory_mb: 16384
                   vcpus: 8
                hostname: redacted
                    host: localhost
               user_data:
          reservation_id: r-4d0cbg8r
            scheduled_at: NULL
             launched_at: 2018-11-06 08:30:20
           terminated_at: NULL
            display_name: redacted
     display_description: redacted
       availability_zone: nova
                  locked: 0
                 os_type: NULL
             launched_on: redacted
        instance_type_id: 315
                 vm_mode: NULL
                    uuid: bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbb
            architecture: NULL
        root_device_name: /dev/vda
            access_ip_v4: NULL
            access_ip_v6: NULL
            config_drive:
              task_state: NULL
default_ephemeral_device: NULL
     default_swap_device: NULL
                progress: 0
        auto_disk_config: 1
      shutdown_terminate: 0
       disable_terminate: 0
                 root_gb: 0
            ephemeral_gb: 0
               cell_name: NULL
                    node: localhost
                 deleted: 0
               locked_by: NULL
                 cleaned: 1
      ephemeral_key_uuid: NULL
1 row in set (0.00 sec)

ERROR: No query specified

MariaDB [nova]> select * from block_device_mapping where instance_uuid='bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbb'\G;
*************************** 1. row ***************************
           created_at: 2018-11-06 08:30:03
           updated_at: 2019-01-25 08:52:12
           deleted_at: NULL
                   id: 4232
          device_name: /dev/vda
delete_on_termination: 0
          snapshot_id: hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
            volume_id: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
          volume_size: 250
            no_device: 0
      connection_info: {"driver_volume_type": "iscsi", "connector": {"initiator": "iqn.1994-05.com.redhat:ba902d55f20", "ip": "X.X.X.18", "platform": "x86_64", "host": "overcloud-comp-3.cloud.vssi.com", "do_local_attach": false, "os_type": "linux2", "multipath": false}, "serial": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa", "data": {"target_luns": [10, 10], "target_iqns": ["iqn.2002-03.com.foobar:5000d3100316861f", "iqn.2002-03.com.foobar:5000d3100316861e"], "device_path": "/dev/sdn", "target_discovered": false, "encrypted": false, "qos_specs": null, "target_iqn": "iqn.2002-03.com.foobar:5000d3100316861f", "target_portals": ["X.X.X.3:3260", "X.X.X.6:3260"], "target_lun": 10, "discard": true, "access_mode": "rw", "target_portal": "X.X.X.3:3260"}}
        instance_uuid: bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb
              deleted: 0
          source_type: snapshot
     destination_type: volume
         guest_format: NULL
          device_type: disk
             disk_bus: virtio
           boot_index: 0
             image_id: NULL
                  tag: NULL
1 row in set (0.00 sec)

Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :

Hi Jon
Are you using master o another release?
Cheers,
Sofia

Changed in cinder:
importance: Undecided → High
tags: added: attach instance scsi volume
Eric Harney (eharney)
Changed in cinder:
status: New → Incomplete
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

This sounds dangerous, and a potential security issue. But since there haven't been any responses in a few years, and no other reports of similar things happening to others, I'm going to close this issue.

If it is seen again, feel free to reopen or file a new issue with current details.

Changed in cinder:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.