Comment 2 for bug 1621818

Revision history for this message
Joris S'heeren (jsheeren) wrote :

There is no useful information in dmesg or syslog unfortunately.

The fails are intermittent. When launching a lot of instances at once coming from the same _base image, we see the error.

The image cache base directory exists and nova can write to it:

root@compute:/var/lib/nova/instances# ls -l
total 256
drwxr-xr-x 2 nova nova 4096 Aug 17 14:09 02db8511-2f20-41da-bcc2-797a9bbbe63b
... snip ...
drwxr-xr-x 2 nova nova 4096 Aug 29 17:24 bab8ddbf-c483-4462-9273-755812d84903
drwxr-xr-x 2 nova nova 4096 Sep 7 13:33 _base
drwxr-xr-x 2 nova nova 4096 Sep 9 17:10 c3251e4f-4c0e-42d8-a039-78ed9263b46c
... snip

root@compute:/var/lib/nova/instances/_base# ls -la
total 34802256
drwxr-xr-x 2 nova nova 4096 Sep 7 13:33 .
drwxr-xr-x 65 nova nova 8192 Sep 9 17:10 ..
-rw-r--r-- 1 libvirt-qemu kvm 8589934592 Sep 8 12:50 21171f1738d671d6801abab7196e4a5460c57af9
-rw-r--r-- 1 libvirt-qemu kvm 16105807872 Sep 9 09:13 3e58771f795c5e889445b424cbce395a69bbfb08
... snip

The nfs mount point is:
1.2.3.4:/data on /var/lib/nova/instances type nfs4 (rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=5.6.7.8,local_lock=none,addr=1.2.3.4)

We can simulate it outside of nova by creating a file of a certain size inside the nfs export. Then in a loop run the touch operation; and in another loop run the copy operation to wherever.
Now and then we see the input/output error.