experiencing issues deploying on hp-cloud

Bug #1401266 reported by Charles Butler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MapRCharm
New
Undecided
Unassigned

Bug Description

When deploying mapr on hp-cloud i'm getting consistent failures during clustering and creating the maprfs with the following log output.

2014-12-10 17:09:49 INFO cluster-relation-changed Error 3, No such process. Unable to reach mfs. Check for errors in mfs.log.

i did some additional digging and found the following relevant bits in the mapr logs

2014-12-10 17:09:48,677 18399 RunCmd:197 INFO Disk list :
/opt/mapr/server/mrconfig -h 127.0.0.1 -p 5660 disk list
2014-12-10 17:09:48,684 18399 RunCmd:200 ERROR rc=1
2014-12-10 17:09:48,684 18399 RunCmd:201 ERROR 2014-12-10 17:09:48,6836 ERROR Global mrconfig.cc:333 ListDisks rpc failed Connection reset by peer.(104).
2014-12-10 17:09:48,6837 ERROR Global mrconfig.cc:3694 ListDisk failed Connection reset by peer.(104).
2014-12-10 17:09:48,684 18399 RunCmd:205 INFO Disk list tried. err 1
2014-12-10 17:09:48,684 18399 LogCurrentInfo:962 INFO 2014-12-10 17:09:48,6836 ERROR Global mrconfig.cc:333 ListDisks rpc failed Connection reset by peer.(104).
2014-12-10 17:09:48,6837 ERROR Global mrconfig.cc:3694 ListDisk failed Connection reset by peer.(104).
2014-12-10 17:09:48,685 18399 RunCmd:197 INFO DiskGroup list :
/opt/mapr/server/mrconfig -h 127.0.0.1 -p 5660 dg list
2014-12-10 17:09:48,691 18399 RunCmd:200 ERROR rc=1
2014-12-10 17:09:48,691 18399 RunCmd:201 ERROR 2014-12-10 17:09:48,6906 ERROR Global mrconfig.cc:478 ListDGs rpc failed Connection reset by peer.(104).
2014-12-10 17:09:48,6907 ERROR Global mrconfig.cc:3710 ListDiskGroups failed Connection reset by peer.(104).
2014-12-10 17:09:48,691 18399 RunCmd:205 INFO DiskGroup list tried. err 1
2014-12-10 17:09:48,691 18399 LogCurrentInfo:967 INFO 2014-12-10 17:09:48,6906 ERROR Global mrconfig.cc:478 ListDGs rpc failed Connection reset by peer.(104).
2014-12-10 17:09:48,6907 ERROR Global mrconfig.cc:3710 ListDiskGroups failed Connection reset by peer.(104).
2014-12-10 17:09:48,691 18399 RunCmd:197 INFO sp list :
/opt/mapr/server/mrconfig -h 127.0.0.1 -p 5660 sp list
2014-12-10 17:09:48,698 18399 RunCmd:200 ERROR rc=1
2014-12-10 17:09:48,698 18399 RunCmd:201 ERROR 2014-12-10 17:09:48,6974 ERROR Global mrconfig.cc:536 ListSPs rpc failed Connection reset by peer.(104).
2014-12-10 17:09:48,6975 ERROR Global mrconfig.cc:3744 ProcessSPList failed Connection reset by peer.(104).
2014-12-10 17:09:48,698 18399 RunCmd:205 INFO sp list tried. err 1
2014-12-10 17:09:48,698 18399 LogCurrentInfo:972 INFO 2014-12-10 17:09:48,6974 ERROR Global mrconfig.cc:536 ListSPs rpc failed Connection reset by peer.(104).
2014-12-10 17:09:48,6975 ERROR Global mrconfig.cc:3744 ProcessSPList failed Connection reset by peer.(104).
2014-12-10 17:09:48,698 18399 LogCurrentInfo:974 INFO
Done capturing debug info

2014-12-10 17:09:48,698 18399 RestoreUdevRules:694 INFO Udev restoration not required.
2014-12-10 17:09:48,698 18399 RunCmd:197 INFO Fileserver stop:
/etc/init.d/mapr-mfs stop
2014-12-10 17:09:49,732 18399 ExitDiskSetup:160 ERROR Error 3, No such process. Unable to reach mfs. Check for errors in mfs.log.[' File "/opt/mapr/server/disksetup", line 1253, in <module>\n RunDiskSetup();\n', ' File "/opt/mapr/server/disksetup", line 1133, in RunDiskSetup\n GetMfsUp();\n', ' File "/opt/mapr/server/disksetup", line 1027, in GetMfsUp\n AbortWithError(errno.ESRCH, msg);\n', ' File "/opt/mapr/server/disksetup", line 190, in AbortWithError\n stack_trace = traceback.format_stack(frame)\n']

Revision history for this message
David Tucker (dbtucker) wrote :

Can you confirm that the instances you configured had storage allocated beyond the boot disk ? There should be at least one UNMOUNTED and UNFORMATED storage spindle on each instance that will support the mapr-fileserver service (which for this simple charm is all nodes).

If there is a way to enforce this in the manner that cpu_cores or memory is enforced, I"m happy to add those details to the documentation.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.