[luminous] ceph-mon crashes when it is elected leader (s390x)

Bug #1713032 reported by Andrew McLeod
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Invalid
High
Unassigned
Ubuntu on IBM z Systems
Won't Fix
High
bugproxy
ceph (Ubuntu)
Won't Fix
High
Unassigned

Bug Description

ceph-mon - s390x - 1 of my 3 nodes decides it is the leader, then crashes:

Summary:

2017-08-25 10:30:49.764717 3ff9a7ff910 1 mon.juju-a9ec9d-1@0(electing).elector(105) init, last seen epoch 105
2017-08-25 10:30:55.288336 3ff9a7ff910 0 log_channel(cluster) log [INF] : mon.juju-a9ec9d-1@0 won leader election with quorum 0,1
2017-08-25 10:30:55.487872 3ff9a7ff910 0 log_channel(cluster) log [INF] : HEALTH_ERR; no osds; 1 mons down, quorum 0,1 juju-a9ec9d-1,juju-a9ec9d-0
2017-08-25 10:30:56.047020 3ff8bfff910 0 log_channel(cluster) log [INF] : monmap e1: 3 mons at {juju-a9ec9d-0=10.0.8.105:6789/0,juju-a9ec9d-1=10.0.8.84:6789/0,noname-b=10.0.8.179:6789/0}
2017-08-25 10:30:56.047050 3ff8bfff910 0 log_channel(cluster) log [INF] : pgmap 0 pgs: ; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
2017-08-25 10:30:56.047073 3ff8bfff910 0 log_channel(cluster) log [DBG] : fsmap
2017-08-25 10:30:56.047122 3ff8bfff910 1 mon.juju-a9ec9d-1@0(leader).osd e0 create_pending setting backfillfull_ratio = 0.9
2017-08-25 10:30:56.047135 3ff8bfff910 1 mon.juju-a9ec9d-1@0(leader).osd e0 create_pending setting full_ratio = 0.95
2017-08-25 10:30:56.047137 3ff8bfff910 1 mon.juju-a9ec9d-1@0(leader).osd e0 create_pending setting nearfull_ratio = 0.85
2017-08-25 10:30:56.047288 3ff8bfff910 1 mon.juju-a9ec9d-1@0(leader).osd e0 encode_pending skipping prime_pg_temp; mapping job did not start
2017-08-25 10:30:56.051808 3ff8bfff910 -1 *** Caught signal (Aborted) **
 in thread 3ff8bfff910 thread_name:ms_dispatch

 ceph version 12.1.2 (b661348f156f148d764b998b65b90451f096cb27) luminous (rc)
 1: (()+0x9334b4) [0x2aa0f9334b4]
 2: [0x3ff8bff9b66]
 3: (gsignal()+0x30) [0x3ffa16381b8]
 4: (abort()+0x14e) [0x3ffa1639726]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x19c) [0x3ffa1a28e2c]
 6: (()+0xa6776) [0x3ffa1a26776]
 7: (()+0xa67d8) [0x3ffa1a267d8]
 8: (__cxa_rethrow()+0x64) [0x3ffa1a26adc]
 9: (CrushWrapper::decode(ceph::buffer::list::iterator&)+0xdc2) [0x2aa0f8b4d92]
 10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x5c4) [0x2aa0f739d44]
 11: (OSDMap::decode(ceph::buffer::list&)+0x44) [0x2aa0f73c434]
 12: (OSDMap::apply_incremental(OSDMap::Incremental const&)+0x1782) [0x2aa0f73dbe2]
 13: (OSDMonitor::encode_pending(std::shared_ptr<MonitorDBStore::Transaction>)+0x212) [0x2aa0f55cf3a]
 14: (PaxosService::propose_pending()+0x2be) [0x2aa0f5214d6]
 15: (PaxosService::_active()+0x2be) [0x2aa0f521bbe]
 16: (Context::complete(int)+0x1e) [0x2aa0f3f1d86]
 17: (void finish_contexts<Context>(CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x212) [0x2aa0f3fb0ea]
 18: (Paxos::finish_round()+0x194) [0x2aa0f51937c]
 19: (Paxos::handle_last(boost::intrusive_ptr<MonOpRequest>)+0xfb2) [0x2aa0f51a97a]
 20: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x2d4) [0x2aa0f51b2c4]
 21: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xf20) [0x2aa0f3e86b8]
 22: (Monitor::_ms_dispatch(Message*)+0x64e) [0x2aa0f3e91ae]
 23: (Monitor::ms_dispatch(Message*)+0x34) [0x2aa0f41919c]
 24: (DispatchQueue::entry()+0xf0c) [0x2aa0f8df744]
 25: (DispatchQueue::DispatchThread::entry()+0x18) [0x2aa0f6ed828]
 26: (()+0x7934) [0x3ffa1e87934]
 27: (()+0xedd1a) [0x3ffa16edd1a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Full log:

https://pastebin.canonical.com/196718/

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: ceph 12.1.2-0ubuntu2~cloud0 [origin: Canonical]
ProcVersionSignature: Ubuntu 4.4.0-87.110-generic 4.4.73
Uname: Linux 4.4.0-87-generic s390x
NonfreeKernelModules: ebtable_broute vport_gre ip_gre gre ip_tunnel xt_CT xt_mac xt_physdev br_netfilter xt_set ip_set_hash_net ip_set nfnetlink xt_REDIRECT nf_nat_redirect nf_conntrack_ipv6 ip6table_mangle xt_nat xt_mark xt_connmark ip6table_raw iptable_raw xt_conntrack ipt_REJECT nf_reject_ipv4 ebtable_filter nbd openvswitch nf_defrag_ipv6 ebt_arp ebt_dnat ebt_ip scsi_transport_iscsi binfmt_misc veth ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack zfs zunicode zcommon znvpair spl zavl zlib_deflate iptable_filter ip_tables ebt_snat ebtable_nat ebtables x_tables bridge 8021q garp mrp stp llc xfs libcrc32c dm_snapshot dm_bufio ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 qeth_l2 sha_common chsc_sch eadm_sch qeth ctcm ccwgroup fsm zfcp qdio scsi_transport_fc dasd_eckd_mod dasd_mod
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: s390x
CrashDB:
 {
                "impl": "launchpad",
                "project": "cloud-archive",
                "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml",
             }
Date: Fri Aug 25 11:17:36 2017
SourcePackage: ceph
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Andrew McLeod (admcleod) wrote :
Revision history for this message
Corey Bryant (corey.bryant) wrote :

This may be specific to use with Ceph BlueStore.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Just talked with Andrew and this is not a BlueStore issue.

Ryan Beisner (1chb1n)
tags: added: uosci
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Can you please add the deploy/config info, such as bundle and juju status? Is this openstack-on-lxd, or multi-lpar?

summary: - ceph-mon crashes when it is elected leader (s390x)
+ [luminous] ceph-mon crashes when it is elected leader (s390x)
Revision history for this message
Andrew McLeod (admcleod) wrote :

This was using the openstack-on-lxd and instructions, but was not using zfs backend - it is using directory backend.

Revision history for this message
James Page (james-page) wrote :

I got a failed bootstrap with a quick test - journalctl -xe -u ceph-mon revealed:

Oct 20 13:36:07 juju-e856f0-0 ceph-mon[13887]: error: monitor data filesystem reached concerning levels of available storage space (avail
Oct 20 13:36:07 juju-e856f0-0 ceph-mon[13887]: you may adjust 'mon data avail crit' to a lower value to make this go away (default: 5%)
Oct 20 13:36:07 juju-e856f0-0 systemd[1]: ceph-mon.service: Main process exited, code=exited, status=28/n/a
Oct 20 13:36:07 juju-e856f0-0 systemd[1]: ceph-mon.service: Unit entered failed state.
Oct 20 13:36:07 juju-e856f0-0 systemd[1]: ceph-mon.service: Failed with result 'exit-code'.
Oct 20 13:36:37 juju-e856f0-0 systemd[1]: ceph-mon.service: Service hold-off time over, scheduling restart.
Oct 20 13:36:37 juju-e856f0-0 systemd[1]: Stopped Ceph cluster monitor daemon.

Might be the same issue I guess?

Revision history for this message
James Page (james-page) wrote :

(trying again with more storage this time)

Revision history for this message
James Page (james-page) wrote :

Better:

$ juju status
Model Controller Cloud/Region Version SLA
default localhost-localhost localhost/localhost 2.2.4 unsupported

App Version Status Scale Charm Store Rev OS Notes
ceph-mon 10.2.7 active 3 ceph-mon jujucharms 13 ubuntu

Unit Workload Agent Machine Public address Ports Message
ceph-mon/0 active idle 0 10.0.171.8 Unit is ready and clustered
ceph-mon/1* active idle 1 10.0.171.158 Unit is ready and clustered
ceph-mon/2 active idle 2 10.0.171.93 Unit is ready and clustered

Machine State DNS Inst id Series AZ Message
0 started 10.0.171.8 juju-2b4fd0-0 xenial Running
1 started 10.0.171.158 juju-2b4fd0-1 xenial Running
2 started 10.0.171.93 juju-2b4fd0-2 xenial Running

Relation provider Requirer Interface Type
ceph-mon:mon ceph-mon:mon ceph peer

Revision history for this message
James Page (james-page) wrote :

Note that the original bug report was on an older ceph version (rc)

Revision history for this message
James Page (james-page) wrote :

no - its on a newer ceph version - retrying.

Revision history for this message
James Page (james-page) wrote :
Changed in ceph (Ubuntu):
status: New → Confirmed
Changed in cloud-archive:
status: New → Confirmed
importance: Undecided → High
Changed in ceph (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
Changed in cloud-archive:
status: Confirmed → Triaged
Revision history for this message
James Page (james-page) wrote :

Confirmed on artful and xenial/pike; attached backtrace is from an artful install.

Revision history for this message
James Page (james-page) wrote :

To reproduce on an Ubuntu install on s390x:

sudo apt install juju
juju bootstrap localhost
juju deploy -n 3 --series artful --force ceph-mon

one of the three units will fail with this problem - the other two remain running, but won't form quorum.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → bugproxy (bugproxy)
Revision history for this message
Frank Heimes (fheimes) wrote :

requested to reverse proxy this ticket, because it looks like an s390x code issue

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-160996 severity-high targetmilestone-inin16043
Revision history for this message
Frank Heimes (fheimes) wrote :

re-attaching log from bug description for external visibility

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-11-27 10:33 EDT-------
Does this problem exit on previous levels, and its a regression problem due to package upgrade, or is this a new used case...?

Revision history for this message
Frank Heimes (fheimes) wrote :

This is not a regression. Furthermore the following link let's assume that s390x is not yet upstream supported (as well as ppc64el ?!):
https://download.ceph.com/debian-luminous/dists/xenial/
Would be good to see parity between all architectures.

An attempt to install ceph luminous on Ubuntu on s390x as per the description above allows to reproduce the described situation.

bugproxy (bugproxy)
tags: added: severity-medium
removed: severity-high
Frank Heimes (fheimes)
tags: added: reverse-proxy-bugzilla
Changed in ubuntu-z-systems:
importance: Undecided → High
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-13 10:54 EDT-------
My team don't have experience on this. I recall some folks in Germany has file system type of skills before, asking broader team might help.

Revision history for this message
Andrew McLeod (admcleod) wrote :

this same problem is still present in rocky, using ceph mimic

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-01-09 11:25 EDT-------
From the stack backtrace ...

8: (__cxa_rethrow()+0x64) [0x3ffa1a26adc]
9: (CrushWrapper::decode(ceph::buffer::list::iterator&)+0xdc2) [0x2aa0f8b4d92]
10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x5c4) [0x2aa0f739d44]

... it seems that in CrushWrapper::decode() an exception is thrown.

Looking at the (v12.1.2) code in ceph/src/crush/CrushWrapper.cc (https://github.com/ceph/ceph/blob/v12.1.2/src/crush/CrushWrapper.cc#L2097-L2105) ...

void CrushWrapper::decode(bufferlist::iterator& blp)
{
create();

__u32 magic;
::decode(magic, blp);
if (magic != CRUSH_MAGIC)
throw buffer::malformed_input("bad magic number");

... it looks like the exception is thrown in the MAGIC check.

So perhaps this could be a little/big endian problem when decoding the magic number?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-01-10 08:20 EDT-------
I investigated the encode() / decode() of the magic number bit more:

Ceph seems use internally little endian data format. So the decoding of the magic number (https://github.com/ceph/ceph/blob/v12.1.2/src/crush/CrushWrapper.cc#L2097-L2105) converts from little to big endian on s390.

For __u32 this is done encoding.h: https://github.com/ceph/ceph/blob/v12.1.2/src/include/encoding.h#L109

The conversion is done in the cast operator (https://github.com/ceph/ceph/blob/v12.1.2/src/include/encoding.h#L104) of the "struct ceph_le" template class: https://github.com/ceph/ceph/blob/v12.1.2/src/include/byteorder.h#L70-L72

For the byte swap operation form little to big endian the mswab() function is used: https://github.com/ceph/ceph/blob/v12.1.2/src/include/byteorder.h#L54-L58

The magic number seems to be properly encoded: https://github.com/ceph/ceph/blob/v12.1.2/src/crush/CrushWrapper.cc#L1922-L1927
And decoded: https://github.com/ceph/ceph/blob/v12.1.2/src/crush/CrushWrapper.cc#L2097-L2105

So at least no obvious / trivial problem here.

Frank Heimes (fheimes)
Changed in ceph (Ubuntu):
status: Triaged → In Progress
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Frank Heimes (fheimes) wrote :

We got asked about the package with the debug symbol for ceph-mon - ceph-mon-dbgsym.
Since I couldn't find the packages with the debug symbols in the 'normal' UCA:
deb http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/rocky main
and since there doesn't seem to be a ddeb version of the UCA
I tried downloading the ceph-mon-dbgsym package manually from here:
https://launchpad.net/ubuntu/cosmic/+package/ceph-mon-dbgsym
But when I try to install I get:
sudo apt install ./ceph-mon-dbgsym_13.2.1+dfsg1-0ubuntu2.18.10.1_s390x.ddeb
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'ceph-mon-dbgsym' instead of './ceph-mon-dbgsym_13.2.1+dfsg1-0ubuntu2.18.10.1_s390x.ddeb'
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 ceph-mon-dbgsym : Depends: ceph-mon (= 13.2.1+dfsg1-0ubuntu2.18.10.1) but 13.2.1+dfsg1-0ubuntu2.18.10.1~cloud0 is to be installed
E: Unable to correct problems, you have held broken packages.

So the package with the debug symbols "ceph-mon-dbgsym_13.2.1+dfsg1-0ubuntu2.18.10.1_s390x.ddeb" requires a package "ceph-mon_13.2.1+dfsg1-0ubuntu2.18.10.1_s390x.deb".
But the UCA only provides a package "ceph-mon_13.2.1+dfsg1-0ubuntu2.18.10.1~cloud0_s390x.deb" - notice the suffix "~cloud0".

Any idea to get ceph-mon including debug-symbols installed?

Revision history for this message
Frank Heimes (fheimes) wrote :

Just noticed that all the packages here: https://launchpad.net/ubuntu/+source/ceph/13.2.0-0ubuntu1
don't have the suffix, hence one pick all needed ones from there.
But downloading and installing them manually is cumbersome and probably not the recommended way ...

Revision history for this message
Andrew McLeod (admcleod) wrote :

If you add the debug symbols repo as follows here:

https://wiki.ubuntu.com/Debug%20Symbol%20Packages#Manual_install_of_debug_packages

then the debug symbols should be available for install - it was mentioned that they may be available for luminous but not mimic - in this case, luminous can be used as the bug is exactly the same between both versions

Revision history for this message
bugproxy (bugproxy) wrote : repro-juju-18-04.txt

------- Comment on attachment From <email address hidden> 2019-01-24 15:18 EDT-------

With the help of Frank Helmes I finally managed to build a setup for debugging with gdb.
The attached file shows the steps for establishing a gdb session to the failing ceph-mon process.

Revision history for this message
bugproxy (bugproxy) wrote : 2018-01-24-gdb-session.txt

------- Comment on attachment From <email address hidden> 2019-01-24 15:35 EDT-------

This is the result of the first gdb session with the new setup:

The ceph-mon process (v12.2.8) does not crash in the magic check but instead in "decode_32_or_64_string_map()": https://github.com/ceph/ceph/blob/v12.2.8/src/crush/CrushWrapper.cc#L2515
More precisely somewhere in "::decode_nohead()": https://github.com/ceph/ceph/blob/v12.2.8/src/crush/CrushWrapper.cc#L2465

It is called with a suspicious value for "strlen=50331648" (48 MiB).

Revision history for this message
Frank Heimes (fheimes) wrote :

Waiting for an update from IBM - setting to Incomplete for now.

Changed in ubuntu-z-systems:
status: In Progress → Incomplete
Revision history for this message
Frank Heimes (fheimes) wrote :

I got an update that work is in progress.
Latest status can be seen here: https://github.com/ceph/ceph/pull/30079

Changed in ubuntu-z-systems:
status: Incomplete → In Progress
Revision history for this message
Frank Heimes (fheimes) wrote :

The s390x support got fixed upstream and in Focal/Ussuri
and a backport for nautilus is upstream available, too.
There is currently an SRU in progress to get the backport into nautilus/Eoan.
A separate LP bug was opened for the work on nautilus, see LP 1851290.
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1851290

Since the fix is nautilus and higher only, I'll close this bug as Won't Fix.

Changed in ceph (Ubuntu):
status: In Progress → Won't Fix
Changed in ubuntu-z-systems:
status: In Progress → Won't Fix
Changed in cloud-archive:
status: Triaged → Invalid
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-11-26 09:51 EDT-------
IBM bugzilla status -> Closed, will not be fixed by Canonical. Further processing will be tracked via different LP document in LP.....

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.