ceph-mgr dashboard incompatible with cython >= 0.29 (disco)

Bug #1832105 reported by Harry Coin on 2019-06-08
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
High
James Page
Disco
High
James Page
Eoan
High
James Page

Bug Description

[Impact]
The ceph-mgr daemon is unable to load additional module due to a new check in cython >= 0.29. This limits the function of the manager.

[Test Case]
Deploy ceph
Check /var/log/ceph/ceph-mgr.`hostname`.log
Errors about loading rados module in subprocesses will be seen.

[Regression Potential]
The fix from upstream actually just works around this issue by overriding the check that cython does; the code works in a subprocess when loaded multiple times. Regression potential low; cython may produce a longer term fix which means we can drop this patch.

[Original Bug Report]
If Ubuntu is really committed to ceph as I think I've been reading: Notice the ceph dashboard went entirely broken in a major regression of the disco upgrade. It won't load at all in 13.2.4+dfsg1-0ubuntu2.

The detail is ceph-mgr (and lots of ceph) relied on a non-feature in cython that went away in cython v29, to do with sub-interpreters. The ceph folks responded with a hack/workaround to avoid the bug being noticed, and a requirement of the package for an earlier version of cython. This was done some weeks and months ago. Actually fixing the problem is a major project the ceph maintainers are struggling to engage, perhaps waiting for later versions of cython to provide a different way forward.

However, as of today, on disco ths error message remains:

Module 'dashboard' has failed dependency: Interpreter change detected - this module can only be loaded into one interpreter per process.

The ceph primary development platform is Debian, on which the workaround has been available for some time.

However in our ubuntu case, a major feature of a core packge (web health/monitoring/config interface of a distributed file system), was allowed to both ship broken and remain so for a long time, even through today.

I urge quick attention to the necessary backports.
https://github.com/ceph/ceph/pull/25585
http://tracker.ceph.com/issues/38788
http://tracker.ceph.com/issues/37472

James Page (james-page) wrote :

Eoan has Nautilus which has the required fix to the build process for the newer cython.

Changed in ceph (Ubuntu):
status: New → Triaged
importance: Undecided → High
assignee: nobody → James Page (james-page)
Changed in ceph (Ubuntu Eoan):
status: Triaged → Fix Released
Changed in ceph (Ubuntu Disco):
status: New → Triaged
importance: Undecided → High
assignee: nobody → James Page (james-page)
James Page (james-page) wrote :

Cosmic and earlier not impacted as this only impacts with cython >= 0.29.

Neither are UCA backports to bionic which has older cython as well.

James Page (james-page) on 2019-07-03
Changed in ceph (Ubuntu Disco):
status: Triaged → In Progress
James Page (james-page) on 2019-07-03
summary: - ceph-mgr Dashboard entirely broken in Disco
+ ceph-mgr dashboard incompatible with cython >= 0.29 (disco)
James Page (james-page) wrote :

Test packages in:

  https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3535/+packages

initial fix reveal some Python 3 syntax issues which will be fixed at the same time

Harry Coin (hcoin) wrote :

Thanks for the effort. I see effort for ceph v12. Notice that for disco:
ceph -v
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)

with /etc/apt/sources.list.d empty and
/etc/apt/sources.list:

deb http://us.archive.ubuntu.com/ubuntu/ disco main restricted
deb http://us.archive.ubuntu.com/ubuntu/ disco-updates main restricted
deb http://us.archive.ubuntu.com/ubuntu/ disco universe
deb http://us.archive.ubuntu.com/ubuntu/ disco-updates universe
deb http://us.archive.ubuntu.com/ubuntu/ disco multiverse
deb http://us.archive.ubuntu.com/ubuntu/ disco-updates multiverse
deb http://security.ubuntu.com/ubuntu disco-security main restricted
deb http://security.ubuntu.com/ubuntu disco-security universe
deb http://security.ubuntu.com/ubuntu disco-security multiverse
deb http://us.archive.ubuntu.com/ubuntu/ disco-backports main restricted universe multiverse

Harry Coin (hcoin) wrote :

Notice we need the binaries to work in disco it's ceph v13, notice your ppa is only v12. Also Its not enough for users to be forced to build from source in Eoan (and what does Nautilus have to do with the manager web server?)

Here we are about a month after the initial report of this regression and it's still totally non-functional.

Harry Coin (hcoin) wrote :

Looking at the ppa I see nothing that fixes this bug in Eoan, nor disco.
I think you should change 'fix released' in Eaon to 'still broken'.

Download full text (4.4 KiB)

FYI, I'm attempting as suggested to use Nautilus via Eoan.  I've learned
if you have IP6 enabled in disco's ceph.conf none of the osds will load
in eoan / nautilus until you add ms_bind_ipv4 = false to ceph.conf.
Also the dashboard remains broken in eoan / ceph nautilus at least as
far as the simple 'do-release-upgrade --devel' provides.  I wonder if
the dashboard really was tested before the announced 'fix released' was
posted for eoan.

   I don't know all of the causes for the dashboard being broken but one
of them is systemd appears to create manager services for the hostname
and for the hostname.domainname.com (or whatever). so even "ceph mgr
module enable dashboard --force" fails to create a manager with a
working dashboard instance.

Here we see a little example of why our linux world faces problems in
acceptance.  It's one thing for a release to offer a new feature that's
somewhat broken.  It's a whole other thing for a major user-facing
feature (dashboard) of an enterprise/core system (fail-tolerant storage)
next release to obviously never have been tested beforehand and ship
broken.   You want to trust that doesn't happen and not be nervous when
doing release upgrades.

You can understand how that could happen in an entirely community
supported distro but I've seen it in both RHEL (viz: freeipa) and
Ubuntu/ceph.

I appreciate the suggested  'solution' to move to the next version
development set to be released in 4 months.  But then that not only
doesn't restore the desired module but brings the whole cluster offline
until a non-documented flag gets set (ms_bind_ipv4 isn't documented that
I could find, ms_bind_ipv6 is.)

I'm sharing this experience not to complain as such but for
information.  Ubuntu ships with so many notifications about available
upgrades of security and other sorts every log in one feels they must be
ready for prime time or Canonical wouldn't have pushed them out.  Then a
big stopper like this happens.

On 7/12/19 8:33 AM, James Page wrote:
> Sorry wrong PPA:
>
> https://launchpad.net/~ci-train-ppa-
> service/+archive/ubuntu/3534/+packages
>
> ** Description changed:
>
> - If Ubuntu is really committed to ceph as I think I've been reading:
> - Notice the ceph dashboard went entirely broken in a major regression of
> - the disco upgrade. It won't load at all in 13.2.4+dfsg1-0ubuntu2.
> + [Impact]
> + The ceph-mgr daemon is unable to load additional module due to a new check in cython >= 0.29. This limits the function of the manager.
> +
> +
> + [Test Case]
> + Deploy ceph
> + Check /var/log/ceph/ceph-mgr.`hostname`.log
> + Errors about loading rados module in subprocesses will be seen.
> +
> + [Regression Potential]
> + The fix from upstream actually just works around this issue by overriding the check that cython does; the code works in a subprocess when loaded multiple times. Regression potential low; cython may produce a longer term fix which means we can drop this patch.
> +
> + [Original Bug Report]
> + If Ubuntu is really committed to ceph as I think I've been reading: Notice the ceph dashboard went entirely broken in a major regression of the disco upgrade. It won't load at a...

Read more...

James Page (james-page) wrote :

@hcoin

I don't think anyone is recommending the solution to this bug is to use an unreleased development version of Ubuntu which by its nature has not been through full testing. That's just a part of the process - to be able to update a released version of Ubuntu we have to evidence that the same software bug has been fixed in the development release; otherwise when users upgrade to the new release, they regress the fix to this issue.

You'll note activity on this bug - its moving forward and we will provide stable release updates for the fixes into 19.04 (Disco) which *is* the solution to this bug.

If you encounter separate issues please feel free to raise new bugs against the ceph package.

Harry Coin (hcoin) wrote :
Download full text (4.6 KiB)

Here's some help for others facing this:

If the ceph dashboard was working in before upgrading to disco (which
killed it in a regression), then your hope to get it working via upgrade
to nautilus (owing to 'fixed-released' advertising in the bug report)
was to move to ceph v14/nautlius available in ubuntu-eoan.

After 'do-release-upgrade --devel' to eoan / ceph nautilus on every
system running ceph do:

systemctl status ceph<esc>  and make sure there is only one entry there
for every osd/mon/mgr/mds.  On my system there were entries there with
the hostname and with the hostname.domainname as well.

There are a number of other instructions involved in getting nautilus
running, see them here:

http://docs.ceph.com/docs/nautilus/releases/nautilus/

Also one of the osd's that was not managed by LVM was ignored and not
started.  I 'replaced it' with itself and it started backfilling normally.

On the systems meant to run the dashboard, this is now necessary:

apt install ceph-mgr-dashboard

    The following will tell you 'the module is already enabled'.

And when you think you're done and ready to log in ... the screen
accepts your password then does nothing further other than redisplay the
login screen.  If you put in the wrong password, it tells you. The
correct password does nothing.  So,  __on every instance of ceph mgr
even the ones you are not using __ you have to

ceph mgr module disable dashboard

then edit

/usr/share/ceph/mgr/dashboard/services/access_control.py

and change line 186 from self.lastUpdate = int(time.mktime(time.gmtime()))

to

self.lastUpdate = int(time.time())

Be sure to use spaces and not tabs.

Then

ceph mgr module enable dashboard

ceph dashboard ac-user-set-password admin <new password>

And then, you get to where you were before the update to disco with a
working dashboard.  Hopefully this saved you a day or two.

I'm no longer able/interested to test whether mimic's dashboard works in
disco, sorry.   If I'd somehow known an official release would break a
major user facing function on something as central to operations as ceph
I would have skipped disco entirely and waited for eoan.

On 7/12/19 8:33 AM, James Page wrote:
> Sorry wrong PPA:
>
> https://launchpad.net/~ci-train-ppa-
> service/+archive/ubuntu/3534/+packages
>
> ** Description changed:
>
> - If Ubuntu is really committed to ceph as I think I've been reading:
> - Notice the ceph dashboard went entirely broken in a major regression of
> - the disco upgrade. It won't load at all in 13.2.4+dfsg1-0ubuntu2.
> + [Impact]
> + The ceph-mgr daemon is unable to load additional module due to a new check in cython >= 0.29. This limits the function of the manager.
> +
> +
> + [Test Case]
> + Deploy ceph
> + Check /var/log/ceph/ceph-mgr.`hostname`.log
> + Errors about loading rados module in subprocesses will be seen.
> +
> + [Regression Potential]
> + The fix from upstream actually just works around this issue by overriding the check that cython does; the code works in a subprocess when loaded multiple times. Regression potential low; cython may produce a longer term fix which means we can drop this patch.
> +
> + [Original Bug Report]
> + If Ub...

Read more...

Hello Harry, or anyone else affected,

Accepted ceph into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/13.2.6-0ubuntu0.19.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Disco):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco
Rgpublic (rgpublic) wrote :

Okay, I added the proposed repo and pinned the packages as described and then did this:

apt install ceph-mgr/disco-proposed ceph-mon/disco-proposed ceph-osd/disco-proposed ceph-base/disco-proposed ceph-common/disco-proposed libradosstriper1/disco-proposed ceph-mds/disco-proposed ceph/disco-proposed librgw2/disco-proposed

Afterwards, the error message disappeared: "HEALTH_OK". I only did this on the server where the mgr is currently active (ceph -s displayed which server is the active mgr). If I stop the service so that the active mgr changes to some other server where I didn't yet install the proposed packages, the error message appears again.

Summary: The proposed packages seem to be working as intended. The error message disappears. I can still access the Ceph filesystem and I can now access the Ceph dashboard again - everything seems to be working normally. Big thank you to everyone working on this.

One question: If the packages appear on the final non-proposed repository... What would I need to do to switch over to them so everything is back to normal?

James Page (james-page) wrote :

ubuntu@juju-05ad98-disco-proposed-3:~$ sudo ceph -s
  cluster:
    id: 57558fde-a86a-11e9-acd6-fa163e0779e9
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum juju-05ad98-disco-proposed-5,juju-05ad98-disco-proposed-3,juju-05ad98-disco-proposed-4
    mgr: juju-05ad98-disco-proposed-3(active), standbys: juju-05ad98-disco-proposed-4, juju-05ad98-disco-proposed-5
    osd: 3 osds: 3 up, 3 in

  data:
    pools: 3 pools, 44 pgs
    objects: 1 objects, 14 B
    usage: 3.0 GiB used, 27 GiB / 30 GiB avail
    pgs: 44 active+clean

ubuntu@juju-05ad98-disco-proposed-3:~$ apt-cache policy ceph-mon
ceph-mon:
  Installed: 13.2.6-0ubuntu0.19.04.2
  Candidate: 13.2.6-0ubuntu0.19.04.2
  Version table:
 *** 13.2.6-0ubuntu0.19.04.2 500
        500 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     13.2.6-0ubuntu0.19.04.1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu disco-updates/main amd64 Packages
     13.2.4+dfsg1-0ubuntu2.1 500
        500 http://security.ubuntu.com/ubuntu disco-security/main amd64 Packages
     13.2.4+dfsg1-0ubuntu2 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu disco/main amd64 Packages
ubuntu@juju-05ad98-disco-proposed-3:~$ apt-cache policy ceph-mgr
ceph-mgr:
  Installed: 13.2.6-0ubuntu0.19.04.2
  Candidate: 13.2.6-0ubuntu0.19.04.2
  Version table:
 *** 13.2.6-0ubuntu0.19.04.2 500
        500 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     13.2.6-0ubuntu0.19.04.1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu disco-updates/main amd64 Packages
     13.2.4+dfsg1-0ubuntu2.1 500
        500 http://security.ubuntu.com/ubuntu disco-security/main amd64 Packages
     13.2.4+dfsg1-0ubuntu2 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu disco/main amd64 Packages

tags: added: verification-done verification-done-disco
removed: verification-needed verification-needed-disco
James Page (james-page) wrote :

@rgpublic

Once the update is released, you'll just need to upgrade your installed packages to pickup the new version; if you're already installed from disco-proposed then you won't get an update as the binary is identical to the released update version.

Rgpublic (rgpublic) wrote :

@james-page: Thanks a lot for the clarification! I already assumed that, but it's very good to know for sure what will happen.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 13.2.6-0ubuntu0.19.04.2

---------------
ceph (13.2.6-0ubuntu0.19.04.2) disco; urgency=medium

  * d/p/bug1832105.patch: Cherry pick fix to avoid cython interpreter
    check raising import error when loading ceph mgr modules
    (LP: #1832105).
  * d/p/mgr-*.patch: Misc fixes to resolve Python 3 syntax issues
    (LP: #1835354).

 -- James Page <email address hidden> Fri, 12 Jul 2019 12:03:05 +0100

Changed in ceph (Ubuntu Disco):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.