[FFe] ceph firefly stable release

Bug #1278466 reported by James Page on 2014-02-10
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Undecided
James Page
ceph (Ubuntu)
High
James Page
Trusty
High
James Page

Bug Description

Raising this now as upstream have alerted me to the fact that the firefly release date has slipped until the start of March.

Firefly is the next stable release of Ceph, and has alway been the intended release target for 14.04.

As well as the usual bug fixes and stability improvements, firefly introduces:

1) Tiered storage pools
2) Erasure encoding of data
3) Embedded web container for RADOS Gateway.

The full change log since the Emperor stable release can be found here:

http://ceph.com/docs/master/release-notes/

James Page (james-page) on 2014-02-10
Changed in ceph (Ubuntu):
importance: Undecided → High
milestone: none → ubuntu-14.04-beta-1
James Page (james-page) on 2014-03-04
summary: - [FFe] ceph firefly stable update
+ [FFe] ceph firefly 0.78.1 stable update

Rather than uploading to trusty, I've pushed the 0.77 interim release to:

  https://launchpad.net/~ceph-ubuntu/+archive/edgers

I've tested this in a multi-node configuration with OpenStack and it smoke tests OK.

Changed in ceph (Ubuntu Trusty):
assignee: nobody → James Page (james-page)
milestone: ubuntu-14.04-beta-1 → ubuntu-14.04-beta-2
description: updated
James Page (james-page) on 2014-03-05
summary: - [FFe] ceph firefly 0.78.1 stable update
+ [FFe] ceph firefly 0.78.1 stable release
Anders (eddiedog988) on 2014-03-13
Changed in ceph (Ubuntu Trusty):
status: New → Confirmed
James Page (james-page) on 2014-03-13
Changed in ceph (Ubuntu Trusty):
status: Confirmed → New

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ceph (Ubuntu):
status: New → Confirmed
James Page (james-page) wrote :

Update from upstream:

<<<

Hi everyone,

It's taken longer than expected, but the tests for v0.78 are calming down
and it looks like we'll be able to get the release out this week.

However, we've decided NOT to make this release firefly. It will be a
normal development release. This will be the first release that includes
some key new functionality (erasure coding and cache tiering) and although
it is passing our tests we'd like to have some operational experience with
it in more users' hands before we commit to supporting it long term.

The tentative plan is to freeze and then release v0.79 after a normal two
week cycle. This will serve as a 'release candidate' that shaves off a
few rough edges from the pending release (including some improvements with
the API for setting up erasure coded pools). It is possible that 0.79
will turn into firefly, but more likely that we will opt for another two
weeks of hardening and make 0.80 the release we name firefly and maintain
for the long term.

Long story short: 0.78 will be out soon, and you should test it! It is
will vary from the final firefly in a few subtle ways, but any feedback or
usability and bug reports at this point will be very helpful in shaping
things.

Thanks!
sage

>>>

This probably means that the firefly release is still +1month away.

Changed in ceph (Ubuntu Trusty):
status: Confirmed → New
summary: - [FFe] ceph firefly 0.78.1 stable release
+ [FFe] ceph firefly stable release
Dave Walker (davewalker) wrote :

James, thanks for the update.

Based on this, do you think Trusty should:
 - Stick with the current version

Or.

 - Target 0.78 for release, but then...
    - If 0.79 has minimal changes, and improves stability - get that in prior to release, if the dates make it suitable.
    - Review changes between 0.79 and Firefly (potentially 0.79 or more likely .80), if a suitable candidate - pursue MRE). If this is likely, we should both release note - and be confident that the update testing has good levels of rigour, with no config changes.

James Page (james-page) wrote :

1) stick with current version

I don't think this is a good idea; the emperor release we have in archive right now is a stable release but is not schedule for long term focus from upstream.

2) get to firefly in a sane way

I think this is a better approach; interim releases are generally pretty good from Ceph so moving forward during the dev cycle feels like the right thing todo so that the step to firefly either as a zero-day SRU or as a normal SRU won't be so great.

We would need to detail this in the release notes so that the ceph position is clear for 14.04 early adopters.

Note that we already have a MRE for minor releases - this is outside the scope of that.

James Page (james-page) wrote :

Adding ubuntu-sru as they will need to agree to option 2) as laid out in #5

Dave Walker (davewalker) wrote :

James, based on prior history of ceph upstream prior releases and continuing discussions, can we be confident that 0.79->Firefly will be bug fix only and not featureful?

Sage Weil (sage-newdream) wrote :

Dave: correct. There are a few very minor changes going into 0.79 to make the final CLI/REST API experience good. We expect nothing but performance and bug fixes for 0.80.

James Page (james-page) wrote :

OK - so the proposed plan is as follows:

1) Update to 0.78 release this week; this will be tested in PPA first.

2) Update to 0.79 on the assumption that it appears in +2 weeks; again this will be tested in PPA first.

3) SRU the firefly 0.80 release into 14.04 post 17th April.

Dave Walker (davewalker) wrote :

Sage, thanks for your input.

James, If we are committing to Firefly for Trusty, I would suggest getting 0.78 in ASAP to maximise potential exposure. I agree, with testing 0.79 in PPA first, but again - we probably want to try and get it in as soon as possible. That said, if 0.79 is looking late - or concerns with testing, it might be prudent to consider THAT for post release in addition.

As Ceph isn't shipped on any iso media, we are safe from image spin. However, please can you consider raw apt-get, juju charms and the upstream ceph-deploy as part of your 0.79 onwards testing.

Providing final Firefly is bug fixes and transparent optimisation, I would be happy to consider it for SRU/MRE post release. However, please make sure this is well documented on the release notes.

James, I am quite confident you will take personal responsibility to ensure that this process is as smooth as is required.

Thanks.

FFe Granted for 0.78, with condition of clear release notes. Please update this bug as information appears about status of 0.79->Firefly.

James Page (james-page) wrote :

Thanks Daviey

Adding task for release notes so we don't forget; I have 0.77 in local testing - just waiting for 0.78 and will then upload.

James Page (james-page) on 2014-03-23
Changed in ceph (Ubuntu Trusty):
milestone: ubuntu-14.04-beta-2 → ubuntu-14.04
James Page (james-page) wrote :

0.78 uploaded to trusty:

ceph (0.78-0ubuntu1) trusty; urgency=medium

  * New upstream release:
    - d/control: Add xfslib-dev to BD's.
    - d/*: Sync relevant packaging changes from upstream.
    - d/p/*: Drop upstreamed patches.
    - d/p/modules.patch: Mark libcls_user.so and libec_jerasure.so as modules.
    - d/ceph.install: Only install libec_jerasure.so.
  * d/ceph-test.install: Install test binaries to /usr/lib/ceph/bin; they
    really don't need to be installed on the default path.
  * d/{ceph|radosgw|ceph-mds}.lintian-overrides: Add overrides for intentional
    difference in naming and structure between upstart configurations and
    init.d scripts.

 -- James Page <email address hidden> Sat, 22 Mar 2014 18:27:40 +0000

I tested upgrade from emperor and use with cinder and glance over the weekend as well as running a continual IO test on a three node cluster for around 16 hours with no problems.

I've not closed out this bug so we can continue to track 0.79/0.80 between now and release.

Steve Langasek (vorlon) wrote :

Is further action needed here from the release team? There seems to be an agreed plan, but the bug is marked 'new'.

Dave Walker (davewalker) wrote :

Triaged. Keeping bug open pending upstream bugfix release cut.

Changed in ceph (Ubuntu Trusty):
status: New → In Progress
James Page (james-page) on 2014-04-03
Changed in ubuntu-release-notes:
assignee: nobody → James Page (james-page)
Brian Candler (b-candler) wrote :

Just a note for early adopters to beware of:
https://ceph.com/releases/v0-78-released/

"Please note that while it is possible to create and test erasure coded pools in this release, the pools will not be usable when you upgrade to v0.79 as the OSDMap encoding will subtlely change. Please do not populate your test pools with important data that can’t be reloaded."

James Page (james-page) wrote :

I've prepared and tested an update to 0.79; you can see the package builds in:

  https://launchpad.net/~ceph-ubuntu/+archive/edgers

Using these packages, I deployed Ceph 0.79 as part of an OpenStack deployment using Cinder for volume management and ran the entire Tempest volume API test suite against the installation.

I also ran the smallio generator from ceph-test against the install whilst doing this.

In addition I tested an erasure encoded pool using documentation from upstream and ran the rados bench tool against it to confirm it was function.

Requesting an ack from the release team to upload to trusty.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 0.79-0ubuntu1

---------------
ceph (0.79-0ubuntu1) trusty; urgency=medium

  * New upstream release (LP: #1278466):
    - d/p/modules.patch: Refreshed.
    - d/ceph.install: Install all jerasure modules.
 -- James Page <email address hidden> Wed, 09 Apr 2014 11:14:03 +0100

Changed in ceph (Ubuntu Trusty):
status: In Progress → Fix Released
James Page (james-page) on 2014-04-09
Changed in ceph (Ubuntu Trusty):
status: Fix Released → In Progress
James Page (james-page) on 2014-04-14
Changed in ubuntu-release-notes:
status: New → Fix Released
James Page (james-page) wrote :

For anyone following this bug waiting for firefly, I've pushed the 0.80~rc1 to:

  https://launchpad.net/~ceph-ubuntu/+archive/edgers

Tests OK on a 25 node setup.

James Page (james-page) wrote :

0.80 was released today; I've uploaded to the PPA and will do testing tomorrow.

James Page (james-page) wrote :

Synced from Debian for utopic

Changed in ceph (Ubuntu):
status: In Progress → Fix Committed
James Page (james-page) wrote :

I've smoked the proposed 0.80 binaries in a 26 node ceph cluster for the last 12 hours using the smalliorbd test; read/write was sustained at a consistent rate for the duration of the test.

Uploading to -proposed for SRU team review.

Hello James, or anyone else affected,

Accepted ceph into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/ceph/0.80-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ceph (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Henrik Korkuc (mpam) wrote :

hi,

I tested proposed packages on my cluster. I hit 2 bugs with it:
some PGs stuck in deep-scrub
high cpu usage on OSDs

I upgraded to 0.80.1 from PPA, cpu bug did not occur there yet, will see about deep-scrubing. These bugs are known for upstream and were addressed in 0.80.1. I suggest moving 0.80.1 from PPA to proposed

tags: added: verification-failed
removed: verification-needed
David Medberry (med) wrote :

Henrik, can you please reference the bugs fixed (mentioned above). I'm looking through the git log and not seeing anything specifically addressing this.

Henrik Korkuc (mpam) wrote :

http://ceph.com/releases/v0-80-1-firefly-released/

scrubing:
"osd: revert incomplete scrub fix (Samuel Just)"

about cpu usage, I think this one (but not sure, but I use tiering and disks were idle when CPU usage was high, also OSD restart didn't help (tried twice), upgrade to 0.80.1 helped, all or almost all OSDs were affected):
"osd: prevent busy loop when tiering agent can do no work (David Zafman)"

Brian Murray (brian-murray) wrote :

Hello James, or anyone else affected,

Accepted ceph into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/ceph/0.80.1-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: removed: verification-failed
tags: added: verification-needed
James Page (james-page) on 2014-05-15
Changed in ceph (Ubuntu):
status: Fix Committed → Fix Released
Shang Wu (shangwu) wrote :

Will 0.80 be supported for 5 years ?

David Medberry (med) wrote :

Shang, as I understand it, yes, since 0.80 is going into trusty proper, it should be supported as long as Trusty is. Per https://wiki.ubuntu.com/Releases that is April 2019. (That's not to say that Trusty won't update to an even newer version as part of that support process--it certainly could.) I don't speak for Canonical, just using the information Canonical and Ubuntu have publicly provided.

James Page (james-page) wrote :

I've deployed the proposed package for 0.80.1 on a 25 node cluster and soaked it with IO's for 24 hours - no problems.

I've also deployed a smaller cluster with OpenStack Icehouse and run the Cinder tempest API test suite against it - again no problems found.

@Henrik - it would be good to get confirmation that 0.80.1 fixes up your issue and then I think we can consider verfication as done!

James Page (james-page) wrote :

@Shang - the 0.80.x series will be supported for 5 years; however I expect most users will want something newer by +2 years, at which point the Ceph version from 16.04 will be available via the Ubuntu Cloud Archive - interim stable release will also be made available but only with 18 months of support each.

Henrik Korkuc (mpam) wrote :

I just recently upgraded to proposed 0.80.1 instead of ppa.

There are no scrubbing problems in new version.

But my cpu problem looks like is not solved. I suspect cache or erasure coded pools are responsible for problem. But currently I also doing backfilling and data migration so I am not completely sure.

@James, can you create EC pool with cache pool on top of it? Try filling it with few TB of data and check if there is cpu load on idle cluster.

James Page (james-page) wrote :

@Henrik

I can try - do you have some details on how you setup EC + cache pools so I can reproduce as closely as possible?

Also is the cluster actually idle? or is it staging data to/from the EC pool which would consume CPU resources?

Henrik Korkuc (mpam) wrote :

EC pool is 3+1, and cache is with 2 replicas. default crush rules.

I decreased cache to few gigs and [semi]idle cpu usage decreased dramaticaly. So I think it is not a bug, but related to how cache works.

I think we can call it "verification-done", unless someone else have objections?

James Page (james-page) on 2014-05-28
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 0.80.1-0ubuntu1

---------------
ceph (0.80.1-0ubuntu1) trusty; urgency=medium

  * New upstream release stable point release (LP: #1278466).

ceph (0.80-0ubuntu1) trusty; urgency=medium

  * New upstream release stable release (LP: #1278466).
 -- James Page <email address hidden> Wed, 14 May 2014 11:09:37 -0400

Changed in ceph (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related blueprints