scheduler falsely reports share service down

Bug #1804208 reported by Maurice Escher on 2018-11-20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Lucio Seki

Bug Description


with a low/default service_down_time in config and a high number (I've seen it with 5) of manila-share services it can happen that a service gets wrongly reported as down in scheduling.

I believe this is because

collects the heartbeat data first and then loops over the services.
E.g. the last service in the loop may be reached after the service_down_time time has passed, the service normally should have received a new heartbeat in the meantime, but the loop operates on old data and does not know.

I propose to let service_is_up do a live check against the database each time, or at least make it configurable for the caller.

I hope my explanation is understandable.


Tom Barron (tpb) on 2018-11-21
tags: added: edge scale
Changed in manila:
importance: Undecided → High
tags: added: backport-potential
Jason Grosso (jgrosso) on 2019-03-22
Changed in manila:
status: New → Triaged
Jason Grosso (jgrosso) on 2019-03-27
Changed in manila:
status: Triaged → New
Jason Grosso (jgrosso) wrote :

This bug will be discussed under the Edge PTG topic

Changed in manila:
status: New → Opinion
status: Opinion → Triaged
Lucio Seki (lseki) wrote :

Hi Maurice,

I started investigating this bug. Tried to reproduce this issue, but no success.

I deployed a DevStack, then configured 8 LVMShareDriver backends and 7 NetAppDriver backends.
Running `manila service-list` a couple seconds after restarting m-shr, I can see all the services up.
The service_down_time is not set, so it's using the default value.

Could you add some details about the environment and the steps you used to find this bug?


Maurice Escher (maurice-escher) wrote :

Hi Lucio,

thanks for investigating.

Maybe it needs actual payload to be visible - I've seen it with 2000 shares and reporting those takes a while.

Now that I think again, I remember especially the server_pools_mapping being too large. I disabled it as an additional workaround. I don't use the PoolWeigher anyhow, which is the only consumer of this share statistics afaik.


Lucio Seki (lseki) wrote :

Thanks for the details, Maurice.

I managed to reproduce the issue with 500 shares using Dummy driver.

Now I'll verify if your approach doing a live check would fix the issue.


Lucio Seki (lseki) on 2019-04-04
Changed in manila:
assignee: nobody → Lucio Seki (lseki)
Lucio Seki (lseki) wrote :

Seems that `_update_host_state_map` is not the only place to fix.

While creating 500 shares, the manila-scheduler log starts printing "Share service is down." several times. However, running `manila service-list` still shows the manila-share service as `up`.

But when I restart manila-share service, running `manila service-list` shows manila-share service as `down` while it's exporting its 500 shares.

Lucio Seki (lseki) wrote :

Actually, even modifying `_upodate_host_state_map` as Maurice suggested, it still shows "Share service is down." while creating the 500 shares:

    def _update_host_state_map(self, context):
        # Get resource usage across the available share nodes:
        topic = CONF.share_topic
        share_services = db.service_get_all_by_topic(context, topic)

        active_hosts = set()
        for service in share_services:
            # Get an updated state of the service
            updated_service = db.service_get(context, service['id'])
            host = updated_service['host']

            # Warn about down services and remove them from host_state_map
            if (not utils.service_is_up(updated_service) or
                LOG.warning("Share service is down. (host: %s).", host)

But despite the warning message, the shares are being created successfully, and `manila service-list` showing manila share-service as `up`.
It's only shown as `down` while exporting the shares, upon manila-share service restart.

Lucio Seki (lseki) wrote :

Sorry, please ignore the comments #4-#6.
It's normal to manila-share service be shown as `down` for a while until re-exporting all the shares.
If it still remains `down` after a long time after restarting, it should be another issue to be addressed in a new bug report.

So I didn't manage to reproduce the issue yet.

Download full text (16.6 KiB)

Hello Team,

I believe I was able to reproduce this issue in my env.
- I added 9 NetApp Share Backends to Manila(Backend details shown in the Session 5th output).
- I am trying to create manila shares continuously in 3 different sessions.
- In an another parallel session I am grepping the logs to see if any Backend is reported as down because of the above bug. -- 4th session
- In an another session I am running "manila service-list | grep ontap | grep down" continuously. -- 5th session

Note : The services were not restarted during this time frame.
       The services were restarted at least 20 mins before running this activity.

I have captured the data for a 10 second timeframe (Fri Apr 12 06:20:39 EDT 2019) to Fri Apr 12 06:20:48 EDT 2019.
Here is the session output from the 5th session.

################################# Session 5 output #################################
root@25-nareshtwo:/home/stack# date
Fri Apr 12 06:20:39 EDT 2019
root@25-nareshtwo:/home/stack# manila service-list
| Id | Binary | Host | Zone | Status | State | Updated_at |
| 1 | manila-share | 25-nareshtwo@london | manila-zone-0 | enabled | down | 2019-04-12T09:58:12.000000 |
| 2 | manila-share | 25-nareshtwo@paris | manila-zone-1 | enabled | down | 2019-04-12T09:58:12.000000 |
| 3 | manila-scheduler | 25-nareshtwo | nova | enabled | up | 2019-04-12T10:20:36.000000 |
| 4 | manila-data | 25-nareshtwo | nova | enabled | up | 2019-04-12T10:20:43.000000 |
| 5 | manila-share | 25-nareshtwo@ontap2 | nova | enabled | up | 2019-04-12T10:20:41.000000 |
| 6 | manila-share | 25-nareshtwo@ontap6 | nova | enabled | up | 2019-04-12T10:20:41.000000 |
| 7 | manila-share | 25-nareshtwo@ontapreplica6 | nova | enabled | up | 2019-04-12T10:20:41.000000 |
| 8 | manila-share | 25-nareshtwo@ontapreplica2 | nova | enabled | up | 2019-04-12T10:20:41.000000 |
| 9 | manila-share | 25-nareshtwo@ontap33 | nova | enabled | up | 2019-04-12T10:20:41.000000 |
| 10 | manila-share | 25-nareshtwo@ontap3 | nova | enabled | up | 2019-04-12T10:20:42.000000 |
| 11 | manila-share | 25-nareshtwo@ontapreplica3 | nova | enabled | up | 2019-04-12T10:20:42.000000 |
| 12 | manila-share | 25-nareshtwo@ontap4 | nova | enabled | up | 2019-04-12T10:20:41.000000 |
| 13 | manila-share | 25-nareshtwo@ontapreplica4 | nova | enabled | up | 2019-04-12T10:20:41.000000 |
root@25-nareshtwo:/home/stack# manila service-list | grep ontap | grep down
root@25-nareshtwo:/home/stack# manila service-list | grep ontap | grep down
root@25-nareshtwo:/home/stack# manila...

wiley (gfhjgfhdfjd) on 2019-05-10
summary: - scheduler falsely reports share service down
+ Tramadol Online ::
wiley (gfhjgfhdfjd) on 2019-05-10
summary: - Tramadol Online ::
+ scheduler falsely reports share now with buy tramadol online service
+ down
summary: - scheduler falsely reports share now with buy tramadol online service
- down
+ Reports share now with buy tramadol online service down
description: updated
tags: added: yourrxpills
removed: backport-potential edge scale
tags: added:
removed: yourrxpills
wiley (gfhjgfhdfjd) on 2019-05-19
summary: - Reports share now with buy tramadol online service down
+ Buy Tramadol online without Prescription in USA
summary: - Buy Tramadol online without Prescription in USA
+ scheduler falsely reports share service down
description: updated
tags: removed:
tags: added: backport-potential edge scale
wiley (gfhjgfhdfjd) on 2019-06-06
description: updated
summary: - scheduler falsely reports share service down
+ buy tramadols online without prescription
wiley (gfhjgfhdfjd) on 2019-06-07
description: updated
Colin Watson (cjwatson) on 2019-06-07
description: updated
summary: - buy tramadols online without prescription
+ scheduler falsely reports share service down
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers