Services not running that should be: haproxy

Bug #1904411 reported by Aurelien Lourot
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph RADOS Gateway Charm
Fix Released
High
Unassigned

Bug Description

Always happening on fresh groovy-victoria deployments:

ceph-radosgw/0* blocked idle 3 172.17.110.8 80/tcp Services not running that should be: haproxy

https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_func_full/openstack/charm-ceph-radosgw/761548/1/7405/index.html

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
summary: - [groovy-victoria] Services not running that should be: haproxy
+ [victoria] Services not running that should be: haproxy
Revision history for this message
Peter Matulis (petermatulis) wrote : Re: [victoria] Services not running that should be: haproxy

FWIW, I was able to successfully deploy ceph-radosgw on focal-victoria during a manual cloud install. These are the pertinent commands:

1. juju deploy -n 4 ceph-osd
2. juju deploy --to lxd:0 ceph-radosgw
3. juju deploy -n 3 --to lxd:0,lxd:1,lxd:2 ceph-mon
4. juju add-relation ceph-mon:osd ceph-osd:mon
5. juju add-relation ceph-mon:radosgw ceph-radosgw:mon

All the above commands were invoked far apart from one another (in terms of time) with the exception of commands #3 and #4.

Changed in charm-ceph-radosgw:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Aurelien Lourot (aurelien-lourot)
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

I'm also able to reproduce it with focal-ussuri locally.

summary: - [victoria] Services not running that should be: haproxy
+ Services not running that should be: haproxy
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

haproxy fails for an unknown reason at deployment but then the service gets restarted too quickly (without pause) and so it can't bind to port 80 so quickly again, because it's probably still bound from the previous process. At some point it exhausts the max number of attempts and the haproxy service remains in `failed` state.

haproxy.log:

Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[5970]: [WARNING] 322/103900 (5970) : Exiting Master process...
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[5970]: [ALERT] 322/103900 (5970) : Current worker #1 (5971) exited with code 143 (Terminated)
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[5970]: [WARNING] 322/103900 (5970) : All workers exited. Exiting... (0)
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[7971]: Proxy stats started.
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[7971]: [ALERT] 322/103900 (7971) : Starting frontend tcp-in_cephradosgw-server: cannot bind socket [0.0.0.0:80]
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[7971]: [ALERT] 322/103900 (7971) : Starting frontend tcp-in_cephradosgw-server: cannot bind socket [:::80]
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[7971]: Proxy stats started.
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[7971]: Proxy cephradosgw-server_10.5.0.8 started.
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[7971]: Proxy cephradosgw-server_10.5.0.8 started.
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[8031]: Proxy stats started.
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[8031]: [ALERT] 322/103900 (8031) : Starting frontend tcp-in_cephradosgw-server: cannot bind socket [0.0.0.0:80]
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[8031]: [ALERT] 322/103900 (8031) : Starting frontend tcp-in_cephradosgw-server: cannot bind socket [:::80]
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[8031]: Proxy stats started.
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[8031]: Proxy cephradosgw-server_10.5.0.8 started.
Nov 18 10:39:00 juju-4d3c7a-zaza-8607d9720117-3 haproxy[8031]: Proxy cephradosgw-server_10.5.0.8 started.
Nov 18 10:39:01 juju-4d3c7a-zaza-8607d9720117-3 haproxy[8081]: Proxy stats started.
...

Manually doing `sudo service haproxy restart` later works fine.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Revision history for this message
Trent Lloyd (lathiat) wrote :

The real reason this fails is because apache2 is still being installed despite not being used. The code for apache2 was removed but got re-added at some point (possibly for the SSL usecase? I didn't finish checking into that)

If you stop apache2 from being installed instead this issue would likely go away.

Revision history for this message
Trent Lloyd (lathiat) wrote :

apache2 package installation was removed but got re-added for some reason in commit 0f3203b18cb62af15ea394fe7f80ef68b1b578da - I didn't finish look into why.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Thanks for the useful pointer Trent! It seems like apache2 has been re-introduced to implement/fix SSL as part of https://bugs.launchpad.net/charms/+source/ceph-radosgw/+bug/1690826 . That piece of work ( https://review.opendev.org/#/c/498122 ) actually does disable_unused_apache_sites(), but maybe this was happening too late and we have a too long time-window in which apache2 takes port 80. It looks like removing apache2 isn't an option but doing disable_unused_apache_sites() earlier is. This new solution has now landed.

Changed in charm-ceph-radosgw:
status: In Progress → Fix Committed
milestone: none → 20.10
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Changed in charm-ceph-radosgw:
status: Fix Committed → Fix Released
assignee: Aurelien Lourot (aurelien-lourot) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.