Comment 3 for bug 1955783

Revision history for this message
Bhagyashri Shewale (bhagyashri-shewale) wrote :

guilhermesp_> hi team. we are looking at it now
<guilhermesp_> ykarel: dpawlik can you check the systems now?
<ykarel> guilhermesp_, looking
<ykarel> i see bridge.soft is up
<ykarel> others seems still down
<ykarel> like https://images.rdoproject.org/
<guilhermesp_> ok do you have a list of all servers in the same situation now?
<ykarel> i can try to find but may be miss some
<guilhermesp_> ok im doing some more checks on my side meanwhile
<ykarel> guilhermesp_, aren't all our servers on same tenant?
<guilhermesp_> yeah, but initially we noticed a kernel bug on one of the controllers which was preventing the active l3 agent to be released or/and to be forwarding packages
<ykarel> ahhk
<ykarel> so now i can see those infra servers in infra-rdo tenant
<ykarel> and seems they are up
<ykarel> but not reachable
<guilhermesp_> ack
<ykarel> ex trunk-centos8.rdoproject.org
<ykarel> trunk-centos7.rdoproject.org
<ykarel> logserver.rdoproject.org
<ykarel> mirror.regionone.vexxhost.rdoproject.org
<ykarel> so almost all of those
<guilhermesp_> can you check those on infra-rdo now ykarel ?
<ykarel> seems you rebooted bridge.softwarefactory-project.io, right?
<ykarel> ^ is in infra-sf tenant
<guilhermesp_> hum we havent rebooted any server by now, we just fallback some l3 agents
<ykarel> ahhk, may be someone else tried as i see time Dec. 28, 2021, 8:06 a.m.
<guilhermesp_> yep i remember i saw something in the ticket description abouit that
<guilhermesp_> i believe infra-rdo servers should be back
<ykarel> guilhermesp_, me checks
<ykarel> yes seems they are getting up, i see gerrit/zuul are up atleast
<ykarel> https://review.rdoproject.org/zuul/status https://review.rdoproject.org/r
<ykarel> https://images.rdoproject.org/, https://logserver.rdoproject.org/ too
<guilhermesp_> yeah we have recovered most of the routers by now
<guilhermesp_> and we are remediating one of the controllers now
<ykarel> but can't ssh to those servers for some reason, need to check why
<guilhermesp_> huuum weird are they responding to 22?
<ykarel> yes responding
<ykarel> seems some local issue related to socket
<guilhermesp_> what do you see when trying to login to those servers?
<guilhermesp_> ah ok
<guilhermesp_> yeah i think we have fixed the access by now
<guilhermesp_> and we took off this bad controller out of the pool to fix it
<ykarel> guilhermesp_, from bridge node it said
<ykarel> unix_listener: cannot bind to path /<email address hidden>:22.fZjvgwJHmYEKYvVd: No such file or directory
<ykarel> so i created /run/user/1000 and added owner, and now i can ssh
<guilhermesp_> ok good
<ykarel> will note it down and will check with team once they are back, may be it's some known issue
<ykarel> for the router specific issue you can update the ticket as they would be needing all these details
<guilhermesp_> sure i will now