novajoin-server Timeout

Bug #1715893 reported by Cédric Jeanneret deactivated
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
novajoin
Triaged
Medium
Unassigned

Bug Description

Hello,

Apparently there's an issue with novajoin-server when we provision more than three nodes (at least, the issue occurs when we provision 6 nodes).

We can see in nova-compute.log the following entries:

ConnectTimeout: Request to http://10.27.100.1:9090/v1/ timed out

We don't have many more information, but if you guide us we can provide more log stuff if needed.

Thank you!

Cheers,

C.

Changed in tripleo:
milestone: none → queens-1
importance: Undecided → Medium
status: New → Triaged
affects: tripleo → novajoin
Changed in novajoin:
milestone: queens-1 → none
Revision history for this message
Rob Crittenden (rcritten) wrote :

What is the value of vendordata_dynamic_connect_timeout in nova.conf?

Is novajoin-server still running? Can you connect to it manually?

Revision history for this message
Cédric Jeanneret deactivated (cjeanneret-c2c-deactivated) wrote :

Hello Rob,

The vendordata_dynamic_connect_timeout entry is commented out, so I guess it uses the default value, probably 5 (the value in the comment).

Novajoin-server was running all the time, and I could connect.
As said to Juan on IRC earlier today, the timer at the end of the novajoin-server.log entries might explain the timeouts - your question about the vendordata_dynamic_connect_timeout confirms my thoughts, especially since novajoin-server.log didn't show any error.

Juan also pointed a lack of possibility to set the workers (join_workers) in the novajoin-server configuration (/etc/novajoin/join.conf) as well as the lack of doc about that setting. Maybe a mix of both timeout at nova level and workers at novajoin level might correct the issue we got in here.

Unfortunately, we won't re-deploy the whole stack since we're going "live" in less than two weeks, and we're finishing the setup. We might have some "lab" env later, but we don't have ETA for that one :/.
So I won't be able to test/validate anything - we can just speculate one or the other can do something.

My thoughts: workers might be raised (meaning: modification of the novajoin-install process so that it can add the join_workers option) - and, as the undercloud is slow as hell, the timeout in nova.conf can also help a bit - knowing we'll need to ensure there's always a free worker for new requests.

Cheers,

C.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers