object-server can't bind any port if no currernt IP record in object ring when server-per-port enabled

Bug #1636228 reported by Charles Hsu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
In Progress
Undecided
Cheng Li

Bug Description

Object-server tried to bind the ports from object ring when Server-per-port enabled.
If there is no entry (this node's IP and port) in object ring, object-server gets empty list of ports, and it causes object-server doesn't bind any port but it's running without any warning message in swift logs.

If you disable server-per-port, you won't see this issue because object-server will bind the port from object-server/1.conf directly

> >>> from swift.common.storage_policy import BindPortsCache as cache
> >>> bind_ports = cache('/etc/swift', '192.168.200.223')
> >>> bind_ports.all_bind_ports_for_node()
> set([])

And here is the strace of the object-server

select(0, NULL, NULL, NULL, {0, 9964}) = 0 (Timeout)
stat("/etc/swift/object.ring.gz", {st_mode=S_IFREG|0644, st_size=90435, ...}) = 0
stat("/etc/swift/object-1.ring.gz", {st_mode=S_IFREG|0644, st_size=8196, ...}) = 0
sendto(3, "<131>object-server: ## _reload_b"..., 62, 0, NULL, 0) = 62
wait4(0, 0x7ffc4b8ea1a0, WNOHANG, NULL) = -1 ECHILD (No child processes)
select(0, NULL, NULL, NULL, {0, 9930}) = 0 (Timeout)
stat("/etc/swift/object.ring.gz", {st_mode=S_IFREG|0644, st_size=90435, ...}) = 0
stat("/etc/swift/object-1.ring.gz", {st_mode=S_IFREG|0644, st_size=8196, ...}) = 0
sendto(3, "<131>object-server: ## _reload_b"..., 62, 0, NULL, 0) = 62
wait4(0, 0x7ffc4b8ea1a0, WNOHANG, NULL) = -1 ECHILD (No child processes)
select(0, NULL, NULL, NULL, {0, 9978}) = 0 (Timeout)
stat("/etc/swift/object.ring.gz", {st_mode=S_IFREG|0644, st_size=90435, ...}) = 0
stat("/etc/swift/object-1.ring.gz", {st_mode=S_IFREG|0644, st_size=8196, ...}) = 0
sendto(3, "<131>object-server: ## _reload_b"..., 62, 0, NULL, 0) = 62
wait4(0, 0x7ffc4b8ea1a0, WNOHANG, NULL) = -1 ECHILD (No child processes)
select(0, NULL, NULL, NULL, {0, 9967}) = 0 (Timeout)
stat("/etc/swift/object.ring.gz", {st_mode=S_IFREG|0644, st_size=90435, ...}) = 0
stat("/etc/swift/object-1.ring.gz", {st_mode=S_IFREG|0644, st_size=8196, ...}) = 0
sendto(3, "<131>object-server: ## _reload_b"..., 62, 0, NULL, 0) = 62
wait4(0, 0x7ffc4b8ea1a0, WNOHANG, NULL) = -1 ECHILD (No child processes)
select(0, NULL, NULL, NULL, {0, 9952}) = 0 (Timeout)
stat("/etc/swift/object.ring.gz", {st_mode=S_IFREG|0644, st_size=90435, ...}) = 0
stat("/etc/swift/object-1.ring.gz", {st_mode=S_IFREG|0644, st_size=8196, ...}) = 0
sendto(3, "<131>object-server: ## _reload_b"..., 62, 0, NULL, 0) = 62
wait4(0, 0x7ffc4b8ea1a0, WNOHANG, NULL) = -1 ECHILD (No child processes)
select(0, NULL, NULL, NULL, {0, 9963}) = 0 (Timeout)
stat("/etc/swift/object.ring.gz", {st_mode=S_IFREG|0644, st_size=90435, ...}) = 0
stat("/etc/swift/object-1.ring.gz", {st_mode=S_IFREG|0644, st_size=8196, ...}) = 0
sendto(3, "<131>object-server: ## _reload_b"..., 62, 0, NULL, 0) = 62
wait4(0, 0x7ffc4b8ea1a0, WNOHANG, NULL) = -1 ECHILD (No child processes)
select(0, NULL, NULL, NULL, {0, 9974}^CProcess 25178 detached
 <detached ...>

And you only see one object-server process

> root 25178 1 4 14:09 ? 00:02:59 /usr/bin/python /opt/ss/bin/swift-object-server /etc/swift/object-server/1.conf

Here is the logs from object-server.

 Oct 24 15:14:17 dev21 object-server: SIGTERM received
 Oct 24 15:14:17 dev21 object-server: SIGTERM received
 Oct 24 15:14:17 dev21 object-server: Exited
 Oct 24 15:14:17 dev21 object-server: Removing dead child 25192
 Oct 24 15:14:17 dev21 object-server: Exited
 Oct 24 15:14:18 dev21 object-server: Started child 30796
 Oct 24 15:14:18 dev21 object-server: Started child 30799
 Oct 24 15:14:18 dev21 object-server: Started child 30801
 Oct 24 15:14:18 dev21 object-server: Started child 30805
 Oct 24 15:14:18 dev21 object-server: Started child 30808
 Oct 24 15:14:19 dev21 object-server: Started child 30809
 Oct 24 15:14:19 dev21 object-server: Started child 30823
 Oct 24 15:14:19 dev21 object-server: Started child 30832

Revision history for this message
clayg (clay-gerrard) wrote :

I guess ideally the process would ...

1) log a warning (and start listening when a better ring show up?)

2) fail to start (which will hopefully be noticed soon so you can fix it and try again?)

Revision history for this message
Charles Hsu (charles0126) wrote :

I voted for 2

Cheng Li (shcli)
Changed in swift:
assignee: nobody → Cheng Li (shcli)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/395911

Changed in swift:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.