OpenStack Object Storage (swift)

Bug #1900845
Comment #7

Comment 7 for bug 1900845

Revision history for this message

Anton (a.porabkovich) wrote on 2020-11-02: Re: [Bug 1900845] Re: ssync replication failure

Здравствуйте
> BTW, if I set "replication_server = True" all PUT request from
replication will fail with "405 Method not allowed"
Вы разнесли сервера репликации и клиентского трафика на разное железо?
Если нет, то ставить ничего не нужно, по коду получается
            if self.replication_server is True:
                for name, m in all_methods:
                    if (getattr(m, 'publicly_accessible', False) and
                            getattr(m, 'replication', False)):
                        self._allowed_methods.append(name)
            elif self.replication_server is False:
                for name, m in all_methods:
                    if (getattr(m, 'publicly_accessible', False) and not
                            getattr(m, 'replication', False)):
                        self._allowed_methods.append(name)
            elif self.replication_server is None:
                for name, m in all_methods:
                    if getattr(m, 'publicly_accessible', False):
                        self._allowed_methods.append(name)

У меня кольцо настроено пер девайс
Ring file /etc/swift/object-1.ring.gz is up-to-date
Devices: id region zone ip address:port replication ip:port name weight
partitions balance flags meta
            0 1 1 10.0.1.11:6061 10.0.1.11:6061 cold0 100.00
      4096 0.00
            1 1 1 10.0.1.11:6062 10.0.1.11:6062 cold1 100.00
      4096 0.00
            2 1 1 10.0.1.11:6063 10.0.1.11:6063 cold2 100.00
      4096 0.00
т.е. каждый диск смотрит в свой порт
и далее
servers_per_port = 2
если ставлю
servers_per_port = больше 2, то перегруз в системе прыгает свыше 80+
особенно при удалении множества объектов в 100+потоков

пн, 2 нояб. 2020 г. в 11:45, Dmitry <email address hidden>:

> I have configured separate replication server processes, reduced
> concurrency, but still there are a lot of errors from ssync.receiver and
> ssync.sender. However all requests from proxy servers are successful and
> rsync replication works well. Disks are not very busy on sender and
> receiver.
> Here is my config for replication server and replicator:
>
> [DEFAULT]
> bind_ip = 172.168.28.135
> bind_port = 6010
> workers = auto
> devices = /import
> client_timeout = 600
> conn_timeout = 5
>
> [pipeline:main]
> pipeline = object-server
>
> [app:object-server]
> use = egg:swift#object
> #replication_server = True
> log_name = replication-server
> log_facility = LOG_LOCAL4
> log_level = DEBUG
> log_address = /dev/log
> replication_concurrency = 0
> replication_concurrency_per_device = 0
> replication_lock_timeout = 45
>
> [object-replicator]
> log_name = object-replicator
> log_facility = LOG_LOCAL1
> log_level = DEBUG
> log_address = /dev/log
> concurrency = 2
> sync_method = ssync
> rsync_timeout = 86400
> rsync_bwlimit = 25m
> node_timeout = 600
> http_timeout = 31600
> lockup_timeout = 87000
> handoffs_first = True
> handoff_delete = 2
>
> BTW, if I set "replication_server = True" all PUT request from
> replication will fail with "405 Method not allowed"
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1890575).
> https://bugs.launchpad.net/bugs/1900845
>
> Title:
> ssync replication failure
>
> Status in OpenStack Object Storage (swift):
> New
>
> Bug description:
> Openstack Train, CentOS 7.
>
> I am trying to replicate partition with ssync, but it fails with this
> error on the receiver:
>
> Oct 21 12:26:53 obj21 object-server[13189]: 172.168.28.135/objects/32
> EXCEPTION in ssync.Receiver: #012Traceback (most recent call
> last):#012 File "/usr/lib/python2.7/site-
> packages/swift/obj/ssync_receiver.py", line 165, in __call__#012
> for data in self.missing_check():#012 File "/usr/lib/python2.7/site-
> packages/swift/obj/ssync_receiver.py", line 350, in missing_check#012
> want = self._check_missing(line)#012 File "/usr/lib/python2.7/site-
> packages/swift/obj/ssync_receiver.py", line 293, in _check_missing#012
> remote = decode_missing(line)#012 File "/usr/lib/python2.7/site-
> packages/swift/obj/ssync_receiver.py", line 41, in decode_missing#012
> t_data = urllib.parse.unquote(parts[1])#012IndexError: list index out
> of range
>
> On the sender side there is message:
> object-replicator: 172.168.28.162:6000/objects/32 120.0 seconds:
> missing_check send line
>
> Number of files in partition: 214945
>
> object-server.conf:
>
> [DEFAULT]
> bind_ip = 172.168.28.135
> bind_port = 6000
> user = swift
> swift_dir = /etc/swift
> devices = /import
> mount_check = true
> log_level = DEBUG
> node_timeout = 120
> [pipeline:main]
> pipeline = healthcheck recon object-server
> [app:object-server]
> use = egg:swift#object
> [filter:healthcheck]
> use = egg:swift#healthcheck
> [filter:recon]
> use = egg:swift#recon
> recon_cache_path = /var/cache/swift
> [object-replicator]
> log_name = object-replicator
> log_facility = LOG_LOCAL1
> log_level = DEBUG
> log_address = /dev/log
> concurrency = 4
> sync_method = ssync
> rsync_timeout = 86400
> rsync_bwlimit = 15m
> http_timeout = 21600
> lockup_timeout = 87000
> handoffs_first = True
> handoff_delete = 2
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/swift/+bug/1900845/+subscriptions
>

У меня кольцо настроено пер девайс
Ring file /etc/swift/object-1.ring.gz is up-to-date
Devices:   id region zone ip address:port replication ip:port  name weight
partitions balance flags meta
            0      1    1  10.0.1.11:6061      10.0.1.11:6061 cold0 100.00
      4096    0.00
            1      1    1  10.0.1.11:6062      10.0.1.11:6062 cold1 100.00
      4096    0.00
            2      1    1  10.0.1.11:6063      10.0.1.11:6063 cold2 100.00
      4096    0.00
т.е. каждый диск смотрит в свой порт
и далее
servers_per_port = 2
если ставлю
servers_per_port =  больше 2, то перегруз в системе прыгает свыше 80+
особенно при удалении множества объектов в 100+потоков

пн, 2 нояб. 2020 г. в 11:45, Dmitry <1900845@bugs.launchpad.net>:

> I have configured separate replication server processes, reduced
> concurrency, but still there are a lot of errors from ssync.receiver and
> ssync.sender. However all requests from proxy servers are successful and
> rsync replication works well. Disks are not very busy on sender and
> receiver.
> Here is my config for replication server and replicator:
>
> [DEFAULT]
> bind_ip = 172.168.28.135
> bind_port = 6010
> workers = auto
> devices = /import
> client_timeout = 600
> conn_timeout = 5
>
> [pipeline:main]
> pipeline = object-server
>
> [app:object-server]
> use = egg:swift#object
> #replication_server = True
> log_name = replication-server
> log_facility = LOG_LOCAL4
> log_level = DEBUG
> log_address = /dev/log
> replication_concurrency = 0
> replication_concurrency_per_device = 0
> replication_lock_timeout = 45
>
> [object-replicator]
> log_name = object-replicator
> log_facility = LOG_LOCAL1
> log_level = DEBUG
> log_address = /dev/log
> concurrency = 2
> sync_method = ssync
> rsync_timeout = 86400
> rsync_bwlimit = 25m
> node_timeout = 600
> http_timeout = 31600
> lockup_timeout = 87000
> handoffs_first = True
> handoff_delete = 2
>
> BTW, if I set "replication_server = True" all PUT request from
> replication will fail with "405 Method not allowed"
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1890575).
> https://bugs.launchpad.net/bugs/1900845
>
> Title:
>   ssync replication failure
>
> Status in OpenStack Object Storage (swift):
>   New
>
> Bug description:
>   Openstack Train, CentOS 7.
>
>   I am trying to replicate partition with ssync, but it fails with this
>   error on the receiver:
>
>   Oct 21 12:26:53 obj21 object-server[13189]: 172.168.28.135/objects/32
>   EXCEPTION in ssync.Receiver: #012Traceback (most recent call
>   last):#012  File "/usr/lib/python2.7/site-
>   packages/swift/obj/ssync_receiver.py", line 165, in __call__#012
>   for data in self.missing_check():#012  File "/usr/lib/python2.7/site-
>   packages/swift/obj/ssync_receiver.py", line 350, in missing_check#012
>   want = self._check_missing(line)#012  File "/usr/lib/python2.7/site-
>   packages/swift/obj/ssync_receiver.py", line 293, in _check_missing#012
>   remote = decode_missing(line)#012  File "/usr/lib/python2.7/site-
>   packages/swift/obj/ssync_receiver.py", line 41, in decode_missing#012
>   t_data = urllib.parse.unquote(parts[1])#012IndexError: list index out
>   of range
>
>   On the sender side there is message:
>   object-replicator: 172.168.28.162:6000/objects/32 120.0 seconds:
> missing_check send line
>
>   Number of files in partition: 214945
>
>   object-server.conf:
>
>   [DEFAULT]
>   bind_ip = 172.168.28.135
>   bind_port = 6000
>   user = swift
>   swift_dir = /etc/swift
>   devices = /import
>   mount_check = true
>   log_level = DEBUG
>   node_timeout = 120
>   [pipeline:main]
>   pipeline = healthcheck recon object-server
>   [app:object-server]
>   use = egg:swift#object
>   [filter:healthcheck]
>   use = egg:swift#healthcheck
>   [filter:recon]
>   use = egg:swift#recon
>   recon_cache_path = /var/cache/swift
>   [object-replicator]
>   log_name = object-replicator
>   log_facility = LOG_LOCAL1
>   log_level = DEBUG
>   log_address = /dev/log
>   concurrency = 4
>   sync_method = ssync
>   rsync_timeout = 86400
>   rsync_bwlimit = 15m
>   http_timeout = 21600
>   lockup_timeout = 87000
>   handoffs_first = True
>   handoff_delete = 2
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/swift/+bug/1900845/+subscriptions
>