Bug #1446873 “ssync doesn't work with replication_server = true” : Bugs : OpenStack Object Storage (swift)

John Dickinson (notmyname) on 2015-04-21

Changed in swift:
milestone:	none → 2.3.0-rc2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-21: Fix proposed to swift (stable/kilo)

#1

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/176110

John Dickinson (notmyname) on 2015-04-22

Changed in swift:
milestone:	2.3.0-rc2 → next-liberty

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-22: Change abandoned on swift (stable/kilo)

#2

Change abandoned by John Dickinson (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/176110
Reason: nope

Revision history for this message

clayg (clay-gerrard) wrote on 2015-05-07: Re: Object Reconstructor doesn't always use replication_ip

#3

Download full text (3.5 KiB)

Hey, so I'm going to hijack this bug since ssync just doesn't work with replication_server = true

vagrant-swift-all-in-one has a branch to setup separate replication servers on different ports [1].

If you use that and try to ssync with some objects that need to move you'll find that the sub-requests inside of sync get 405'd

  May 7 04:18:45 saio object-7040: 127.0.0.1 - - [07/May/2015:04:18:45 +0000] "SSYNC /sdb4/219" 200 - "-" "-" "-" 0.0001 "-" 23181 0
  May 7 04:18:45 saio object-7040: None - - [07/May/2015:04:18:45 +0000] "PUT /sdb4/219/AUTH_test/new-test/115677-B-obj" 405 - "-" "-" "-" 0.0001 "-" 23181 0
  May 7 04:18:46 saio object-7020: 127.0.0.1:7040/sdb4/219 Early disconnect
  May 7 04:18:46 saio object-7040: STDERR: Traceback (most recent call last):#012 File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 483, in handle_one_response#012
   write(b''.join(towrite))#012 File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 426, in write#012 _writelines(towrite)#012 File "/usr/lib/python2.7/socket.py", l
  ine 334, in writelines#012 self.flush()#012 File "/usr/lib/python2.7/socket.py", line 303, in flush#012 self._sock.sendall(view[write_offset:write_offset+buffer_size])#012 File
  "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 376, in sendall#012 tail = self.send(data, flags)#012 File "/usr/local/lib/python2.7/dist-packages/eventlet/g
  reenio/base.py", line 359, in send#012 total_sent += fd.send(data[total_sent:], flags)#012error: [Errno 32] Broken pipe
  May 7 04:18:46 saio object-7040: STDERR: Traceback (most recent call last):
  May 7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/greenpool.py", line 82, in _spawn_n_impl
  May 7 04:18:46 saio object-7040: STDERR: func(*args, **kwargs)
  May 7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 686, in process_request
  May 7 04:18:46 saio object-7040: STDERR: proto.__init__(sock, address, self)
  May 7 04:18:46 saio object-7040: STDERR: File "/usr/lib/python2.7/SocketServer.py", line 649, in __init__
  May 7 04:18:46 saio object-7040: STDERR: self.handle()
  May 7 04:18:46 saio object-7040: STDERR: File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
  May 7 04:18:46 saio object-7040: STDERR: self.handle_one_request()
  May 7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 358, in handle_one_request
  May 7 04:18:46 saio object-7040: STDERR: self.handle_one_response()
  May 7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 507, in handle_one_response
  May 7 04:18:46 saio object-7040: STDERR: while self.environ['eventlet.input'].read(MINIMUM_CHUNK_SIZE):
  May 7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 189, in read
  May 7 04:18:46 saio object-7040: STDERR: return self._chunked_read(self.rfile, length)
  May 7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi...

Hey, so I'm going to hijack this bug since ssync just doesn't work with replication_server = true

vagrant-swift-all-in-one has a branch to setup separate replication servers on different ports [1].

If you use that and try to ssync with some objects that need to move you'll find that the sub-requests inside of sync get 405'd

May  7 04:18:45 saio object-7040: 127.0.0.1 - - [07/May/2015:04:18:45 +0000] "SSYNC /sdb4/219" 200 - "-" "-" "-" 0.0001 "-" 23181 0
  May  7 04:18:45 saio object-7040: None - - [07/May/2015:04:18:45 +0000] "PUT /sdb4/219/AUTH_test/new-test/115677-B-obj" 405 - "-" "-" "-" 0.0001 "-" 23181 0
  May  7 04:18:46 saio object-7020: 127.0.0.1:7040/sdb4/219 Early disconnect
  May  7 04:18:46 saio object-7040: STDERR: Traceback (most recent call last):#012  File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 483, in handle_one_response#012   
   write(b''.join(towrite))#012  File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 426, in write#012    _writelines(towrite)#012  File "/usr/lib/python2.7/socket.py", l
  ine 334, in writelines#012    self.flush()#012  File "/usr/lib/python2.7/socket.py", line 303, in flush#012    self._sock.sendall(view[write_offset:write_offset+buffer_size])#012  File 
  "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 376, in sendall#012    tail = self.send(data, flags)#012  File "/usr/local/lib/python2.7/dist-packages/eventlet/g
  reenio/base.py", line 359, in send#012    total_sent += fd.send(data[total_sent:], flags)#012error: [Errno 32] Broken pipe
  May  7 04:18:46 saio object-7040: STDERR: Traceback (most recent call last):
  May  7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/greenpool.py", line 82, in _spawn_n_impl
  May  7 04:18:46 saio object-7040: STDERR: func(*args, **kwargs)
  May  7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 686, in process_request
  May  7 04:18:46 saio object-7040: STDERR: proto.__init__(sock, address, self)
  May  7 04:18:46 saio object-7040: STDERR: File "/usr/lib/python2.7/SocketServer.py", line 649, in __init__
  May  7 04:18:46 saio object-7040: STDERR: self.handle()
  May  7 04:18:46 saio object-7040: STDERR: File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
  May  7 04:18:46 saio object-7040: STDERR: self.handle_one_request()
  May  7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 358, in handle_one_request
  May  7 04:18:46 saio object-7040: STDERR: self.handle_one_response()
  May  7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 507, in handle_one_response
  May  7 04:18:46 saio object-7040: STDERR: while self.environ['eventlet.input'].read(MINIMUM_CHUNK_SIZE):
  May  7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 189, in read
  May  7 04:18:46 saio object-7040: STDERR: return self._chunked_read(self.rfile, length)
  May  7 04:18:46 saio object-7040: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 179, in _chunked_read
  May  7 04:18:46 saio object-7040: STDERR: self.chunk_length = int(rfile.readline().split(b";", 1)[0], 16)
  May  7 04:18:46 saio object-7040: STDERR: ValueError: invalid literal for int() with base 16: ''

So... we should fix that.  And if that works maybe we can make the GET request pipeline through a SSYNC call?

1. https://github.com/swiftstack/vagrant-swift-all-in-one/tree/replication-servers

summary:

- Object Reconstructor doesn't always use replication_ip
+ ssync doesn't work with replication_server = true

Revision history for this message

clayg (clay-gerrard) wrote on 2015-05-07:

#4

oh, i should say as a work around you can comment out the replication_server line in your replication server's config

clayg (clay-gerrard) on 2015-05-11

tags:

added: ec

Revision history for this message

clayg (clay-gerrard) wrote on 2015-06-25:

#5

@acoles @pluse @swifterdarrell should all be able to confirm this bug

As far as priority, for people currently running with with replication networks for rsync this is a big blocker to using EC (or ssync).

The work around to comment out replication_server = True in your replication server's config file has no negative impact [1] - and begs the question why do we even have the config option to make replication servers not handle the real request methods (as is needed for ssync and the reconstructor).

replication_server = False seems like a sane option for the object servers processes in the client data path - meaning don't answer replication/ssync requests.

But rather than having a three state switch - maybe we only need two - either you can serve all requests (i.e. the *default*), or you can serve all requests except replication/ssync (i.e. replication_server = False).

1. Can any one think of any harm to get rid of the "only serve replication/ssync requests" option that only actually *works* in the one replication scenario where rsync, not the object server, is the primary process for moving data?

Revision history for this message

Samuel Merritt (torgomatic) wrote on 2015-06-25:

#6

I think Clay's right. Replication needs pretty much all the verbs that normal traffic does if we're using ssync, so having the three-state switch is pretty useless. The three states wind up being "client traffic, no replication", "client or replication traffic", and "horribly broken".

I mean, I can sort of see wanting an object server that only does replication *stuff*, so if client traffic winds up getting there then it's rejected. What we have now doesn't do that. Either we need to fix up that third state so it's not horribly broken, or we should just get rid of it.

I usually lean towards getting rid of things, FWIW.

clayg (clay-gerrard) on 2015-06-26

Changed in swift:
importance:	Undecided → Low
importance:	Low → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-06-26: Change abandoned on swift (master)

#7

Change abandoned by Clay Gerrard (<email address hidden>) on branch: master
Review: https://review.openstack.org/176090
Reason: better plan in https://bugs.launchpad.net/swift/+bug/1446873

John Dickinson (notmyname) on 2015-07-06

Changed in swift:
status:	New → Confirmed

Revision history for this message

paul luse (paul-e-luse) wrote on 2015-07-27:

#8

so the last abandoned change update confuses me - what are we doing with this and is there a patch proposed to implement it or not?

Thierry Carrez (ttx) on 2015-09-01

Changed in swift:
milestone:	2.4.0 → next-liberty

Revision history for this message

Matthew Oliver (matt-0) wrote on 2015-09-03:

#9

So to paraphrase, replication_server = True fails because it's stopping Ssync and reconstructure from using verbs other than REPLICATE.

Now that these other replication methods are required replication_server isn't really useful anymore, so maybe it should be removed, anyone who wants to use a second object server for replication only should make sure it's IP and port is specified as the replication IP/ replication port in the ring so user traffic uses it. Alternatively, if we want to filter out user traffic, we'd need to filter by swift_source or a different tx_id or something.

Thierry Carrez (ttx) on 2015-10-05

Changed in swift:
milestone:	2.5.0 → none

Revision history for this message

clayg (clay-gerrard) wrote on 2017-08-24:

#10

Related: https://review.openstack.org/#/c/493774/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-23: Fix merged to swift (master)

#11

Reviewed: https://review.opendev.org/740867
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=9eb81f6e698ef8ac8855c413b0b1bf4e030cb687
Submitter: Zuul
Branch: master

commit 9eb81f6e698ef8ac8855c413b0b1bf4e030cb687
Author: Tim Burke <email address hidden>
Date: Tue Jul 7 21:28:36 2020 -0700

Allow replication servers to handle all request methods

Previously, the replication_server setting could take one of three
states:

     * If unspecified, the server would handle all available methods.
     * If "true", "yes", "on", etc. it would only handle replication
       methods (REPLICATE, SSYNC).
     * If any other value (including blank), it would only handle
       non-replication methods.

    However, because SSYNC tunnels PUTs, POSTs, and DELETEs through
    the same object-server app that's responding to SSYNC, setting
    `replication_server = true` would break the protocol. This has
    been the case ever since ssync was introduced.

    Now, get rid of that second state -- operators can still set
    `replication_server = false` as a principle-of-least-privilege guard
    to ensure proxy-servers can't make replication requests, but replication
    servers will be able to serve all traffic. This will allow replication
    servers to be used as general internal-to-the-cluster endpoints, leaving
    non-replication servers to handle client-driven traffic.

Closes-Bug: #1446873
Change-Id: Ica2b41a52d11cb10c94fa8ad780a201318c4fc87

Changed in swift:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-04: Related fix merged to swift (master)

#12

Reviewed: https://review.opendev.org/735751
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=2a6dfae2f3d50750de2bdb5f31ea52070588e895
Submitter: Zuul
Branch: master

commit 2a6dfae2f3d50750de2bdb5f31ea52070588e895
Author: Tim Burke <email address hidden>
Date: Mon Jun 15 17:09:15 2020 -0700

Allow direct and internal clients to use the replication network

A new header `X-Backend-Use-Replication-Network` is added; if true, use
the replication network instead of the client-data-path network.

Several background daemons are updated to use the replication network:

      * account-reaper
      * container-reconciler
      * container-sharder
      * container-sync
      * object-expirer

    Note that if container-sync is being used to sync data within the same
    cluster, the replication network will only be used when communicating
    with the "source" container; the "destination" traffic will continue to
    use the configured realm endpoint.

    The direct and internal client APIs still default to using the
    client-data-path network; this maintains backwards compatibility for
    external tools written against them.

UpgradeImpact
=============

Until recently, servers configured with

replication_server = true

    would only handle REPLICATE (and, in the case of object servers, SSYNC)
    requests, and would respond 405 Method Not Allowed to other requests.
    When upgrading from Swift 2.25.0 or earlier, remove the config option
    and restart services prior to upgrade to avoid a flood of background
    daemon errors in logs.

    Note that some background daemons find work by querying Swift rather
    than walking local drives that should be available on the replication
    network:

* container-reconciler
* object-expirer

Previosuly these may have been configured without access to the
replication network; ensure they have access before upgrading.

    Closes-Bug: #1883302
    Related-Bug: #1446873
    Related-Change: Ica2b41a52d11cb10c94fa8ad780a201318c4fc87
    Change-Id: Ieef534bf5d5fb53602e875b51c15ef565882fbff

Reviewed:  https://review.opendev.org/735751
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=2a6dfae2f3d50750de2bdb5f31ea52070588e895
Submitter: Zuul
Branch:    master

commit 2a6dfae2f3d50750de2bdb5f31ea52070588e895
Author: Tim Burke <tim.burke@gmail.com>
Date:   Mon Jun 15 17:09:15 2020 -0700

Allow direct and internal clients to use the replication network
    
    A new header `X-Backend-Use-Replication-Network` is added; if true, use
    the replication network instead of the client-data-path network.
    
    Several background daemons are updated to use the replication network:
    
      * account-reaper
      * container-reconciler
      * container-sharder
      * container-sync
      * object-expirer
    
    Note that if container-sync is being used to sync data within the same
    cluster, the replication network will only be used when communicating
    with the "source" container; the "destination" traffic will continue to
    use the configured realm endpoint.
    
    The direct and internal client APIs still default to using the
    client-data-path network; this maintains backwards compatibility for
    external tools written against them.
    
    UpgradeImpact
    =============
    
    Until recently, servers configured with
    
      replication_server = true
    
    would only handle REPLICATE (and, in the case of object servers, SSYNC)
    requests, and would respond 405 Method Not Allowed to other requests.
    When upgrading from Swift 2.25.0 or earlier, remove the config option
    and restart services prior to upgrade to avoid a flood of background
    daemon errors in logs.
    
    Note that some background daemons find work by querying Swift rather
    than walking local drives that should be available on the replication
    network:
    
      * container-reconciler
      * object-expirer
    
    Previosuly these may have been configured without access to the
    replication network; ensure they have access before upgrading.
    
    Closes-Bug: #1883302
    Related-Bug: #1446873
    Related-Change: Ica2b41a52d11cb10c94fa8ad780a201318c4fc87
    Change-Id: Ieef534bf5d5fb53602e875b51c15ef565882fbff

OpenStack Object Storage (swift)

ssync doesn't work with replication_server = true

Bug Description

Other bug subscribers

Remote bug watches