Migrate from Ejabberd to Redis for OpenSRF Messaging

Bug #2017941 reported by Bill Erickson
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Wishlist
Unassigned
OpenSRF
Confirmed
Wishlist
Unassigned
Bill Erickson (berick)
Changed in evergreen:
importance: Undecided → Wishlist
Revision history for this message
Bill Erickson (berick) wrote :
Changed in evergreen:
status: New → Confirmed
Changed in opensrf:
status: New → Confirmed
Revision history for this message
Jason Stephenson (jstephenson) wrote :

During the 2023 Evergreen International Conference, the consensus was that switching from Ejabberd to Redis warrants bumping the OpenSRF version number to 4.0. When the OpenSRF milestone is added to this bug, it should be 4.0.

I don't know if updating the Evergreen code warrants such a drastic version bump, but it might not hurt.

Revision history for this message
Galen Charlton (gmc) wrote :

I took this out for a spin, and was able to get it working on Debian Buster (which yes, is Debian oldstable). Some notes on the experience:

- The minimum version of Redis needed is 6.0 because of the ACL stuff. That happened to be available in buster-backports
- At least on Buster, Redis binds only to 127.0.0.1 and ::1, so I needed to edit redis.conf to also added public.localhost and private.localhost
- It was not clear from the OpenSRF README that the Rust router is _required_. Using MONITOR via redis-cli helped point out the problem
- Once I figured out that the Rust stuff is necessary, I ran into the fact that the version of Rust and Cargo packaged for Buster is too old; I had to resort to rustup. But once I had a new enough toolchain, 'make build-opensrf-release && sudo make install-opensrf-release' from a clone of https://github.com/kcls/evergreen-universe-rs did the trick
- I noted that 167 (!) crates were pulled in.
- There was a complaint about one of the dependencies:

warning: the following packages contain code that will be rejected by a future version of Rust: traitobject v0.1.0
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1`

Going up the dependency chain, I found a note by the maintainers of the websocket crate recommending that it _not_ be used for new projects in favor of either tungstenite or tokio-tungstenite.

- Without actually measuring just yet, staff client and OPAC actions felt noticeably zippier on my test VM

Revision history for this message
Galen Charlton (gmc) wrote :

As far as Bullseye is concerned, it packages 1.48, which I don't think is quite new enough. Also, Debian won't have the rust-all package until Bookworm.

Revision history for this message
Bill Erickson (berick) wrote :

Thanks Galen!

As a note of appreciation, I've ported the Websockets code from 'websockets' to 'tungstenite'.

https://github.com/kcls/evergreen-universe-rs/commit/805bfca298990c67591f1e3ec6613cbc49e7adc8

I have been testing the code so far with Ubuntu 22.04 on the assumption that it would be some time before we're ready to seriously move in this direction. I'm not too surprised the Rust bits are somewhat bleeding (lightly wounded?) edge.

I certainly have work to do on the INSTALL docs. I've been in a holding pattern waiting to see how the Rust discussion goes: http://list.evergreen-ils.org/pipermail/evergreen-dev/2023-May/000550.html. Will follow-up here with any changes based on that convo.

Revision history for this message
Bill Erickson (berick) wrote :

Patches pushed to the OpenSRF working branch to modify the C Router to work with Redis. This required replacing/refactoring most of the existing C Router code. I've tested a variety of multi-domain scenarios and watched for memory leaks, etc., but this much change would certainly benefit from a lot of tire kicking.

With this and the previous mods to the Websockets code, we can run the Redis branch now without the Rust additions.

I've updated my auto installer to reflect these changes:

https://github.com/berick/evergreen-ansible-installer/tree/working/ubuntu-22.04-redis

==

https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/lp2017941-opensrf-on-redis-v1

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp2017941-opensrf-on-redis-v1

Revision history for this message
Bill Erickson (berick) wrote :
tags: added: pullrequest
Revision history for this message
Jason Stephenson (jstephenson) wrote (last edit ):

I had a look at the branches, and kicked the tires by running the Perl and PgTap tests. All tests passed, so that's good. I plan to exercise a few features in the coming days and possibly hook this up to a database with a production-sized data set.

I have a few suggestions based on the installation process:

1. The first time that I ran `osrf_control -l --reset-message-bus`, I got the following error: "Could not connect to Redis at private.localhost:6379: Connection refused." This was caused by not configuring Redis to listen on the public.localhost and private.localhost IPs configured in the hosts file. I recommend either adding an instruction to the OpenSRF README to add those IPs to the "bind" line in /etc/redis/redis.conf (my solution) or to change the instruction for /etc/hosts to use 127.0.0.1 for both the public.localhost and private.localhost domains.

2. More help on finding the passwords in the redis-accounts.txt file would be useful. I had to guess which lines they were on, and though it seemed kind of obvious, it may not be obvious to others.

3. Renaming 'redis-accounts.example.txt' to 'redis-accounts.txt.example' to be congruent with the naming of the other example files. (I tried using a for loop in bash to rename the examples, but this one slipped through.)

4. The opensrf_core.xml.example file still says "Jabber" and should be changed to "Redis."

5. A couple of junk files are left behind in the SYSCONFDIR. One is named "password" and the other looks like a password UUID. Both files are empty. They should be cleaned up by the process that sets the passwords.

I noticed 1 difference on the Evergreen side of things. When running 'make check' from ~/Evergreen/, make will attempt another compile of the code. This compile fails in the Redis branch, but succeeds in main. This is minor because 'make check' is supposed to be run from ~/Evergreen/Open-ILS/src/perlmods/, but it might be worth looking into why the compilation fails with 'make check' versus make.

This looks like a promising start. Thanks, Bill!

Revision history for this message
Bill Erickson (berick) wrote :

Thanks, Jason!

Patch pushed to the EG branch to fix the 'make check' issues.

Will look at other items shortly.

Revision history for this message
Bill Erickson (berick) wrote :

1. Going forward, sharing 127.0.0.1 would be simplest. We'll need an upgrade note regardless. I've not made any changes on this one yet.

2. I added some more verbiage to INSTALL.

3. Done.

4. Done.

5. I'm not seeing these junk files in /openils/conf/ or in either of the repositories after building.

Revision history for this message
Bill Erickson (berick) wrote :

Rebased branches pushed:

https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/lp2017941-opensrf-on-redis-v3

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp2017941-opensrf-on-redis-v3

---

I also stepped through and upgrade from Evergreen/OpenSRF main to the Redis branches and documented the process in a new release notes doc (linked from Github for auto-formatting):

https://github.com/berick/Evergreen/blob/user/berick/lp2017941-opensrf-on-redis-v3/docs/RELEASE_NOTES_NEXT/Administration/redis-upgrade.adoc

Upgrade notes will likely need a little love and proofreading, but cover the basics.

Revision history for this message
Bill Erickson (berick) wrote :
Revision history for this message
Bill Erickson (berick) wrote (last edit ):

Opened bug #2041431 to track EG 3.12-specific changes.

Revision history for this message
Bill Erickson (berick) wrote :

Staged a patch here to add usernames to bus addresses for extended domain-level differentiation (i.e. honor router_name)

https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/collab/berick/lp2017941-opensrf-on-redis-v3-bus-addr-gets-username

Will merge to main collab branch once reviewed.

Revision history for this message
Bill Erickson (berick) wrote :

After further testing, I have merged the commit from lp2017941-opensrf-on-redis-v3-bus-addr-gets-username into the main opensrf branch (collab/berick/lp2017941-opensrf-on-redis-v3) for ease of review and testing.

Revision history for this message
Jane Sandberg (sandbergja) wrote :

Thanks, Bill, Galen, and Jason. This is a great step toward a more maintainable/understandable stack. I pushed the Evergreen parts to main for inclusion in 3.12.

The OpenSRF parts still need to be pushed.

Changed in evergreen:
status: Confirmed → Fix Committed
milestone: none → 3.12-beta
tags: added: signedoff
Changed in evergreen:
status: Fix Committed → Fix Released
Changed in opensrf:
milestone: none → 3.3-beta
Galen Charlton (gmc)
Changed in opensrf:
milestone: 3.3-beta → 4.0-beta
Revision history for this message
Jane Sandberg (sandbergja) wrote :

Just adding a note that I ran the OpenSRF branch in a qa server that was exposed to the big scary internet for a few weeks. With the top two commits 9e89bb and 2d5d55, it worked very well (no crashes due to bad input from bots, nice performance, no unexpected issues). Thanks for your continued work on this, Bill!

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Using the latest collab/berick/lp2017941-opensrf-on-redis-v3, I got the following on a VM where services were not running.

osrf_control -l --diagnostic
[auth] WRONGPASS invalid username-password pair, at /usr/share/perl5/Redis.pm line 311.

After starting services and trying again, things worked as normal. We may want to consider resetting the passwords any time that osrf_control runs, not just when starting services.

I was trying to check if services were running on a system that had been restarted a few days ago, and I couldn't remember if I had started services or not.

Revision history for this message
Bill Erickson (berick) wrote :

Makes sense, Jason. Should be a simple osrf_control script change. Will circle back to this soon unless someone else grabs it.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Bill, I'll take a stab at it.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

That wasn't too hard. I pushed a commit to the collab/berick/lp2017941-opensrf-on-redis-v3 branch that works for me.

osrf_control (opensrf-perl.pl) now does a forced message bus reset when the --diagnostic option is used.

Revision history for this message
Bill Erickson (berick) wrote :

Thumbs-up to patch.

Revision history for this message
Bill Erickson (berick) wrote :

For reference:

https://www.linuxfoundation.org/press/linux-foundation-launches-open-source-valkey-community

https://github.com/valkey-io/valkey

Building Valkey is currently a manual process, but work is under way for packages, etc. Confirmed it acts as a drop-in replacement for Redis.

Revision history for this message
Blake GH (bmagic) wrote :

All,

Just noting here that we upgraded a production machine to use this code. As sort of a proof of concept, ready to port back to ejabberd if needs be. We're over a week now on production with no issues. Granted it's a small production machine with little usage (which is why we chose it) but it's running perfectly.

Incidentally, it's also running PG15.

FWIW

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.