razor2 had unknown error during get_server_info

Bug #1819977 reported by Michael Heuberger on 2019-03-13
146
This bug affects 23 people
Affects Status Importance Assigned to Milestone
spamassassin (Ubuntu)
Undecided
Unassigned

Bug Description

Happens on Ubuntu 18.10

```
razor2: razor2 check failed: razor2: razor2 had unknown error during get_server_info at /usr/share/perl5/Mail/SpamAssassin/Plugin/Razor2.pm line 186, <GEN2> line 1. at /usr/share/perl5/Mail/SpamAssassin/Plugin/Razor2.pm line 329.
```

Any clues?

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in spamassassin (Ubuntu):
status: New → Confirmed

Test as root: echo "test" | spamassassin -D razor2 2>&1 | less

Result with error:
Mar 14 12:53:56.923 [27516] dbg: razor2: razor2 is available, version 2.84
Mar 14 12:53:58.439 [27516] warn: razor2: razor2 check failed: razor2: razor2 had unknown error during get_server_i
nfo at /usr/share/perl5/Mail/SpamAssassin/Plugin/Razor2.pm line 186, <GEN3> line 1. at /usr/share/perl5/Mail/SpamAs
sassin/Plugin/Razor2.pm line 329.
 Razor-Log: read_file: 2 items read from /etc/razor/razor-agent.conf
...

There are two ways to fix the error:

1) If you use Amavis and razor_config /etc/razor/razor-agent.conf: (default if you install razor)

# su - amavis -s /bin/bash
# razor-admin -create
# razor-admin -register
Register successful. Identity stored in /etc/mail/spamassassin/razor/identity-kis3udk1FDHj
# razor-admin -discover
# exit

Restart Spamassassin "service spamassassin restart" and test

2) or create a new razor config:

# mkdir /etc/spamassassin/razor
# razor-admin -home=/etc/spamassassin/razor -register
Register successful. Identity stored in /etc/mail/spamassassin/razor/identity-rudk1FD2xs
# razor-admin -home=/etc/spamassassin/razor -create
# razor-admin -home=/etc/spamassassin/razor -discover

Add/Change the following line at the end of /etc/spamassassin/local.cf file:

razor_config /etc/spamassassin/razor/razor-agent.conf

Restart Spamassassin "service spamassassin restart" and test

Result with no error:

echo "test" | spamassassin -D razor2 2>&1 | less
Mar 14 13:09:45.871 [27832] dbg: razor2: razor2 is available, version 2.84
Mar 14 13:09:46.716 [27834] info: util: setuid: ruid=0 euid=0 rgid=0 0 egid=0 0
 Razor-Log: read_file: 2 items read from /etc/razor/razor-agent.conf
 Razor-Log: Computed razorhome from env: /root/.razor
Mar 14 13:09:46.849663 check[27832]: [ 2] [bootup] Logging initiated LogDebugLevel=9 to stdout
...

greetings
frank from administrator.de

Bram Matthys (syzop) wrote :

I had the same issue on a mail server with Ubuntu 16.04.5 LTS. After reading previous comment I ran the following command (note that the username, 'mail' in this case, may be different on your system):
root@mail:~# su -s /bin/bash - mail
mail@mail:~$ razor-admin --discover

That made the "razor2 had unknown error during get_server_info" error go away.

Bram Matthys (syzop) wrote :

Actually, ignore my message. I still have the error.

Strange, could it be that it does not happen 100% of the cases? After all I also see RAZOR2_CHECK and RAZOR2_CF_RANGE_51_100 scores in spamd results a couple of times during the day. That implies that it works.. or at least some of the time....

Thanks but still fail. Tried to recreate based on step 1) and says

$ razor-admin -create
nextserver: Bootstrap discovery failed. Giving up.

Still nothing works :(

Bram Matthys (syzop) wrote :

It's worth mentioning that on the mailinglist at https://sourceforge.net/p/razor/mailman/razor-users/ it has been questioned back in 2018 if the project is dead (or development of the client software, anyway).

Also, when I compare two mail servers under my control:
* On Ubuntu 16.04 LTS where I see the errors the pyzor package is version 0.5.0-4fakesync1.
* On Debian 9.7, where I do not see errors, the pyzor package is 1.0.0-2.

Download full text (29.0 KiB)

Hi,
not sure if this is really needed as Bram and Googolplex are all over that bug and know what they are doing.

But a plain install of the components from the archive works, so it must be some part of the custom post-install configuration that eventually makes it fail. That seems clear to all of you (which is fine) but for others to participate it might be helpful what changes after install you have to do to see the same issue that you do.

$ lxc exec c bash
root@c:~# apt install razor spamassassin
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libauthen-sasl-perl libcrypt-openssl-bignum-perl libcrypt-openssl-random-perl libcrypt-openssl-rsa-perl libdigest-bubblebabble-perl libdigest-hmac-perl libencode-locale-perl libhtml-parser-perl libhtml-tagset-perl libhttp-date-perl libhttp-message-perl
  libio-html-perl libio-socket-inet6-perl libio-socket-ssl-perl liblwp-mediatypes-perl libmail-dkim-perl libmail-spf-perl libmailtools-perl libnet-dns-perl libnet-dns-sec-perl libnet-ip-perl libnet-libidn-perl libnet-smtp-ssl-perl libnet-ssleay-perl libnetaddr-ip-perl
  libsocket6-perl libsys-hostname-long-perl libtimedate-perl liburi-perl perl-openssl-defaults re2c sa-compile spamc
Suggested packages:
  libgssapi-perl libdata-dump-perl libwww-perl libdbi-perl pyzor libencode-detect-perl libgeo-ip-perl libnet-patricia-perl
The following NEW packages will be installed:
  libauthen-sasl-perl libcrypt-openssl-bignum-perl libcrypt-openssl-random-perl libcrypt-openssl-rsa-perl libdigest-bubblebabble-perl libdigest-hmac-perl libencode-locale-perl libhtml-parser-perl libhtml-tagset-perl libhttp-date-perl libhttp-message-perl
  libio-html-perl libio-socket-inet6-perl libio-socket-ssl-perl liblwp-mediatypes-perl libmail-dkim-perl libmail-spf-perl libmailtools-perl libnet-dns-perl libnet-dns-sec-perl libnet-ip-perl libnet-libidn-perl libnet-smtp-ssl-perl libnet-ssleay-perl libnetaddr-ip-perl
  libsocket6-perl libsys-hostname-long-perl libtimedate-perl liburi-perl perl-openssl-defaults razor re2c sa-compile spamassassin spamc
0 upgraded, 35 newly installed, 0 to remove and 53 not upgraded.
Need to get 3331 kB of archives.
After this operation, 11.4 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://archive.ubuntu.com/ubuntu cosmic/main amd64 perl-openssl-defaults amd64 3build1 [7012 B]
Get:2 http://archive.ubuntu.com/ubuntu cosmic/main amd64 libcrypt-openssl-bignum-perl amd64 0.09-1build1 [24.7 kB]
Get:3 http://archive.ubuntu.com/ubuntu cosmic/main amd64 libcrypt-openssl-random-perl amd64 0.15-1 [9884 B]
Get:4 http://archive.ubuntu.com/ubuntu cosmic/main amd64 libcrypt-openssl-rsa-perl amd64 0.30-1 [22.6 kB]
Get:5 http://archive.ubuntu.com/ubuntu cosmic/main amd64 libdigest-bubblebabble-perl all 0.02-2 [7908 B]
Get:6 http://archive.ubuntu.com/ubuntu cosmic/main amd64 libdigest-hmac-perl all 1.03+dfsg-2 [10.3 kB]
Get:7 http://archive.ubuntu.com/ubuntu cosmic/main amd64 libencode-locale-perl all 1.05-1 [12.3 kB]
Get:8 http://archive.ubuntu.com/ubuntu cosmic/main amd64 libhtml-tagset-perl all 3.20-3 [12.1 kB]
Get:9 http://archi...

Bram Matthys (syzop) wrote :

Whoops.. pyzor.. razor.. keep confusing the two. So, a correction to the above:

The razor packages on Ubuntu 16.04 LTS and Debian 9.7 are very similar: 2.85-4.2build1 and 2.85-4.2+b2 respectively. So I guess that is not it.
Similarly, spamassassin is 3.4.2-0ubuntu0.16.04.1 on Ubuntu and 3.4.2-1~deb9u1 on Debian.

So... I don't know why the machine with Ubuntu 16.04 has the problem and the one with Debian 9.7 does not. It may not even be related to the OS... or it may.

In any case, I rm -rf'd the razor files and tried to re-register and well, this kinda confirms the "randomness" of the issue, which was mentioned before:

$ razor-admin -create
$ razor-admin -register
Register successful. Identity stored in ......
$ razor-admin -discover
nextserver: Bootstrap discovery failed. Giving up.
$ razor-admin -discover
nextserver: Bootstrap discovery failed. Giving up.
$ razor-admin -discover
(no error!)

I'm done with it. I don't know if it's an infrastructure problem or a problem with the package. I'm just going to ignore the errors for now.

Martin Thomas (mtlaunchpad) wrote :

Mabe it is a kind of racetime condition (concurrency issue)?

Seems to be a server error from the remote site.

I tested several times with: echo "test" | spamassassin -D razor2 2>&1 | less

Sometimes the error occurred, other times not. The only difference was that - when it worked - it showed:
Mar 15 11:12:48.739649 check[21141]: [ 8] Discovery Server discovery.razor.cloudmark.com replying with nsl=n002.cloudmark.com

whereas on the other hand, when it didn't work, it showed:
Mar 15 11:13:06.607983 check[21146]: [ 5] Razor Discovery Server discovery.razor.cloudmark.com had no valid nsl servers

So for me it seems that the remote site has some trouble, not the client installations.
Hope, they fix it soon.

A DNS lookup of discovery.razor.cloudmark.com occurs. Here, that gives three A records: 208.83.137.118, 208.83.139.205, and 208.83.137.117. When the ..205 is connected, the initial read(2) of 'sn=D...', write(2) of 'a=g...', triggers a read() of 'err=240\r\n'. connect() to ..117 and that error doesn't occur. Thus there's something wrong with at least one of the remote servers and it's pot luck what DNS entry is used.

Jonathan Nichols (jrnichols) wrote :

Mar 17 11:34:00.034823 check[8834]: [ 5] Razor Discovery Server discovery.razor.cloudmark.com had no valid nsl servers
Mar 17 11:34:00.034872 check[8834]: [ 5] Couldn't talk to discovery servers. Will force a bootstrap...
Mar 17 11:34:00.034919 check[8834]: [ 6] no discovery listfile: servers.discovery.lst
Mar 17 11:34:00.034998 check[8834]: [ 5] no listfile: servers.catalogue.lst
Mar 17 11:34:00.035024 check[8834]: [ 6] no discovery listfile: servers.discovery.lst
Mar 17 11:34:00.035055 check[8834]: [ 8] Checking with Razor Discovery Server discovery.razor.cloudmark.com
Mar 17 11:34:00.035135 check[8834]: [ 4] discovery.razor.cloudmark.com << 12
Mar 17 11:34:00.035158 check[8834]: [ 6] a=g&pm=csl
Mar 17 11:34:00.058530 check[8834]: [ 4] discovery.razor.cloudmark.com >> 9
Mar 17 11:34:00.058596 check[8834]: [ 6] response to sent.3
err=240
Mar 17 11:34:00.058691 check[8834]: [ 5] Razor Discovery Server discovery.razor.cloudmark.com had no valid csl servers
Mar 17 11:34:00.058749 check[8834]: [ 4] discovery.razor.cloudmark.com << 12
Mar 17 11:34:00.058768 check[8834]: [ 6] a=g&pm=nsl
Mar 17 11:34:00.082217 check[8834]: [ 4] discovery.razor.cloudmark.com >> 9
Mar 17 11:34:00.082280 check[8834]: [ 6] response to sent.4
err=240
Mar 17 11:34:00.082372 check[8834]: [ 5] Razor Discovery Server discovery.razor.cloudmark.com had no valid nsl servers

15.04 Vivid on this particular machine, and this issue just started happening here. No recent configuration changes.

I too wonder if the project has faded away and isn't being developed anymore.

nyet (nyetwurk) wrote :

Razor servers are dead. Dev is MIA, with absolutely no indication who to contact, or if there are backups of the server, or if there is a way to replicate the server.

Hi nyet, You state this all as fact, but give no references to the source of the information, nor say why you'd know, e.g. an employee of Cloudmark. And what you say is incorrect because 'Razor servers' are not dead, some are still alive.

/etc/cron.d/amavisd-new has

    18 */3 * * * amavis ... sa-sync
    24 1 * * * amavis ... sa-clean

The razor errors occur here every time sa-clean runs,

    2019-03-15 01:24 +0000
    2019-03-16 01:24 +0000
    2019-03-17 01:24 +0000
    2019-03-18 01:24 +0000

But only some of the times sa-sync runs

    2019-02-22 09:18 +0000
    2019-03-13 21:18 +0000
    2019-03-14 09:18 +0000
    2019-03-14 12:18 +0000
    2019-03-14 18:18 +0000
    2019-03-14 21:18 +0000
    2019-03-15 00:18 +0000
    2019-03-15 06:18 +0000
    2019-03-15 09:18 +0000
    2019-03-15 12:18 +0000
    2019-03-16 00:18 +0000
    2019-03-16 09:18 +0000
    2019-03-16 15:18 +0000
    2019-03-16 18:18 +0000
    2019-03-17 03:18 +0000
    2019-03-17 09:18 +0000
    2019-03-17 12:18 +0000
    2019-03-17 18:18 +0000
    2019-03-17 21:18 +0000
    2019-03-18 03:18 +0000

2019-03-17 06:18 +0000 worked fine, for example, and that's because of my explanation about DNS in comment #11.

Christian Betz (crhbetz) wrote :

So you could forward your connections intended to the faulty .205 server to one of the others to mitigate the issue client-side?

https://superuser.com/a/681707 - forwarding through iptables like this seems to work for me. I have no idea about further implications of this workaround, though.

STrRedWolf (strredwolf) wrote :

Chiming in here. I've worked around this via a DNS host entry in /etc/hosts:

208.83.137.117 discovery.razor.cloudmark.com

Great workaround tip thanks @strredwolf.

You can use either 208.83.137.117 or 208.83.137.118, the current problem is with 208.83.139.205.

To test, set the ip you want to test in /etc/hosts. Example:
208.83.139.205 discovery.razor.cloudmark.com

Then:
# echo "test" | spamassassin -D razor2 2>&1 | grep " csl"
...Razor Discovery Server discovery.razor.cloudmark.com had no valid csl servers...
OR
...Discovery Server discovery.razor.cloudmark.com replying with csl=c302.cloudmark.com...

Is hacking hosts file really a good idea? Shouldn't this be fixed within Razor's package in the next update? If so, where to report directly?

schamane (schamane) wrote :

On my end it seems like all 3 IPs that discovery.razor.cloudmark.com is pointing to now show the same error :(

For reference:

$ host discovery.razor.cloudmark.com.
discovery.razor.cloudmark.com has address 208.83.139.205
discovery.razor.cloudmark.com has address 208.83.137.118
discovery.razor.cloudmark.com has address 208.83.137.117

nyet (nyetwurk) wrote :

I tried to contact cloudmark/proofpoint a week ago. Still no response.

I don't think they care.

nyet (nyetwurk) wrote :

Note that proofpoint has no interest in helping anyone that isn't paying them:

https://proofpointcommunities.force.com/community/PPSupCommunityLogin

> If you do not have access to this system, contact your Proofpoint Sales Representative.
>
> If you are having trouble logging in, please contact <email address hidden>. (All other issues > will be disregarded.) All other issues should be addressed by logging a support case or calling your > Hotline number in your support agreement.

nyet (nyetwurk) wrote :

Deb bug is here
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=924583

$ telnet 208.83.137.117 2703
Trying 208.83.137.117...
Connected to 208.83.137.117.
Escape character is '^]'.
sn=C&srl=43940&a=l&a=cg
a=g&pm=csl
err=240
a=g&pm=nsl
err=240

$ telnet 208.83.137.118 2703
Trying 208.83.137.118...
Connected to 208.83.137.118.
Escape character is '^]'.
sn=C&srl=43940&a=l&a=cg
a=g&pm=csl
err=240
a=g&pm=nsl
err=240

$ telnet 208.83.137.205 2703
Trying 208.83.137.205...
Connected to 208.83.137.205.
Escape character is '^]'.
sn=C&srl=43940&a=l&a=cg
a=g&pm=csl
err=240
a=g&pm=nsl
err=240

nyet (nyetwurk) wrote :

I have received a response from cloudmark and they appear to have fixed the issue.
$ telnet 208.83.139.205 2703
Trying 208.83.139.205...
Connected to 208.83.139.205.
Escape character is '^]'.
sn=D&srl=670&a=l&a=cg
a=g&pm=csl
-csl=?
c302.cloudmark.com
c303.cloudmark.com
c301.cloudmark.com
.
a=g&pm=nsl
-nsl=?
n004.cloudmark.com
n002.cloudmark.com
n001.cloudmark.com
n003.cloudmark.com

~$ telnet 208.83.137.117 2703
Trying 208.83.137.117...
Connected to 208.83.137.117.
Escape character is '^]'.
sn=D&srl=670&a=l&a=cg
a=g&pm=csl
-csl=?
c302.cloudmark.com
c303.cloudmark.com
c301.cloudmark.com
.
a=g&pm=nsl
-nsl=?
n002.cloudmark.com
n003.cloudmark.com
n004.cloudmark.com
n001.cloudmark.com
.

$ telnet 208.83.137.118 2703
Trying 208.83.137.118...
Connected to 208.83.137.118.
Escape character is '^]'.
sn=D&srl=670&a=l&a=cg
a=g&pm=csl
-csl=?
c301.cloudmark.com
c303.cloudmark.com
c302.cloudmark.com
.
a=g&pm=nsl
-nsl=?
n001.cloudmark.com
n003.cloudmark.com
n002.cloudmark.com
n004.cloudmark.com
.

Hi nyet, thanks for letting us know. What mechanism did you use to contact CloudMark that worked?

In case of future similar problems, I think the error logged should include the IP address used.

Hey guys, I highly appreciate that you all got this sorted out.
I tried to follow the updates, but there was a lot going on .

My naive question would be if that was really all resolved on the backend and no changes to the Ubuntu packages are needed?

Hi Christian, DNS resolved the domainname to one of three IP addresses and the server listening on one of those developed a fault leading to the client reporting the error, but in an unclear manner. The server owners were alerted thanks to nyet and fixed their server's problem. Thus the software does snot need to change. However, when it occurs again we'll be milling around for a while not understanding the cause because of the poor diagnostic so that could be improved, e.g. including the IP address contacted so we realise it's always the same one. The software could also improve by trying each of the IP addresses in a random order.

Paride Legovini (paride) on 2019-03-25
tags: added: server-triage-discuss
Paride Legovini (paride) on 2019-03-28
tags: removed: server-triage-discuss
Paride Legovini (paride) wrote :

Hey Ralph and others, thanks for following-up and getting to the bottom of this.

It is quite clear that razor2's diagnostic messages could and should be improved: the effort you put into this do show this pretty well. However we believe that this request belongs to the upstream project, and not to Ubuntu, as the problem is not specific to Ubuntu. For this reason I'm setting this bug as Invalid, which in this case means "not to be fixed in Ubuntu": the report itself was perfectly valid! If you still think there are Ubuntu-specific aspects of this issue that should be discussed please do not hesitate to set the bug status back to Confirmed and reopen the discussion.

Changed in spamassassin (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.