faxgetty segfault

Bug #600219 reported by Valentijn Sessink
66
This bug affects 10 people
Affects Status Importance Assigned to Milestone
HylaFAX
Fix Released
Medium
hylafax (Debian)
Fix Released
Unknown
hylafax (Ubuntu)
Confirmed
Undecided
Giuseppe Sacco

Bug Description

Until recently, my HylaFax installation worked flawlessly. Since upgrade from Dapper, through Hardy, to Lucid, the Faxgetty process segfaults. The message is always the same:

 Jun 30 15:13:18 machinename kernel: [4917851.100056] faxgetty[26509]: segfault at a3c ip 0805c083 sp bf8304b0 error 4 in faxgetty[8048000+72000]

Revision history for this message
Peter Childs (pchilds-bcs) wrote :

This happens while receiving a fax. I've managed to get this exact bug to happen on more than one computer.

If you need any more detail let me know.

Jun 29 16:20:49.30: [ 9678]: SESSION BEGIN 000000090 442074031930
Jun 29 16:20:49.30: [ 9678]: HylaFAX (tm) Version 6.0.3
Jun 29 16:20:49.30: [ 9678]: <-- [4:ATA\r]
Jun 29 16:20:56.26: [ 9678]: --> [7:CONNECT]
Jun 29 16:20:56.26: [ 9678]: ANSWER: FAX CONNECTION DEVICE '/dev/ttyS0'
Jun 29 16:20:56.26: [ 9678]: RECV FAX: begin
Jun 29 16:20:56.26: [ 9678]: <-- data [32]
Jun 29 16:20:56.26: [ 9678]: <-- data [2]
Jun 29 16:20:58.24: [ 9678]: --> [7:CONNECT]
Jun 29 16:20:58.24: [ 9678]: <-- data [23]
Jun 29 16:20:58.24: [ 9678]: <-- data [2]
Jun 29 16:20:58.99: [ 9678]: --> [7:CONNECT]
Jun 29 16:20:58.99: [ 9678]: <-- data [13]
Jun 29 16:20:58.99: [ 9678]: <-- data [2]
Jun 29 16:20:59.48: [ 9678]: --> [2:OK]
Jun 29 16:20:59.48: [ 9678]: <-- [9:AT+FRH=3\r]
Jun 29 16:20:59.94: [ 9678]: --> [7:CONNECT]
Jun 29 16:21:01.69: [ 9678]: --> [2:OK]
Jun 29 16:21:01.69: [ 9678]: RECV recv TSI (sender id)
Jun 29 16:21:01.69: [ 9678]: REMOTE TSI ""
Jun 29 16:21:01.69: [ 9678]: <-- [9:AT+FRH=3\r]
Jun 29 16:21:01.70: [ 9678]: --> [7:CONNECT]
Jun 29 16:21:01.96: [ 9678]: --> [2:OK]
Jun 29 16:21:01.96: [ 9678]: RECV recv DCS (command signal)
Jun 29 16:21:01.96: [ 9678]: REMOTE wants 14400 bit/s
Jun 29 16:21:01.96: [ 9678]: REMOTE wants A4 page width (215 mm)
Jun 29 16:21:01.96: [ 9678]: REMOTE wants unlimited page length
Jun 29 16:21:01.96: [ 9678]: REMOTE wants 7.7 line/mm
Jun 29 16:21:01.96: [ 9678]: REMOTE wants 1-D MH
Jun 29 16:21:01.96: [ 9678]: RECV training at v.17 14400 bit/s
Jun 29 16:21:01.96: [ 9678]: <-- [11:AT+FRM=145\r]
Jun 29 16:21:03.60: [ 9678]: --> [7:CONNECT]
Jun 29 16:21:05.17: [ 9678]: RECV: TCF 2774 bytes, 1% non-zero, 2699 zero-run
Jun 29 16:21:05.17: [ 9678]: --> [10:NO CARRIER]
Jun 29 16:21:05.17: [ 9678]: DELAY 70 ms
Jun 29 16:21:05.24: [ 9678]: TRAINING succeeded
Jun 29 16:21:05.24: [ 9678]: <-- [9:AT+FTH=3\r]
Jun 29 16:21:05.26: [ 9678]: --> [7:CONNECT]
Jun 29 16:21:05.26: [ 9678]: <-- data [3]
Jun 29 16:21:05.26: [ 9678]: <-- data [2]
Jun 29 16:21:06.47: [ 9678]: --> [2:OK]
Jun 29 16:21:06.47: [ 9678]: <-- [11:AT+FRM=146\r]
Jun 29 16:21:07.36: [ 9678]: --> [7:CONNECT]
Jun 29 16:21:07.36: [ 9678]: RECV: begin page

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Could you provide a complete session log created with sessiontracing 0x08FFF ?

Thanks,
Giuseppe

Changed in hylafax (Ubuntu):
assignee: nobody → Giuseppe Sacco (giuseppe-eppesuigoccas)
Revision history for this message
yaztromo (tromo) wrote :

Just to add I am also experiencing this since upgrading from hardy to lucid server. faxgetty will segfault just after Begin Page

Sep 01 12:32:20.60: [23473]: SESSION BEGIN 000022359 441709369264
Sep 01 12:32:20.60: [23473]: HylaFAX (tm) Version 6.0.3
Sep 01 12:32:20.60: [23473]: <-- [4:ATA\r]
Sep 01 12:32:25.81: [23473]: --> [7:CONNECT]
Sep 01 12:32:25.81: [23473]: ANSWER: FAX CONNECTION DEVICE '/dev/ttyS0'
Sep 01 12:32:25.81: [23473]: RECV FAX: begin
Sep 01 12:32:25.83: [23473]: <-- data [32]
Sep 01 12:32:25.83: [23473]: <-- data [2]
Sep 01 12:32:26.80: [23473]: --> [7:CONNECT]
Sep 01 12:32:26.80: [23473]: <-- data [23]
Sep 01 12:32:26.80: [23473]: <-- data [2]
Sep 01 12:32:26.83: [23473]: --> [7:CONNECT]
Sep 01 12:32:26.83: [23473]: <-- data [13]
Sep 01 12:32:26.83: [23473]: <-- data [2]
Sep 01 12:32:28.96: [23473]: --> [2:OK]
Sep 01 12:32:28.96: [23473]: <-- [9:AT+FRH=3\r]
Sep 01 12:32:29.50: [23473]: --> [7:CONNECT]
Sep 01 12:32:30.62: [23473]: --> [2:OK]
Sep 01 12:32:30.62: [23473]: RECV recv DCS (command signal)
Sep 01 12:32:30.62: [23473]: REMOTE wants 14400 bit/s
Sep 01 12:32:30.62: [23473]: REMOTE wants A4 page width (215 mm)
Sep 01 12:32:30.62: [23473]: REMOTE wants unlimited page length
Sep 01 12:32:30.62: [23473]: REMOTE wants 7.7 line/mm
Sep 01 12:32:30.62: [23473]: REMOTE wants 1-D MH
Sep 01 12:32:30.62: [23473]: RECV training at v.17 14400 bit/s
Sep 01 12:32:30.62: [23473]: <-- [11:AT+FRM=145\r]
Sep 01 12:32:32.13: [23473]: --> [7:CONNECT]
Sep 01 12:32:33.95: [23473]: RECV: TCF 2934 bytes, 6% non-zero, 2732 zero-run
Sep 01 12:32:34.16: [23473]: --> [10:NO CARRIER]
Sep 01 12:32:34.16: [23473]: <-- [9:AT+FRS=7\r]
Sep 01 12:32:34.17: [23473]: --> [2:OK]
Sep 01 12:32:34.17: [23473]: TRAINING succeeded
Sep 01 12:32:34.17: [23473]: <-- [9:AT+FTH=3\r]
Sep 01 12:32:34.22: [23473]: --> [7:CONNECT]
Sep 01 12:32:34.22: [23473]: <-- data [3]
Sep 01 12:32:34.22: [23473]: <-- data [2]
Sep 01 12:32:35.46: [23473]: --> [2:OK]
Sep 01 12:32:35.46: [23473]: <-- [11:AT+FRM=146\r]
Sep 01 12:32:37.86: [23473]: --> [7:CONNECT]
Sep 01 12:32:37.86: [23473]: RECV: begin page

And from dmesg:
[316471.104213] faxgetty[23473]: segfault at a3c ip 0805c083 sp bf88e390 error 4 in faxgetty[8048000+72000]

I have changed sessiontracing to 0x08FFF in config.tty0. There's also config and config.sav, do I need to change it there too? I can give you a couple of days output then I will have to revert back to hardy from a dd image since I risk losing the company faxed purchase orders.

Revision history for this message
Leonardo (rnalrd) wrote :

Disabling/stopping apparmor "fixes" the issue for me

Revision history for this message
yaztromo (tromo) wrote :
Download full text (10.4 KiB)

Here are logs with session tracing set to 0x08FFF and server tracing set to 0x0FFFF. Leonardo I will try your solution tomorrow, I have no use for apparmor anyway.

From the faxlog:
Sep 02 12:22:29.69: [ 4541]: SESSION BEGIN <number hidden>
Sep 02 12:22:29.69: [ 4541]: HylaFAX (tm) Version 6.0.3
Sep 02 12:22:29.69: [ 4541]: <-- [4:ATA\r]
Sep 02 12:22:34.94: [ 4541]: --> [7:CONNECT]
Sep 02 12:22:34.94: [ 4541]: ANSWER: FAX CONNECTION DEVICE '/dev/ttyS0'
Sep 02 12:22:34.94: [ 4541]: STATE CHANGE: ANSWERING -> RECEIVING
Sep 02 12:22:34.94: [ 4541]: RECV FAX: begin
Sep 02 12:22:34.95: [ 4541]: <-- HDLC<32:FF C0 04 B5 00 AA 12 9E 36 86 62 82 1A 04 14 2E B6 94 04 6A A6 4E CE 96 F6 76 04 6C 74 0C 74 CC>
Sep 02 12:22:34.95: [ 4541]: <-- data [32]
Sep 02 12:22:34.95: [ 4541]: <-- data [2]
Sep 02 12:22:35.94: [ 4541]: --> [7:CONNECT]
Sep 02 12:22:35.94: [ 4541]: <-- HDLC<23:FF C0 02 CE 4E A6 76 2E 4E 86 0A 64 2E 66 F6 4E C6 A6 A6 42 04 04 04>
Sep 02 12:22:35.94: [ 4541]: <-- data [23]
Sep 02 12:22:35.94: [ 4541]: <-- data [2]
Sep 02 12:22:35.96: [ 4541]: --> [7:CONNECT]
Sep 02 12:22:35.96: [ 4541]: <-- HDLC<13:FF C8 01 00 77 5F 23 01 FB C1 01 01 18>
Sep 02 12:22:35.96: [ 4541]: <-- data [13]
Sep 02 12:22:35.96: [ 4541]: <-- data [2]
Sep 02 12:22:38.08: [ 4541]: --> [2:OK]
Sep 02 12:22:38.08: [ 4541]: <-- [9:AT+FRH=3\r]
Sep 02 12:22:38.57: [ 4541]: --> [7:CONNECT]
Sep 02 12:22:39.74: [ 4541]: --> HDLC<9:FF C8 C1 00 46 1F 00 3D 9B>
Sep 02 12:22:39.75: [ 4541]: --> [2:OK]
Sep 02 12:22:39.75: [ 4541]: RECV recv DCS (command signal)
Sep 02 12:22:39.75: [ 4541]: REMOTE wants 14400 bit/s
Sep 02 12:22:39.75: [ 4541]: REMOTE wants A4 page width (215 mm)
Sep 02 12:22:39.75: [ 4541]: REMOTE wants unlimited page length
Sep 02 12:22:39.75: [ 4541]: REMOTE wants 7.7 line/mm
Sep 02 12:22:39.75: [ 4541]: REMOTE wants 1-D MH
Sep 02 12:22:39.75: [ 4541]: RECV training at v.17 14400 bit/s
Sep 02 12:22:39.75: [ 4541]: <-- [11:AT+FRM=145\r]
Sep 02 12:22:41.24: [ 4541]: --> [7:CONNECT]
Sep 02 12:22:43.06: [ 4541]: RECV: TCF 2934 bytes, 6% non-zero, 2732 zero-run
Sep 02 12:22:43.27: [ 4541]: --> [10:NO CARRIER]
Sep 02 12:22:43.27: [ 4541]: <-- [9:AT+FRS=7\r]
Sep 02 12:22:43.28: [ 4541]: --> [2:OK]
Sep 02 12:22:43.28: [ 4541]: TRAINING succeeded
Sep 02 12:22:43.28: [ 4541]: <-- [9:AT+FTH=3\r]
Sep 02 12:22:43.33: [ 4541]: --> [7:CONNECT]
Sep 02 12:22:43.33: [ 4541]: <-- HDLC<3:FF C8 21>
Sep 02 12:22:43.33: [ 4541]: <-- data [3]
Sep 02 12:22:43.33: [ 4541]: <-- data [2]
Sep 02 12:22:44.56: [ 4541]: --> [2:OK]
Sep 02 12:22:44.56: [ 4541]: <-- [11:AT+FRM=146\r]
Sep 02 12:22:46.96: [ 4541]: --> [7:CONNECT]
Sep 02 12:22:46.96: [ 4541]: MODEM input buffering enabled
Sep 02 12:22:46.96: [ 4541]: RECV: begin page
Sep 02 12:22:47.09: [ 4541]: RECV/CQ: Bad 1D pixel count, row 0, got 1843, expected 1728
Sep 02 12:22:47.38: [ 4541]: RECV/CQ: Bad 1D pixel count, row 1, got 1830, expected 1728
Sep 02 12:22:47.52: [ 4541]: RECV/CQ: Bad 1D pixel count, row 2, got 1745, expected 1728
Sep 02 12:22:47.52: [ 4541]: RECV/CQ: Bad 1D pixel count, row 3, got 3722, expected 1728
Sep 02 12:22:47.66: [ 4541]: RECV/CQ: Bad 1D pixel count, row 4, got 3187, expected 1728
Sep 02 12:22:48.01: [ 4541]: RECV/...

Revision history for this message
yaztromo (tromo) wrote :

Disabling apparmor does not work for me.

I'm reverting the server back to hardy, before I get in trouble for lost faxes :)

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Could you try 6.0.4-10 as published in Debian? It should work out of the box on lucyd.

Thanks,
Giuseppe

Revision history for this message
yaztromo (tromo) wrote :

Tested. Seems to work perfectly.

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Let's wait a couple of days, then I would close this bug report.

Changed in hylafax (Ubuntu):
status: New → Fix Committed
Revision history for this message
yaztromo (tromo) wrote :

Will this fix be released as an updated set of packages for Lucid?

I will keep the hardy hylafax installed until then since it I trust it.

Revision history for this message
Francesco (francesco-colista) wrote :

Installing backport from maverick to lucid does not work.
Hylafax server and client 6.0.4-10

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

In order to collect more information, could you change ServerTracing and SessionTracing to 8FFFF, restart hylafax, produce the error, and then send the relevant log from /var/log/daemon and/or /var/log/messages ?
Thanks

Revision history for this message
Francesco (francesco-colista) wrote :

That's my /var/log/messages.
__________________________________________________________________________________________

Sep 16 11:01:51 itras01 FaxGetty[32411]: ANSWER: FAX CONNECTION DEVICE '/dev/tt
yS11'
Sep 16 11:02:03 itras01 ntpd[798]: kernel time sync status change 6001
Sep 16 11:02:19 itras01 kernel: [854106.814226] faxgetty[32411]: segfault at a3c ip 0805b6a3 sp bf911c60 error 4 in faxgetty[8048000+72000]
Sep 16 11:02:25 itras01 HylaFAX[7364]: checkHostIdentity("itmon01")
Sep 16 11:03:25 itras01 HylaFAX[7365]: checkHostIdentity("itmon01")

The error is recorder to dmesg too:

[854106.814226] faxgetty[32411]: segfault at a3c ip 0805b6a3 sp bf911c60 error 4 in faxgetty[8048000+72000]

I cannot reproduce the error. Anyway, seems that segfaults happen only with ttyS11
I've a multi-modem card with chipset ST16654.
Everything works perfectly with 8.04.

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

[I will ask something somewhat peculiar, I hope not so difficult to execute]
Could you check if you may get any core dumped from faxgetty? Usually segfault would create such file is ulimit does not forbid it.
ulimit is specified in /etc/security/limits.conf, you should add to that file two lines:

uucp soft core 100000
uucp hard core 100000

since I think that file is read at login time, you should logout, login again, and restart hylafax server.
once your faxgetty segfault, you should find its core somewhere, probably in /var/spool/hylafax, then run gdb against the core file. Once in gdb please type the command "bt" for having a backtrace. Send to me its output.

Thank you very much,
Giuseppe

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Francesco,
another (simpler) way to check it using gdb is to follow this instruction http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=553027#15

Bye,
Giuseppe

Revision history for this message
Francesco (francesco-colista) wrote :

Well, with upgrade to kernel 2.6.32-24-generic-pae seems that the segfault doesn't occours.
Anyway, i cannot reproduce the bug.

Revision history for this message
Mike Nielsen (mike-getbent) wrote :

Updating to kernel 2.6.32-24-generic-pae seems to have helped but not fixed my issue. Instead of segfaulting every hour or so I can now get nearly a day out of Hylfax before it crashes.

Apparmor is disabled.

The system is question has 3 modems attached, all different make and models. Regardless of the modem or PID the error is always the same

Sep 25 14:15:11 ITMFAX-XX kernel: [51933.292051] faxgetty[2432]: segfault at a3c ip 0805c083 sp bffdd890 error 4 in faxgetty[8048000+72000]
Sep 25 14:17:29 ITMFAX-XX kernel: [52071.504050] faxgetty[2433]: segfault at a3c ip 0805c083 sp bfb335e0 error 4 in faxgetty[8048000+72000]

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Mike, could you please check it using gdb, as explained on previous messages on this bug report?

Thanks,
Giuseppe

Revision history for this message
Mike Nielsen (mike-getbent) wrote :

Just got a different error, Hylafax was in the middle of receiving at the time.

Sep 27 09:06:17 ITMFAX-XX kernel: [206199.152049] faxgetty[9480]: segfault at 1850 ip b7499785 sp bff8c038 error 4 in libc-2.11.1.so[b7426000+153000]

Revision history for this message
Mike Nielsen (mike-getbent) wrote :

I'm set the limits.conf lines above. Any guess on where I might find the core?

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

If you follow instruction on http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=553027#15 then you should find them in /tmp/faxsend-*

Revision history for this message
Mike Nielsen (mike-getbent) wrote :

I have the faxsend wrappers in place but still get no core dumps on subsequent segfaults. I'm unsure if it matters but I have never segfaulted on a send, only on reception of a fax.

I also upgraded to kernel 2.6.32-25-generic-pae this morning with no change in the problem.

What other information can I get you? Is there another way to wrap the process for debugging?

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Mike, you are right! I didn't mean to use faxsend script for faxsend, bug faxgetty.
In should «killall faxgetty», download the general scrip gdb-wrapper.sh from thttp://people.ifax.com/~aidan/hylafax/gdb/ and run it with args "ttyS0" (or the device name your modem is available).

Revision history for this message
yaztromo (tromo) wrote :

What is the status of this bug. It seems to be ongoing, yet status is marked as fix comitted??

I would like to be able use the lucid hylafax package instead of having hardy packages installed on a lucid server.

Changed in hylafax (Debian):
status: Unknown → Incomplete
Revision history for this message
Antonio J. de Oliveira (ajoliveira) wrote :

Hello

Fresh Lucid server 64-bit install, 2.6.32-25-generic-pae, system hangs one per day after receiving a fax.

Best regards
Antonio

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi all,
I just uploaded a new hylafax package, version 6.0.5-4, with two new binary packages with debugging symbols for client and server. They are currently sitting in the NEW queue for Debian, since they require manual approval for those new binary packages. Once approved it should migrate to Debian unstable and start being rebuilt for all architecture.

You may wait a few days and get it for your architecture, or you may download its source from the NEW queue and rebuild it yourself. Have a look at http://ftp-master.debian.org/new.html for information on the NEW queue.

Please, install all binary packages, included the new ones with debugging information, and try again to use gdb in order to procude a backtrace. The new backtrace should hopefully show all debugging information, and in turns, it should make simpler to find this problem.

Revision history for this message
Antonio J. de Oliveira (ajoliveira) wrote :

Good morning

Regarding #25, the apparent cause for the described hang does not appear to be in hylafax, the machine in question exhibited an intermittent hardware problem which was corrected. Further investigation is being performed, will have more data available by the beginning of next week.

All the best

Antonio

Changed in hylafax (Ubuntu):
status: Fix Committed → Confirmed
Changed in hylafax:
status: Unknown → Confirmed
Revision history for this message
Andreas Oster (aoster) wrote :

Hello all,

what is the current status, is someone working on the problem ?
I have recently upgraded our hylafax installation from an old 4.X
version to 6.0.4 (Ubuntu 10.10). I am using t38modem 1.2.0 and
are now facing the exact same problem with faxgetty segfaulting.
I have recognized, that most of the time this happens, the peer
machine is a software FAX. With hardware based fax machines
this does not seem to happen very often.

I have tested several hylafax versions 6.04, 6.05 and 6.1git all
with the same issues.

Has someone built a working hylafax 4.4.7 deb package for Ubuntu 10.10
that I could try ?

Thank you for your kind help

Andraes

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Andreas,
I received a good news a few hours ago. An ubuntu user told me that Natty packages are working fine on Ubuntu 10.10. Could you try them (and install also -dbg package in order to get gdb traces eventually)?

Thanks,
Giuseppe

Revision history for this message
Andreas Oster (aoster) wrote :

Hello Giuseppe,

thank you for the fast reply.

i will try the new Natty packages and will report my findings.

Unfortunately I have no experience with debugging. How do
I get the traces ? Is there a howto I can read ?

regards

Andreas

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Andreas, check comment #23 for a few instruction about debugging.

Revision history for this message
Andreas Oster (aoster) wrote :

Hello Giuseppe,

I've tested the new packages yesterday evening and the segfault seems
to have gone :-)

Unfortunately I was still not able to resceive my twelve pages test facsimile
from a software fax solution. I have tried several times but always got the
following error message in the logs:

"V.21 signal reception timeout; expected page possibly not received in full"

Any idea what could cause this ?

Thank you for your kind help

Andreas

Revision history for this message
Andreas Oster (aoster) wrote :

Hello Giuseppe,

seems like I have found the problem. While testing, I did
disable ECM on the Cisco gateway (dial-peer). After re-
enabling ECM everything does work as expeceted.

Thanks

Andreas

Changed in hylafax:
importance: Unknown → Medium
Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Can I please ask for the situation at this moment? I am running a fully updated server Ubuntu 10.04 Linux 2.6.32-29 and still have the problem.
Thank you

Simon

Revision history for this message
Francesco (francesco-colista) wrote : Re: [Bug 600219] Re: faxgetty segfault

On Tue, 15 Mar 2011 12:33:30 -0000, "Simon G. Stikkelorum"
<email address hidden> wrote:
> Can I please ask for the situation at this moment? I am running a
> fully updated server Ubuntu 10.04 Linux 2.6.32-29 and still have the
> problem.
> Thank you
>
> Simon

Because of this bug (opened several month ago) i've removed 10.04 and
rolled back to 8.04.

8.04 is not affected.

--
:: Francesco ::
Jabber: <email address hidden>
E-Mail: <email address hidden>
GnuPG: FE9DDD5F

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi,
this bug reports about many different problems. The only way to fix each of them is to supply all information using a debugger as explained in comments #15, #21, and #23. If you may produce that data, then problably you bug will be found and fix.

You may add your data to this report or you may send it directly to me.

Bye,
Giuseppe

Revision history for this message
Antonio J. de Oliveira (ajoliveira) wrote :

Hi

I am running 10.04 server on a 32-bit pc, and everything is running flawlwssly, my previous problem was an intermittent hardware one.

Cheers

Antonio

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Gentlemen,

Thank you for your prompt response. I did not subscribe to this topic, so I am a bit late reacting, sorry. I am happy to see commitment here.

@Giuseppe: I will run faxgetty in the wrapper and get back here with
           results. Although I should be able to dig down into the
           code as well. The problem occures about once every 2
           weeks with a station that does not send ID. Other faxes
           we receive without ID are all spam-advertisements.

@Francesco: It sure is a good thing to know the old version works.
           I have been running hylafax (with this modem) for over
           10 years, so I know it can be as solid as a rock. I can
           allow for this situation to continue a bit longer. So
           I can contribute to solving this. I'd rather go this
           way than to revert back to an old version. I understand
           why you would choose for that, though.

@Antonio: I will check, I have been moving the server. But would
           you not agree that software should not SEGVIO as a
           result of hardware problems?

More information is that the failure seems to occur at the end of the page, not at the beginning as you'd think from the logging. In the recvq I find a tif file of a reasonable length for one page.
But there are two things wrong with the file:
It has the wrong rights (rwx------ to user uucp) and when I do chmod 777 the file still produces no picture. This drives me to the idea that something goes wrong at the end of the page.

That's all for now. I'll start faxgetty in the wrapper and I'll be back with the results. Maybe should take a look at the source; I should have some two weeks before the next failure.

Regards,

Simon.

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Giuseppe,

I have tried the wrapper but it seems not to work. So I tried to jest run the gdb command found in the wrapper from the command line and found that gdb complains that it does not have debugging symbols. I am sorry but I have little experience with debugging with gdb.
I think I have to obtain the source, compile it with specific options and then run it in this gdb command. It is no problem for me to run this directly from the command line, if needed.

Would you please have some pointers on how to go forward?

Regards,

Simon.

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Simon,
you should get packages from Debian and rebuild them. I am unsure if you may install them as they are. They might work. If you get any problem while installing them via dpkg, then you'll ha ve to rebuild them.

In Debian you will find two more packages with names ending in -dbg, that include all debug symbols.

http://packages.debian.org/source/testing/hylafax

Bye
Giuseppe

Revision history for this message
yaztromo (tromo) wrote :

Like Francesco said, I just downgraded to the 8.04 version. You can then hold the package from upgrading using dpkg or aptitude. This has been the situation for me since the release of lucid.

Alternatively you can install the Debian version which also seems unaffected. Which leads me to think that something from Ubuntu patching is causing this maybe?

I would love to bug test this but it's a bit difficult on a live server that receives a lot of faxes :(

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Gents,

In the meanwhile I tried to download and compile the package. No problem with that.
Since I do not want to install it I ran into the problem that some shared lib is missing. Apparently I downloaded the source of an other version than I am running now. I'll re-check and have another try.

@yaztromo: I am aware that this is an escape. Similarly making faxgetty to respawn may make the situation acceptable. The reason I truly want to corner this is that the problem may affect other applications too. For my feeling the problem is less hylafax and more system (libraries).
 I am in a position where I can experiment a bit more and when I set everything up, just one failure will help a lot. We can accept that (for a better world).

One more question yaztromo; my server does not receive that much faxes and fails once every two weeks. Would you say the failure is related to the time gettyfax is running, or is it related to the number of faxes received, or is it related to the fax sender, or none of the above?

Regards, Simon.

Revision history for this message
yaztromo (tromo) wrote :

@Simon

We receive about 30 faxes per day. During the time I was running the Lucid version of Hylafax I would see faxgetty segfault at least twice or more each day. So in my opinion it would appear to be related to how often you receive a fax.

You can find some logs I made at the time further up. The segfault could happen at anytime but it was usually as it began to receive the page. Several times it indicated that faxgetty segfaulted in libtiff, though looking at libtiff is possibly a red herring.

Revision history for this message
Nancy Saia (nwsaia) wrote :

Did we ever find a solution to this? Running 10.04.2 LTS. Would greatly appreciate a patch.

In the mean time, could anyone provide a script that watches the logs for faxgetty to crash and restarts hylafax or faxgetty when this happens? I am not much of a wizard.

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Hi Nancy (et Al)

I have been investing some time in an attempt to corner this problem.
What I did:
= Downloading the latest version of Hylafax only to find out my box is
  running an older version.
= Downloading the right version and trying to compile it. Then you find
  out that the shared library is newer than the configuration allows.
= After adapting the configuration file I was able to make a faxgetty
  with debug options.
= Since March 24th that faxgetty is running, not failing yet.

So, if it fails I should have some more insight. At the same time I
would like to mention that, on my box, the fax causing the problem
produces a broken/damaged/unreadable .tiff file. So, although the
logging suggests that things go wrong at the beginning of
reception, I'd suspect the problem lies in some special action
(Checksum?) needed when closing the TIFF file.
This fits with the suggestion of Yaztromo, who suggested that
libtiff may be to blame. Also my faxgetty is not failing yet, and I
noted that libtiff has been updated recently! Maybe 1 + 1 = 2?

Your suggestion to restart is called "respawn" and you used to
write one line in /etc/inittab for that. But this is organized
differently in recent versions of Ubuntu. Sorry I can't help you
there.

Regards,

Simon.

Revision history for this message
Hugo (hugohg34) wrote :

I did this in PHP and run it as root in /etc/crontab

PHP:
<?php
/**
* Comprueba que faxgetty se ejecuta y en caso contrario reinicia el demonio
* Check if faxgetty runs and if not restart the service
*
*/

$proceso='faxgetty'; //process
$busqueda='faxgett'; //find process
$comando="ps -Al -cmd| grep $busqueda"; //command !!!
$fecha=date("d-m-Y h:i");

$resultado=exec($comando); //execute command

### find $busqueda in $resultado ###
if(stripos($resultado,$proceso))
{
  echo "Proceso encontrado - sin acciones"; //OK
}
else
{
  echo "Proceso no encontrado reinicinado server hylafax";
  exec("/etc/init.d/hylafax restart");
}
?>

Edit /etc/crontab and add:

#runfaxgetty .php runs every hour
01 * * * * root php -f /runfaxgetty.php

Revision history for this message
Deeptht69 (deeptht69) wrote :

I am trying to run a hylafax server in my office, and am getting tripped up by this same bug. My computer is running Desktop Ubuntu 10.04.4 LTS. I have tried 2 different serial modems with no difference noted. I compiled hylafax 6.05 from source, but this did not help.

The segfault seems to occur when receiving a multi-page fax from one particular number. I have not confirmed this, yet. The resulting partial fax is in recvq, with abnormal permissions (600, instead of 644). The pages that are present are all correct; the error seems to be when switching to a new page.

yaztromo (tromo)/Francesco (francesco-colista): You say that you "downgraded to the 8.04 version". Does that mean that you are using the ".deb" versions for 8.04 on a system running Ubuntu 10.04, or did you downgrade to Ubuntu 8.04?

For now, I am setting up a cron file to check that faxgetty is running, and restarting it as needed, similar to the script that Hugo provided above.

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Hi deep,

I was surprised by your remark that you were able to see something of the fax. I have tried that many times, in an attempt to see "who did it" but I was never able to see a fax. The most recent failure I had here was from March 1st, and indeed, after changing the rights from 0600 to 0644 I could see the first page. The most recent failure before that was December 22nd 2011, and I could not read that. Just like I remembered. In total I have a fax history of 3 month, containing 200 faxes and 7 failures. And only the most recent is readable. First page only. This is a cover sheet telling it is a two-page fax.

All failing faxes are something like 55 to 75k in size. The size suggests that even page two is contained in the file, but cannot be shown. Single page computer faxes are often something like 30k. And also from a timing point of view, looking at the logging, it seems that things go wrong at the end of reception. This also explains the 0600 file rights. The file is still written in 0600 mode and then the process is failing.

You are looking back to old versions of Ubuntu to avoid the problem. But it seems that between Dec 22nd 2011 and Mar 1st 2012 someone has been very close to the problem. Maybe, with the help of other we even can narrow the time frame down further and look up in update history what has been changed. It has been suggested before that the true problem is not in Hylafax but in some library it uses to handle TIFs.
And things seem to go wrong at the end of reception, before the file mode is changed. That is not 1000 lines of code, rather like 10. Maybe I should take another look.

Hope this helps a little bit.

Regards,

Simon

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi deeptht69 and Simon,
is anyone among you able to use the instructions at:
https://bugs.launchpad.net/ubuntu/+source/hylafax/+bug/600219/comments/15
https://bugs.launchpad.net/ubuntu/+source/hylafax/+bug/600219/comments/21
? This would help a lot in finding the solution for this problem.

I would also ask you to try latest package from Debian testing. You may find them starting at http://packages.qa.debian.org/h/hylafax.html . If you need the rebuilt for your specific ubuntu version, just send me an email with version and architecture.

Thanks,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :

Giuseppe: I will be happy to set up my system to generate debugging traces for you. I will also be paying close attention to when the error occurs. As I said, it seems that one particular sender seems to generate the error, but that could just be confirmation bias. After getting a segfault after 3 pages from the sender in question, I restarted faxgetty, and received several 12 page faxes in a row from another sender.

As far as trying the latest packages, I can't do that right now. As I said, I am trying to get this working in a live environment, and every time the system crashes when I'm not around to fix it, my office staff just plugs in our old dedicated fax machine. Not good for data collection.

Thanks for your interest.
Mark

Revision history for this message
Deeptht69 (deeptht69) wrote :

Giuseppe: Some more info that may be helpful: I have reviewed my "/var/log/syslog" files, and have discovered that the faxgetty segfault only occurs when receiving a fax from one particular sender. Of the last 9 faxes received from that sender, 4 came thru OK, and 5 produced segfaults before being completed. When they re-send they fax that produced a segfault, the re-send can segfault on a different page.

I spoke to the sender, and she told me that they have a lot of problems with that machine. It is a Konica/Minolta model 2900.

The only other visible pattern concerns what I presume is the page size on the log entry. Every other sender produces a line in the log file of the form:

<stuff>..RECV FAX (....): from <number>, page 1 in <time>, INF, # line/mm, 2-D MMR, 14400 bit/s

The offending sender has "A4" where the others have "INF". (I am in the USA, where A4 is not a commonly used page size). The offending sender also has "1-D MH" where *most* of the others have "2-D MMR". The 2 other incoming faxes that use "1-D MH" were single page. The logs cover a span of 5 days, during which 41 faxes were received, 6 of which ended prematurely with a segfault. (One segfault occurred while the first page was being received, so the log does not report what number it was originating from).

Regarding the use of gdb: I have not used it before, but I'm willing to give it a shot. The messages you referred to, #15 and #21, are both links to a thread about segfaults occurring when a fax is being sent. So far, I have never had a segfault during a send, even to the machine that causes segfaults when we are receiving. Message #23 appears to have instructions for using gdb with faxgetty, but they are not very clear. Am I to understand that I should run: killall faxgetty; gdb-wrapper.sh faxgetty ttyS0?

Thanks,
Mark

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Mark,
you have perfectly understood. The gdb-wrapper.sh is a command that run your faxgetty in a way managed by gdb. While your program run, gdb collect a lot of information in a file named /tmp/gdb-$PID.log, where $PID is the process id of the sgb-wrapper.sh process. Beside logging information, it also produce a file "core" /tmp/gdb-$PID.log. If they are not too large, you may attach these files to this report, otherwise please send them to me. I'll forward all information to hylafax authors.

The way to run it is the one you already wrote.

Thanks,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :

Giuseppe:

Just to make sure I have this right: When I try to kill faxgetty, it immediately respawns. There is an entry in "/etc/init/" named "ttyS0.conf", which contains:

start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]

respawn
exec /usr/local/sbin/faxgetty ttyS0

I will change the last line to:

exec gdb-wrapper.sh /usr/local/sbin/faxgetty ttyS0

That is correct?

Mark

Revision history for this message
Deeptht69 (deeptht69) wrote :

Giuseppe:

Problem: I renamed "/etc/init/ttyS0.conf" to prevent faxgetty from respawning, then re-booted.
I ran gdb-wrapper.sh faxgetty ttyS0, but no faxgetty or gdb appeared in the list of processes. The logs in /tmp/gdb-1565.log say exactly this:

/tmp/gdb-1565.batch:6: Error in sourced command file:
Unrecognized or ambiguous flag word: "passhandle".

I cannot find the string "passhandle" in the gdb-wrapper.sh script, /usr/bin/gdb, or in /usr/local/sbin/faxgetty.

Ideas?

Mark

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi,
I just double checked the gdb-wrapper.sh script and spotted two errors. All lines starting with "print" should have a "\n" as end of lines. It happens that both lines starting with "handle" do not have such "\n". Please add it as here shown:

printf '# Automatically generated batch\n' > $PREFIX.batch
printf 'set pagination off\n' >> $PREFIX.batch
printf 'set logging file %s\n' $PREFIX.log >> $PREFIX.batch
printf 'set logging redirect on\n' >> $PREFIX.batch
printf 'set logging on\n' >> $PREFIX.batch
printf 'handle all nostop print pass\n' >> $PREFIX.batch
printf 'handle SIGSEGV stop print pass\n' >> $PREFIX.batch
printf 'run\n' >> $PREFIX.batch
printf 'thread apply all bt\n' >> $PREFIX.batch
printf 'backtrace\n' >> $PREFIX.batch
printf 'print this\n' >> $PREFIX.batch
printf 'generate-core-file %s\n' $PREFIX.core >> $PREFIX.batch

Bye,
Giuseppe

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Mark, Giuseppe,

In the past I have been putting some effort in, so I returned to find out where I stalled. Amongst the trouble I ran into was that I am running hylafax version 6.0.3 but I downloaded the source of 6.0.5 for debugging. And it did not work. So later I loaded the 6.0.3 version also. But the faxgetty I compiled did not run. I suppose that is where it stopped last time.

Now I found it cannot find the shared library libhylafax-6.0.so.3. So I made a symbolic link like:

cd /usr/lib/
ln -s libhylafax-6.0.so.3 ./hylafax/libhylafax-6.0.so.3

Now the self-built faxgetty does run. So I tried running it inside the wrapper from the directory the faxgetty is built:

./gdb-warpper.sh ./faxgetty -D ttyS0

And, yes, it runs. But no sign of gdb. Is this OK? Also gdb seems not to be running either. Running ps ax | grep gdb reveals no gdb process.
So, I think I made a step today, but I do not feel certain that it is going to deliver. Your comments please.

Regards,

Simon.

Revision history for this message
yaztromo (tromo) wrote :

Deep,

To answer your question earlier. I am running the 8.04 packages in 10.04.

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

@Simon,
does "ps auxfw | grep -C3 faxgetty" show gdb?

Bye,
Giuseppe

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Hi Giuseppe,

No, it does not show gdb:

simon@Helios:/usr/lib$ ps auxfw | grep -C3 faxgetty
root 2602 0.0 0.1 3164 1780 pts/2 Ss 20:58 0:00 \_ login -h antarctica.local -p
simon 2666 0.0 0.3 6068 3324 pts/2 S 20:58 0:00 \_ -bash
simon 3926 0.0 0.1 2860 1108 pts/2 R+ 23:57 0:00 \_ ps auxfw
simon 3927 0.0 0.0 3340 832 pts/2 S+ 23:57 0:00 \_ grep --color=auto -C3 faxgetty
root 1293 0.0 0.1 5816 1792 ? Ss 20:16 0:00 /usr/lib/postfix/master
postfix 1302 0.0 0.1 5876 1744 ? S 20:16 0:00 \_ qmgr -l -t fifo -u
postfix 3783 0.0 0.1 5832 1704 ? S 23:36 0:00 \_ pickup -l -t fifo -u -c
--
root 2474 0.0 0.1 4488 1644 pts/1 S 20:48 0:00 \_ su
root 2482 0.0 0.1 4668 1964 pts/1 S+ 20:48 0:00 \_ bash
simon 1937 0.2 1.8 32852 18668 ? Sl 20:17 0:26 /usr/bin/python /usr/lib/ubuntuone-client/ubuntuone-syncdaemon
uucp 2468 0.0 0.1 5260 1500 ? Ss 20:46 0:00 /home/simon/Projecten/hylafax-6.0.3/faxd/faxgetty -D ttyS0
simon@Helios:/usr/lib$

If you look at how I started the wrapper, ( ./gdb-warpper.sh ./faxgetty -D ttyS0 ) would you agree that is the right way?
Or has it to do with the -D option of faxgetty?

Hear from you! Simon.

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Well, I am not a gdb expert, but it could be that gdb change its process name as the program being traced.
In order to check this, check with command "fuser" if the logfile in /tmp is used by faxgetty process. And, with command "lsof -p2468", check if this process uses only faxgetty binary or even gdb binary.

Thanks,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :

Giuseppe/Simon:

I was not in my office yesterday, so I haven't yet tried the repaired "gdb-wrapper.sh" script.

Looking at the log files, there were 2 segfaults: one was from a number different from the one that has been causing all of the segfaults to date, and the other occurred on the first page, before the number was logged.

I did write a script to watch the process tree, and re-start faxgetty after it segfaults. Here it is:

#!/bin/bash
# monfaxgetty.sh - restart faxgetty daemon after segfault

TGT=" faxgetty$"

while true; do
   sleep 60
   res=$(ps -A | grep "$TGT")
   if [ "$res" ]; then continue
   fi
   echo $(date) "-- faxgetty daemon not found. Restarting."
   /usr/local/sbin/faxgetty -D ttyS0
done

This script is invoked by typing:

nohup /var/spool/hylafax/bin/monfaxgetty.sh >> /var/log/monfaxgetty 2>&1 <&- &

The redirection shenanigans at the end of the line redirects stderror to stdout (2>&1), and closes stdin (<&-) (you can also use 0<&-). Any time the daemon has to restart faxgetty, it logs the date and time to "/var/log/monfaxgetty". The script runs as a daemon, checking for faxgetty once a minute.

The fact that my receptionist didn't plug the old fax machine back in yesterday after 2 segfaults indicates that it works as planned.

If anyone reading this wants to use the script, pay very close attention to the spaces in the "res=" command. Bash is very fussy about spaces, and it took me several tries to get it right. Also, the definition of "TGT" is important: there is a single space, then "faxgetty$".

I'll see if I can get the debugger running, and send the dumps from any further segfaults.

Mark

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Mark,
why don't you just make faxgetty start (and restart automatically) via /etc/inittab? Infomation about this configuration are in /usr/share/doc/hylafax-server/README.Debian.gz

Bye,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :
Download full text (3.8 KiB)

Giuseppe:

I don't know if this is a problem or not.

I killed "faxgetty", then executed "gdb-wrapper.sh /usr/local/sbin/faxgetty -D ttyS0". Faxgetty once again appears in the process tree, but the log at /tmp/gdb-24080.log reads:

<blank line>
Program exited normally.
No stack.
/tmp/gdb-24080.batch:11: Error in sourced command file:
No symbol table is loaded. Use the "file" command.

The result of "file /usr/local/sbin/faxgetty" is:

/usr/local/sbin/faxgetty: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, stripped

As I said earlier, I had compiled hylafax 6.05 from source, but did not use any switches for including debugging symbols.

gdb does not appear in the process tree, nor does any process with a pid of 24080. (faxgetty's pid is 24086)

The result of "fuser /tmp/gdb-24080.log" is:
/tmp/gdb-24080.log: 24086

The result of "lsof -p24086" is:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
faxgetty 24086 uucp cwd DIR 8,3 4096 397885 /var/spool/hylafax
faxgetty 24086 uucp rtd DIR 8,3 4096 2 /
faxgetty 24086 uucp txt REG 8,3 477484 136322 /usr/local/sbin/faxgetty
faxgetty 24086 uucp mem REG 8,3 113964 410036 /lib/ld-2.11.1.so
faxgetty 24086 uucp mem REG 8,3 9748 410063 /lib/tls/i686/cmov/libutil-2.11.1.so
faxgetty 24086 uucp mem REG 8,3 477612 136841 /usr/local/lib/libhylafax-6.0.so.5
faxgetty 24086 uucp mem REG 8,3 366580 145310 /usr/lib/libtiff.so.4.3.2
faxgetty 24086 uucp mem REG 8,3 79512 391113 /lib/libz.so.1.2.3.3
faxgetty 24086 uucp mem REG 8,3 975088 136780 /usr/lib/libstdc++.so.6.0.13
faxgetty 24086 uucp mem REG 8,3 149392 391032 /lib/tls/i686/cmov/libm-2.11.1.so
faxgetty 24086 uucp mem REG 8,3 120368 390998 /lib/libgcc_s.so.1
faxgetty 24086 uucp mem REG 8,3 1405508 391024 /lib/tls/i686/cmov/libc-2.11.1.so
faxgetty 24086 uucp mem REG 8,3 128616 136484 /usr/lib/libjpeg.so.62.0.0
faxgetty 24086 uucp mem REG 8,3 30496 391054 /lib/tls/i686/cmov/libnss_compat-2.11.1.so
faxgetty 24086 uucp mem REG 8,3 79676 391042 /lib/tls/i686/cmov/libnsl-2.11.1.so
faxgetty 24086 uucp mem REG 8,3 34408 391089 /lib/tls/i686/cmov/libnss_nis-2.11.1.so
faxgetty 24086 uucp mem REG 8,3 42572 391076 /lib/tls/i686/cmov/libnss_files-2.11.1.so
faxgetty 24086 uucp mem REG 8,3 256324 140902 /usr/lib/locale/en_US.utf8/LC_CTYPE
faxgetty 24086 uucp mem REG 8,3 2454 207218 /usr/lib/locale/en_US.utf8/LC_TIME
faxgetty 24086 uucp mem REG 8,3 26048 153218 /usr/lib/gconv/gconv-modules.cache
faxgetty 24086 uucp 0u CHR 1,3 0t0 791 /dev/null
faxgetty 24086 uucp 1u CHR 1,3 0t0 791 /dev/null
faxgetty 24086 uucp 2u CHR 1,3 0t0 791 /dev/null
faxgetty 24086 uucp 3r FIFO 0,8 0t0 524621 pipe
faxgetty 24086 uucp 4w FIFO 0,8 0t0 524621 pipe
faxgetty 24086 uucp 5r REG 8,3 270 269268 /tmp/gdb-24080.batch (deleted)
fa...

Read more...

Revision history for this message
Deeptht69 (deeptht69) wrote :

Giuseppe:

> why don't you just make faxgetty start (and restart automatically) via /etc/inittab? Infomation about this
> configuration are in /usr/share/doc/hylafax-server/README.Debian.gz

Recent versions of Ubuntu no longer use the "/etc/inittab" system. The current method is to have a file named "ttyS0.conf" in the directory "/etc/init/". The contents of "ttyS0.conf" are listed in message #53. With that file in place, if I perform a "killall faxgetty", faxgetty immediately re-spawns with a new PID. However, if faxgetty dies because of a segfault, it does not automatically respawn. Hence the need for a daemon to monitor the process tree.

I couldn't think of a way to trigger a re-spawn whenever a segfault message was issued to "/var/log/syslog" , so I went with the polling method instead.

Mark

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Mark,
thanks for the clarification about how to respawn faxgetty. I will check why upstart does not restart it in case of errors.

About the other messages you sent, I think there are two problems.

The first one is that you are missing all debug symbols. In order to fix this, I recompiled the 6.0.5 version as deb packages for Lucid Lynx i386. This is with debugging symbols as separate packages. You may find all of them here http://eppesuigoccas.homedns.org/~giuseppe/debian/hylafax/lucid-i386/

The second one is about gdb. I think you cannot correctly dump all symbols from faxgetty since this program respawn itself with correct permissions. Basically, when you run faxgetty from root at the command prompt, it does create a second process owned by uucp user and then exit. I am unsure about gdb keeping logging all informations for child processes even when parent process terminate. Please give it a try now, with all debug symbols available.

Bye,
Giuseppe

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Mark,
after "set logging on" on gdb-wrapper.sh, add the following line:

printf 'set follow-fork-mode child\n' >> $PREFIX.batch

in order to let gdb follow all child processes when there is a fork. This should let gdb keep waiting all faxgetty processes.

Bye,
Giuseppe

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Hi Guiseppe and Mark,

Following the instructions from Guiseppe (who said you were not an expert on gdb?!) , I changed the gdb-wrapper. Now all seems to work! gdb starts up, blocking the console and in the process list I can now see both gdb running and faxgetty. Like this:

simon@Helios:~$ ps auxfw | grep -C3 faxgetty
root 16138 0.2 0.1 3164 1780 pts/2 Ss 00:13 0:00 \_ login -h antarctica.local -p
simon 16210 1.3 0.3 6068 3308 pts/2 S 00:13 0:00 \_ -bash
simon 16259 0.0 0.1 2860 1108 pts/2 R+ 00:14 0:00 \_ ps auxfw
simon 16260 0.0 0.0 3340 828 pts/2 S+ 00:14 0:00 \_ grep --color=auto -C3 faxgetty
root 1293 0.0 0.1 5816 1792 ? Ss Mar07 0:00 /usr/lib/postfix/master
postfix 1302 0.0 0.1 5968 1884 ? S Mar07 0:00 \_ qmgr -l -t fifo -u
postfix 15555 0.0 0.1 5832 1700 ? S Mar08 0:00 \_ pickup -l -t fifo -u -c
--
simon 1885 0.0 0.3 6052 3288 pts/0 Ss Mar07 0:00 \_ bash
root 1960 0.0 0.1 4488 1644 pts/0 S Mar07 0:00 | \_ su
root 1969 0.0 0.2 4792 2128 pts/0 S Mar07 0:00 | \_ bash
root 16071 0.0 0.0 1832 548 pts/0 S+ 00:05 0:00 | \_ /bin/sh ./gdb_wrap.sh ./faxgetty -D ttyS0
root 16072 0.0 1.0 16432 10524 pts/0 S+ 00:05 0:00 | \_ gdb -return-child-result -batch -x /tmp/gdb-16071.batch --args ./faxgetty -D ttyS0
simon 2063 0.0 0.3 6052 3320 pts/1 Ss Mar07 0:00 \_ bash
root 2474 0.0 0.1 4488 1644 pts/1 S Mar07 0:00 \_ su
root 2482 0.0 0.2 4820 2080 pts/1 S+ Mar07 0:00 \_ bash
simon 1937 0.1 1.8 32852 18668 ? Sl Mar07 2:41 /usr/bin/python /usr/lib/ubuntuone-client/ubuntuone-syncdaemon
uucp 16077 0.0 0.1 5260 1500 ? Ss 00:05 0:00 /home/simon/Projecten/hylafax-6.0.3/faxd/faxgetty -D ttyS0
simon@Helios:~$

also a log file is created, but still empty.
I know that the file contains symbols, because:

simon@Helios:~/Projecten/hylafax-6.0.3/faxd$ file faxgetty
faxgetty: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped

So I think it now is a matter of waiting?

Regards, good night (I am in the Netherlands)

Simon

Revision history for this message
MarkA (deeptht69-4) wrote :

OK, I just installed the hylafax 6.0.5 deb packages on my Ubuntu 10.04 system, and ran "gdb-wrapper.sh faxgetty -D ttyS0" Now, my process tree shows:

root@isabel:/tmp# ps auxw|grep fax
uucp 3839 0.0 0.2 4820 1568 ? S 17:01 0:00 /usr/sbin/hfaxd -d -i 4559
uucp 4251 0.1 0.2 4944 1956 ? S 17:07 0:00 /usr/sbin/hfaxd -d -i 4559
root 4302 0.0 0.0 1832 524 pts/4 S 17:12 0:00 /bin/sh ./gdb-wrapper.sh faxgetty -D ttyS0
root 4303 0.4 1.2 15192 9404 pts/4 S 17:12 0:00 gdb -return-child-result -batch -x /tmp/gdb-4302.batch --args faxgetty -D ttyS0
uucp 4308 0.0 0.1 5240 1252 ? Ss 17:12 0:00 /usr/local/sbin/faxgetty -D ttyS0
root 4312 0.0 0.1 3320 792 pts/3 S+ 17:13 0:00 grep --color=auto fax
root@isabel:/tmp#

Now, we wait for the next segfault to occur.

Revision history for this message
Deeptht69 (deeptht69) wrote :

Perhaps it is not so easy....

gdb seems to exit as soon as a fax is received, even when it does not cause a segfault. Here is the log after it exits:

toor@isabel:~$ cat /tmp/gdb-4945.log
[New process 4951]
[New process 5045]

Program exited normally.
No stack.
/tmp/gdb-4945.batch:12: Error in sourced command file:
No symbol table is loaded. Use the "file" command.
toor@isabel:~$

Here is "dpkg -l", showing the packages properly installed:

toor@isabel:~$ dpkg -l|grep hylafax|cut -c -80
ii hylafax-client 2:6.0.5-5
ii hylafax-server 2:6.0.5-5
ii hylafax-server-dbg 2:6.0.5-5
toor@isabel:~$

I will modify my "ps watcher" daemon to re-start gdb if needed after each fax comes in.

Mark

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi, this is probably a side effect of "set follow-fork-mode child". gdb only follow the child process and then, when the fax is received, this child process ends and a new child is forked from parent.
Can you confirm that when gdb exit, there is still a faxgetty (the parent process) running?

About the second problem, I think that when forking, gdb loose the link with executable name, so please add the line
printf 'file /usr/sbin/faxgetty\n' >> $PREFIX.batch
changing "/usr/sbin/faxgetty" with your correct path, if you recompiled the package.

Thanks,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :

I have received a few more faxes with gdb running, but no segfaults so far.

As before, gdb exits after each fax is received, and must be manually re-started. The log file for a normally received fax reads:

root@isabel:/tmp# cat gdb-9245.log
[New process 9251]
[New process 9304]

Program exited normally.
No stack.
/tmp/gdb-9245.batch:12: Error in sourced command file:
No symbol "this" in current context.
root@isabel:/tmp#

"/usr/sbin/faxgetty" is still showing up as "stripped", despite installation of the "hylafax-server-dbg_6.0.5-5_i386.deb" package:

root@isabel:/tmp# file /usr/sbin/faxgetty
/usr/sbin/faxgetty: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, stripped
root@isabel:/tmp#

After gdb exits, there is a "/usr/sbin/faxgetty -D ttyS0" present in the process tree.

Also, I fear that my system is hopelessly confused. I had previously installed hylafax 6.0.5 by compiling the source code. Before installing the "deb" packages you supplied, I ran a "make distclean", believing that it would delete all the previously installed files. That turned out not to be the case. Now, I have hylafax files in "/usr/local/sbin", from the previous install, as well as files in "/usr/sbin" from the more recent "deb" install. Also, my "/var/spool/hylafax/etc/config" file, which has the timestamp of the "deb" installation, still has multiple references to the old installation "/usr/local/sbin" ,"/usr/local/bin", and "/usr/local/lib" directories.

In particular, "/usr/local/sbin/faxgetty" and "/usr/sbin/faxgetty" have different file sizes. I made sure that I specified "/usr/sbin/faxgetty" to be loaded by gdb-wrapper.sh.

I suppose I could just clear the "/usr/local/sbin" directory, as the only files in it are from hylafax, then re-generate the config file so it specifies the "/usr/sbin" directory, and do the same for "/usr/local/bin" and "/usr/local/lib".

Will it do any good to keep running "gdb" with a stripped faxgetty? Should I re-compile the source code with debugging info turned on? Can you supply a "deb" file with an unstripped faxgetty for Ubuntu 10.04? Should I wait for Ubuntu 12.04 LTS to become available in a few weeks, and see it that makes any difference?

Thanks,
Mark

Revision history for this message
Deeptht69 (deeptht69) wrote :
Download full text (4.7 KiB)

Late breaking news:

I just had a series of segfaults, which has generated 2 ".core" files, each 878000 bytes long, each with a log file about 4500 bytes long. Here is one of the log files:

root@isabel:/tmp# cat gdb-9574.log
[New process 9580]

Program received signal SIGSEGV, Segmentation fault.
[Switching to process 9580]
ModemServer::vtraceStatus (this=0x0, kind=1024, fmt=0xbfff50f0 "RECV/CQ: Adjusting for RTC found at row %u", ap=0xbfff5188 "z") at ModemServer.c++:950
950 ModemServer.c++: No such file or directory.
 in ModemServer.c++

Thread 2 (process 9580):
#0 ModemServer::vtraceStatus (this=0x0, kind=1024, fmt=0xbfff50f0 "RECV/CQ: Adjusting for RTC found at row %u", ap=0xbfff5188 "z") at ModemServer.c++:950
#1 0x0807dca0 in FaxModem::copyQualityTrace (this=0x80ce268, fmt=0x80a6f24 "Adjusting for RTC found at row %u") at CopyQuality.c++:1198
#2 0x08080a24 in FaxModem::recvPageDLEData (this=0x80cde08, tif=0x80cf488, checkQuality=true, params=..., eresult=...) at CopyQuality.c++:309
#3 0x08070395 in Class1Modem::recvPageData (this=0x80cde08, tif=0x80cf488, eresult=...) at Class1Recv.c++:1859
#4 0x08070b55 in Class1Modem::recvPage (this=0x80cde08, tif=0x80cf488, ppm=@0xbfffe98c, eresult=..., id=...) at Class1Recv.c++:619
#5 0x080579e9 in FaxServer::recvFaxPhaseD (this=0x80c1d68, tif=0x80cf488, info=..., ppm=@0xbfffe98c, result=...) at FaxRecv.c++:247
#6 0x08058446 in FaxServer::recvDocuments (this=0x80c1d68, tif=0x80cf488, info=..., docs=..., result=...) at FaxRecv.c++:201
#7 0x08058bed in FaxServer::recvFax (this=0x80c1d68, callid=..., result=...) at FaxRecv.c++:69
#8 0x0805285c in faxGettyApp::processCall (this=0x80c1d68, ctype=2, eresult=..., callid=...) at faxGettyApp.c++:579
#9 0x080529cb in faxGettyApp::answerCall (this=0x80c1d68, atype=0, ctype=@0xbffff288, eresult=..., callid=..., dialnumber=0x0) at faxGettyApp.c++:542
#10 0x08054822 in faxGettyApp::answerPhone (this=0x80c1d68, atype=0, ctype=2, callid=..., dialnumber=0x0) at faxGettyApp.c++:429
#11 0x0805569d in faxGettyApp::listenForRing (this=0x80c1d68) at faxGettyApp.c++:248
#12 0x080559e3 in faxGettyApp::listenBegin (this=0x80c1d68) at faxGettyApp.c++:188
#13 0x08055a31 in faxGettyApp::inputReady (this=0xbfff50f0, fd=12) at faxGettyApp.c++:905
#14 0x0015ce51 in Dispatcher::notify(int, fd_set&, fd_set&, fd_set&) () from /usr/lib/hylafax/libhylafax-6.0.so.5
#15 0x0015ba8f in Dispatcher::dispatch(timeval*) () from /usr/lib/hylafax/libhylafax-6.0.so.5
#16 0x0015b9c9 in Dispatcher::dispatch() () from /usr/lib/hylafax/libhylafax-6.0.so.5
#17 0x08053c3d in main (argc=3, argv=0xbffff7e4) at faxGettyApp.c++:1150
#0 ModemServer::vtraceStatus (this=0x0, kind=1024, fmt=0xbfff50f0 "RECV/CQ: Adjusting for RTC found at row %u", ap=0xbfff5188 "z") at ModemServer.c++:950
#1 0x0807dca0 in FaxModem::copyQualityTrace (this=0x80ce268, fmt=0x80a6f24 "Adjusting for RTC found at row %u") at CopyQuality.c++:1198
#2 0x08080a24 in FaxModem::recvPageDLEData (this=0x80cde08, tif=0x80cf488, checkQuality=true, params=..., eresult=...) at CopyQuality.c++:309
#3 0x08070395 in Class1Modem::recvPageData (this=0x80cde08, tif=0x80cf488, eresult=...) at Class1Recv.c++:1859
#4 0x08070b55 in Clas...

Read more...

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Hi All,
I had faxgetty running in gdb, as I posted previously. In my case, faxgetty keeps receiving faxes, but "frees" itself of gdb. Maybe it is the same as Deepht69 describes in #69.
I now started faxgetty without the -D option. I do not mind if it sticks to my terminal. We'll see if faxgetty now remains running in gdb too.
I'll keep you posted,

Regards,
Simon

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Deeptht69,
please email me all cores and log files at "eppesuig @ debian.org", thanks.

Then, about your questions:
1. /usr/sbin/faxgetty is correctly stripped. All debugging symbols are in files shipped with *-dbg package. When both packages are installed, you are ok. gdb knows where to find debugging symbols. If you curious about those files, just list content of *-dbg package with command "dpkg -L hylafax-server-dbg".

2. when gdb exit, and you still have faxgetty process, this is because gdb follow the child process and do not control the parent process. faxgetty probably fork a new child process for waiting on a FAX, and restart a new child every time a FAX arrive. In order to better debug this process, it is compulsory to manually kill the remaining faxgetty before proceeding.

Bye,
Giuseppe

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Simon,
if you run gdb with faxgetty without -D, does it leave a faxgetty process running after gdb return?

Thanks,
Giusepp

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Deeptht69,
I had a look at those gdb backtraces. It is quite possible that the problem is "this" variable being null. I looked at the sources but I cannot figure out why this is happening. Nevertheless I did a small change in source file and built new packages. Could you please install these and replace the current ones? Packages are again in http://eppesuigoccas.homedns.org/~giuseppe/debian/hylafax/lucid-i386/

Thanks,
Giuseppe

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

So, as hylafax author says in http://bugs.hylafax.org/show_bug.cgi?id=941#c9 it seems this is a compiler bug. I rebuilt lucid package using -O flags instead of -O2, as suggested, and I'll forward this bug to the g++ package for lucid.

Packages are available at http://eppesuigoccas.homedns.org/~giuseppe/debian/hylafax/lucid-i386/bug%23600219/

Could you all please test them?

Thanks,
Giuseppe

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :
Revision history for this message
MarkA (deeptht69-4) wrote :

Giuseppe:

I'm not in my office today, but if I have time tomorrow I'll install the new packages and let you know if anything has changed. It may take a week or more to determine if the segfaults are gone. Thanks for all your effort!

Mark

Revision history for this message
Deeptht69 (deeptht69) wrote :

I installed the new packages yesterday. So far, no segfaults. Will keep watching and waiting....

Revision history for this message
yaztromo (tromo) wrote :

Packages installed, Guiseppe.

Will report back midweek, it would have definitely segfaulted by then.

Revision history for this message
yaztromo (tromo) wrote :

Sorry, but this one is segfaulting already. Falling back to hardy packages again.

Mar 19 13:21:17 Shula kernel: [249718.200222] faxgetty[27670]: segfault at a3c ip 0805baf0 sp bfc3c180 error 4 in faxgetty[8048000+6f000]

Sorry I can't be of more help, but it's hard to debug while at work and on a live box.

Hope the rest of you have more luck!

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :
Revision history for this message
yaztromo (tromo) wrote :

Thanks Giuseppe. The debs I used are still sat on the server. Here are the md5's. They appear to check out against the ones in your bug directory, but I would be grateful if you could check I'm not going crazy.

MD5(hylafax-client_6.0.5-5_i386.deb)= 6c03cb2126ef70ea5953f22f768fc39a

MD5(hylafax-server_6.0.5-5_i386.deb)= 972a5acc059441351f700a4377f36a8c

I have the rest of the week working at base, so I could reinstall them and be able to catch any segfault without too much disruption. Should I install the packages marked "dbg" instead?

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Yaztromo,
the md5 signatures look correct. I added a README.md5 file in that directory for future checks.
The "dbg" files only contains debugging symbols. They are handy for using gdb on hylafax executables.

I wonder if you have the same problem I am trying to solve, or if this is a different problem. Could you use the gdb-wrapper script and send here the backtrace?

Thanks,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :

I am half way through our "busy day" at the office, using the "-O" hylafax-*_6.0.5-5_i386.deb packages (same md5sums as listed in message #84). So far today, we have received over 25 faxes, and no segfaults since the packages were installed last week. This is a longer time/more faxes without a segfault than I ever got with the original packages. W00t!

Revision history for this message
yaztromo (tromo) wrote :

Giuseppe, I have attempted to run faxgetty via gdb with these steps:

- Install all packages from bug directory including dbg packages.

- Download gdb wrapper and add your fixes + suggested lines.

- From a local tty run "./gdb-warpper.sh /usr/sbin/faxgetty ttyS0"

However after reception of first fax gdb disappears from process list and I return to command prompt. I still have a faxgetty running according to "ps ax" though:
17622 tty1 S 0:00 /usr/sbin/faxgetty ttyS0

Is all this behaviour correct? Also if faxgetty segfaults where will I find the dumps/.core files?

Thanks for all your time on this Giuseppe.

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Yes, it should be correct. You do have to manually kill the remaining faxgetty and run gdb-wrapper again (every time it complete and you want to collect more information with a new run).
You should also find a log file in /tmp with all required information.

Bye,
Giuseppe

Revision history for this message
yaztromo (tromo) wrote :

Okay got you. I wrote a small script to respawn gdb everytime a fax is received, this also can mitigate against unexpected segfault.

#!/bin/bash
cd /root
while [ 0 ] ; do

if ! ps ax | grep [g]db > /dev/null; then
 echo "Could not detect gdb starting a new process"
 killall faxgetty > /dev/null 2>&1
 ./gdb-wrapper.sh /usr/sbin/faxgetty /dev/ttyS0
fi
sleep 10

done

Now just to wait.

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Now we wait...

http://www.youtube.com/watch?v=lfa8fC93Pds

(I am Dutch)

Revision history for this message
yaztromo (tromo) wrote :

Picked up a segfault!

Revision history for this message
yaztromo (tromo) wrote :
Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Yaztromo,
the trace is about exactly the same problem I am referring to, so I think this is not solved at all. I still think the problem being in g++ so I rebuilt all packages without optimizing at all. They are available at:
http://eppesuigoccas.homedns.org/~giuseppe/debian/hylafax/lucid-i386/bug%23600219-no-optimization/

Could you please use them and test it again?

Thanks,
Giuseppe

Revision history for this message
yaztromo (tromo) wrote :

Installed and running :) Let's see!

Just for reference my CPU specs and kernel in case they are the problem.

Linux Shula 2.6.32-39-generic-pae #86-Ubuntu SMP Mon Feb 13 23:05:11 UTC 2012 i686 GNU/Linux

processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 3
model name : AMD Duron(tm) Processor
stepping : 1
cpu MHz : 796.431
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow up
bogomips : 1592.86
clflush size : 32
cache_alignment : 32
address sizes : 36 bits physical, 32 bits virtual

Revision history for this message
MarkA (deeptht69-4) wrote :

Giuseppe:

Just to confirm, I am coming up on one week since installing the "-O" compiled packages with no segfaults so far. Could this bug be depending on the CPU type? My system is also an AMD processor, but a more recent one. (Athlon, maybe?)

Anyway, I'm hoping I've seen my last segfault.

Mark

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Giuseppe,

I have been digging into the 6.0.5 code using the input Yaztromo provided with the SEGVIO he logged. And I would like to make two statements and have your view on them:

1. I found that the actual code on that famous line 950 is just if (log) { although this "log" could be a handle to a log file, and yes it may be NULL. For example in line 700 it is set to NULL. Could it simply be that log is set to NULL and then makes line 950 do nasty things?

2. If you say it is a compiler error, then you should always have this with the same fax-partner as it will use the same protocol. For my feeling the problem occurs more randomly. It seems more a" once per fifty faxes" thing. Not a structural compiler blunder.

Looking forward to your angle on this.

Simon.

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Simon,
I try to reply to your questions.

1. The problem about log. The error message is due to "this" being NULL. Basically, in C++, you do not address member variable at a fixed address, so you need a base pointer for your instance (called "this") and then you offset your variable starting at "this" address. Let's say it is a pointer to a complex data. "log" is a variable in this complex data, so, once you have "this" you may address "log". Since "this" is null, then your offset is wrong and you get segmentation violation. So, the value of "log" is not even found and it might be NULL.

2. About the compiler error. "this" is not an explicit variable that you change in your code. You cannot change it once created. You may create an instance of an object, and you may delete these instances. This is why user code cannot change "this". and this is why it may be a compiler error. I do not have an answer about this happening randomly even when communicating with the same fax-partner.

Of course I might be wrong, but this is my opinion so far.

Bye,
Giuseppe

Revision history for this message
yaztromo (tromo) wrote :

I was thinking maybe the crash is only triggered one in every x faxes because most of the time the bad compiled code is not touched upon often. Possibly in this case it is "Adjusting for RTC (real time clock?) found at row" in "recvPageDLEData". Maybe this does not happen often, and when it does, then "this" is corrupted, and we get segfault.

It would look like a compiler bug, since it appears other distro's do not appear to be suffering from it. Although I do not know how much hylafax is used in the wild these days. Many people now use hylafax+ too.

Of course I am speaking as a total layman, but just looking at the gdb log against the source code, it is hard to see why the source would corrupt "this" pointer in the run up to the segfault.

I am suspicious that it is only one sender that is causing the segfault for me. The sender is using a USR modem with a very old version of VSI-FAX on SCO UNIX. Both USR modems and VSI fax are known to be problematic. That sender has always been troublesome, one in every 5 or so transmissions has always failed without segfault.

Revision history for this message
MarkA (deeptht69-4) wrote :

Adding my two cents.....

As yaztromo says, segfaults seem to be more likely when the fax is originating from a particular sender. However, for me, when the "problem sender" would try to re-send the same fax, it could segfault in a different place each time. This would suggest that it is nothing in the data stream itself that triggers the error, but something in the control signals being exchanged perhaps? The people in the sending office told me that they have "a lot of problems with that machine (a Konica/Minolta 2900)", but I don't know exactly what that means. What office worker doesn't like to complain about their equipment? I looked up the brochure for the 2900, and it looks like a quality device, and not that old. Perhaps the control/status signals are causing an unexpected condition, or triggering the execution of badly-optimized code that clobbers the "this" pointer?

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Hi All,

Thank you all for your input. I'll put a bit more detail in: The RTC means Return To Command (mode). What I know about faxes is that they start up and negotiate how they are going to do "it". So the exact protocol, page size, and transfer speed are agreed. This happens with a bi-directional communication of 1200 or 2400 Baud. Then the actual transfer is done, say in 9600 Baud which was blistering fast in those days, but at that moment the transfer is one-way only. So the sender pumps the data and the receiver simply has to chew it. Back then telephone connections were slightly more troublesome. That is why bad lines are counted.
At the end of a page the fax goes back to command mode and negotiates further with the sender. Eventually another page is sent, and after the last page the whole thing is concluded. Again in command mode.
The switchback at the end of the page is triggered by a particular sequence of data sent in high speed transfer. Once this RTC code is seen, the "offending code" is executed.

So, that drives me to the insight that this code is not seldom executed. At least it is used each fax that is send in a particular mode. And that is why I have trouble believing it is a compiler problem. And I sure agree with Giuseppe that the NULL value for this is wrong. But on the other hand we also get the remark "No such file or directory." And I cannot see how " if (log) { " can result in this additional "No such file or directory." I'll do some more digging in the code tonight.

Bye for now,
Simon.

Revision history for this message
Deeptht69 (deeptht69) wrote :

I have reached one week since installing the "-O" packages. 77 faxes have been received during that time, with no segfaults. The OTHER problem I was having (the computer occasionally freezing until the mouse is moved) also seems to be fixed. I hope everything continues to work when I install Ubuntu 12.04 LTS next month!

Revision history for this message
Deeptht69 (deeptht69) wrote :

Simon:

I'm not a compiler expert, but it seems to me that "the offending code" being executed is only a problem when some other process has previously clobbered the "this" pointer. The offending code itself is probably fine, but somewhere, perhaps in some data buffer management process elsewhere, the address of the current "ModemServer" instance is getting replaced by "0". Once that happens, any attempt to access a member of ModemServer will cause a segfault. The problem isn't at line 950 (if (log) { ), it's somewhere before that. Line 950 is just the first time the program has a chance to fail because of the error.

Mark

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Mark,
this is exactly what hylafax author wrote in http://bugs.hylafax.org/show_bug.cgi?id=941#c9

Bye,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :

I was just browsing the source code for ModemServer.c++, and I notice that ModemServer::vtraceStatus is called by ModemServer::traceStatus, which has, as one of its arguments, a variable length format string. It makes me wonder if the format string trailing NULL is what's clobbering the "this" pointer when it makes the call to vtraceStatus? Just a thought.

Mark

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Hi Mark, hi Guiseppe,

Yes, sure, we are all on one line here. And Mark, in my mind there is no doubt that "if (log) {" is not causing the problem. And yes, the this pointer is damaged some other way. BUT if this would be due to a compiler issue I have trouble to accept that this happens only once every 50 faxes. So, what I try to say is, could it not be a combination of programming error (buffer overrun), malicious input (damaged/ill formatted input data) and the optimalisation used by the compiler (different -O option produces different code that may or may not fail).

So, I think that by more analysis we may learn more. We may be able to corner the problem further it we could more exactly describe when this happens. Is it for example only in multi-page faxes? Or is it only after the last page?
That was also why I noted that this code is running often. Every fax received does at least once do a RTC. So, I tried to ask, what is the extraordinary (or additional) circumstance on top of doing this RTC logging that causes the failure? Sorry for not having been clearer before.

At the same time I still do not understand where the "no such file or directory". What is that trying to tell?

Regards, Simon.

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Simon,
I think the "No such file" is a message from gdb that alert user that gdb does not find the source file.

Bye,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :

Simon:

Good points. As I said earlier, back when I was having segfaults, these patterns emerged: 1) segfaults were very common when receiving from one particular sender, 2) when a segfault occurred from the problem sender, the sender would automatically try to send the fax again. It would often segfault again, but on a different page, 3) I have had segfaults occur during receipt of the first page, or during receipt of later pages in a multi-page fax.

Note that I'm looking at a fairly small sample size: most of the segfaults I analyzed were on a single day. Being in a live office setting, if too many segfaults were occurring, my office staff would hook up the dedicated fax machine, and that was the end of data collection (as well as hylafax fax reception).

If a dedicated fax machine, like the Minolta 2900, tries to re-send a fax that failed because the receiver (hylafax) crashed during the transfer, would it not send *exactly* the same data as the first time? That would make me think that it is not related to the actual data being sent, but definitely has something to do with the control signals.

Is there any way to monitor the actual data transfer process, to see exactly what is going on just before a segfault?

Mark

Revision history for this message
yaztromo (tromo) wrote :

I'm going to tentatively say that the no optimisations package is bug free.

Two days running and no segfaults. Normally I would have one or two by now.

Revision history for this message
Deeptht69 (deeptht69) wrote :

yaztromo:

Glad to hear your setup is working. I am now at 8 days of continuous running without a segfault, using the "-O" compiled packages. I wonder why the "-O" compiled packages didn't work for you? Hardware difference? (My fax server uses an AMD Athlon CPU). Anyway, I hope this is the end of this bug, and that it doesn't re-surface when Ubuntu 12.04 LTS comes out next month!

Mark

Revision history for this message
yaztromo (tromo) wrote :

Hi Deep,

I've been wondering the same too, and the only difference I can find is that Athlon supports SSE and Duron doesn't. Does -O enable SSE support? I don't know anything about gcc but would've thought enabling SSE would need a seperate flag.

A mystery!

Revision history for this message
Deeptht69 (deeptht69) wrote :

BAD NEWS!!!

After going for 8 days without a problem, I just had 3 segfaults back-to-back! All occurred before the originating phone number was logged, so I can't tell if they were all from the same sender. The last fax received before the segfaults was from the sender that was causing problems before, but the most recent fax completed OK.

I will, over the weekend, install the completely un-optimized packages, and see if they can run fault free.

Mark

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi all,
I changed the source code in order to create a new logfile in /tmp. This file contains the "this" value in many lines of function "recvPageDLEData". Could you please test it and check the new logfile?

Packages are available at http://eppesuigoccas.homedns.org/~giuseppe/debian/hylafax/lucid-i386/bug%23600219-extralogging/ .

Please note that package version number change from 6.0.5-5 to 6.0.5-5.1.

Thanks,
Giuseppe

Revision history for this message
yaztromo (tromo) wrote :

Ok Guiseppe, will do.

Is this packages unoptimised? No -O?

Do I need to run with gdb to get the log in tmp?

Welcome aboard the unoptimised train Mark! So far so good here.

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Yaztromo,
the latest package has standard optimization options (should be -O2). The source code has been changed in order to create a logfile for each recvPageDLEData() call. There is no need to run gdb, and I do not have any estimate on how many log files will be written.

Bye,
Giuseppe

Revision history for this message
Giuseppe Sacco (eppesuig) wrote : g++ bug on Lucid that prevent hylafax to work correctly

Hi all,
this is a request for help for fixing a bug in g++ compiler shipped with
Lucid.

The problem is shown in bug https://bugs.launchpad.net/bugs/600219 and
it has also shortly reported against g++ as launchpad bug #955013. A
quick summary of the g++ relevant part is also on hylafax bugzilla:
http://bugs.hylafax.org/show_bug.cgi?id=941#c9 .

What happens is that a pointer to an object instance, "this", is trashed
during a method execution.

Is there anyone that may help on fixing this?

Thank you very much,
Giuseppe

Revision history for this message
yaztromo (tromo) wrote :

Well I already got a segfault with the new packages. However the log doesn't look like its recorded the final of this before the crash.

Revision history for this message
Simon G. Stikkelorum (iah) wrote :

Guiseppe,

Did you flush(fd) the logging every time you log something? Writing to a file is heavily asynchronous.

Simon

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Simon,
I hope it is a line buffered output. I would like to be sure that the logs hit the disk before the process crashes. Is it creating too much disk I/O? I tough about putting it in a tmpfs but that would have involved you setting up a file system for that.

Bye,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :

Any new developments with this bug? I have not (yet) installed the "enhanced logging" packages, as it seemed from Yaztromo's message that they may not be providing the required information. As before, I am running the non-optimized packages, and have not yet had another segfault. If it will help the effort, I could install the enhanced logging packages, which will presumably be more likely to segfault, and send along whatever logs they generate.

Mark

Revision history for this message
yaztromo (tromo) wrote :

I left the logging going for a few more days. But even though I had several segfaults, the value for 'this' never showed anything different. So I then reverted to the completely unopitimised package, not having any segfaults since.

Can we cautiously say the fix has been found then? If the answer is 'yes'. What happens next?

Alsop if you discover the cause of why the logging doesn't catch when 'this' gets trashed, Guiseppe, I will install any new try at logging whats happening.

Revision history for this message
Deeptht69 (deeptht69) wrote :

Yaztromo:

I looks as though nobody has seen any segfaults with the completely un-optimized packages? I am coming up on a week of running un-optimized packages without any segfaults, though it took 8 days to catch one when I was running the "-O" packages. I will have to run unoptimized right up until I upgrade to 12.04 LTS in another month, before considering it error-free. Then, we can start all over again!

Mark

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi,
I am still waiting for a reply on the ubuntu developer list for the g++ bug. Today I sent a second message to the list. If anyone reply, I'll update this bug report.

Anyway, I do not expect the problem to be present in any other ubuntu version since the compiler has been fixed in later versions.

Bye,
Giuseppe

Revision history for this message
MarkA (deeptht69-4) wrote :

Giuseppe:

Do we know that this bug is no longer present in the g++ compiler that will ship with 12.04? If that's the case, I'll just wait it out, running the un-optimized packages, until 12.04 is released. There's no point in chasing a bug that's already been squashed!

Mark

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Mark,
this is a guess: there are no later Ubuntu releases reporting this error. And no Debian user reported such a problem.

But 10.04 is LTS, so it should be supported for a lot of years. That's why I really do not understand why ubuntu developers ignore this problem. I know Hylafax is not included in this support, but g++ is.

Bye,
Giuseppe

Revision history for this message
Deeptht69 (deeptht69) wrote :

Giuseppe:

I see your point. My hylafax server is running on a computer dedicated to that function. I am looking forward to upgrading to 12.04 LTS specifically because of this bug.

Mark

Revision history for this message
Matthias Klose (doko) wrote :

I would suggest fixing this by building the problematic file with -O0. Can you provide such a patch for hylafax?

Revision history for this message
Matthias Klose (doko) wrote :

see as well bug 955013

Changed in hylafax:
status: Confirmed → Fix Released
Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi Matthias,
I can certainly provide a patch. Would you sponsor the upload?

Thanks,
Giuseppe

Revision history for this message
yaztromo (tromo) wrote :

Could some explain what O0 does? From the man pages:

"Reduce compilation time and make debugging produce the expected results. This is the default. "

So is that same as not using -O at all? Or is it the same as just -O, or something entirely different?

Revision history for this message
Giuseppe Sacco (eppesuig) wrote :

Hi,
I believe that -O0 means "no optimization at all".

As you may see at the very beginning of http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html, "Without any optimization option, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results". So "-O0" is equivalent to "without optimization option".

Bye,
Giuseppe

Changed in hylafax (Debian):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.