Group chat sometimes misses messages with Unknown Number Message not found

Bug #1532899 reported by Pat McGowan
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Confirmed
High
Pat McGowan
messaging-app (Ubuntu)
Invalid
Undecided
Unassigned
nuntium (Ubuntu)
Confirmed
Critical
Alfonso Sanchez-Beato
ubuntu-download-manager (Ubuntu)
Confirmed
Critical
Michael Sheldon

Bug Description

This happened the last two times I received group chat messages. Several messages were not listed in the chat but rather under an Unknown Number contact with content Message no found.

I see 4 such entries for the time covered in the log files.

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Logs from the day it last occured

description: updated
Changed in canonical-devices-system-image:
assignee: nobody → Bill Filler (bfiller)
importance: Undecided → Critical
milestone: none → backlog
status: New → Confirmed
Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

nuntium logs

Revision history for this message
Tiago Salem Herrmann (tiagosh) wrote :

Nuntium is signaling MessageAdded without "Sender", and the attachment file contains "message not found".
In the nuntium log we can find the following message:

2016/01/11 14:16:18 Cannot upload m-notifyresp.ind encoded file /home/phablet/.cache/nuntium/store/a53fcb01bc22bbfaa2812221d3596d0f.m-notifyresp.ind to message center: NETWORK ERROR

Bill Filler (bfiller)
Changed in messaging-app (Ubuntu):
status: New → Invalid
Changed in nuntium (Ubuntu):
assignee: nobody → Alfonso Sanchez-Beato (alfonsosanchezbeato)
importance: Undecided → High
Revision history for this message
Bill Filler (bfiller) wrote :
Download full text (3.8 KiB)

This is very easy to reproduce for me, on At+t network. Not sure if it's an operator issue, nuntium issue, or udm issue at this point. Here are the steps to easily reproduce it for me:

1) from another phone (phone A), create a Group Chat message to your ubuntu phone and one other phone
2) send 4-5 messages in rapid succession (i.e. "one", "two", "three", "four", "five") from phone A
3) on ubuntu phone, verify that all messages are receivied, otherwise you'll see a message from "Unknown Number" with body "message not found"
Repeat steps 1-2 until you receive the "Unknown Number/message not found" message in messaging app.
For me this happens usually after 2 or 3 cycles

Based on my logs, it looks like download mgr downloads the messages, and nuntium tries to upload a reply (assuming it's an ack reply). This fails according to the nuntium log:

2016/01/11 16:59:59 Starting download of http://166.216.166.67:8006/1113152245600010000030000 with proxy 172.26.39.1:80
2016/01/11 17:00:00 Progress:60 60
2016/01/11 17:00:00 Progress:60 60
2016/01/11 17:00:00 File downloaded to /home/phablet/.local/share/ubuntu-download-manager/Downloads/1113152245600010000030000
2016/01/11 17:00:00 Created /home/phablet/.cache/nuntium/store/9d454e8dff12878113d91725fa5fb3f6.m-notifyresp.ind to handle m-notifyresp.ind for 9d454e8dff12878113d91725fa5fb3f6
2016/01/11 17:00:00 Starting upload of /home/phablet/.cache/nuntium/store/9d454e8dff12878113d91725fa5fb3f6.m-notifyresp.ind to http://mmsc.mobile.att.net with proxy 172.26.39.1:80
2016/01/11 17:00:00 Progress:8 8
2016/01/11 17:00:02 Cannot upload m-notifyresp.ind encoded file /home/phablet/.cache/nuntium/store/9d454e8dff12878113d91725fa5fb3f6.m-notifyresp.ind to message center: NETWORK ERROR

In the corresonding download manager log (using tail -f .cache/ubuntu-download-manager/*), it appears the initial download was correct and I don't see any failures:

//FAILED MMS Message
I0111 16:59:59.504356 1808 manager.cpp:224] Create MMS download == {url: http://166.216.166.67:8006/1113152245600010000030000 hostname: 172.26.39.1 port:80}
I0111 16:59:59.505923 1808 manager.cpp:131] Owner is: :1.383
I0111 16:59:59.506072 1808 factory.cpp:70] Factory assigns the Download Uuid.
I0111 16:59:59.507314 1808 apparmor.cpp:116] UNCONFINED APP
I0111 16:59:59.507707 1808 apparmor.cpp:185] Local path is /home/phablet/.local/share/ubuntu-download-manager/Downloads
I0111 16:59:59.510921 1808 manager.cpp:131] Owner is: :1.383
I0111 16:59:59.513485 1808 downloads_db.cpp:366] Insert download
I0111 16:59:59.531574 1808 manager.cpp:119] void Ubuntu::DownloadManager::Daemon::DownloadManager::onDownloadsChanged(QString) /com/canonical/applications/download/ddc0ffa352844cecba4ece5133af86fb
I0111 16:59:59.542951 1808 downloads_db.cpp:363] Update download
I0111 16:59:59.592730 1808 file_download.cpp:355] Download ID{ ddc0ffa352844cecba4ece5133af86fb } Performing a network download.
I0111 16:59:59.593823 1808 file_download.cpp:362] Download ID{ ddc0ffa352844cecba4ece5133af86fb } EMIT started(true)
I0111 17:00:00.499265 1808 file_download.cpp:1089] Download ID{ ddc0ffa352844cecba4ece5133af86fb } Rename ' /home/phablet/.local/share/ubunt...

Read more...

Revision history for this message
Bill Filler (bfiller) wrote :
Revision history for this message
Bill Filler (bfiller) wrote :

Actually see this error in .cache/ubuntu-upload-manager/ubuntu-upload-manager.INFO

I0111 17:00:00.558549 20468 manager.cpp:134] Create MMS upload == {url: http://mmsc.mobile.att.
net filePath: /home/phablet/.cache/nuntium/store/9d454e8dff12878113d91725fa5fb3f6.m-notifyresp
.ind hostname: 172.26.39.1 port:80}
I0111 17:00:00.560053 20468 manager.cpp:116] Owner is: :1.384
I0111 17:00:00.560163 20468 factory.cpp:61] Factory assigns the Download Uuid.
I0111 17:00:00.560889 20468 apparmor.cpp:116] UNCONFINED APP
I0111 17:00:00.561141 20468 apparmor.cpp:185] Local path is /home/phablet/.local/share/ubuntu-u
pload-manager/Downloads
I0111 17:00:00.561915 20468 manager.cpp:213] Registering upload to path /com/canonical/applica
tions/upload/2a6705fefb8144fa910b1ffe5b70de71
I0111 17:00:00.561980 20468 manager.cpp:207] void Ubuntu::UploadManager::Daemon::UploadManager:
:onUploadsChanged(QString) /com/canonical/applications/upload/2a6705fefb8144fa910b1ffe5b70de71
I0111 17:00:00.562012 20468 base_daemon.cpp:162] Timer must be stopped because we have 1downloa
ds.
I0111 17:00:00.574678 20468 file_upload.cpp:249] Upload ID{ 2a6705fefb8144fa910b1ffe5b70de71 }
EMIT started(true)
E0111 17:00:02.321601 20468 file_upload.cpp:345] Upload ID{ 2a6705fefb8144fa910b1ffe5b70de71 }
 http://mmsc.mobile.att.net ERROR::Network error ProtocolInvalidOperationError: the requested o
peration is invalid for this protocol

And this in .cache/ubuntu-upload-manager/ubuntu-upload-manager.ERROR

Log file created at: 2016/01/11 17:00:02
Running on machine: ubuntu-phablet
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0111 17:00:02.321601 20468 file_upload.cpp:345] Upload ID{ 2a6705fefb8144fa910b1ffe5b70de71 }
 http://mmsc.mobile.att.net ERROR::Network error ProtocolInvalidOperationError: the requested o
peration is invalid for this protocol

Changed in ubuntu-download-manager (Ubuntu):
assignee: nobody → Michael Sheldon (michael-sheldon)
importance: Undecided → High
Revision history for this message
Bill Filler (bfiller) wrote :

So somehow our response is getting rejected, possibly because it's malformed?

If you look at nuntium log for successful uploads, you always see the message "Using deprecated Start header" before it sends the response. But in the case where it fails, we never see this message.

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

nuntium has not changed in latest two OTAs. Maybe udm has changed the code returned when performing a valid upload?

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

There has been 2 uploads of u-d-m within last month with the following changes:
 ubuntu-download-manager (1.2+15.04.20151223.1-0ubuntu1) vivid; urgency=medium
 .
   * Recalculate the total download size after a redirection (LP:
     #1450144)
 .
 ubuntu-download-manager (1.2+16.04.20151216.2-0ubuntu1) xenial; urgency=medium
 .
   [ Michael Sheldon ]
   * Provide a shared download history for the downloads model that presents
     all uncollected downloads from previous app launches.
   * Update symbol files
 .

Revision history for this message
Bill Filler (bfiller) wrote : Re: [Bug 1532899] Re: Group chat sometimes misses messages with Unknown Number Message not found

This bug has likely been around for a long time. We noticed it some months ago even though it wasn't filed. Very much doubt it has anything to do with recent udm changes

> On Jan 12, 2016, at 2:56 AM, Alfonso Sanchez-Beato <email address hidden> wrote:
>
> nuntium has not changed in latest two OTAs. Maybe udm has changed the
> code returned when performing a valid upload?
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1532899
>
> Title:
> Group chat sometimes misses messages with Unknown Number Message not
> found
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/canonical-devices-system-image/+bug/1532899/+subscriptions

Revision history for this message
Bill Filler (bfiller) wrote :

Wondering if it could be a problem trying to upload while another download
is in progress.
I do see this in the upload log:
base_daemon.cpp:162] Timer must be stopped because we have 1downloa
ds.

On Tue, Jan 12, 2016 at 8:03 AM, Bill Filler <email address hidden>
wrote:

> This bug has likely been around for a long time. We noticed it some months
> ago even though it wasn't filed. Very much doubt it has anything to do with
> recent udm changes
>
> > On Jan 12, 2016, at 2:56 AM, Alfonso Sanchez-Beato <
> <email address hidden>> wrote:
> >
> > nuntium has not changed in latest two OTAs. Maybe udm has changed the
> > code returned when performing a valid upload?
> >
> > --
> > You received this bug notification because you are a bug assignee.
> > https://bugs.launchpad.net/bugs/1532899
> >
> > Title:
> > Group chat sometimes misses messages with Unknown Number Message not
> > found
> >
> > To manage notifications about this bug go to:
> >
> https://bugs.launchpad.net/canonical-devices-system-image/+bug/1532899/+subscriptions
>

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

I see errors dating back to 09/19/2015 on 5 different group chats.
I have logs back to June, interestingly no earlier errors but that could be coincidence.
I believe a recent change has made this more obvious now that we report it as Unknown, whereas before it was silently missed.

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

If the download succeeds and its just the upload of an ack that fails, why not ignore the upload error and deliver the message properly, at least for a near term workaround?

Bill Filler (bfiller)
Changed in nuntium (Ubuntu):
importance: High → Critical
Changed in ubuntu-download-manager (Ubuntu):
importance: High → Critical
Changed in canonical-devices-system-image:
milestone: backlog → ww08-2016
Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Looking at the nuntium logs the failures always have the same progress count, be interesting to see what was in those files
Successful downloads have larger files (> x00) with an upload response of size 37

2016/01/08 22:13:35 Starting download of http://166.216.166.67:8010/Y/0811214218600023000030000 with proxy 66.209.11.32:80
2016/01/08 22:13:39 Progress:60 60
2016/01/08 22:13:40 Progress:60 60
2016/01/08 22:13:40 File downloaded to /home/phablet/.local/share/ubuntu-download-manager/Downloads/0811214218600023000030000
2016/01/08 22:13:40 Created /home/phablet/.cache/nuntium/store/f917fd2205cc69ad699ddff0b25f4a15.m-notifyresp.ind to handle m-notifyresp.ind for f917fd2205cc69ad699ddff0b25f4a15
2016/01/08 22:13:40 Starting upload of /home/phablet/.cache/nuntium/store/f917fd2205cc69ad699ddff0b25f4a15.m-notifyresp.ind to http://mmsc.cingular.com/ with proxy 66.209.11.32:80
2016/01/08 22:13:40 Progress:8 8
2016/01/08 22:13:41 Cannot upload m-notifyresp.ind encoded file /home/phablet/.cache/nuntium/store/f917fd2205cc69ad699ddff0b25f4a15.m-notifyresp.ind to message center: NETWORK ERROR

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Missed 4 of 17 messages yesterday
missed 1 of 2 today

The nuntium log looks about the same.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nuntium (Ubuntu):
status: New → Confirmed
Changed in ubuntu-download-manager (Ubuntu):
status: New → Confirmed
Revision history for this message
Bill Filler (bfiller) wrote :

still debugging without success, most likely will not make ota10

Changed in canonical-devices-system-image:
milestone: ww08-2016 → 11
Bill Filler (bfiller)
Changed in canonical-devices-system-image:
assignee: Bill Filler (bfiller) → John McAleely (john.mcaleely)
Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Other symptoms

No message is received at all
A message shows up days later seemingly triggered by new messages being sent to a group and not necessarily the same group

Revision history for this message
James Dearing (dearingj) wrote :

I'm experiencing this bug too. My phone is a BQ Aquaris E5 HD Ubuntu Edition. My phone service is Consumer Cellular, which I think uses the AT&T network. I'd gladly post any log files if I knew how to access them.

Changed in canonical-devices-system-image:
milestone: 11 → 12
Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Testing from today, 1 out of 10 failed on the download with unknown server error, when retrying the particular download from the command line it reports Bad gateway. Querying the other messages on all other gateways simply reports the message is not found, which is correct as the ack was delivered for them or that message was not assigned to that server address.

The only message using that server address was the failed one.

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Another symptom. The network interface on the device was busy at the time two MMS were sent to it. Nuntium did not make any attempt to retrieve them. After the network interface was not busy, a third message was sent and all 3 messages were then received. (about 15 mins later)

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Another symptom. If cellular data is off at the time an MMS is sent, nuntium will not be notified and attempt to get the message. Receiving a subsequent SMS message seems enough to trigger the pull however.

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Changing priority as not used in markets served

Changed in canonical-devices-system-image:
assignee: John McAleely (john.mcaleely) → Pat McGowan (pat-mcgowan)
importance: Critical → High
milestone: 12 → backlog
Revision history for this message
Jason (gerlowskija) wrote :

I'm seeing this frequently.

I'm using the Aquarius E5, and am on AT&T's 2G network in the USA (if that matters).

I'm happy to pull in logs from my phone if that would be helpful, though it seems that the bugs reproducible enough that logs are easy to get. Let me know.

If there's anything I can do to help investigate this or get it fixed, please let me know, as it's making it very hard to use messaging on Ubuntu Phone.

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

I built a new nuntium using golang v1.6 and the issue seems to have disappeared, so perhaps a bug in Go not nuntium. We build with 1.3 in the overlay which is 2 years and 1000s of fixes old.

Changed in canonical-devices-system-image:
status: Confirmed → In Progress
milestone: backlog → 13
Revision history for this message
Jason (gerlowskija) wrote :

Hey Pat,

I'm happy to try this out on my own phone as an additional data point if you think getting someone else to try out the fix might be helpful. Though I'll have to dig around a bit to figure out how to build and swap in a modified nuntium, as I'll admit I'm not familiar with that component.

Let me know if you'd like me to try reproducing the fix. It'd be nice to get a fix prior to end of August, so I'm def willing to give it a shot.

Revision history for this message
Jason (gerlowskija) wrote :

I took a stab at reproducing the fix Pat mentioned above on my Aquarius E5. Long story short, it didn't work for me. Listing the steps below. If anyone sees anything obviously amiss in my fix-attempt, please let me know.

Steps I Took:
1.) Enable developer mode on my phone.
2.) Download nuntium from the github repo located here onto my laptop: https://github.com/ubuntu-phonedations/nuntium I noticed there was also a bzr repo located here (bzr branch lp:nuntium), but the code there hadn't been touched since 2014. Not sure if I chose the right one.
3.) Install "go version go1.6.2 linux/amd64" on my desktop.
4.) Use uname -a to check the architecture on my phone (it's reported as armv7l)
4.) cd nuntium/cmd/nuntium.
5.) Cross-compile nuntium on my desktop, for my phone: GOOS=linux GOARCH=arm go build. The architecture I specified with go build here isn't exactly the same as what was specified by uname above (arm vs armv7l). Not sure if that matters.
6.) Use adb shell, and sudo mount -o remount,rw / so that I can manipulate files on the phone filesystem.
7.) On the phone, move the existing nuntium binary in /usr/bin to /usr/bin/nuntium.bak
8.) Copy in the newly compiled nuntium (nuntium/cmd/nuntium/nuntium) from my desktop to my phone (/usr/bin/nuntium).
9.) Check that permissions and ownership matches up between the new and the old nuntium.
10.) Exit adb, turn my phone off and on again.
11.) Get some friends to start a group message. Still see missing/dropped messages.

Are there any other binaries I should be copying over? Did I grab the right project source? Are there any config files I needed to have tweaked before building with golang 1.6.2?

Going to try looking into the logs, but would love a sanity check on my fix-attempt in the meantime if anyone can set me straight.

Changed in canonical-devices-system-image:
status: In Progress → Confirmed
Changed in canonical-devices-system-image:
milestone: 13 → backlog
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.