Containers stuck at started status

Bug #2032172 reported by Ashish sawarkar
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Anbox Cloud
Fix Released
High
Gary.Wang

Bug Description

Hello Team,

We have encountered an abrupt issue where Anbox containers are not functioning as expected. Our current setup involves deploying Anbox on the AWS environment using the marketplace AMI with the identifier ami-0b8ae058a54209687. Are there any known problems associated with this particular situation?

Attached is the log file.

Revision history for this message
Ashish sawarkar (ash-anbox) wrote :
Revision history for this message
Simon Fels (morphis) wrote :

Hey Ashish,

Can you describe a bit further what you mean by "are not functioning as expected"?

Can you provide us the output of

$ sudo anbox-cloud-appliance.buginfo

Thanks!

Changed in anbox-cloud:
status: New → Incomplete
assignee: nobody → Simon Fels (morphis)
Revision history for this message
Ashish sawarkar (ash-anbox) wrote :

Hi Simon,

Please find the attached file for bug report.

Revision history for this message
Gary.Wang (gary-wzl77) wrote :

Hey Ashish
  Thanks for the attached log.
  - Regarding the issue `Containers stuck at started status` you mentioned above. Do you mean whenever you launch a new container, it got stuck at *started* status and never progresses to the *running* status?

  - If possible,
    1. Could you please enable the debug logging for ams
      $ /snap/anbox-cloud-appliance/current/bin/juju config -m appliance:anbox-cloud ams log_level=debug

    2. Then launch a new container as you did previously
    3. Once the container got stuck at started status for a long time, running the following command and share the output with us
      $ /snap/anbox-cloud-appliance/current/bin/juju ssh -m appliance:anbox-cloud ams/0 "sudo snap logs -n=all ams | grep <new_container_id>"

       Also please dump the details of lxd0 node as well:
      $ /snap/anbox-cloud-appliance/current/bin/juju ssh -m appliance:anbox-cloud ams/0 "sudo ETCDCTL_API=3 /snap/etcd/current/bin/etcdctl --debug --cert=/var/snap/ams/common/etcd/client-cert.pem --key=/var/snap/ams/common/etcd/client-key.pem --insecure-transport=true --endpoints=240.12.250.85:2379 get /ams/1.0/nodes/lxd0"

Thanks
Gary

Revision history for this message
Ashish sawarkar (ash-anbox) wrote :
Download full text (11.0 KiB)

Hi Gary,

So when start launching the container for some time container start and run properly with running status.
After some time we observed that the container was stuck in started status and then change to running status and then error.

I have attached the screenshot for all the references. Also below are the details you have asked for.

  1. Could you please enable the debug logging for ams = done.
 2. Once the container got stuck at started status for a long time, running the following command and share the output with us =

output = ubuntu@ip-172-31-15-176:~$ amc ls
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| ID | APPLICATION | TYPE | STATUS | TAGS | NODE | ADDRESS | ENDPOINTS |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl2u41jpke7jvo4tug | app | regular | stopped | | lxd0 | 192.168.96.1 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl32s1jpke7jvo4tvg | app | regular | stopped | | lxd0 | 192.168.96.2 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl33s1jpke7jvo4u0g | app | regular | stopped | | lxd0 | 192.168.96.3 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl34s1jpke7jvo4u1g | app | regular | stopped | | lxd0 | 192.168.96.4 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl35s1jpke7jvo4u2g | app | regular | stopped | | lxd0 | 192.168.96.5 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl37s1jpke7jvo4u3g | app | regular | stopped | | lxd0 | 192.168.96.6 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl3941jpke7jvo4u4g | app | regular | stopped | | lxd0 | 192.168.96.7 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl39s1jpke7jvo4u5g | app | regular | stopped | | lxd0 | 192.168.96.8 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl3as1jpke7jvo4u6g | app | regular | stopped | | lxd0 | 192.168.96.9 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl3c41jpke7jvo4u7g | app | regular | stopped | | lxd0 | 192.168.96.10 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl3dc1jpke7jvo4u8g | app | regular | error | | lxd0 | 192.168.96.11 | |
+----------------------+-------------+---------+---------+------+------+---------------+-----------+
| cjhl3ec1jpke7jvo4u9g | app | regular | error | | lxd0 | 192.168.96.12 | ...

Revision history for this message
Ashish sawarkar (ash-anbox) wrote :

In the attached screenshot you can see initially the containers were running fine and then suddenly we are getting this errors.

Revision history for this message
Ashish sawarkar (ash-anbox) wrote :
Download full text (6.6 KiB)

Hi Gary,

Updated logs.

ubuntu@ip-172-31-15-176:~$ /snap/anbox-cloud-appliance/current/bin/juju ssh -m appliance:anbox-cloud ams/0 "sudo snap logs -n=all ams | grep cjhlobc1jpk5c0216ea0"
2023-08-21T12:44:29Z ams.ams[14142]: I0821 12:44:29.348722 14142 orchestrator.go:245] Orchestrator: Received update for container cjhlobc1jpk5c0216ea0 (status created desired running)
2023-08-21T12:44:29Z ams.ams[14142]: I0821 12:44:29.350827 14142 worker.go:451] Worker: Found new regular container cjhlobc1jpk5c0216ea0 to launch
2023-08-21T12:44:29Z ams.ams[14142]: I0821 12:44:29.354174 14142 worker.go:324] Worker: Scheduled container cjhlobc1jpk5c0216ea0 onto node lxd0
2023-08-21T12:44:29Z ams.ams[14142]: I0821 12:44:29.355536 14142 orchestrator.go:245] Orchestrator: Received update for container cjhlobc1jpk5c0216ea0 (status prepared desired running)
2023-08-21T12:44:29Z ams.ams[14142]: I0821 12:44:29.371411 14142 container.go:218] Launcher: Processing task cjhlobc1jpk5c0216eag for container cjhlobc1jpk5c0216ea0
2023-08-21T12:44:29Z ams.ams[14142]: I0821 12:44:29.408677 14142 orchestrator.go:245] Orchestrator: Received update for container cjhlobc1jpk5c0216ea0 (status prepared desired running)
2023-08-21T12:44:30Z ams.ams[14142]: I0821 12:44:30.297790 14142 container.go:478] Launcher: Container cjhlobc1jpk5c0216ea0 is now fully initialized
2023-08-21T12:44:30Z ams.ams[14142]: I0821 12:44:30.297823 14142 container.go:494] Launcher: Doing actual start of container cjhlobc1jpk5c0216ea0
2023-08-21T12:44:30Z ams.ams[14142]: I0821 12:44:30.297910 14142 container.go:186] Launcher: Waiting for container cjhlobc1jpk5c0216ea0 to switch to running status
2023-08-21T12:44:30Z ams.ams[14142]: I0821 12:44:30.297935 14142 orchestrator.go:245] Orchestrator: Received update for container cjhlobc1jpk5c0216ea0 (status started desired running)
2023-08-21T12:44:33Z ams.ams[14142]: I0821 12:44:33.156648 14142 container.go:502] Launcher: Successfully started container cjhlobc1jpk5c0216ea0
2023-08-21T12:54:41Z ams.ams[14142]: I0821 12:54:41.624321 14142 container.go:656] Backend: Got status error from container cjhlobc1jpk5c0216ea0
2023-08-21T12:54:41Z ams.ams[14142]: E0821 12:54:41.625582 14142 container.go:718] Backend: Container cjhlobc1jpk5c0216ea0 reported an error: service exited with status 0
2023-08-21T12:54:43Z ams.ams[14142]: I0821 12:54:43.790562 14142 housekeeper.go:170] Housekeeper: Fetching anbox logs directory from container cjhlobc1jpk5c0216ea0
2023-08-21T12:54:44Z ams.ams[14142]: I0821 12:54:44.887032 14142 housekeeper.go:381] Housekeeper: Updated task cjhlobc1jpk5c0216eag for object cjhlobc1jpk5c0216ea0
2023-08-21T12:54:44Z ams.ams[14142]: I0821 12:54:44.887186 14142 container.go:201] Launcher: Status of container cjhlobc1jpk5c0216ea0 was updated to error
2023-08-21T12:54:44Z ams.ams[14142]: W0821 12:54:44.887261 14142 trace.go:83] Trace[2116291573]: "Launching container cjhlobc1jpk5c0216ea0" (started: 2023-08-21 12:44:29.371415267 +0000 UTC m=+13.176393515) (total time: 10m15.515796499s):
2023-08-21T12:54:44Z ams.ams[14142]: W0821 12:54:44.887273 14142 trace.go:83] Trace[2116291573]: [1.866466ms] [1.866466ms] Found applicatio...

Read more...

Revision history for this message
Simon Fels (morphis) wrote :

Can you also provide us the system.log android.log* files from the failed container?

You can extract them via

$ amc show-log <container id> system.log
$ amc show-log <container id> android.log
$ amc show-log <container id> android.log.1

Revision history for this message
Ashish sawarkar (ash-anbox) wrote :

Hi Simon,

Attached all three files as requested.

Revision history for this message
Ashish sawarkar (ash-anbox) wrote :
Revision history for this message
Ashish sawarkar (ash-anbox) wrote :
Revision history for this message
Simon Fels (morphis) wrote :

Thanks Ashish! We will have a look and let you know what we find.

Revision history for this message
Gary.Wang (gary-wzl77) wrote :

Hey Ashish
  I confirm the problem persists on ami-0b8ae058a54209687(Anbox-Cloud 1.18.2 arm64) after the Ubuntu kernel is rolling from 5.15.0-1031-aws to 6.2.0-1009-aws in your case. The issue here was that after upgrading the kernel using userfaultfd syscalls are disallowed by unprivileged users.
  Meanwhile, it's Android 13 image specific issue. We're going to fix it in the next patch release(1.19.1)

  As an immediate step, there are two options:
  a). If you still want to use applications built on top of Android 13 image, you have to downgrade the kernel to 5.13
     To downgrade the kernel to 5.13, please refer to the following post entry [1].
     After downgrading the kernel to 5.13 and rebooting the VM, re-build the anbox dkms modules
     $ sudo dpkg-reconfigure anbox-modules-dkms-118
     $ sudo modprobe virt_wifi
     $ sudo modprobe anbox_sync
  b). Rebuild the application on top of Android 12 image, this enables you to run applications on the latest rolling kernel(6.2.0-1009-aws) without downgrading the kernel version.

  With this, containers would be working as normal.
  Could you please give it a try?

Thanks.
Gary

[1] https://discourse.ubuntu.com/t/how-to-downgrade-the-kernel-on-ubuntu-20-04-to-the-5-4-lts-version/26459

Simon Fels (morphis)
Changed in anbox-cloud:
assignee: Simon Fels (morphis) → Gary.Wang (gary-wzl77)
milestone: none → 1.19.1
importance: Undecided → High
status: Incomplete → Triaged
Gary.Wang (gary-wzl77)
Changed in anbox-cloud:
status: Triaged → In Progress
Simon Fels (morphis)
Changed in anbox-cloud:
status: In Progress → Fix Committed
Revision history for this message
Ashish sawarkar (ash-anbox) wrote :

Hello Team,

Is this bug fixed, can we use the latest AMI for our use now?

Revision history for this message
Simon Fels (morphis) wrote :

Hey Ashish,

The fix hasn't released yet it has been just committed to our internal repositories. It will roll out with 1.19.1 release mid September. See https://anbox-cloud.io/docs/ref/roadmap for more details.

Also which AMI you use doesn't matter so much as snaps are still upgraded and are not fixed per AMI. You can do a `snap refresh anbox-cloud-appliance` on any AMI and get the latest version. Also see https://snapcraft.io/docs/keeping-snaps-up-to-date

Revision history for this message
Gary.Wang (gary-wzl77) wrote :

Just a heads up.
This issue has been fixed in the Anbox Cloud 1.19.1 release.
Please check the release announcement[1] for details.

Thanks
Gary

[1] https://discourse.ubuntu.com/t/anbox-cloud-1-19-1-has-been-released/38595

Changed in anbox-cloud:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.