Temporal Fails to Start on MAAS 3.5 (Fresh Install on Rocky Linux 8)

Bug #2083710 reported by Thiago Martins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned

Bug Description

**Description**:

I encountered an issue with MAAS 3.5 where the **Temporal** process fails to start after a fresh installation on Oracle Linux 9.4, rendering MAAS unusable as the images do not sync.

This bug does not occur with MAAS 3.4, where the installation and setup work smoothly.

**Steps to reproduce**:

1. Remove MAAS:

```bash
snap remove --purge maas
```

2. Recreate PostgreSQL database using this script:

```bash
#!/bin/bash

# Variables - Update these with your actual values
MAAS_DBUSER="maasdbuser" # Replace with your MAAS DB user
MAAS_DBPASS="maasdbpass" # Replace with your MAAS DB password
MAAS_DBNAME="maasdbname" # Replace with your MAAS DB name
HOSTNAME="localhost" # Assuming localhost, change if needed

# Step 1: Drop the MAAS database if it exists
echo "Dropping the database if it exists..."
sudo -i -u postgres psql -c "DROP DATABASE IF EXISTS \"$MAAS_DBNAME\";"

# Step 2: Drop the MAAS database user if it exists
echo "Dropping the database user if it exists..."
sudo -i -u postgres psql -c "DROP USER IF EXISTS \"$MAAS_DBUSER\";"

# Step 3: Create PostgreSQL user
echo "Creating the database user..."
sudo -i -u postgres psql -c "CREATE USER \"$MAAS_DBUSER\" WITH ENCRYPTED PASSWORD '$MAAS_DBPASS';"

# Step 4: Create the MAAS database
echo "Creating the database..."
sudo -i -u postgres createdb -O "$MAAS_DBUSER" "$MAAS_DBNAME"

# Step 5: Change the owner of the MAAS database to the MAAS_DBUSER
echo "Changing the database owner to '$MAAS_DBUSER'..."
sudo -i -u postgres psql -c "ALTER DATABASE \"$MAAS_DBNAME\" OWNER TO \"$MAAS_DBUSER\";"

echo "PostgreSQL setup for MAAS is complete."
```

3. Install MAAS 3.5:

```bash
snap install --channel=3.5/stable maas
```

4. Initialize MAAS:

```bash
maas init region --database-uri '${database_uri}' --maas-url 'http://${maasfqdn}:5240/MAAS'
 ```

Upon accessing the UI, MAAS loads, but the image syncing process never begins, and the Temporal process fails to start. The following logs were observed:

```
# journalctl -n 500 -f -u snap.maas.pebble -t maas-temporal
Oct 04 15:36:43 maas-regiond-1 maas-temporal[15144]: Unable to start server. Error: failed to start service worker: context deadline exceeded
Oct 04 15:36:44 maas-regiond-1 maas-temporal[15234]: 2024/10/04 15:36:44 Loading config; env=production,zone=,configDir=/var/snap/maas/36889/temporal
Oct 04 15:36:44 maas-regiond-1 maas-temporal[15234]: 2024/10/04 15:36:44 Loading config files=[/var/snap/maas/36889/temporal/production.yaml]
Oct 04 15:36:49 maas-regiond-1 maas-temporal[15234]: {"level":"warn","ts":"2024-10-04T15:36:49.575+0200","msg":"error creating sdk client","service":"worker","error":"failed reaching server: context deadline exceeded","logging-call-at":"factory.go:114"}
Oct 04 15:36:54 maas-regiond-1 maas-temporal[15234]: {"level":"warn","ts":"2024-10-04T15:36:54.767+0200","msg":"error creating sdk client","service":"worker","error":"failed reaching server: context deadline exceeded","logging-call-at":"factory.go:114"}
Oct 04 15:37:00 maas-regiond-1 maas-temporal[15234]: {"level":"warn","ts":"2024-10-04T15:37:00.114+0200","msg":"error creating sdk client","service":"worker","error":"failed reaching server: context deadline exceeded","logging-call-at":"factory.go:114"}
Oct 04 15:37:05 maas-regiond-1 maas-temporal[15234]: {"level":"warn","ts":"2024-10-04T15:37:05.869+0200","msg":"error creating sdk client","service":"worker","error":"failed reaching server: context deadline exceeded","logging-call-at":"factory.go:114"}
Oct 04 15:37:12 maas-regiond-1 maas-temporal[15234]: {"level":"warn","ts":"2024-10-04T15:37:12.409+0200","msg":"error creating sdk client","service":"worker","error":"failed reaching server: context deadline exceeded","logging-call-at":"factory.go:114"}
Oct 04 15:37:20 maas-regiond-1 maas-temporal[15234]: {"level":"warn","ts":"2024-10-04T15:37:20.277+0200","msg":"error creating sdk client","service":"worker","error":"failed reaching server: context deadline exceeded","logging-call-at":"factory.go:114"}
Oct 04 15:37:30 maas-regiond-1 maas-temporal[15234]: {"level":"warn","ts":"2024-10-04T15:37:30.186+0200","msg":"error creating sdk client","service":"worker","error":"failed reaching server: context deadline exceeded","logging-call-at":"factory.go:114"}
Oct 04 15:37:39 maas-regiond-1 maas-temporal[15234]: {"level":"warn","ts":"2024-10-04T15:37:39.205+0200","msg":"error creating sdk client","service":"worker","error":"failed reaching server: context deadline exceeded","logging-call-at":"factory.go:114"}
Oct 04 15:37:44 maas-regiond-1 maas-temporal[15234]: {"level":"error","ts":"2024-10-04T15:37:44.522+0200","msg":"start failed","component":"fx","error":"context deadline exceeded","logging-call-at":"fx.go:1163","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/common/log/zap_logger.go:156\ngo.temporal.io/server/temporal.(*fxLogAdapter).LogEvent\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/temporal/fx.go:1163\ngo.uber.org/fx.(*App).Start.func1\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:639\ngo.uber.org/fx.(*App).Start\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:647\ngo.temporal.io/server/temporal.(*ServerImpl).startServices.func1\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/temporal/server_impl.go:142"}
Oct 04 15:37:44 maas-regiond-1 maas-temporal[15234]: {"level":"error","ts":"2024-10-04T15:37:44.523+0200","msg":"OnStart hook failed","component":"fx","callee":"go.temporal.io/server/temporal.(*ServerImpl).Start-fm()","caller":"go.temporal.io/server/temporal.ServerLifetimeHooks","error":"failed to start service worker: context deadline exceeded","logging-call-at":"fx.go:1055","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/common/log/zap_logger.go:156\ngo.temporal.io/server/temporal.(*fxLogAdapter).LogEvent\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/temporal/fx.go:1055\ngo.uber.org/fx.appLogger.LogEvent\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:810\ngo.uber.org/fx/internal/lifecycle.(*Lifecycle).runStartHook.func1\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/internal/lifecycle/lifecycle.go:247\ngo.uber.org/fx/internal/lifecycle.(*Lifecycle).runStartHook\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/internal/lifecycle/lifecycle.go:257\ngo.uber.org/fx/internal/lifecycle.(*Lifecycle).Start\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/internal/lifecycle/lifecycle.go:216\ngo.uber.org/fx.(*App).start.func1\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:679\ngo.uber.org/fx.(*App).withRollback\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:661\ngo.uber.org/fx.(*App).start\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:678\ngo.uber.org/fx.withTimeout.func1\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:782"}
Oct 04 15:37:44 maas-regiond-1 maas-temporal[15234]: {"level":"error","ts":"2024-10-04T15:37:44.523+0200","msg":"start failed, rolling back","component":"fx","error":"failed to start service worker: context deadline exceeded","logging-call-at":"fx.go:1156","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/common/log/zap_logger.go:156\ngo.temporal.io/server/temporal.(*fxLogAdapter).LogEvent\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/temporal/fx.go:1156\ngo.uber.org/fx.(*App).withRollback\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:662\ngo.uber.org/fx.(*App).start\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:678\ngo.uber.org/fx.withTimeout.func1\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:782"}
Oct 04 15:37:44 maas-regiond-1 maas-temporal[15234]: {"level":"error","ts":"2024-10-04T15:37:44.523+0200","msg":"start failed","component":"fx","error":"failed to start service worker: context deadline exceeded","logging-call-at":"fx.go:1163","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/common/log/zap_logger.go:156\ngo.temporal.io/server/temporal.(*fxLogAdapter).LogEvent\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/temporal/fx.go:1163\ngo.uber.org/fx.(*App).Start.func1\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:639\ngo.uber.org/fx.(*App).Start\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/go.uber.org/fx/app.go:647\ngo.temporal.io/server/temporal.(*ServerFx).Start\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/temporal/fx.go:299\nmain.buildCLI.func2\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/cmd/server/main.go:204\ngithub.com/urfave/cli/v2.(*Command).Run\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/github.com/urfave/cli/v2/command.go:163\ngithub.com/urfave/cli/v2.(*App).RunContext\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/github.com/urfave/cli/v2/app.go:313\ngithub.com/urfave/cli/v2.(*App).Run\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/vendor/github.com/urfave/cli/v2/app.go:224\nmain.main\n\t/build/temporal-8dYnj9/temporal-1.22.5/src/cmd/server/main.go:55\nruntime.main\n\t/usr/lib/go-1.20/src/runtime/proc.go:250"}

```

**Workaround attempts**:
I tried applying the workaround mentioned in bug #2067117, but I could not find the `maasserver_secret` table.

This issue occurs consistently with a fresh installation of MAAS 3.5, not as an upgrade from MAAS 3.4.

Thiago Martins (martinx)
description: updated
description: updated
Revision history for this message
Jacopo Rota (r00ta) wrote :

Does it work on an Ubuntu host? Oracle Linux is not officially supported

Changed in maas:
status: New → Invalid
status: Invalid → Incomplete
Thiago Martins (martinx)
summary: - Temporal Fails to Start on MAAS 3.5 (Fresh Install on Oracle Linux 9.4)
+ Temporal Fails to Start on MAAS 3.5 (Fresh Install on Rocky Linux 8)
Revision history for this message
Thiago Martins (martinx) wrote :

Hello r00ta!

I've spent several hours trying to resolve this issue. I'm not well-versed in the RHEL/SNAP ecosystem, and I mistakenly assumed that Oracle Linux would be supported. Following your feedback, I downgraded to Rocky Linux 8, as it is listed as supported: [Snapcraft Documentation](https://snapcraft.io/docs/installing-snapd).

I expected MAAS 3.5 SNAP to work smoothly on Rocky 8 ([Snap on Rocky](https://snapcraft.io/docs/installing-snap-on-rocky)), but it still fails to start, just like on Oracle Linux.

Initially, I thought the root cause might be related to this error:

```
2024-10-06T17:07:39.564416+02:00 maas-regiond-1 maas.pebble[1214]: 2024-10-06T15:07:39.564Z [pebble] Service "http" starting: sh -c "exec systemd-cat -t maas-http $SNAP/bin/run-nginx"
2024-10-06T17:07:39.613135+02:00 maas-regiond-1 maas-http[1940]: nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (2: No such file or directory)
2024-10-06T17:07:39.857461+02:00 maas-regiond-1 maas-http[1947]: 2024/10/06 17:07:39 [crit] 1947#1947: *1 connect() to unix:/var/snap/maas/36889/maas-regiond-webapp.sock.0 failed (2: No such file or directory) while connecting to upstream, client: 10.50.36.71, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/36889/maas-regiond-webapp.sock.0:/MAAS/rpc/", host: "10.50.36.71"
```

After a fresh start (removing everything and creating `/var/log/nginx` manually before running `snap install maas`), I solved the initial `nginx` issue. However, the Temporal services and image sync continue to fail.

Tested versions:

- MAAS 3.4 SNAP works flawlessly on Ubuntu 24.04 and Rocky 8.
- MAAS 3.5 SNAP works fine on Ubuntu, but it fails on Rocky 8.
- The same machine that runs MAAS 3.4 without problems (so it's not a resource issue) fails with MAAS 3.5.

Here’s a detailed log of the failure post-systemd startup for MAAS 3.5 on Rocky 8: [Full Log](https://pastebin.ubuntu.com/p/Sj9M2NwS6K/)
And the specific errors: [Errors](https://pastebin.ubuntu.com/p/DkZGnYsKz9/)

The most notable errors are repeated "no live upstreams" from `nginx` and the `DatabaseError: DatabaseWrapper objects created in a thread can only be used in that same thread` seen in the logs. Temporal also consistently reports "context deadline exceeded" errors, indicating either a service communication issue or timeout.

Do you have any suggestions or workarounds? I'm running out of ideas here, and I would love to ensure MAAS works on a OS which SNAPs are also supported, as Rocky 8.

Thanks again for your help!

Changed in maas:
status: Incomplete → New
Revision history for this message
Jacopo Rota (r00ta) wrote (last edit ):

Thanks for the check! The MAAS snap is supported (and tested/certified) only on Ubuntu, hence I have to mark this bug as invalid.

Changed in maas:
status: New → Invalid
Revision history for this message
Thiago Martins (martinx) wrote :

Hi r00ta,

I hope this message finds you well. I’m not looking to re-open this bug report, but I wanted to provide a follow-up regarding my experience.

I managed to get MAAS 3.5 (via Snap) working on Oracle 9, but only by using LXD Ubuntu containers. Here's a summary of what I found:

What doesn’t work:

Host: Rocky 8 or Oracle 9 → snapd → MAAS 3.5 = fail

What does work:

Host: Rocky 8 or Oracle 9 → snapd → LXD → container (Ubuntu 22.04) → snapd → MAAS 3.5 = works

However, this setup feels unnecessarily convoluted just to run MAAS. Having two Snap instances in this configuration adds complexity, and while I appreciate both Snap and LXD for many use cases, this approach isn’t ideal for MAAS.

Currently, I’m experimenting with simpler Ubuntu containers on Oracle 9 (and CentOS 9) using the MAAS Debian packages, and so far, it’s been a far smoother experience. I plan to share more details in a post on the MAAS Discourse forum soon.

Here’s the crux of my feedback: Snap has not lived up to its promise of enabling packages to run anywhere Snapd is “supported.” In practice, its stability seems confined to the "Ubuntu bubble," as you've pointed out.

This experience highlights a need for greater transparency from Canonical and the Snap team. I strongly recommend creating and maintaining a comprehensive compatibility matrix for Snap packages. This matrix should clearly indicate which Snap packages work on which hosts, making limitations explicit. For example, the fact that MAAS Snap only works reliably on Ubuntu hosts must be prominently documented and crystal clear to everyone.

At the moment, this critical information is absent, which is misleading to users and wastes valuable time and resources. I’ve noticed others expressing similar frustrations across various forums, which underscores the need for clear communication and documentation.

I sincerely hope Canonical will NEVER discontinue the MAAS Debian packages! Without them, I wouldn’t be able to recommend or deploy MAAS in the environment I’m currently working in—an infrastructure managing around 40,000 servers and continuing to grow.

Believe me, I’ve invested considerable effort over several days trying to make Snap work, but the instability outside of Ubuntu environments is a serious limitation. For larger-scale adoption, it’s crucial to have a robust, reliable alternative to Snap.

Thank you for taking the time to read this. I hope my experience adds value to the ongoing discussions about MAAS’s future and Snap’s role in it.

Cheers!

Revision history for this message
Jacopo Rota (r00ta) wrote :

Thiago, thank you very much for taking time to provide your valuable feedbacks!

I agree with you that if MAAS snap has some limitations we should document them. As I said we certify the MAAS snap only on Ubuntu and we put effort to make it work well.

In case you or anybody from community is willing to investigate what are the problems of the MAAS snap on other distros and contribute to make it work with other distros I'd be super happy to reviews/help with the patches (in case it's actually MAAS the problem, because some issues might actually be in snapd).

Jacopo Rota (r00ta)
tags: added: bug-council
Jacopo Rota (r00ta)
tags: removed: bug-council
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.