snap builds randomly failing via proxy

Bug #2057771 reported by Alan Pope 🍺🐧🐱 🦄
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Confirmed
High
Simone Pelosi

Bug Description

I'm trying to build amd64,armhf,arm64 snaps of forgejo, using the yaml at https://codeberg.org/forgejo-contrib/snap using snapcraft remote-build

It kept failing with errors connecting to the npm registry.

So I added the following to the override-build stanza in the yaml:

+
+ # Setup proxy access if required.
+ if [[ -n "${http_proxy:-}" ]]; then
+ export HTTP_PROXY="${http_proxy}"
+ export HTTPS_PROXY="${https_proxy}"
+ npm config set proxy "${http_proxy}"
+ npm config set https-proxy "${https_proxy}"
+ fi
+

However, it's now failing inconsistently. Sometimes it will work on one architecture and fail on another or fail on all three. It's seemingly random.

Here's the log of it failing most recently.

https://launchpadlibrarian.net/719122167/buildlog_snap_ubuntu_jammy_armhf_snapcraft-forgejo-1ad726ec7fd27ef347bb79e64066f66c_BUILDING.txt.gz

Here's a snippet:

```
[13/Mar/2024:11:25:02 +0000] "CONNECT registry.npmjs.org:443 HTTP/1.1" 200 10142 "-" "npm/10.2.4 node/v18.19.1 linux arm workspaces/false"
[13/Mar/2024:11:25:02 +0000] "CONNECT registry.npmjs.org:443 HTTP/1.1" 200 6375 "-" "npm/10.2.4 node/v18.19.1 linux arm workspaces/false"
[13/Mar/2024:11:25:02 +0000] "CONNECT registry.npmjs.org:443 HTTP/1.1" 200 6164 "-" "npm/10.2.4 node/v18.19.1 linux arm workspaces/false"
[13/Mar/2024:11:25:02 +0000] "CONNECT registry.npmjs.org:443 HTTP/1.1" 200 45089 "-" "npm/10.2.4 node/v18.19.1 linux arm workspaces/false"
:: npm ERR! Proxy connection ended before receiving CONNECT response
::
:: npm ERR! A complete log of this run can be found in: /root/.npm/_logs/2024-03-13T11_24_42_677Z-debug-0.log
:: make[1]: *** [Makefile:949: node_modules] Error 1
:: make: *** [Makefile:989: public/assets/js/index.js] Error 2
[13/Mar/2024:11:25:02 +0000] "CONNECT registry.npmjs.org:443 HTTP/1.1" 200 71514 "-" "npm/10.2.4 node/v18.19.1 linux arm workspaces/false"
[13/Mar/2024:11:25:02 +0000] "CONNECT registry.npmjs.org:443 HTTP/1.1" 200 51562 "-" "npm/10.2.4 node/v18.19.1 linux arm workspaces/false"
[13/Mar/2024:11:25:02 +0000] "CONNECT registry.npmjs.org:443 HTTP/1.1" 200 34537 "-" "npm/10.2.4 node/v18.19.1 linux arm workspaces/false"
'override-build' in part 'forgejo' failed with code 2.
Review the scriptlet and make sure it's correct.
```

description: updated
Jürgen Gmach (jugmac00)
affects: rutabaga → launchpad
Revision history for this message
Jürgen Gmach (jugmac00) wrote :

Hi Alan,

Thanks for reaching out. We had a couple of similar reports lately, but there it was about builds taking more than 6 hours and the proxy token expired, which does not seem to be the case here.

Do you see the same issues when building the Snap locally?

Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :

Hi. Thanks for the quick response.

Works fine every time locally. Only fails when doing remote build in launchpad. The fact it's inconsistent is weird too. I ran the same remote-build three times in a row and got different results each time. One time, amd64 worked, armhf and arm64 failed, another time I got an arm64 build and another build got me the armhf build. I don't seem to be able to get all three to work together.

Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :
Revision history for this message
Clinton Fung (clinton-fung) wrote :

(Only mildly) epic necro here, but I was just investigating another instance of this error occurring, and I think I've found a lead. It seems that the buildd process has a `ulimit -n` of 1024. In the snap build I investigated, I noticed that npm can easily open > 1000 files in a short space of time, and minor variations in load of the proxy, load of the upstream server, etc, can easily keep files open longer than usual, leading to this issue.

When this problem occurs, npm seems to recover in certain circumstances but not others. I haven't figured that out yet, but I suspect that to npm, some requests are considered un-retryable. When this happens, you see the

```
:: npm ERR! Proxy connection ended before receiving CONNECT response
```

error in the npm logs.

This problem is fairly easily (though not very reliably) reproducible in our pre-production environment, so I hope to identify and implement a fix soon.

Revision history for this message
Jürgen Gmach (jugmac00) wrote :

@Alan Could you please retry the build? There had been some work to increase the limit.

Changed in launchpad:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Simone Pelosi (pelpsi)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.