Juju is unreliable on Joyent because of undeleted firewalls
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Won't Fix
|
High
|
Unassigned | ||
juju-core |
Won't Fix
|
Critical
|
Unassigned | ||
1.25 |
Won't Fix
|
Critical
|
Unassigned |
Bug Description
CI has observed several cases where Juju on Joyent is reliable, but reliability and repeatability decline after a week. Manually deleting firewalls after we teardown the env appears to restore reliability.
Joyent firewalls were introduced after we created the joyent-provider. After the last upgrade to Joyent cloud, the feature is now always on in. We saw Joyent become the most reliable to deploy on. We also noted that we were getting successes when we had mixed networks. Then over a week we saw that that service machine could not contact the state server to download agents. We can also see that all machines were on same 72.* or 165.* network. Our own personal experience with Joyent showed it was still very reliable.
We discovered in just a week, CI had added 1000 firewall rules. We deleted all the rules, Joyent was better for a time. We then added a rule to several CI jobs to delete firewalls after destroy-
If Joyent continues to be happy without an accumulation of firewall rules, I think we need to update juju to alway delete firewalls when destroying an env to ensure repeatability.
Changed in juju-core: | |
milestone: | 1.25-alpha1 → 1.25-beta1 |
Changed in juju-core: | |
milestone: | 1.25-beta1 → 1.25-beta2 |
Changed in juju-core: | |
milestone: | 1.25-beta2 → 1.26-alpha1 |
no longer affects: | juju-core/1.24 |
no longer affects: | juju-core/1.22 |
Changed in juju-core: | |
milestone: | 1.26-alpha1 → 1.26-alpha2 |
Changed in juju-core: | |
milestone: | 1.26-alpha2 → 1.26-beta1 |
description: | updated |
Changed in juju-core: | |
milestone: | 1.26-beta1 → 2.0-alpha2 |
Changed in juju-core: | |
milestone: | 2.0-alpha2 → 2.0-alpha3 |
Changed in juju-core: | |
milestone: | 2.0-alpha3 → 2.0-beta4 |
tags: | added: ci |
tags: | added: jujuqa |
Changed in juju-core: | |
milestone: | 2.0-beta4 → 2.0.1 |
summary: |
- Juju is unreliable on Joyent + Juju is unreliable on Joyent because of undeleted firewalls |
affects: | juju-core → juju |
Changed in juju: | |
milestone: | 2.0.1 → none |
milestone: | none → 2.0.1 |
Changed in juju-core: | |
importance: | Undecided → Critical |
status: | New → Won't Fix |
Changed in juju: | |
milestone: | 2.0.1 → none |
We have enough evidence to conclude that deleting firewalls makes juju reliable:
This Week:
95% success rate for series and bundles
precise is 100% reliable
The 10% bundle failures were caused by charms
No machine agents failed to download
No actions were take to prevent 72.* addresses.
Last week:
40% success rate for all series and bundles.
precise was 25% reliable
bundles failed 30% of the time because agents failed to download
80% of failures were because agents cold not be downloaded from the state-server
We kept 72.* addresses tied to running machines to keep them out of tests
Joyent's firewall rules left beta earlier this year. AS major update to the regions happened in the last 4 weeks. We saw an immediate improvement after the updates, but reliability declined until we started manually deleting firewall rules tagged with the Juju env.