Juju is unreliable on Joyent because of undeleted firewalls

Bug #1485781 reported by Curtis Hovey on 2015-08-17
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Unassigned
juju-core
Critical
Unassigned
1.25
Critical
Unassigned

Bug Description

CI has observed several cases where Juju on Joyent is reliable, but reliability and repeatability decline after a week. Manually deleting firewalls after we teardown the env appears to restore reliability.

Joyent firewalls were introduced after we created the joyent-provider. After the last upgrade to Joyent cloud, the feature is now always on in. We saw Joyent become the most reliable to deploy on. We also noted that we were getting successes when we had mixed networks. Then over a week we saw that that service machine could not contact the state server to download agents. We can also see that all machines were on same 72.* or 165.* network. Our own personal experience with Joyent showed it was still very reliable.

We discovered in just a week, CI had added 1000 firewall rules. We deleted all the rules, Joyent was better for a time. We then added a rule to several CI jobs to delete firewalls after destroy-environment. CI appears to be very happy.

If Joyent continues to be happy without an accumulation of firewall rules, I think we need to update juju to alway delete firewalls when destroying an env to ensure repeatability.

Curtis Hovey (sinzui) wrote :

We have enough evidence to conclude that deleting firewalls makes juju reliable:
This Week:
    95% success rate for series and bundles
    precise is 100% reliable
    The 10% bundle failures were caused by charms
    No machine agents failed to download
    No actions were take to prevent 72.* addresses.

Last week:
    40% success rate for all series and bundles.
    precise was 25% reliable
    bundles failed 30% of the time because agents failed to download
    80% of failures were because agents cold not be downloaded from the state-server
    We kept 72.* addresses tied to running machines to keep them out of tests

Joyent's firewall rules left beta earlier this year. AS major update to the regions happened in the last 4 weeks. We saw an immediate improvement after the updates, but reliability declined until we started manually deleting firewall rules tagged with the Juju env.

description: updated
Curtis Hovey (sinzui) on 2015-08-27
Changed in juju-core:
milestone: 1.25-alpha1 → 1.25-beta1
Changed in juju-core:
milestone: 1.25-beta1 → 1.25-beta2
Ian Booth (wallyworld) on 2015-09-17
Changed in juju-core:
milestone: 1.25-beta2 → 1.26-alpha1
no longer affects: juju-core/1.24
no longer affects: juju-core/1.22
Curtis Hovey (sinzui) on 2015-11-03
Changed in juju-core:
milestone: 1.26-alpha1 → 1.26-alpha2
Changed in juju-core:
milestone: 1.26-alpha2 → 1.26-beta1
Curtis Hovey (sinzui) on 2015-11-25
description: updated
Changed in juju-core:
milestone: 1.26-beta1 → 2.0-alpha2
Changed in juju-core:
milestone: 2.0-alpha2 → 2.0-alpha3
Changed in juju-core:
milestone: 2.0-alpha3 → 2.0-beta4
Curtis Hovey (sinzui) on 2016-03-29
tags: added: ci
Curtis Hovey (sinzui) on 2016-03-31
tags: added: jujuqa
Changed in juju-core:
milestone: 2.0-beta4 → 2.0.1
Mark Ramm (mark-ramm) wrote :

These kinds of "failure to tear down" issues have been discussed as something which should have CI tests to check that everything is torn down properly, and would then curse the build.

Curtis Hovey (sinzui) on 2016-06-27
summary: - Juju is unreliable on Joyent
+ Juju is unreliable on Joyent because of undeleted firewalls
affects: juju-core → juju
Changed in juju:
milestone: 2.0.1 → none
milestone: none → 2.0.1
Changed in juju-core:
importance: Undecided → Critical
status: New → Won't Fix
Curtis Hovey (sinzui) on 2016-10-28
Changed in juju:
milestone: 2.0.1 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers