HP cloud requires 4G to do what AWS does in 2G mem

Bug #1255242 reported by Curtis Hovey
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Invalid
High
Unassigned
mysql (Juju Charms Collection)
New
Undecided
Unassigned

Bug Description

I am sorry for this late bug report. CI consistently shows that upgrade-juju consistently fails on hp-cloud. Revno 2072 appears to be the culprit. Revno 2071 was the last pass, and we verified 2071 was good in two repeats runs of CI. Revno 2072 always fails and fails the same way...the mysql charm has a hook error.

The specific test that fails does this: A 1.16.3 env is bootstrapped on hp cloud. Wordpress + mysql is deployed and verified to be up. The test workspace installed the juju-core 1.17.0. package created from revno 2072. The new tools and metadata were published to all the clouds uder the testing/ prefix. Next juju-upgrade was called on HP cloud and we know 1.17.0 was found.

We see this error after the second effort to get the tools
    ERROR juju runner.go:220 worker: exited "environ-provisioner": no state server machines with addresses found

Much latter we see a hook failure
    juju-test-release-hp-machine-2:2013-11-26 15:10:05 INFO juju.worker.uniter context.go:263
    HOOK subprocess.CalledProcessError: Command '['service', 'mysql', 'start']' returned non-zero exit status 1

The mysql charm did not fail on aws, canonistack, lxc, or azure. Just Hp.

I am attaching the juju and CI log.

UPDATE

CI now deploys mysql on mem=4G to run the same tests that lxc, aws, azure, and canonistack do with 2G. I have verified that Juju can be deployed in 512m on canonistack, which it could not it August. I believe this is a HP issues, not a Juju issue.

Revision history for this message
Curtis Hovey (sinzui) wrote :
Revision history for this message
Curtis Hovey (sinzui) wrote :

This is the test runner log. It captures the all-machine-log when it believes there is a error.

tags: added: ci
Revision history for this message
John A Meinel (jameinel) wrote :

You mention that "1.16.3 deploys wordpress and mysql and verifies that they are up".
However, looking at the log I see:
369:juju-test-release-hp-machine-1:2013-11-26 15:09:23 INFO juju.worker.upgrader upgrader.go:134 desired tool version: 1.17.0
370:juju-test-release-hp-machine-1:2013-11-26 15:09:23 INFO juju.worker.machiner machiner.go:52 "machine-1" started
371:juju-test-release-hp-machine-1:2013-11-26 15:09:23 INFO juju runner.go:262 worker: start "lxc-watcher"
372:juju-test-release-hp-machine-1:2013-11-26 15:09:23 INFO juju runner.go:262 worker: start "deployer"
373:juju-test-release-hp-machine-1:2013-11-26 15:09:23 INFO juju.worker.deployer deployer.go:106 checking unit "wordpress/0"
374:juju-test-release-hp-machine-1:2013-11-26 15:09:23 INFO juju.worker.deployer deployer.go:146 deploying unit "wordpress/0"

That would indicate that the agent already thinks it wants to run Juju 1.17.0 *before* it has gotten to the point of deploying anything. I don't actually see any steps where the tool is clearly 1.16.3 (we should add a log message on startup of what version the agent is running, though we do have that in the RPC log on machine-0):

388:juju-test-release-hp-machine-0:2013-11-26 15:09:23 DEBUG juju.state.apiserver apiserver.go:102 <- [4] machine-1 {"RequestId":7,"Type":"Upgrader","Request":"SetTools","Params":{"AgentTools":[{"Tag":"machine-1","Tools":{"Version":"1.17.0-precise-amd64"}}]}}

Where machine-1 is reporting to the API server that it is running 1.17.0.
And I'll note that it isn't until:

1107:juju-test-release-hp-machine-2:2013-11-26 15:09:38 INFO juju charm.go:61 worker/uniter/charm: downloading cs:precise/mysql-29 from https://region-a.geo-1.objects.hpcloudsvc.com/v1/17031369947864/juju-pavlova-snags--chiko-roll-meat-pie-pasty-sauce-2013-10-10/cs_3a_precise_2f_mysql-29

a full 800 messages and 16 seconds later that machine-2 actually starts downloading the Mysql charm.

so this doesn't seem to be "upgrading mysql fails" but "installing mysql using 1.17.0" fails.

Can you confirm?

Revision history for this message
Curtis Hovey (sinzui) wrote :

I see 2013-10-10 in the private bucket. We have had problems with Hp/openstack remembering cruft from previous uses. We need a clean bucket to be sure the the same juju and mysql is used as in the other tests.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Note at http://162.213.35.54:8080/job/hp-upgrade-and-deploy/ shows that hp successfully upgraded for the first time since Novemer 4. It appears that Hp got fixed in r2116 at the cost of aws and canonistack, or that there is an undocumented behavioural difference that testing is not aware of. See Bug #1257371.

Revision history for this message
Curtis Hovey (sinzui) wrote : Re: juju 1.17.0 bootstrap on HP fails

Sorry, wrong bug. ignore. This bug is not being seen because of the bootstrap bug at the moment.

summary: - upgrade-juju on HP cloud broken in devel
+ juju 1.17.0 bootstrap on HP fails
summary: - juju 1.17.0 bootstrap on HP fails
+ upgrade-juju on HP cloud broken in devel
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.17.1 → 1.17.0
Curtis Hovey (sinzui)
summary: - upgrade-juju on HP cloud broken in devel
+ upgrade-juju and bootstrap on HP cloud broken in devel
Revision history for this message
Curtis Hovey (sinzui) wrote : Re: upgrade-juju and bootstrap on HP cloud broken in devel
Download full text (3.3 KiB)

Now that the bootstrap private ip fix is in trunk. CI now sees this bug when deploying wordpress and mysql using the the 1.17.0 candidate on both the client and server. CI sees a hook failure. 1.16.5 does not encounter this problem. This issue could relate to HP's speed or internals. The mysql charm itself might be brittle. I have played out this scenario 3 times using my one command line to allow for human slowness
    http://bazaar.launchpad.net/~juju-qa/juju-core/ci-cd-scripts2/view/head:/deploy_stack.py

We can see from the logs that the right juju version is always found so the script does not attempt to force the version.
I see config-changed hook failures of the mysql agent is stuck in pending for 40 minutes.

I am attaching a log from CI. I have confirmed using debug log that my own tests see this, but I think it is orthogonal:

machine-1: 2013-12-18 17:40:59 ERROR juju.worker.authenticationworker worker.go:68 adding current Juju keys to ssh authorised keys: generating key fingerprint: exit status 1
machine-1: 2013-12-18 17:40:59 ERROR juju runner.go:220 worker: exited "authenticationworker": adding current Juju keys to ssh authorised keys: generating key fingerprint: exit status 1
machine-1: 2013-12-18 17:40:59 INFO juju runner.go:254 worker: restarting "authenticationworker" in 3s
machine-1: 2013-12-18 17:41:02 INFO juju runner.go:262 worker: start "authenticationworker"
machine-1: 2013-12-18 17:41:02 WARNING juju.ssh authorisedkeys.go:244 invalid Juju ssh key : generating key fingerprint: exit status 1

The hook failures seen in 2 of my 3 manual tests report
ubuntu0.12.04.1) ...
2013-12-18 17:55:51 INFO juju.worker.uniter context.go:323 HOOK 131218 17:55:51 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
2013-12-18 17:56:00 INFO juju.worker.uniter context.go:323 HOOK mysql start/running, process 20579
2013-12-18 17:56:00 INFO juju.worker.uniter context.go:323 HOOK Setting up libhtml-template-perl (2.10-1) ...
2013-12-18 17:56:00 INFO juju.worker.uniter context.go:323 HOOK Setting up mysql-server (5.5.34-0ubuntu0.12.04.1) ...
2013-12-18 17:56:01 INFO worker.uniter.jujuc server.go:102 running hook tool "juju-log" ["-l" "INFO" "dataset size in bytes: 822083584"]
2013-12-18 17:56:01 INFO juju juju-log.go:66 mysql/0: dataset size in bytes: 822083584
2013-12-18 17:56:03 INFO juju.worker.uniter context.go:323 HOOK mysql stop/waiting
2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK start: Job failed to start
2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK Traceback (most recent call last):
2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/config-changed", line 356, in <module>
2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK check_call(['service','mysql','start'])
2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK File "/usr/lib/python2.7/subprocess.py", line 511, in check_call
2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK raise CalledProcessError(retcode, cmd)
2013-12-18 ...

Read more...

Revision history for this message
Dave Cheney (dave-cheney) wrote : Re: [Bug 1255242] Re: upgrade-juju and bootstrap on HP cloud broken in devel
Download full text (5.4 KiB)

I'm taking a look now.

On Thu, Dec 19, 2013 at 5:03 AM, Curtis Hovey <email address hidden> wrote:

> Now that the bootstrap private ip fix is in trunk. CI now sees this bug
> when deploying wordpress and mysql using the the 1.17.0 candidate on both
> the client and server. CI sees a hook failure. 1.16.5 does not encounter
> this problem. This issue could relate to HP's speed or internals. The mysql
> charm itself might be brittle. I have played out this scenario 3 times
> using my one command line to allow for human slowness
>
> http://bazaar.launchpad.net/~juju-qa/juju-core/ci-cd-scripts2/view/head:/deploy_stack.py
>
> We can see from the logs that the right juju version is always found so
> the script does not attempt to force the version.
> I see config-changed hook failures of the mysql agent is stuck in pending
> for 40 minutes.
>
> I am attaching a log from CI. I have confirmed using debug log that my
> own tests see this, but I think it is orthogonal:
>
> machine-1: 2013-12-18 17:40:59 ERROR juju.worker.authenticationworker
> worker.go:68 adding current Juju keys to ssh authorised keys: generating
> key fingerprint: exit status 1
> machine-1: 2013-12-18 17:40:59 ERROR juju runner.go:220 worker: exited
> "authenticationworker": adding current Juju keys to ssh authorised keys:
> generating key fingerprint: exit status 1
> machine-1: 2013-12-18 17:40:59 INFO juju runner.go:254 worker: restarting
> "authenticationworker" in 3s
> machine-1: 2013-12-18 17:41:02 INFO juju runner.go:262 worker: start
> "authenticationworker"
> machine-1: 2013-12-18 17:41:02 WARNING juju.ssh authorisedkeys.go:244
> invalid Juju ssh key : generating key fingerprint: exit status 1
>
> The hook failures seen in 2 of my 3 manual tests report
> ubuntu0.12.04.1) ...
> 2013-12-18 17:55:51 INFO juju.worker.uniter context.go:323 HOOK 131218
> 17:55:51 [Warning] Using unique option prefix key_buffer instead of
> key_buffer_size is deprecated and will be removed in a future release.
> Please use the full name instead.
> 2013-12-18 17:56:00 INFO juju.worker.uniter context.go:323 HOOK mysql
> start/running, process 20579
> 2013-12-18 17:56:00 INFO juju.worker.uniter context.go:323 HOOK Setting up
> libhtml-template-perl (2.10-1) ...
> 2013-12-18 17:56:00 INFO juju.worker.uniter context.go:323 HOOK Setting up
> mysql-server (5.5.34-0ubuntu0.12.04.1) ...
> 2013-12-18 17:56:01 INFO worker.uniter.jujuc server.go:102 running hook
> tool "juju-log" ["-l" "INFO" "dataset size in bytes: 822083584"]
> 2013-12-18 17:56:01 INFO juju juju-log.go:66 mysql/0: dataset size in
> bytes: 822083584
> 2013-12-18 17:56:03 INFO juju.worker.uniter context.go:323 HOOK mysql
> stop/waiting
> 2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK start: Job
> failed to start
> 2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK Traceback
> (most recent call last):
> 2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK File
> "/var/lib/juju/agents/unit-mysql-0/charm/hooks/config-changed", line 356,
> in <module>
> 2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK
> check_call(['service','mysql','start'])
> 2013-12-18 17:56:05 INFO juju.worker.uniter...

Read more...

Revision history for this message
Dave Cheney (dave-cheney) wrote :
Download full text (5.8 KiB)

Oh dear, things are quite broken in the cloud images today.

Preparing to replace landscape-client 12.05-0ubuntu1.12.04 (using
.../landscape-client_13.07.3-0ubuntu0.12.04_amd64.deb) ...
 * Stopping landscape-client daemon
   ...fail!

On Thu, Dec 19, 2013 at 11:49 AM, David Cheney
<email address hidden>wrote:

> I'm taking a look now.
>
>
> On Thu, Dec 19, 2013 at 5:03 AM, Curtis Hovey <email address hidden>wrote:
>
>> Now that the bootstrap private ip fix is in trunk. CI now sees this bug
>> when deploying wordpress and mysql using the the 1.17.0 candidate on both
>> the client and server. CI sees a hook failure. 1.16.5 does not encounter
>> this problem. This issue could relate to HP's speed or internals. The mysql
>> charm itself might be brittle. I have played out this scenario 3 times
>> using my one command line to allow for human slowness
>>
>> http://bazaar.launchpad.net/~juju-qa/juju-core/ci-cd-scripts2/view/head:/deploy_stack.py
>>
>> We can see from the logs that the right juju version is always found so
>> the script does not attempt to force the version.
>> I see config-changed hook failures of the mysql agent is stuck in pending
>> for 40 minutes.
>>
>> I am attaching a log from CI. I have confirmed using debug log that my
>> own tests see this, but I think it is orthogonal:
>>
>> machine-1: 2013-12-18 17:40:59 ERROR juju.worker.authenticationworker
>> worker.go:68 adding current Juju keys to ssh authorised keys: generating
>> key fingerprint: exit status 1
>> machine-1: 2013-12-18 17:40:59 ERROR juju runner.go:220 worker: exited
>> "authenticationworker": adding current Juju keys to ssh authorised keys:
>> generating key fingerprint: exit status 1
>> machine-1: 2013-12-18 17:40:59 INFO juju runner.go:254 worker: restarting
>> "authenticationworker" in 3s
>> machine-1: 2013-12-18 17:41:02 INFO juju runner.go:262 worker: start
>> "authenticationworker"
>> machine-1: 2013-12-18 17:41:02 WARNING juju.ssh authorisedkeys.go:244
>> invalid Juju ssh key : generating key fingerprint: exit status 1
>>
>> The hook failures seen in 2 of my 3 manual tests report
>> ubuntu0.12.04.1) ...
>> 2013-12-18 17:55:51 INFO juju.worker.uniter context.go:323 HOOK 131218
>> 17:55:51 [Warning] Using unique option prefix key_buffer instead of
>> key_buffer_size is deprecated and will be removed in a future release.
>> Please use the full name instead.
>> 2013-12-18 17:56:00 INFO juju.worker.uniter context.go:323 HOOK mysql
>> start/running, process 20579
>> 2013-12-18 17:56:00 INFO juju.worker.uniter context.go:323 HOOK Setting
>> up libhtml-template-perl (2.10-1) ...
>> 2013-12-18 17:56:00 INFO juju.worker.uniter context.go:323 HOOK Setting
>> up mysql-server (5.5.34-0ubuntu0.12.04.1) ...
>> 2013-12-18 17:56:01 INFO worker.uniter.jujuc server.go:102 running hook
>> tool "juju-log" ["-l" "INFO" "dataset size in bytes: 822083584"]
>> 2013-12-18 17:56:01 INFO juju juju-log.go:66 mysql/0: dataset size in
>> bytes: 822083584
>> 2013-12-18 17:56:03 INFO juju.worker.uniter context.go:323 HOOK mysql
>> stop/waiting
>> 2013-12-18 17:56:05 INFO juju.worker.uniter context.go:323 HOOK start:
>> Job failed to start
>> 2013-12-18 17:56:05 INFO juju.worker.uni...

Read more...

Revision history for this message
Curtis Hovey (sinzui) wrote : Re: upgrade-juju and bootstrap on HP cloud broken in devel

CI passed lp:juju-core r2162. I hope this bug was fixed with the other recent revisions. CI is starting its second run against r2162. I will close this bug if HP passes again.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Ci still see this bug in some runs. Since it is intermittent and may be charm specific, I am downgrading this to High and will move this issue to a latter milestone.

tags: added: intermittent-failure
Changed in juju-core:
importance: Critical → High
milestone: 1.17.0 → 1.17.1
Revision history for this message
Curtis Hovey (sinzui) wrote :

I think I have sussed the HP mysql errors that have plagues the 1.17.0 development cycle.

I and the CI use machines with 2G memory. This is about as small as anyone will use mysql for. Juju's memory usage has changed quite a bit. We have seen the mysql state errors change from rare or very common.

For now, I have changed CI to use more RAM. If the next few CI runs don't see the error, I will consider this issue a mysql problem. However we might have a user problem because "juju deploy mysql" is less likely to "just work" on HP cloud.

Martin Packman (gz)
Changed in juju-core:
milestone: 1.17.1 → 1.18.0
Curtis Hovey (sinzui)
summary: - upgrade-juju and bootstrap on HP cloud broken in devel
+ HP cloud requires 4G to do what AWS does in 2G mem
Curtis Hovey (sinzui)
description: updated
Changed in juju-core:
milestone: 1.18.0 → none
Revision history for this message
Curtis Hovey (sinzui) wrote :

I removed the regression tag but is it not clear that juju is at fault. There is still a problem, but it may be HP Cloud or their OS images that are at fault.

tags: removed: regression
Revision history for this message
John A Meinel (jameinel) wrote :

options:
    dataset-size:
        default: '80%'
        description: How much data do you want to keep in memory in the DB. This will be used to tune settings in the database server appropriately. Any more specific settings will override these defaults though. This currently sets innodb_buffer_pool_size or key_cache_size depending on the setting in preferred-storage-engine. If query-cache-type is set to 'ON' or 'DEMAND' 20% of this is given to query-cache-size. Suffix this value with 'K','M','G', or 'T' to get the relevant kilo/mega/etc. bytes. If suffixed with %, one will get that percentage of RAM devoted to dataset and (if enabled) query cache.

So the MySQL charm defaults to dedicating 80% of the target machine's RAM to mysql caches. I have the feeling what is happening is that 20% of RAM is enough for some machines, but not all machines.

An "easy" fix would be to do:
cat >mysql.conf <<EOF
mysql:
  dataset-size: 20M
EOF

juju deploy mysql --config mysql.conf ...

I tested it on the local provider, and it worked well.

That would give you a fixed MySQL overhead for testing.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Thank you John. I am going to move this bug to charms now that you have explained the issue from the charm's perspective.

Changed in juju-core:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.