HPCC Charm initial check-in for review

Bug #1272083 reported by Xiaoming Wang on 2014-01-23
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juju Charms Collection
Undecided
Xiaoming Wang

Bug Description

A new Juju Charm - hpcc is submitted for review.
The README.md has the information for how to use the charm.

For charm code:
  config.yaml: there are parameters to control which HPCCSystems use can install. There are other parameters for how to configure
                             the cluster.
  bin/ : There are some help scripts we put here mainly for re-configure HPCC cluster. Default configuration may not meet user need.
              We understand it probably not recommend way for juju charm. But we want to give user convenient way to access the tools.
              We are open for discuss these during the review process.
  icon.svg: We haven't gotten which icon we should use. It is in our internal process.

  Currently HPCC charm doesn't have related to any other charms.

Marco Ceppi (marcoceppi) on 2014-01-27
Changed in charms:
assignee: nobody → Xiaoming Wang (xwang2713)
Charles Butler (lazypower) wrote :

Greetings Xiaoming,

First, I want to thank you for the excellent work on this charm. I'm really excited to be reviewing a charm that has integrated support for multiple host operating systems, and installation paths. There were some rough corners that I've identified in my cursory review of the charm, the notes are as follows:

`Charm Proof`

When executing 'charm proof' it returns that hpcc has no hooks. I assume this interface is exposed for peer relationship purposes to get the public-ip of its peer?

Readme:

Your work in the Readme is excellent. I went through the example deployment and found everything was very straight forward. I've tested this with both the Local Provider and Amazon EC2 environments, and both completed without an issue.

Further down the readme, I was looking for further instruction on the configuration options. These are well documented in the interface thanks to your verbose config.yaml, however its good practice to expand on the configuration detail in the README. It would be a good idea to itemize the list of configuration options, go over in detail what the defaults are providing, and cover any of the specifics of each configuration option.

Another item that I missed in the Readme was a callout to the necessity of running a third party script from the CLI to complete the scaling process among peers. I see its documented in config.yaml, but again, providing a thorough overview in the Readme of this process would be preferrable, along with a documented example run so users know what to expect during the execution of this additional script.

I pulled a copy of the .deb from your community edition download portal, and hosted it from a S3 bucket and installed from there to test the custom location installation, and it went flawlessly.

Nitpicks:

Departed relationship hook uses a file in /tmp, which may not exist if the host machine reboots. It may be better practice to cache this information in $CHARM_ROOT, or another location that is not subject to frequent deletions on reboot.

The Provided SSH Keys are a potential attack vector, but have been noted in installation script. When preparing for Charm Store review, this will need to be generated on the host, or configured by the user.

Resolution:

Barring some simple modifications and documentation updates, I feel this charm will be ready for prime time in no time. Thank you again for the work on this charm. I'm going to move the status of this review to 'incomplete' barring the modifications. When you are ready for another review simply place the status of the charm to 'new' or 'needs review' and we will be more than happy to give it another look.

Changed in charms:
status: New → Incomplete
Download full text (4.3 KiB)

Hi Charles & Marco,

For ssh key I will create variable of ssh key directory in config.yaml
If it defined our charm will use it instead of default one shipped with our
charm.

I hope this will work with charm store as well. If not let me know.

Thanks

On Tue, Jan 28, 2014 at 2:35 PM, Charles Butler <
<email address hidden>> wrote:

> Greetings Xiaoming,
>
> First, I want to thank you for the excellent work on this charm. I'm
> really excited to be reviewing a charm that has integrated support for
> multiple host operating systems, and installation paths. There were some
> rough corners that I've identified in my cursory review of the charm,
> the notes are as follows:
>
> `Charm Proof`
>
> When executing 'charm proof' it returns that hpcc has no hooks. I assume
> this interface is exposed for peer relationship purposes to get the
> public-ip of its peer?
>
> Readme:
>
> Your work in the Readme is excellent. I went through the example
> deployment and found everything was very straight forward. I've tested
> this with both the Local Provider and Amazon EC2 environments, and both
> completed without an issue.
>
> Further down the readme, I was looking for further instruction on the
> configuration options. These are well documented in the interface thanks
> to your verbose config.yaml, however its good practice to expand on the
> configuration detail in the README. It would be a good idea to itemize
> the list of configuration options, go over in detail what the defaults
> are providing, and cover any of the specifics of each configuration
> option.
>
> Another item that I missed in the Readme was a callout to the necessity
> of running a third party script from the CLI to complete the scaling
> process among peers. I see its documented in config.yaml, but again,
> providing a thorough overview in the Readme of this process would be
> preferrable, along with a documented example run so users know what to
> expect during the execution of this additional script.
>
> I pulled a copy of the .deb from your community edition download portal,
> and hosted it from a S3 bucket and installed from there to test the
> custom location installation, and it went flawlessly.
>
> Nitpicks:
>
> Departed relationship hook uses a file in /tmp, which may not exist if
> the host machine reboots. It may be better practice to cache this
> information in $CHARM_ROOT, or another location that is not subject to
> frequent deletions on reboot.
>
> The Provided SSH Keys are a potential attack vector, but have been noted
> in installation script. When preparing for Charm Store review, this will
> need to be generated on the host, or configured by the user.
>
> Resolution:
>
> Barring some simple modifications and documentation updates, I feel this
> charm will be ready for prime time in no time. Thank you again for the
> work on this charm. I'm going to move the status of this review to
> 'incomplete' barring the modifications. When you are ready for another
> review simply place the status of the charm to 'new' or 'needs review'
> and we will be more than happy to give it another look.
>
>
> ** Changed in: charms
> Status: New => Incomplete
>
> --
> You rec...

Read more...

Xiaoming Wang (xwang2713) wrote :
Download full text (4.8 KiB)

Sorry, ignore my previous node.
Don't think it will work since install hook can't see juju server ssh-keys
directory.
I am not sure if there is a way to around this.
Basically allow user generate ssh-keys put to hpcc/ssh-kyes directory then
deploy.
It is OK for local repository. Don't know how it works from juju-store

On Wed, Jan 29, 2014 at 9:54 AM, Xiaoming Wang <email address hidden> wrote:

> Hi Charles & Marco,
>
> For ssh key I will create variable of ssh key directory in config.yaml
> If it defined our charm will use it instead of default one shipped with
> our charm.
>
> I hope this will work with charm store as well. If not let me know.
>
> Thanks
>
>
> On Tue, Jan 28, 2014 at 2:35 PM, Charles Butler <
> <email address hidden>> wrote:
>
>> Greetings Xiaoming,
>>
>> First, I want to thank you for the excellent work on this charm. I'm
>> really excited to be reviewing a charm that has integrated support for
>> multiple host operating systems, and installation paths. There were some
>> rough corners that I've identified in my cursory review of the charm,
>> the notes are as follows:
>>
>> `Charm Proof`
>>
>> When executing 'charm proof' it returns that hpcc has no hooks. I assume
>> this interface is exposed for peer relationship purposes to get the
>> public-ip of its peer?
>>
>> Readme:
>>
>> Your work in the Readme is excellent. I went through the example
>> deployment and found everything was very straight forward. I've tested
>> this with both the Local Provider and Amazon EC2 environments, and both
>> completed without an issue.
>>
>> Further down the readme, I was looking for further instruction on the
>> configuration options. These are well documented in the interface thanks
>> to your verbose config.yaml, however its good practice to expand on the
>> configuration detail in the README. It would be a good idea to itemize
>> the list of configuration options, go over in detail what the defaults
>> are providing, and cover any of the specifics of each configuration
>> option.
>>
>> Another item that I missed in the Readme was a callout to the necessity
>> of running a third party script from the CLI to complete the scaling
>> process among peers. I see its documented in config.yaml, but again,
>> providing a thorough overview in the Readme of this process would be
>> preferrable, along with a documented example run so users know what to
>> expect during the execution of this additional script.
>>
>> I pulled a copy of the .deb from your community edition download portal,
>> and hosted it from a S3 bucket and installed from there to test the
>> custom location installation, and it went flawlessly.
>>
>> Nitpicks:
>>
>> Departed relationship hook uses a file in /tmp, which may not exist if
>> the host machine reboots. It may be better practice to cache this
>> information in $CHARM_ROOT, or another location that is not subject to
>> frequent deletions on reboot.
>>
>> The Provided SSH Keys are a potential attack vector, but have been noted
>> in installation script. When preparing for Charm Store review, this will
>> need to be generated on the host, or configured by the user.
>>
>> Resolution:
>>
>> Barring some simple m...

Read more...

Xiaoming Wang (xwang2713) wrote :

Thanks for the review.
We submit modified version the hpcc charm. Please help use review it again.
Here is what changed:

Based on the initial review the following modifications are made:

1) replaced /tmp with /var/lib/juju/hpcc which is created during install hook
   We do not use $CHARM_DIR since we want this directory to have the same name for all nodes.
   We cannot find variable $CHARM_ROOT. If there is a variable to point to /var/lib/juju we will
   be happy to use it instead of the hard-coded /var/lib/juju.

2) hpcc relation hook. For the first release there are no other charms related to hpcc. But
   in near future we will add other related charms, for example, ganglia-monitor and hadoop
   support, etc. We add an empty hpcc-relation-changed hook to satisfy the 'juju charm proof'

3) There is a potential security risk by using supplied ssh-key. For local deployment
   we added a sentence to recommend users to generate their own keys and replace the ones in hpcc/ssh-keys
   directory. For deploy directly from juju-store we don't know how to use users' generated keys.

4) Regarding the README section on help scripts under hpcc/bin:
   Users would need to download the charm to use the scripts.
   There are two main shell scripts. These are now explained in the readme (see below). The other two
   python scripts are support scripts and users do not call them directly.

   config_hpcc.sh : When you use the `juju add-unit` command to add nodes, scripts are called automatically
   to provide a default configuration. If you want to configure manually, set auto-gen to 0, wait for all
   nodes to be in a "started" state, then call the config_hpcc.sh script using the following parameters:

     ./config_hpcc.sh -thornodes <# of thor nodes> -roxienodes <# of roxie nodes> -supportnodes
            <# of support nodes> -slavespernode <#of thor slaves per node>

   Another useful script reports the URL for the ECL Watch node. Call the get-url.sh script to display the
   cluster configuration and the URL for the ECL Watch service.

5) We updated the icon.svg file with our company logo

Changed in charms:
status: Incomplete → New
Charles Butler (lazypower) wrote :
Download full text (3.7 KiB)

Greetings Xiaoming,

Thank you for the speedy response in providing the fixes for the HPCC charm. This is really coming along nicely. And I personally apologize for the confusion surrounding $CHARM_ROOT, what I had intended to write was $CHARM_DIR

> We do not use $CHARM_DIR since we want this directory to have the same name for all nodes. - after re-reviewing the code submission, it appears to me that you need this data available outside of the execution run-time of Juju, and you have no guarantee that you can access the data in a consistent manner when using $CHARM_DIR. I don't want to nitpick for using /var/lib/juju, as you have satisfied the requirement of not caching in ephemeral storage, but I feel it would be better served in it's own location: /var/lib/hpcc, or /etc/hpcc, or /usr/share/hpcc. As this data is not related to Juju.

> hpcc relation hook. - that's good to hear that you are looking into investigating further integration with other services like ganglia. I'm excited to see those as they land. Thank you for the clarification.

> ssh keys - I'm not really familiar with the use of the SSH keys in this charm, however there are a few charms that use SSL key generation. For an example take a look at the Postfix charm written by a fellow community member: http://bazaar.launchpad.net/~jose/charms/precise/postfix/trunk/files/head:

They provide a set of configuration fields for users to insert their .ca and .crt file contents, and if not present they generate self signed certificates, or to generate a certificate based on provided .key and .crt files.

My suggested implementation:

Add a configuration option for a public/private key pair, so the SSH keys themselves are stored in the JUJU runtime environment. When the hook execution runs, if no user-configured ssh key is present, you can then generate them on the first host that gets deployed. As subsequent HPCC client's join the cluster, in the hpcc-relation-joined hook, read the contents of $USER/.ssh/id_rsa.pub and $USER/.ssh/id_rsa, and set those as configuration values that are part of the relationship. Example code follows:

file: Install

PUB_KEY = `config-get public_key`
PRIV_KEY=`config-get private_key`

if [ -z "$PUB_KEY" ] && [ -z "$PRIV_KEY" ]; then
   ssh-keygen -b 2048 -t rsa -N "" -f $PATH_HOME/.ssh/id_rsa
else
  juju-log "Using User Defined Keys"
  echo "$private_key" > $PATH_HOME/.ssh/id_rsa
  echo "$public_key" >> $PATH_HOME/.ssh/id_rsa.pub
 fi

file: hpcc-relationship-changed
 *note* this won't work if you have a dependency on using these keys prior to the relationship-joined/changed hook execution.
public_key=`relation-get public_key`
private_key=`relation-get private_key`

if [ -z "$public_key" ] || [ -z "$private_key" ]; then
  if [ -f $PATH_HOME/.ssh/id_rsa ]; then
    juju-log "Sending my keys for consideration"
    relation-set private_key=`cat $PATH_HOME/.ssh/id_rsa`
    relation-set public_key=`cat $PATH_HOME/.ssh/id_rsa.pub`
  fi
else
  echo "$private_key" > $PATH_HOME/.ssh/id_rsa
  echo "$public_key" >> $PATH_HOME/.ssh/id_rsa.pub
  echo "$public_key" >> $PATH_HOME/.ssh/authorized_keys
fi

Hopefully this will help clarify some of the SSH key dist...

Read more...

Charles Butler (lazypower) wrote :

Quick addendum I did not catch before posting, the writing to id_rsa.pub should be an overwrite pipe, > not appending >>

Changed in charms:
status: New → Incomplete
Xiaoming Wang (xwang2713) wrote :
Download full text (5.7 KiB)

HI Charles,

Thanks for the suggestion. Actually original we thought allow users put
ssh keys in config.yaml but afraid that is not good to exposed in there.
But re-think it and recommend by you and other's implementation we do it.
As set/get ssh keys with relation-set/get how can we guarantee only one
unit set this (we don't want every unit generate their own keys since in
practice user may have hundreds or thousands of units per cluster)?

If it is OK we only implement customized keys through config.yaml. If user
doesn't set it we will use default keys shipped with hpcc charm. In future
we can enhance this by dynamically generate keys.

Thanks

On Fri, Jan 31, 2014 at 3:03 PM, Charles Butler <
<email address hidden>> wrote:

> Greetings Xiaoming,
>
> Thank you for the speedy response in providing the fixes for the HPCC
> charm. This is really coming along nicely. And I personally apologize
> for the confusion surrounding $CHARM_ROOT, what I had intended to write
> was $CHARM_DIR
>
>
> > We do not use $CHARM_DIR since we want this directory to have the same
> name for all nodes. - after re-reviewing the code submission, it appears to
> me that you need this data available outside of the execution run-time of
> Juju, and you have no guarantee that you can access the data in a
> consistent manner when using $CHARM_DIR. I don't want to nitpick for using
> /var/lib/juju, as you have satisfied the requirement of not caching in
> ephemeral storage, but I feel it would be better served in it's own
> location: /var/lib/hpcc, or /etc/hpcc, or /usr/share/hpcc. As this data is
> not related to Juju.
>
> > hpcc relation hook. - that's good to hear that you are looking into
> investigating further integration with other services like ganglia. I'm
> excited to see those as they land. Thank you for the clarification.
>
> > ssh keys - I'm not really familiar with the use of the SSH keys in
> this charm, however there are a few charms that use SSL key generation.
> For an example take a look at the Postfix charm written by a fellow
> community member:
> http://bazaar.launchpad.net/~jose/charms/precise/postfix/trunk/files/head:
>
> They provide a set of configuration fields for users to insert their .ca
> and .crt file contents, and if not present they generate self signed
> certificates, or to generate a certificate based on provided .key and
> .crt files.
>
> My suggested implementation:
>
> Add a configuration option for a public/private key pair, so the SSH
> keys themselves are stored in the JUJU runtime environment. When the
> hook execution runs, if no user-configured ssh key is present, you can
> then generate them on the first host that gets deployed. As subsequent
> HPCC client's join the cluster, in the hpcc-relation-joined hook, read
> the contents of $USER/.ssh/id_rsa.pub and $USER/.ssh/id_rsa, and set
> those as configuration values that are part of the relationship. Example
> code follows:
>
> file: Install
>
> PUB_KEY = `config-get public_key`
> PRIV_KEY=`config-get private_key`
>
> if [ -z "$PUB_KEY" ] && [ -z "$PRIV_KEY" ]; then
> ssh-keygen -b 2048 -t rsa -N "" -f $PATH_HOME/.ssh/id_rsa
> else
> juju-log "Using User Defined K...

Read more...

Xiaoming Wang (xwang2713) wrote :

Thanks again for the review.

Based on the second review the following modifications are made:

1) replaced /var/lib/juju/hpcc_data with /var/lib/HPCCSystems/charm
   All hpcc charm related user data saved in there. For example, ip files, url, etc

2) Add two string variables: ssh-key-public and ssh-key-private. User can generate
   ssh key pair and copy/paste the public key and private to these variables.
   install hook script will use the values if defined. Otherwise default ssh key pair
   will be used.

   It would be nice if we can generate key paris if user doesn't supply them. But we only
   want to have one pair of key fo reach clusters and not sure how to garrantte only
   one unit generating the key when relation changed. We will leave it to future enhancement.

3) README file is updated to reflect the change in 2)

Changed in charms:
status: Incomplete → New
Xiaoming Wang (xwang2713) wrote :

disable shell debug when copy user ssh key pair values

Charles Butler (lazypower) wrote :

Greetings Xiaoming, I'm pulling the latest copy of your revisions and will have a follow up review prepared for you shortly. Thank you for your continued effort on this charm. We really appreciate it.

Charles Butler (lazypower) wrote :

Greetings Xiaoming, I've had a deeper look at the charm hooks and the progress you've made over the last review cycle. Things have progressed really nicely. The fact your charm is Centos aware has been an interesting feature set to review.

I have some additional notes while reviewing the charm, and they are as follows:

# Charm Proof
No output, fully ready to progress to the next phase of reviewing! Excellent work.

# Hook Review
The start hook halts in failure if the upstart job is already running. This is a situation where unexpected behavior from the hook was not handled gracefully. Exit 1 status calls should be reserved for halting charm execution if something has gone seriously wrong. One option would be to short circuit your start hook to restart if starting the service fails:
`service hpcc-init start || service hpcc-init restart`

the conditional around the port checking path in the installation hook violates the charm store policy. The official bullet point is as follows:

 - Must call Juju API tools (relation-*, unit-*, config-*, etc) without a hard coded path.

Downloads from upstream are not SHA1 verified, and is a requirement for inclusion to the charm store.

## Pushing SSH Keys across Units
> If it is OK we only implement customized keys through config.yaml. If user doesn't set it we will use default keys shipped with hpcc charm

I'm still against shipping any SSH keys with the charm. The preferred method here, would be to have it as a required configuration option for the charm. It's perfectly OK to have a charm halt what its doing if required configuration options are not present. As you progress through exploring dynamically generated keys, this functionality can be extended and enhanced in future releases - and is not a requirement at this phase of development.

# Summary
Prior to your next audit, take a quick run through the charm store policy bullet points to ensure you're adhering to the regulations surrounding having your charm ready for inclusion to the store.

https://juju.ubuntu.com/docs/authors-charm-policy.html

Over all the charm is progressing nicely, and you are really close to being ready for a promulgation review. the new additions have helped to shape it into a great new addition and with a little more work it will be ready for addition to the charm store.

Thank you for the continued submissions, we appreciate the hard work that has gone into this charm.

Changed in charms:
status: New → Incomplete
Xiaoming Wang (xwang2713) wrote :

Again, thanks for taking time to review HPCC Charm. Based on the 3rd review we made following changes:
1) Add restart in start hook

2) Add checksum validation. We add a string package-checksum in config.yaml. If it is not empty install script will use it to validate the downloaded package before install. In future our download site will provide md5sum file so validate will be conducted automatically wihtout this setting.

3) Fix the open-port which was broken before.

4) Add implementation to automatically generate ssh key pair if user doesn't provide them in config.yaml. Basically if no keys supplied in config.yaml every new joined node will try to use existing ssh keys through relation-get. If can find it will call ssh-keygen to create new pair. In relation-chanage hook every node will use the keys from lowest ip node if there are different thant its pair. It shouldn't have overhead since when every nodes has the same pair of key. We tested in various scenarios, for example deploy multiple nodes, add new nodes, etc. There is a restriction: user can not change the keys setting in config.yaml after the charm deployed. It will be added in future enhancement.

5) We add CentOS prerequisite but never test it. Actually we are not sure if current juju charm support CentOS or Fedora.

Changed in charms:
status: Incomplete → New
Matt Bruzek (mbruzek) wrote :
Download full text (6.2 KiB)

Hello Xiaoming,

Thanks for taking the time to continually improve this charm! I took a look at the hpcc charm this time and here is my review.

You do a really good job at handling the keys in the install hook I particularly like how you turned off BASH echo (start +x) when you were handling the keys! That is a very nice touch!

I also like the way you added the checksum verification in the configuration file. That is a very configurable solution. There are a few concerns with how it was implemented:

If the package-checksum is not set or of zero size the install will not do cryptographic verification. The install hook should fail if there was no checksum set. Otherwise someone could unset the value and no cryptographic verification would be done on the downloaded file.

Also I would suggest using sha1sum rather than md5sum. The underlying MD5 algorithm is no longer deemed secure so I would highly suggest sha1sum.

The hpcc charm did not deploy correctly. The start hook failed with an exit status of 1. I looked in the logs and found that some services FAILED.
Here are the logs from my system that show the error:
unit-hpcc-0: 2014-02-19 18:32:08 INFO install + set +x
unit-hpcc-0: 2014-02-19 18:32:09 INFO install + which open-port
unit-hpcc-0: 2014-02-19 18:32:09 INFO install /var/lib/juju/tools/unit-hpcc-0/open-port
unit-hpcc-0: 2014-02-19 18:32:09 INFO install + '[' 0 -eq 0 ']'
unit-hpcc-0: 2014-02-19 18:32:09 INFO install + open-port 8010/TCP
unit-hpcc-0: 2014-02-19 18:32:09 INFO install + open-port 8002/TCP
unit-hpcc-0: 2014-02-19 18:32:10 INFO install + open-port 8015/TCP
unit-hpcc-0: 2014-02-19 18:32:10 INFO install + open-port 9876/TCP
unit-hpcc-0: 2014-02-19 18:32:10 INFO install + JUJU_HPCC_DIR=/var/lib/HPCCSystems/charm
unit-hpcc-0: 2014-02-19 18:32:10 INFO install + '[' '!' -e /var/lib/HPCCSystems/charm ']'
unit-hpcc-0: 2014-02-19 18:32:10 INFO install + mkdir -p /var/lib/HPCCSystems/charm
unit-hpcc-0: 2014-02-19 18:32:10 INFO install + chmod -R 777 /var/lib/HPCCSystems/charm
unit-hpcc-0: 2014-02-19 18:32:15 INFO juju-log envgen_signature:
unit-hpcc-0: 2014-02-19 18:32:17 INFO start Starting mydafilesrv.... #033[32m [ OK ] #033[0m
unit-hpcc-0: 2014-02-19 18:32:18 INFO start Starting mydali.... #033[32m [ OK ] #033[0m
unit-hpcc-0: 2014-02-19 18:32:19 INFO start Starting mydfuserver.... #033[32m [ OK ] #033[0m
unit-hpcc-0: 2014-02-19 18:32:20 INFO start Starting myeclagent.... #033[32m [ OK ] #033[0m
unit-hpcc-0: 2014-02-19 18:32:22 INFO start Starting myeclccserver.... #033[32m [ OK ] #033[0m
unit-hpcc-0: 2014-02-19 18:32:23 INFO start Starting myeclscheduler.... #033[32m [ OK ] #033[0m
unit-hpcc-0: 2014-02-19 18:32:24 INFO start Starting myesp.... #033[32m [ OK ] #033[0m
unit-hpcc-0: 2014-02-19 18:32:30 INFO start Starting myroxie.... #033[31m [FAILED] #033[0m
unit-hpcc-0: 2014-02-19 18:32:31 INFO start Starting mysasha.... #033[32m [ OK ] #033[0m
unit-hpcc-0: 2014-02-19 18:32:46 INFO start Starting mythor.... #033[32m [ OK ] #033[0m
unit-hpcc-0: 2014-02-19 18:32:46 INFO start **************************************...

Read more...

Changed in charms:
status: New → Incomplete
Xiaoming Wang (xwang2713) wrote :
Download full text (9.2 KiB)

Thanks for the review.

For charm repository directory structure I will fix.

For the error when install/start HPCC, It could be system resource related.
We tested hpcc charm on local, amazon, mass/virtualbox. Only see similar
error when deploy multiple hpcc instance on local provide on some systems.
If this happens when deploy first instance please provide
/var/log/HPCCSystems/myroxie/roxie.log. Last a few lines to show the error
is fine. If deploy/add multiple hpcc instances on local provider roxie
process start error most likely due to error reported on
http://askubuntu.com/questions/404969/error-net-core-wmem-default-is-an-unknown-key-on-juju-local-provider

We have two systems (ubuntu 12.04 and 13.10. Both has virtubox installed)
have this problem.
Another problem we have is deploy hpcc charm on Windows to amazon: hooks
scripts are deployed outside hooks directory.

For checksum, We have a team to maintains our HPCCSystem products portal
download site: http://hpccsystems.com. Currently only md5sum are provided.
Our products are built for each supported Ubuntu and CentOS distributions.
For example, Ubuntu 12.04 amd64, Ubuntu 13.10 amd64, etc. I did copy the
md5sum of Ubuntu 12.04 to config.yaml. But for Ubuntu 13.10 user need copy
the dm5sum from our download site to replace the value in config.yaml. I
can create sha1sum for current Ubuntu 12.04 HPCC package (4.2.0-4) but we
do update product frequently (every one or two months) so to allow user use
latest HPCC it'd better keep current dm4sum method. I will request our
portal team to use sha1sum in future release and provide sha1sum files so
we don't need hard-code any checksum string in config.yaml.

I do allow empty checksum string to skip the validation. I can enforce the
check.

Thanks

On Wed, Feb 19, 2014 at 2:26 PM, Matt Bruzek
<email address hidden>wrote:

> Hello Xiaoming,
>
> Thanks for taking the time to continually improve this charm! I took a
> look at the hpcc charm this time and here is my review.
>
> You do a really good job at handling the keys in the install hook I
> particularly like how you turned off BASH echo (start +x) when you were
> handling the keys! That is a very nice touch!
>
> I also like the way you added the checksum verification in the
> configuration file. That is a very configurable solution. There are a
> few concerns with how it was implemented:
>
> If the package-checksum is not set or of zero size the install will not
> do cryptographic verification. The install hook should fail if there
> was no checksum set. Otherwise someone could unset the value and no
> cryptographic verification would be done on the downloaded file.
>
> Also I would suggest using sha1sum rather than md5sum. The underlying
> MD5 algorithm is no longer deemed secure so I would highly suggest
> sha1sum.
>
> The hpcc charm did not deploy correctly. The start hook failed with an
> exit status of 1. I looked in the logs and found that some services FAILED.
> Here are the logs from my system that show the error:
> unit-hpcc-0: 2014-02-19 18:32:08 INFO install + set +x
> unit-hpcc-0: 2014-02-19 18:32:09 INFO install + which open-port
> unit-hpcc-0: 2014-02-19 18:...

Read more...

Xiaoming Wang (xwang2713) wrote :
Download full text (8.1 KiB)

Hi Mathew,

Thanks for the review and suggestion.

We try to figure out the problem of starting HPCC.
Looks roxie procures fails to start. We only see this in local provider
with multiple hpcc instance in the cluster.
It is due to some network resource setting unavailable on some local
provider environment.
For single instance it shouldn't happen.

Since we can reproduce it so far (will continue try) could you let us know
your environment, particularly the constraints.
Also it will be great if you can give us the
/var/log/HPCCSystems/myroxie/roxie.log on the instance with error.

Thanks

On Wed, Feb 19, 2014 at 2:26 PM, Matt Bruzek
<email address hidden>wrote:

> Hello Xiaoming,
>
> Thanks for taking the time to continually improve this charm! I took a
> look at the hpcc charm this time and here is my review.
>
> You do a really good job at handling the keys in the install hook I
> particularly like how you turned off BASH echo (start +x) when you were
> handling the keys! That is a very nice touch!
>
> I also like the way you added the checksum verification in the
> configuration file. That is a very configurable solution. There are a
> few concerns with how it was implemented:
>
> If the package-checksum is not set or of zero size the install will not
> do cryptographic verification. The install hook should fail if there
> was no checksum set. Otherwise someone could unset the value and no
> cryptographic verification would be done on the downloaded file.
>
> Also I would suggest using sha1sum rather than md5sum. The underlying
> MD5 algorithm is no longer deemed secure so I would highly suggest
> sha1sum.
>
> The hpcc charm did not deploy correctly. The start hook failed with an
> exit status of 1. I looked in the logs and found that some services FAILED.
> Here are the logs from my system that show the error:
> unit-hpcc-0: 2014-02-19 18:32:08 INFO install + set +x
> unit-hpcc-0: 2014-02-19 18:32:09 INFO install + which open-port
> unit-hpcc-0: 2014-02-19 18:32:09 INFO install
> /var/lib/juju/tools/unit-hpcc-0/open-port
> unit-hpcc-0: 2014-02-19 18:32:09 INFO install + '[' 0 -eq 0 ']'
> unit-hpcc-0: 2014-02-19 18:32:09 INFO install + open-port 8010/TCP
> unit-hpcc-0: 2014-02-19 18:32:09 INFO install + open-port 8002/TCP
> unit-hpcc-0: 2014-02-19 18:32:10 INFO install + open-port 8015/TCP
> unit-hpcc-0: 2014-02-19 18:32:10 INFO install + open-port 9876/TCP
> unit-hpcc-0: 2014-02-19 18:32:10 INFO install +
> JUJU_HPCC_DIR=/var/lib/HPCCSystems/charm
> unit-hpcc-0: 2014-02-19 18:32:10 INFO install + '[' '!' -e
> /var/lib/HPCCSystems/charm ']'
> unit-hpcc-0: 2014-02-19 18:32:10 INFO install + mkdir -p
> /var/lib/HPCCSystems/charm
> unit-hpcc-0: 2014-02-19 18:32:10 INFO install + chmod -R 777
> /var/lib/HPCCSystems/charm
> unit-hpcc-0: 2014-02-19 18:32:15 INFO juju-log envgen_signature:
> unit-hpcc-0: 2014-02-19 18:32:17 INFO start Starting mydafilesrv....
> #033[32m [ OK ] #033[0m
> unit-hpcc-0: 2014-02-19 18:32:18 INFO start Starting mydali....
> #033[32m [ OK ] #033[0m
> unit-hpcc-0: 2014-02-19 18:32:19 INFO start Starting mydfuserver....
> #033[32m [ OK ] #033[0m
> unit-hpcc-0: 2014-02-19 18:32:20 INFO start Sta...

Read more...

Xiaoming Wang (xwang2713) wrote :

HPCC charm is re-submitted after 4th review with following update:

1) remove extra precise/hpcc directories in repository.

2) enforce checksum validation before installing HPCC package. If checksum not
   supplied message will be logged with juju-log (though log as INFO) and exit 1
   We will defer sha1sum implementation due to the time limit.

3) Add check status after starting HPCC. It is neccessory even start run
   successfully (return code 0) becasue some processes may exit unexpectedly.
   A variable "start-check-wait" in seconds is introduced in config.yaml to delay
   the checking.
   If error found messsage logged to tell user where to find the detail about
   the error: /var/log/HPCCSystems/<component>/

4) Update README. It adds a link to HPCC Hardware requirement. HPCC require
   4GB memory. With less memory some processes may not start successfully or
   even started may not function correctly.

Thanks

Changed in charms:
status: Incomplete → New
Matt Bruzek (mbruzek) wrote :

I sent Xiaoming an email about the problem I was seeing and included the log files. I wanted to include the content of the email here incase it is useful.

Thu, Feb 20, 2014 at 9:29 AM

Hi Xiaoming,

I found the hook error when running the hpcc charm in a HP public cloud environment (which is OpenStack). I included all the log files from the roxie directory and the juju log for the hpcc unit in the included file. I deployed one of the hpcc charms (from the bzr branch) and I did not set any configuration values, so the default values were used.

I have now deployed the hpcc charm on my local LXC environment (in the same way) and the deploy was successful. I attached the same log files so you can compare the log files and hopefully see what is going wrong on the HP cloud environment.

Let me know if you need any more information.

- Matt

Matt Bruzek (mbruzek) wrote :

Here is the lxc log files that were included in the email.

Xiaoming Wang (xwang2713) wrote :

Hi Matt,

Thanks for the feedback.
The problem is the instance has no enough memory. It is the same as last
time reported in the roxie log:

00000003 2014-02-20 15:10:14.216 24371 24371 "RoxieMemMgr: posix_memalign
(alignment=1048576, size=1073741824) failed - ret=12 (ENOMEM There was
insufficient memory to fulfill the allocation request.)"

We add hardware requirement link in the README and also I mentioned in bug
submit's comment

Our HPCC system requirement recommend 4GB. In my test environment sometime
I use 2GB in virtual box instance for some minimum check.

So in your HP-Cloud environment more memory need to be assigned to the
instance.
I have OpenStack Cloud. I always run HPCC with 4+GB memory instance

Let me know if add memory there is still a problem.

On Mon, Feb 24, 2014 at 4:17 PM, Matt Bruzek
<email address hidden>wrote:

> I sent Xiaoming an email about the problem I was seeing and included the
> log files. I wanted to include the content of the email here incase it
> is useful.
>
> Thu, Feb 20, 2014 at 9:29 AM
>
> Hi Xiaoming,
>
> I found the hook error when running the hpcc charm in a HP public cloud
> environment (which is OpenStack). I included all the log files from the
> roxie directory and the juju log for the hpcc unit in the included file.
> I deployed one of the hpcc charms (from the bzr branch) and I did not
> set any configuration values, so the default values were used.
>
> I have now deployed the hpcc charm on my local LXC environment (in the
> same way) and the deploy was successful. I attached the same log files
> so you can compare the log files and hopefully see what is going wrong
> on the HP cloud environment.
>
> Let me know if you need any more information.
>
> - Matt
>
> ** Attachment added: "hp-cloud log files with the Roxie error"
>
> https://bugs.launchpad.net/charms/+bug/1272083/+attachment/3994956/+files/hp-cloud_log_files.tar.gz
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1272083
>
> Title:
> HPCC Charm initial check-in for review
>
> Status in Juju Charms:
> New
>
> Bug description:
> A new Juju Charm - hpcc is submitted for review.
> The README.md has the information for how to use the charm.
>
> For charm code:
> config.yaml: there are parameters to control which HPCCSystems use
> can install. There are other parameters for how to configure
> the cluster.
> bin/ : There are some help scripts we put here mainly for
> re-configure HPCC cluster. Default configuration may not meet user need.
> We understand it probably not recommend way for juju
> charm. But we want to give user convenient way to access the tools.
> We are open for discuss these during the review process.
> icon.svg: We haven't gotten which icon we should use. It is in our
> internal process.
>
> Currently HPCC charm doesn't have related to any other charms.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charms/+bug/1272083/+subscriptions
>

Xiaoming Wang (xwang2713) wrote :

submit fix for missing ssh keys with single instance deployment.
updated file: hooks/start

Matt Bruzek (mbruzek) wrote :

Xiaoming,

Thank you for addressing the previous comments. The new file structure looks correct. The checksum validation code looks correct but the config.yaml still indicates that users can set an empty string to skip validation.

The charm continues to pass charm proof there are no errors on structure! The README.md converts correctly to HTML format and is easy to read.

A single hpcc charm deployed correctly in the LXC environment, but when trying to add units, the hpcc-cluster-relation-changed hook failed. There is more information about hook errors here:
https://juju.ubuntu.com/docs/authors-hook-errors.html

This error actually made the Ecl Watch page stop responding on port 8010.

Use set -e for all bash scripts:
It is highly recommended that BASH hooks use the “set -e” option, that means if any command returns false (non-zero) the script will stop and raise an error. This is important so Juju can work out if the script is running properly.

Fault tolerance:
The hpcc-cluster-relation-departed and hpcc-cluster-relation-changed hooks use LOCAL_IP_FILE and TMP_FILE without checking for existence first. Since the hook does not use “set -e” the hook will not stop on the error of the head command if the file does not exist. For instance if the save_local_ip function somehow failed that would lead to an error on the head command. Hooks can be run multiple times, so being fault tolerant and idempotent is crucial.

Remove unused hooks:
If the hpcc charm does not support the upgrade-charm or hpcc-relation-changed hook you can safely remove the empty files. Juju attempts to call hooks and if a hook does not exist, all is well and nothing is called. If the features improve, those hooks can be added in the future.

Hook failure:
Since the hook failed in error, the hook could be debugged according to this page: https://juju.ubuntu.com/docs/authors-hook-debug.html

juju debug-hooks hpcc/0
Then I added “set +x” at the top of the hooks/hpcc-cluster-relation-changed file to print out every command.
In a different terminal:
juju resolved –retry hpcc/0

This re-ran the hook and the output is attached to this bug report. Take a look at the commands and see if you can fix the error.

Thanks again for the continued work on this charm. Please contact us if you have additional questions. We are on freenode.net in the #juju channel, or email us at <email address hidden>.

The bug has been moved to Incomplete at this time. When the code is ready for another review move the bug status to New or “Fix Committed” to have it added back in the queue for review.

Changed in charms:
status: New → Incomplete
Xiaoming Wang (xwang2713) wrote :
Download full text (4.4 KiB)

Hi Matt,

Could you give one of the juju unit log under /var/log/juju/ directory?
 And file list of /proc/sys/net/core on any failed hpcc instance.

I will add set -e to hook scripts and other suggestions.

Thanks

On Wed, Mar 5, 2014 at 11:52 AM, Matt Bruzek
<email address hidden>wrote:

> Xiaoming,
>
> Thank you for addressing the previous comments. The new file structure
> looks correct. The checksum validation code looks correct but the
> config.yaml still indicates that users can set an empty string to skip
> validation.
>
> The charm continues to pass charm proof there are no errors on
> structure! The README.md converts correctly to HTML format and is easy
> to read.
>
> A single hpcc charm deployed correctly in the LXC environment, but when
> trying to add units, the hpcc-cluster-relation-changed hook failed. There
> is more information about hook errors here:
> https://juju.ubuntu.com/docs/authors-hook-errors.html
>
> This error actually made the Ecl Watch page stop responding on port
> 8010.
>
> Use set -e for all bash scripts:
> It is highly recommended that BASH hooks use the "set -e" option, that
> means if any command returns false (non-zero) the script will stop and
> raise an error. This is important so Juju can work out if the script is
> running properly.
>
> Fault tolerance:
> The hpcc-cluster-relation-departed and hpcc-cluster-relation-changed hooks
> use LOCAL_IP_FILE and TMP_FILE without checking for existence first. Since
> the hook does not use "set -e" the hook will not stop on the error of the
> head command if the file does not exist. For instance if the save_local_ip
> function somehow failed that would lead to an error on the head command.
> Hooks can be run multiple times, so being fault tolerant and idempotent is
> crucial.
>
> Remove unused hooks:
> If the hpcc charm does not support the upgrade-charm or
> hpcc-relation-changed hook you can safely remove the empty files. Juju
> attempts to call hooks and if a hook does not exist, all is well and
> nothing is called. If the features improve, those hooks can be added in
> the future.
>
> Hook failure:
> Since the hook failed in error, the hook could be debugged according to
> this page: https://juju.ubuntu.com/docs/authors-hook-debug.html
>
> juju debug-hooks hpcc/0
> Then I added "set +x" at the top of the
> hooks/hpcc-cluster-relation-changed file to print out every command.
> In a different terminal:
> juju resolved -retry hpcc/0
>
> This re-ran the hook and the output is attached to this bug report.
> Take a look at the commands and see if you can fix the error.
>
> Thanks again for the continued work on this charm. Please contact us if
> you have additional questions. We are on freenode.net in the #juju
> channel, or email us at <email address hidden>.
>
> The bug has been moved to Incomplete at this time. When the code is
> ready for another review move the bug status to New or "Fix Committed"
> to have it added back in the queue for review.
>
>
> ** Changed in: charms
> Status: New => Incomplete
>
> ** Attachment added: "The output of the failed hook when it was re-run."
>
> https://bugs.launchpad.net/charms/+bug/1272083/+attac...

Read more...

Matt Bruzek (mbruzek) wrote :

I have attached the juju status log where I could see cluster1/0, cluster1/1 and cluster 1/3 with a hook in error state. Please note cluster1/2 did not appear to have a failed hook, but I included the core files for comparison.

Matt Bruzek (mbruzek) wrote :

I have attached the juju status log where I could see cluster1/0, cluster1/1 and cluster 1/3 with a hook in error state. Please note cluster1/2 did not appear to have a failed hook, but I included the core files for comparison.

Xiaoming Wang (xwang2713) wrote :

Hi Matt,

Thanks for the log. Could you give the output of /proc/sys/net/core/ on
unit 1 or 3 which roxie process failed.
Also the logs under /var/log/HPCCSystems/myroxie.

I get similar error in two of my LXC environment which is due a problem
with juju-local or some kind of system configuration with juju-local
packages. All my other team members don't have this problem.
I am still investigating the reason. I doubt it may related to my virtual
box settings.

On Wed, Mar 5, 2014 at 12:45 PM, Matt Bruzek
<email address hidden>wrote:

> I have attached the juju status log where I could see cluster1/0,
> cluster1/1 and cluster 1/3 with a hook in error state. Please note
> cluster1/2 did not appear to have a failed hook, but I included the core
> files for comparison.
>
>
> ** Attachment added: "Log files from the hpcc units in error."
>
> https://bugs.launchpad.net/charms/+bug/1272083/+attachment/4008636/+files/hpcc_logs.tar.gz
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1272083
>
> Title:
> HPCC Charm initial check-in for review
>
> Status in Juju Charms:
> Incomplete
>
> Bug description:
> A new Juju Charm - hpcc is submitted for review.
> The README.md has the information for how to use the charm.
>
> For charm code:
> config.yaml: there are parameters to control which HPCCSystems use
> can install. There are other parameters for how to configure
> the cluster.
> bin/ : There are some help scripts we put here mainly for
> re-configure HPCC cluster. Default configuration may not meet user need.
> We understand it probably not recommend way for juju
> charm. But we want to give user convenient way to access the tools.
> We are open for discuss these during the review process.
> icon.svg: We haven't gotten which icon we should use. It is in our
> internal process.
>
> Currently HPCC charm doesn't have related to any other charms.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charms/+bug/1272083/+subscriptions
>

Xiaoming Wang (xwang2713) wrote :

Hi Matt,

I think about "sed -e". It will cause shell script terminate if any
statement return non-zero value.
I understand it has more clean code to do that way. But given the time
limit (I heard Match 7 is the deadline for this charm to complete) I 'd
rather defer adding "see -e". We do have some code parsing the return code
(grep, etc). It will require more testing if I make the change.

We prefer a minimum code change at this stage of game unless it is must-fix
issue.
For example hpcc start fails (roxie fails to start in your log).

Let me know. I include my manager, Ort Stuart, in our discussion.

Thanks

On Wed, Mar 5, 2014 at 1:07 PM, Xiaoming Wang <email address hidden> wrote:

> Hi Matt,
>
> Thanks for the log. Could you give the output of /proc/sys/net/core/ on
> unit 1 or 3 which roxie process failed.
> Also the logs under /var/log/HPCCSystems/myroxie.
>
> I get similar error in two of my LXC environment which is due a problem
> with juju-local or some kind of system configuration with juju-local
> packages. All my other team members don't have this problem.
> I am still investigating the reason. I doubt it may related to my virtual
> box settings.
>
>
> On Wed, Mar 5, 2014 at 12:45 PM, Matt Bruzek <<email address hidden>
> > wrote:
>
>> I have attached the juju status log where I could see cluster1/0,
>> cluster1/1 and cluster 1/3 with a hook in error state. Please note
>> cluster1/2 did not appear to have a failed hook, but I included the core
>> files for comparison.
>>
>>
>> ** Attachment added: "Log files from the hpcc units in error."
>>
>> https://bugs.launchpad.net/charms/+bug/1272083/+attachment/4008636/+files/hpcc_logs.tar.gz
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1272083
>>
>> Title:
>> HPCC Charm initial check-in for review
>>
>> Status in Juju Charms:
>> Incomplete
>>
>> Bug description:
>> A new Juju Charm - hpcc is submitted for review.
>> The README.md has the information for how to use the charm.
>>
>> For charm code:
>> config.yaml: there are parameters to control which HPCCSystems use
>> can install. There are other parameters for how to configure
>> the cluster.
>> bin/ : There are some help scripts we put here mainly for
>> re-configure HPCC cluster. Default configuration may not meet user need.
>> We understand it probably not recommend way for juju
>> charm. But we want to give user convenient way to access the tools.
>> We are open for discuss these during the review process.
>> icon.svg: We haven't gotten which icon we should use. It is in our
>> internal process.
>>
>> Currently HPCC charm doesn't have related to any other charms.
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/charms/+bug/1272083/+subscriptions
>>
>
>

Xiaoming Wang (xwang2713) wrote :
Download full text (3.7 KiB)

Hi Matt,

Here are two typical cases which roxie start failed:
1) Not enough system memory. We recommend 4GB. With 2GB it may be OK to
start it and do some basic functions
2) network resource with juju-local
    For some environment juju-local doesn't start linux instance with
correct network resource.
    There are lots things missing under /proc/sys/net/core/
    /proc/sys/net/core/rmem_max is missing which result roxie process fails
to start:
   "/proc/sys/net/core/rmem_max value 0 is less than 131071"
  "EXCEPTION: (1455): System socket max read buffer is less than 131071

I will check your case when having roxie.log and file list under
/proc/sys/net/core.

Thanks

On Wed, Mar 5, 2014 at 1:29 PM, Xiaoming Wang <email address hidden> wrote:

> Hi Matt,
>
> I think about "sed -e". It will cause shell script terminate if any
> statement return non-zero value.
> I understand it has more clean code to do that way. But given the time
> limit (I heard Match 7 is the deadline for this charm to complete) I 'd
> rather defer adding "see -e". We do have some code parsing the return code
> (grep, etc). It will require more testing if I make the change.
>
> We prefer a minimum code change at this stage of game unless it is
> must-fix issue.
> For example hpcc start fails (roxie fails to start in your log).
>
>
> Let me know. I include my manager, Ort Stuart, in our discussion.
>
> Thanks
>
>
>
>
> On Wed, Mar 5, 2014 at 1:07 PM, Xiaoming Wang <email address hidden> wrote:
>
>> Hi Matt,
>>
>> Thanks for the log. Could you give the output of /proc/sys/net/core/ on
>> unit 1 or 3 which roxie process failed.
>> Also the logs under /var/log/HPCCSystems/myroxie.
>>
>> I get similar error in two of my LXC environment which is due a problem
>> with juju-local or some kind of system configuration with juju-local
>> packages. All my other team members don't have this problem.
>> I am still investigating the reason. I doubt it may related to my
>> virtual box settings.
>>
>>
>> On Wed, Mar 5, 2014 at 12:45 PM, Matt Bruzek <
>> <email address hidden>> wrote:
>>
>>> I have attached the juju status log where I could see cluster1/0,
>>> cluster1/1 and cluster 1/3 with a hook in error state. Please note
>>> cluster1/2 did not appear to have a failed hook, but I included the core
>>> files for comparison.
>>>
>>>
>>> ** Attachment added: "Log files from the hpcc units in error."
>>>
>>> https://bugs.launchpad.net/charms/+bug/1272083/+attachment/4008636/+files/hpcc_logs.tar.gz
>>>
>>> --
>>> You received this bug notification because you are subscribed to the bug
>>> report.
>>> https://bugs.launchpad.net/bugs/1272083
>>>
>>> Title:
>>> HPCC Charm initial check-in for review
>>>
>>> Status in Juju Charms:
>>> Incomplete
>>>
>>> Bug description:
>>> A new Juju Charm - hpcc is submitted for review.
>>> The README.md has the information for how to use the charm.
>>>
>>> For charm code:
>>> config.yaml: there are parameters to control which HPCCSystems use
>>> can install. There are other parameters for how to configure
>>> the cluster.
>>> bin/ : There are some help scripts we put here mainly for
>>> re-configur...

Read more...

Matt Bruzek (mbruzek) wrote :

Xiaoming,

The last tar file included the file listing from /proc/sys/net/core/ for each system including cluster1/2 which was not in error state.

Here is cluster1/0 output:
$ ls /proc/sys/net/core/
somaxconn xfrm_acq_expires xfrm_aevent_etime xfrm_aevent_rseqth xfrm_larval_drop

I have attached a new tar file that has the logs from the /var/log/HPCCSystems/myroxie on each system.

It is important that the scripts end on the first error (set -e) so that the script does not continue in an error state. The error of one command could cause problems for following commands. For instance if the script was unable to write the LOCAL_IP_FILE and did not halt execution, a different hook could fail to read that file later and cause different problems.

Xiaoming Wang (xwang2713) wrote :

the /proc/sys/net/core show the same problem on my system.
It miss the network resource settings.
Maybe you can help me open a bug against juju-local
Here is discussion thread on Ubuntu one (Actually you help me correct the
spelling and formatting. Thanks).

http://askubuntu.com/questions/404969/error-net-core-wmem-default-is-an-unknown-key-on-juju-local-provider

I will try to see how much efforts I need to replace "set -e" and necessary
hook code changes.

Thanks

On Wed, Mar 5, 2014 at 2:34 PM, Matt Bruzek <email address hidden>wrote:

> Xiaoming,
>
> The last tar file included the file listing from /proc/sys/net/core/ for
> each system including cluster1/2 which was not in error state.
>
> Here is cluster1/0 output:
> $ ls /proc/sys/net/core/
> somaxconn xfrm_acq_expires xfrm_aevent_etime xfrm_aevent_rseqth
> xfrm_larval_drop
>
> I have attached a new tar file that has the logs from the
> /var/log/HPCCSystems/myroxie on each system.
>
> It is important that the scripts end on the first error (set -e) so that
> the script does not continue in an error state. The error of one
> command could cause problems for following commands. For instance if
> the script was unable to write the LOCAL_IP_FILE and did not halt
> execution, a different hook could fail to read that file later and cause
> different problems.
>
>
>
> ** Attachment added: "Log files from the hpcc units myroxie directories."
>
> https://bugs.launchpad.net/charms/+bug/1272083/+attachment/4008744/+files/hpcc_logs.tar.gz
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1272083
>
> Title:
> HPCC Charm initial check-in for review
>
> Status in Juju Charms:
> Incomplete
>
> Bug description:
> A new Juju Charm - hpcc is submitted for review.
> The README.md has the information for how to use the charm.
>
> For charm code:
> config.yaml: there are parameters to control which HPCCSystems use
> can install. There are other parameters for how to configure
> the cluster.
> bin/ : There are some help scripts we put here mainly for
> re-configure HPCC cluster. Default configuration may not meet user need.
> We understand it probably not recommend way for juju
> charm. But we want to give user convenient way to access the tools.
> We are open for discuss these during the review process.
> icon.svg: We haven't gotten which icon we should use. It is in our
> internal process.
>
> Currently HPCC charm doesn't have related to any other charms.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charms/+bug/1272083/+subscriptions
>

Xiaoming Wang (xwang2713) wrote :
Download full text (3.2 KiB)

Hi Matt,

I add "-e to at the top file bash: #!/bin/bash -e" for all shell scripts
under hooks.
I will test it and talk with my manager before I check-in today.

For unused hook hpcc-relation-changed, if I remove it 'juju charm proof'
will report an info massage: "I: relation hpcc has no hooks". And I must
keep 'hpcc' in "provider" field in metadata.yaml otherwise will get a
warning.

Thanks

On Wed, Mar 5, 2014 at 2:55 PM, Xiaoming Wang <email address hidden> wrote:

> the /proc/sys/net/core show the same problem on my system.
> It miss the network resource settings.
> Maybe you can help me open a bug against juju-local
> Here is discussion thread on Ubuntu one (Actually you help me correct the
> spelling and formatting. Thanks).
>
>
> http://askubuntu.com/questions/404969/error-net-core-wmem-default-is-an-unknown-key-on-juju-local-provider
>
>
> I will try to see how much efforts I need to replace "set -e" and
> necessary hook code changes.
>
>
> Thanks
>
>
> On Wed, Mar 5, 2014 at 2:34 PM, Matt Bruzek <email address hidden>wrote:
>
>> Xiaoming,
>>
>> The last tar file included the file listing from /proc/sys/net/core/ for
>> each system including cluster1/2 which was not in error state.
>>
>> Here is cluster1/0 output:
>> $ ls /proc/sys/net/core/
>> somaxconn xfrm_acq_expires xfrm_aevent_etime xfrm_aevent_rseqth
>> xfrm_larval_drop
>>
>> I have attached a new tar file that has the logs from the
>> /var/log/HPCCSystems/myroxie on each system.
>>
>> It is important that the scripts end on the first error (set -e) so that
>> the script does not continue in an error state. The error of one
>> command could cause problems for following commands. For instance if
>> the script was unable to write the LOCAL_IP_FILE and did not halt
>> execution, a different hook could fail to read that file later and cause
>> different problems.
>>
>>
>>
>> ** Attachment added: "Log files from the hpcc units myroxie directories."
>>
>> https://bugs.launchpad.net/charms/+bug/1272083/+attachment/4008744/+files/hpcc_logs.tar.gz
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1272083
>>
>> Title:
>> HPCC Charm initial check-in for review
>>
>> Status in Juju Charms:
>> Incomplete
>>
>> Bug description:
>> A new Juju Charm - hpcc is submitted for review.
>> The README.md has the information for how to use the charm.
>>
>> For charm code:
>> config.yaml: there are parameters to control which HPCCSystems use
>> can install. There are other parameters for how to configure
>> the cluster.
>> bin/ : There are some help scripts we put here mainly for
>> re-configure HPCC cluster. Default configuration may not meet user need.
>> We understand it probably not recommend way for juju
>> charm. But we want to give user convenient way to access the tools.
>> We are open for discuss these during the review process.
>> icon.svg: We haven't gotten which icon we should use. It is in our
>> internal process.
>>
>> Currently HPCC charm doesn't have related to any other charms.
>>
>> To manage notific...

Read more...

Matt Bruzek (mbruzek) wrote :

There is no juju-local project. As a launchpad user you can open a bug against juju-core on this issue.

https://bugs.launchpad.net/juju-core

Xiaoming Wang (xwang2713) wrote :

I will do. Thanks

On Wed, Mar 5, 2014 at 3:43 PM, Matt Bruzek <email address hidden>wrote:

> There is no juju-local project. As a launchpad user you can open a bug
> against juju-core on this issue.
>
> https://bugs.launchpad.net/juju-core
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1272083
>
> Title:
> HPCC Charm initial check-in for review
>
> Status in Juju Charms:
> Incomplete
>
> Bug description:
> A new Juju Charm - hpcc is submitted for review.
> The README.md has the information for how to use the charm.
>
> For charm code:
> config.yaml: there are parameters to control which HPCCSystems use
> can install. There are other parameters for how to configure
> the cluster.
> bin/ : There are some help scripts we put here mainly for
> re-configure HPCC cluster. Default configuration may not meet user need.
> We understand it probably not recommend way for juju
> charm. But we want to give user convenient way to access the tools.
> We are open for discuss these during the review process.
> icon.svg: We haven't gotten which icon we should use. It is in our
> internal process.
>
> Currently HPCC charm doesn't have related to any other charms.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charms/+bug/1272083/+subscriptions
>

Xiaoming Wang (xwang2713) wrote :

Again, thanks for taking time to review HPCC Charm. Based on the 5th review we
made following changes:
1) add '-e' bash option for all the hook scripts. So if any statement return non-zero
    value the hook script will stop execution.
2) Remove unused charm upgrade and hpcc-relation-changed hooks. The later will
    result "INFO" level message from 'juju charm proof' as expected.
3) For reported HPCC roxie process fails to start, it is due to lacking of network resource
    (/proc/sys/net/core/rmem_max. It is only happens on some Juju local environment. We
    will open a bug against juju-core.

Changed in charms:
status: Incomplete → New
Xiaoming Wang (xwang2713) wrote :

During test we found config-get return string has newlines replaced with spaces. This prevent us using ssh private key defined through config.yaml. I opened bug report at launchpad project juju-core: bug# 1288960

Here is a workaround for HPCC side: add newlines after retrive ssh private value from config-get

Marco Ceppi (marcoceppi) on 2014-03-25
Changed in charms:
status: New → Incomplete
Xiaoming Wang (xwang2713) wrote :

Hi Marco,

I received a note about hpcc charm (launchpad bug
1272083
<https://bugs.launchpad.net/bugs/1272083>)
status changed from "new" to "incomplete".
But I didn't any review feedback and comment.

Let me know if anything I need to do.

Thanks

On Tue, Mar 25, 2014 at 6:56 AM, Marco Ceppi <email address hidden> wrote:

> ** Changed in: charms
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1272083
>
> Title:
> HPCC Charm initial check-in for review
>
> Status in Juju Charms:
> Incomplete
>
> Bug description:
> A new Juju Charm - hpcc is submitted for review.
> The README.md has the information for how to use the charm.
>
> For charm code:
> config.yaml: there are parameters to control which HPCCSystems use
> can install. There are other parameters for how to configure
> the cluster.
> bin/ : There are some help scripts we put here mainly for
> re-configure HPCC cluster. Default configuration may not meet user need.
> We understand it probably not recommend way for juju
> charm. But we want to give user convenient way to access the tools.
> We are open for discuss these during the review process.
> icon.svg: We haven't gotten which icon we should use. It is in our
> internal process.
>
> Currently HPCC charm doesn't have related to any other charms.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charms/+bug/1272083/+subscriptions
>

Xiaoming Wang (xwang2713) wrote :

Update hpcc charm :
1) Add support for HPCC 5.0.0
2) Enable HPCC reconfiguration after a hpcc node departure
3) Some minor fixes, for example dependencies directory name
4) add support for Ubuntu trusty. repo: lp:~xwang2713/charms/trusty/hpcc/trunk. It has the same code base except HPCC checksum for Ubuntu trusty.

Jorge Castro (jorge) on 2014-08-20
Changed in charms:
status: Incomplete → Fix Committed
Matt Bruzek (mbruzek) wrote :
Download full text (3.7 KiB)

Hello Xiaoming,

Thanks again for the submission of the HPCC charm. I am sorry for the delay since the last review. Our review queue is quite large and we are getting to each one as fast as we can.

I was able to get the HPCC charm to deploy on a 4GB machine on HP-cloud (OpenStack). I increased the memory for the machine, by calling: “juju set-constraints mem=4GB” after a bootstrap was complete. The HPCC charm deployed and looked to be working properly!

Please note that the hpcc charm is currently available in the charm store under your personal name space at: http://manage.jujucharms.com/~xwang2713/precise/hpcc

Anyone can deploy this charm from the Charm Store using the command:
juju deploy cs:~xwang2713/precise/hpcc

Charms must pass review to get in the official Charm Store, and your submission is very close to passing. My review turned up just a few more things that need addressing.

#Proof

There was one informational message when I ran the “charm proof” command:
I: relation hpcc has no hooks

Informational message are OK, and I see in the past review comments that you wish to have future hpcc relations.

#Review

Now that the hpcc charm deploys I tried to change some configuration options and they did not change on the deployed charm.

The hpcc charm has 14 configuration values, some of which are immutable (can not be changed after the hpcc charm is installed/created). This breaks the Juju user experience. The Juju user expects that when they set a configuration value the charm handles it accordingly. Juju calls the “config-changed” hook when the user changes a value, and some of the configuration options are not processed by the config-changed hook.

Say for example the user wants to change the hpcc-version to a new version of the code after hpcc has already been installed, only the “config-changed” hook is called and hpcc version is not changed, but the user would not know the version has not changed and expect to be using the new version. Another example is the public or private keys, say the user wants to use new keys with the hpcc cluser and calls “juju set hpcc ssh-key-private=<private_key>” after deployment. The keys are only set in the install hook, or the hpcc-cluster-relation-joined hook and would not get updated.

Immutable configuration should only be used to prevent data loss, or in other important cases. I would highly recommend refactoring the code to process all the configuration options in the config-changed hook or in the hpcc-common file that is called from config-changed.

#Minor Issues.

The README.md file references a different version 4.3.0-4 of HPCC than what is in the config.yaml (version 5.0.0-3).

Some of the README formatting looks to be off when I converted markdown to HTML. In the General Usage section “juju status” is not indented properly because there needs to be an additional new line after “run”. The backquotes (`) are not needed when you properly indent the block 4 spaces.

I was not able to access the GUI after running the first two commands in General Usage. Add the command “juju expose” to the first group of commands before giving instruction to use the GUI.

Thanks agai...

Read more...

Changed in charms:
status: Fix Committed → Incomplete
Xiaoming Wang (xwang2713) wrote :

New fixes are submitted to lp:~xwang2713/precise/hpcc/turnk and lp:~xwang2713/trusty/hpcc/turnk
The only difference between these two are the config.yaml and README.md

Here are the commit message:

Re-implement to make configuration not immutable

1. Simplify config.yaml to have 3 set of configuration:
   a. HPCC package version and checksum
   b. SSH Keys
   c. HPCC components configratation (thor/roxie)

2. Re-implement config-changed hook
   a. Update HPCC package. HPCC has related short release cycle.
      Usually every 2-3 months there is a new/update release.
      User can update HPCC package by:
         juju set <hpcc service> hpcc-version=<new version> package-checksum=<checksum>

   b. Update SSH Keys
         juju set <hpcc service> ssh-key-public=<new key> ssh-key-private=<new key>

   c. Update HPCC cluster topology:
         juju set <hpcc service> thor-ratio=<new ratio> roxie-ratio=<new ratio>
      The ratio is based on compute nodes which is <number of unit node> minus
      <support node>.

3. Update README.md. Particularly add 4GB memory requirement.

4. Make the scripts under hpcc/bin optinal. There is a readme which tell the purpose
   of these scripts

5. We still support juju local provider
   But due to the problem reported in launchpad bug 1288969 it can run multiple roxie
   nodes in lxc environment. If users want launch multiple nodes they need set
   roxie-ratio to 0 in config.yaml.

Changed in charms:
status: Incomplete → New
Xiaoming Wang (xwang2713) wrote :

I read automated test log. it is bootstrap failed. Don't think it is anything to do with this charm (hpcc). If anything I need to fix or do let me know.

Thanks

Review Queue (review-queue) wrote :

The results (PASS) are in and available here: http://reports.vapour.ws/charm-tests/charm-bundle-test-1209-results

Xiaoming,

Thanks for the update!

I attempted to deploy hpcc and got another failure. This one looked to be
related to the sha1sum not matching. I tried to set the right version of
sha1sum from juju and redeploy the charm and it did not work for me. Can
you please try a deploy on your end? Update the sha1sum and anything else.

Thank you,

   - Matt Bruzek <email address hidden>

On Wed, Oct 8, 2014 at 9:03 PM, Xiaoming Wang <email address hidden> wrote:

> I read automated test log. it is bootstrap failed. Don't think it is
> anything to do with this charm (hpcc). If anything I need to fix or do
> let me know.
>
> Thanks
>
> --
> You received this bug notification because you are a member of charmers,
> which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1272083
>
> Title:
> HPCC Charm initial check-in for review
>
> Status in Juju Charms:
> New
>
> Bug description:
> A new Juju Charm - hpcc is submitted for review.
> The README.md has the information for how to use the charm.
>
> For charm code:
> config.yaml: there are parameters to control which HPCCSystems use
> can install. There are other parameters for how to configure
> the cluster.
> bin/ : There are some help scripts we put here mainly for
> re-configure HPCC cluster. Default configuration may not meet user need.
> We understand it probably not recommend way for juju
> charm. But we want to give user convenient way to access the tools.
> We are open for discuss these during the review process.
> icon.svg: We haven't gotten which icon we should use. It is in our
> internal process.
>
> Currently HPCC charm doesn't have related to any other charms.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charms/+bug/1272083/+subscriptions
>

Xiaoming Wang (xwang2713) wrote :

I just checkout both precise and trusty from launchpad:
   bzr checkout lp:~xwang2713/charms/precise/hpcc/trunk
   bzr checkout lp:~xwang2713/charms/trusty/hpcc/trunk

And deploy them on amazon. Both install and start correctly.

Could be two things related to the error you see:
1) our hpcc image only support amd64. So make bootstrap arch is amd64. Maybe we should add it to README.md
2) download incomplete, blocked or failed.

Also we use md5sum instead of sha1sum

Let me know.

Thanks

Xiaoming Wang (xwang2713) wrote :

If install still fails attach the portion of the log. I will take a look.
Thanks

Matt Bruzek (mbruzek) wrote :

Xiaoming,

Thanks for sticking with the review process! The hpcc charm now deploys and scales up by adding an additional 3 nodes. I am very exited to include the HPCC in the Juju Charm Store!

The hpcc charm has been pushed to the charm store and after the (30 min) ingestion process should be available to deploy by typing:

juju deploy cs:precise/hpcc

The official bug tracker for the hpcc charm can be found here:

https://bugs.launchpad.net/charms/+source/hpcc

If you have any further questions/comments/concerns please contact us in #juju on Freenode.net or email the juju list at <email address hidden>

Thanks again for the work on this charm, and sticking with us on multiple review cycles!

Changed in charms:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers