repo_container on arm host contains packages for x86 hosts only

Bug #1703618 reported by Martin
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Expired
Undecided
Unassigned

Bug Description

I am setting up openstack-ansible on a heterogeneous cluster consisting of an x86 control node and arm compute nodes. The x86 host is also used as deployment host. However, setup of nova services on the arm hosts fails during TASK [pip_install : Get Modern PIP] with Error 404:
ok: [odroid] => {"changed": false, "dest": "/opt/get-pip.py", "failed": true, "failed_when_result": false, "gid": 0, "group": "root", "invocation": {"module_args": {"backup": false, "checksum": "", "content": null, "delimiter": null, "dest": "/opt/get-pip.py", "directory_mode": null, "follow": false, "force": true, "force_basic_auth": false, "group": null, "headers": null, "http_agent": "ansible-httpget", "mode": null, "owner": null, "regexp": null, "remote_src": null, "selevel": null, "serole": null, "setype": null, "seuser": null, "sha256sum": "", "src": null, "timeout": 10, "tmp_dest": "", "url": "http://172.29.236.1:8181/os-releases/14.2.6/ubuntu-16.04-armv7l/get-pip.py", "url_password": null, "url_username": null, "use_proxy": true, "validate_certs": false}, "module_name": "get_url"}, "mode": "0644", "msg": "Request failed", "owner": "root", "response": "HTTP Error 404: Not Found", "size": 1595408, "state": "file", "status_code": 404, "uid": 0, "url": "http://172.29.236.1:8181/os-releases/14.2.6/ubuntu-16.04-armv7l/get-pip.py"}

Reason for this failure is that the repo_container on the control host only contains the repositories for x86. Consequently, the requested URL does not exist and cannot be resolved.

To fix this problem I set up a second repo_container on an arm host to obtain the repositories for the arm architecture. However, I had to realize that this container also only contains the repositories for the x86 architecture.

Encountered bug:
Repositories seem to be built only for the architecture of the control node or deployment host. Deployment of services on hosts with different architecture subsequently fails.

Expected behavior:
Repositories for each architecture existing in the cluster should be available, especially when repository containers on hosts of each architecture are created.

In case you have any idea how to solve this problem, you are very welcome to share your thoughts.

Tags: arm
Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

On its way to fixing: https://review.openstack.org/#/c/484812/

Is this for newton only?

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

I will backport this change above back to Newton. It should fix your issue.

Revision history for this message
Martin (marniemann) wrote :

I only tested this on Newton.

Thanks :)

Revision history for this message
Major Hayden (rackerhacker) wrote :

Martin -- Are you able to test that the patch helped?

Changed in openstack-ansible:
assignee: nobody → Jean-Philippe Evrard (jean-philippe-evrard)
Revision history for this message
Martin (marniemann) wrote :

I am sorry but things seem to be broken at some other place...

What I did:
- Get fresh systems on control and compute node.
- Load the current version of openstack-ansible from stable/newton
- bootstrap the control node
- run playbooks
- redefine the variable lxc_architecture_mapping in user_variables as proposed in https://bugs.launchpad.net/openstack-ansible/+bug/1703612/ (this change is required, otherwise lxc-create fails because the arch argument is empty)

What I get:
- lxc container preparation fails during setup-hosts
- Detailed error message with max verbosity (haec2 is the control node's hostname):
TASK [lxc_hosts : Prepare cached image setup commands] *************************
task path: /etc/ansible/roles/lxc_hosts/tasks/lxc_cache_preparation.yml:71
container_name: "haec2"
physical_host: "haec2"
container_name: "haec2"
physical_host: "haec2"
<192.168.0.100> ESTABLISH SSH CONNECTION FOR USER: root
<192.168.0.100> SSH: EXEC ssh -C -q -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=5 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ServerAliveInterval=64 -o ServerAliveCountMax=1024 -o Compression=no -o TCPKeepAlive=yes -o VerifyHostKeyDNS=no -o ForwardX11=no -o ForwardAgent=yes -T -o ControlPath=/home/stack/.ansible/cp/ansible-ssh-%h-%p-%r 192.168.0.100 '/bin/sh -c '"'"'LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python && sleep 0'"'"''
fatal: [haec2]: FAILED! => {"changed": true, "cmd": ["chroot", "/var/lib/lxc/LXC_NAME/rootfs", "/usr/local/bin/cache-prep-commands.sh"], "delta": "0:00:00.003277", "end": "2017-07-26 14:23:42.061475", "failed": true, "invocation": {"module_args": {"_raw_params": "chroot /var/lib/lxc/LXC_NAME/rootfs /usr/local/bin/cache-prep-commands.sh", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}, "module_name": "command"}, "rc": 126, "start": "2017-07-26 14:23:42.058198", "stderr": "chroot: failed to run command ‘/usr/local/bin/cache-prep-commands.sh’: Exec format error", "stdout": "", "stdout_lines": [], "warnings": []}

Revision history for this message
Martin (marniemann) wrote :

Well it looks like I have to eat my words from comment #5.
I had the wrong openstack_user_config in use. Someone must have copied it from the wrong path...

With the correct configuration and a freshly loaded OSA from stable/newton:
- I do *not* get armhf repos with a repo_container only on the x86 node
- With repo_containers on (each) x86 and arm node I do get arm repos on the arm node.

However, in the latter case the x86 repos are neither built on the arm nor on the x86 node.
I will have a look into this issue tomorrow and keep you updated.

Revision history for this message
Martin (marniemann) wrote :

So, I ran both repo setups again to test for which architecture repos are built:

- With a repo_container only on the x86 node I do only get the repos for x86 arch
  - During TASK [Prepare group of master repo servers] the only changed item is:
    changed: [localhost] => (item=repo_servers_16.04_x86_64)
- With a repo_container on each node repos for both architecture are built
  - During TASK [Prepare group of master repo servers] two items are changed as expected:
    changed: [localhost] => (item=repo_servers_16.04_armv7l)
    changed: [localhost] => (item=repo_servers_16.04_x86_64)
  - What confused me yesterday was that this process is executed not in parallel but in sequence for each arch.
  - I was not yet able to test if the repos are accessed correctly

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

Hello Martin.

The neutron being a stable branch, we are more conservative in the backports.
However this looks like you're hitting a bug.

Could you have a try to run this:
https://github.com/openstack/openstack-ansible/blob/8a3ac94033e7f33ad17299154ed2930f3924b1d6/playbooks/repo-build.yml#L16-L43

In your branch?

That should solve it.
Please tell me if you're ok for a backport.

Changed in openstack-ansible:
assignee: Jean-Philippe Evrard (jean-philippe-evrard) → nobody
Changed in openstack-ansible:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for openstack-ansible because there has been no activity for 60 days.]

Changed in openstack-ansible:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.