Recently, Rackspace[1] released an OpenStack compute installer called ‘Alamo‘. This is effectively a downloadable ISO that will, after asking the user a few questions, install a fully functional OpenStack compute cluster, with integrated nova, glance, keystone and horizon components.
The installer is really easy to use. When you pop the Alamo ISO into your machine and boot up, one of the main questions you are asked is what type of node you want to install – controller (everything but nova-compute), or compute (only nova-compute). You need to install a controller node first, before you go ahead and install compute nodes to connect to the cluster.
Depending on what you’re doing, it’s quite likely that you’ll want to reinstall one of your compute nodes at some point, and without going through some additional steps you’re going to have issues. Issues like this:
So what’s happening here? Well, it’s important to understand what the installer actually does when we’re installing a compute node before we can understand what we need to do to fix it. So, I hear you cry, ‘WHAT DOES THE INSTALLER DO??’. Well, this:
- boots using the ubuntu/debian installer
- asks you a few questions (most importantly – what is your ip address, hostname, and the ip of the previously installed controller node)
- installs ubuntu precise (12.04)
- registers the machine with the chef server (that is running inside a vm on the controller node)
- assigns itself the nova-compute chef role
- runs chef-client, which then runs all the chef recipes associated with the nova compute role and thereby configures the node as a compute node in this cluster
See how one of those steps is highlighted? Yep, that’s our problem right there. When we re-install a compute node with this method (giving it exactly the same hostname as before), it tries to re-register itself as a client with the chef server. But the chef server won’t have any of it; as far as it’s concerned, it already has a client with that name and you’re just an imposter. You can see this by logging on to the controller node and running the following:
root@mycontroller:~# knife node list compute1.example.com compute2.example.com mycontroller.example.com
See! I’m trying to reinstall compute1.example.com, but it’s already there!
Luckily enough, the fix is easy. Shut down the compute node that you are trying to install (you need to do this as it will continually attempt to re-register itself every 30 seconds otherwise). And then, on the controller node:
root@mycontroller:~# knife node delete -y compute1.example.com && knife client delete -y compute1.example.com Deleted node[compute1.example.com] Deleted client[compute1.example.com]
That’s it! Now restart the compute node and re-run the installer, and you will enjoy many minutes of success!
[1] full disclosure – I work on the team at Rackspace that produced the Alamo installer
Thank a lot. I have just faced with this problem. I will try to do like your suggestion again.