Early this year, I had a chance to take a refactor task: one of our big downstream features doesn’t scale. Here I share some of my learning notes to potentially help you out there :-).

Steps to follow

1. Runnable Minimal Reproduce

For a distributed system that required heavy hardware setup, or even worse, when you are trying to fix a case in the condition of a large scale, we need to have a runnable minimal reproduce(RMR) method to decouple the costly large scale of test environment/ production environment. A handy RMR is crucial as it brings us repeatability in circles of the rests steps.

In my case, when I started looking into this task, I found the only environment in hand was already quite expensive for the team (15+ servers, each of them is with 100+ GiB RAM and 40+ threads), yet it’s still far from the scale I am optimizing for(1000+ servers case).

How to have a RMR?

The answer could vary case by case:

  • Buy/Pay-as-you-go-on-others-server 1000+ nodes cluster?
    • This may be the best choice when you are in a cool organization working ;)
  • Setup virtualized node cluster?
    • In case you are dealing with the case you can semi-virtualize your environment to gain a 10/100 times scale yet with the closest to the real 1000+ node scale: use some efforts to set up 10 VMs on each bare-metal servers, that is, with 15+ servers you can reproduce the 150+ server issues. In my case, to create a virtualized environment is still very expensive as the workload we were working on depends on layers not that straightforward to be virtualized, and 150+ is still not enough for me.
  • Mock it!
    • I end up isolating the modular of code to be optimized, and, carefully mocking the interactions with other applications/ processes, with performance-wise mock.

With the RMR implemented, I put it in the existing UT code base with performance benchmark criteria as one of the assertions of the function I optimized. One of the reasons to trigger my putting RMR code in the codebase as UT is some of the surrounding mock utilities are there already, of course, the most important one is by doing so, these performance criteria will be always regression in every single commit merged to the codebase in future.

It’s worth mentioning that, during the implementation of RMR, there are some assumptions to be made, for example, the involved database, RESTful API, and RPC calls were mocked with sort of sleep in certain times, which was estimated based some experiments. The actual value may vary in different environments while values under some assumptions still can help us on some level of evaluation, yet, provide us comparison on before and after the refactoring (as of course, we use same assumptions).

2. Profiling and analysis

I started to have more of the whole-picture view of the function in two ways almost the same time: profiling and code reading.

Before profiling the function calls, I perceptually went through the code the feature and drew all call flows with time complexity with http://asciiflow.com/ like this:

Or drew it in whiteboard like this:

Read more »

This is a note on setting up kolla ansible openstack develop envrionment in your working machine.

I referred to this document: https://docs.openstack.org/kolla-ansible/latest/contributor/vagrant-dev-env.html

I had done this in 2020 May on my old mac mini with Catalina ;-). With 20 GiB disk and 6GiB RAM assigned to one Vagrant spawned (virtualbox), the OpenStack Cluster is running pretty well, comparing to my previous setups ( multi-VMs based manually deployment and VM based Mirantis Fuel 9.0 deployment), it’s obviously more reponsive with the help of the single node setup.

Here I leave the notes as a reference for you to have a playground easier.


Install vagrant and plugins needed

Below is an example to do this on a macOS machine, to do that from other OS, refer to here

brew cask install virtualbox vagrant

vagrant plugin install vagrant-hostmanager vagrant-vbguest

# if we need to resize disks after vagrant up
vagrant plugin install vagrant-disksize
# if we need to make vagrant VM behind a proxy
vagrant plugin install vagrant-proxyconf

Clone kolla repo

We need to clone below three repositories.

git clone https://opendev.org/openstack/kolla-cli
git clone https://opendev.org/openstack/kolla-ansible
git clone https://opendev.org/openstack/kolla

Customise Vagrant scripts

I found the configuration file out-of-box is no longer working/ out of repair for some time, below changes on Vagrantfile, Vagrantfile.custom, and bootstrap.sh are needed.

For Vagrantfile.custom, we need to use Ubuntu 18.04 instead of 16.04, which comes with python3.6+ in apt repo.

Read more »

This is the note on serving your Ubuntu server from iPadOS/ iOS with UTM, I am running 0.8 now.

Download image

wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img

Configure Password for the image, here we use chroot instead of config drive, the reason is at bottom.

Use chroot to set password for the cloud image as follow:

$sudo modprobe nbd max_part=1
$sudo qemu-nbd --connect=/dev/nbd0 bionic-server-cloudimg-amd64.img
$mkdir -p /mnt/nbd
$sudo mount /dev/nbd0p1 /mnt/nbd/ -o noatime
$chroot /mnt/nbd
ubuntu# passwd
ubuntu# exit
$sudo qemu-nbd --disconnect /dev/nbd0
$sudo umount /dev/nbd0p1
Read more »