nova scheduling study on making it smart by machine learning

Posted on 2019-03-06

update in 2019 June: I managed to implement it, check here ;-) will write more on how it was done…

This note is my brain dump on getting ideas to do machine learning enabled optimized nova scheduler weighing.

How existing Weighing works?

Short version conclusion

By default, it simply weighted all existing weighers with weighing factor 1.0.

TL;DR

see ref first : https://www.slideshare.net/guptapeeyush1/presentation1-23249150

The weighing was called by:

self.weight_handler.get_weighed_objects(self.weighers)
- self.weighers comes from CONF.host_mgr_sched_wgt_cls_opt
  - By default it’s all weighers
    
    default=["nova.scheduler.weights.all_weighers"]
- get_weighed_objects is doing this:
  1
  2
  3
  for i, weight in enumerate(weights):
  obj = weighed_objs[i]
  obj.weight += weigher.weight_multiplier() * weight

Below are mentioned subroutines…

`nova/scheduler/host_manager.py`

class HostManager(object):
    """Base HostManager class."""

    # Can be overridden in a subclass
    def host_state_cls(self, host, node, **kwargs):
        return HostState(host, node)

    def __init__(self):
        self.host_state_map = {}
        self.filter_handler = filters.HostFilterHandler()
        filter_classes = self.filter_handler.get_matching_classes(
                CONF.scheduler_available_filters)
        self.filter_cls_map = {cls.__name__: cls for cls in filter_classes}
        self.filter_obj_map = {}
        self.default_filters = self._choose_host_filters(self._load_filters())
        self.weight_handler = weights.HostWeightHandler()              <--------- 
        weigher_classes = self.weight_handler.get_matching_classes(
                CONF.scheduler_weight_classes)
        self.weighers = [cls() for cls in weigher_classes]             <--------- 
#...

    def get_weighed_hosts(self, hosts, spec_obj):                      <---------
        """Weigh the hosts."""
        return self.weight_handler.get_weighed_objects(self.weighers,
                hosts, spec_obj)

`nova/conf/scheduler.py`


host_mgr_sched_wgt_cls_opt = cfg.ListOpt("scheduler_weight_classes",
        default=["nova.scheduler.weights.all_weighers"],
        help="""
This is a list of weigher class names. Only hosts which pass the filters are
weighed. The weight for any host starts at 0, and the weighers order these
hosts by adding to or subtracting from the weight assigned by the previous
weigher. Weights may become negative.

An instance will be scheduled to one of the N most-weighted hosts, where N is
'scheduler_host_subset_size'.

By default, this is set to all weighers that are included with Nova. If you
wish to change this, replace this with a list of strings, where each element is
the path to a weigher.

This option is only used by the FilterScheduler and its subclasses; if you use
a different scheduler, this option has no effect.

* Services that use this:

    ``nova-scheduler``

* Related options:

    None
""")

#...

io_ops_weight_mult_opt = cfg.FloatOpt("io_ops_weight_multiplier",
        default=-1.0,
        help="""
This option determines how hosts with differing workloads are weighed. Negative
values, such as the default, will result in the scheduler preferring hosts with
lighter workloads whereas positive values will prefer hosts with heavier
workloads. Another way to look at it is that positive values for this option
will tend to schedule instances onto hosts that are already busy, while
negative values will tend to distribute the workload across more hosts. The
absolute value, whether positive or negative, controls how strong the io_ops
weigher is relative to other weighers.

This option is only used by the FilterScheduler and its subclasses; if you use
a different scheduler, this option has no effect. Also note that this setting
only affects scheduling if the 'io_ops' weigher is enabled.

Valid values are numeric, either integer or float.

* Services that use this:

    ``nova-scheduler``

* Related options:

    None
""")

We could get all weight_mult_opt:

$ grep _multiplier  nova/conf/scheduler.py  -A3 -B3 | grep _opts
ram_weight_mult_opt = cfg.FloatOpt("ram_weight_multiplier",

disk_weight_mult_opt = cfg.FloatOpt("disk_weight_multiplier",

io_ops_weight_mult_opt = cfg.FloatOpt("io_ops_weight_multiplier",

metrics_weight_opts = [
     cfg.FloatOpt("weight_multiplier",
     default=1.0,

And in metrics_weight_opts , check below:


cpu.user.percent
cpu.user.time
cpu.iowait.percent
cpu.iowait.time
cpu.frequency
cpu.idle.percent
cpu.idle.time
cpu.percent
cpu.kernel.time
cpu.kernel.percent

example of a conf

[metrics]
weight_multiplier = -0.1
weight_setting = cpu.iowait.percent=-1.0

existing weighers

~/openstack/nova/nova/scheduler
❯ tree weights
weights
├── __init__.py
├── affinity.py
├── cpu.py
├── disk.py
├── io_ops.py
├── metrics.py
└── ram.py

`nova/weights.py`

# Copyright (c) 2011-2012 OpenStack Foundation
# All Rights Reserved.
#
#    Licensed under the Apache License, Version 2.0 (the "License"); you may
#    not use this file except in compliance with the License. You may obtain
#    a copy of the License at
#
#         http://www.apache.org/licenses/LICENSE-2.0
#
#    Unless required by applicable law or agreed to in writing, software
#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
#    License for the specific language governing permissions and limitations
#    under the License.

"""
Pluggable Weighing support
"""

import abc
import six

from nova import loadables


def normalize(weight_list, minval=None, maxval=None):
    """Normalize the values in a list between 0 and 1.0.

    The normalization is made regarding the lower and upper values present in
    weight_list. If the minval and/or maxval parameters are set, these values
    will be used instead of the minimum and maximum from the list.

    If all the values are equal, they are normalized to 0.
    """

    if not weight_list:
        return ()

    if maxval is None:
        maxval = max(weight_list)

    if minval is None:
        minval = min(weight_list)

    maxval = float(maxval)
    minval = float(minval)

    if minval == maxval:
        return [0] * len(weight_list)

    range_ = maxval - minval
    return ((i - minval) / range_ for i in weight_list)


class WeighedObject(object):
    """Object with weight information."""
    def __init__(self, obj, weight):
        self.obj = obj
        self.weight = weight

    def __repr__(self):
        return "<WeighedObject '%s': %s>" % (self.obj, self.weight)


@six.add_metaclass(abc.ABCMeta)
class BaseWeigher(object):
    """Base class for pluggable weighers.

    The attributes maxval and minval can be specified to set up the maximum
    and minimum values for the weighed objects. These values will then be
    taken into account in the normalization step, instead of taking the values
    from the calculated weights.
    """

    minval = None
    maxval = None

    def weight_multiplier(self):                               <-------- weight for weigher factor
        """How weighted this weigher should be.

        Override this method in a subclass, so that the returned value is
        read from a configuration option to permit operators specify a
        multiplier for the weigher.
        """
        return 1.0

    @abc.abstractmethod
    def _weigh_object(self, obj, weight_properties):
        """Weigh an specific object."""

    def weigh_objects(self, weighed_obj_list, weight_properties):
        """Weigh multiple objects.

        Override in a subclass if you need access to all objects in order
        to calculate weights. Do not modify the weight of an object here,
        just return a list of weights.
        """
        # Calculate the weights
        weights = []
        for obj in weighed_obj_list:
            weight = self._weigh_object(obj.obj, weight_properties)

            # Record the min and max values if they are None. If they anything
            # but none we assume that the weigher has set them
            if self.minval is None:
                self.minval = weight
            if self.maxval is None:
                self.maxval = weight

            if weight < self.minval:
                self.minval = weight
            elif weight > self.maxval:
                self.maxval = weight

            weights.append(weight)

        return weights


class BaseWeightHandler(loadables.BaseLoader):
    object_class = WeighedObject

    def get_weighed_objects(self, weighers, obj_list, weighing_properties):  <-------
        """Return a sorted (descending), normalized list of WeighedObjects."""
        weighed_objs = [self.object_class(obj, 0.0) for obj in obj_list]

        if len(weighed_objs) <= 1:
            return weighed_objs

        for weigher in weighers:
            weights = weigher.weigh_objects(weighed_objs, weighing_properties)

            # Normalize the weights
            weights = normalize(weights,
                                minval=weigher.minval,
                                maxval=weigher.maxval)

            for i, weight in enumerate(weights):
                obj = weighed_objs[i]
                obj.weight += weigher.weight_multiplier() * weight          <--------- caculation

        return sorted(weighed_objs, key=lambda x: x.weight, reverse=True)

`nova.scheduler.weights.metrics`

ref:

https://01.org/sites/default/files/utilization_based_scheduing_in_openstack_compute_nova.pdf

https://docs.openstack.org/nova/mitaka/api/nova.scheduler.weights.metrics.html

https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling

weight_setting in nova/conf/scheduler.py

     cfg.ListOpt("weight_setting",
            default=[],
            help="""
This setting specifies the metrics to be weighed and the relative ratios for
each metric. This should be a single string value, consisting of a series of
one or more 'name=ratio' pairs, separated by commas, where 'name' is the name
of the metric to be weighed, and 'ratio' is the relative weight for that
metric.

Note that if the ratio is set to 0, the metric value is ignored, and instead
the weight will be set to the value of the 'weight_of_unavailable' option.

As an example, let's consider the case where this option is set to:

    ``name1=1.0, name2=-1.3``

The final weight will be:

    ``(name1.value * 1.0) + (name2.value * -1.3)``

This option is only used by the FilterScheduler and its subclasses; if you use
a different scheduler, this option has no effect.

Existing monitors in Mitaka, only cpu related

openstack/nova/nova/compute/monitors 
❯ tree
.
├── __init__.py
├── base.py
└── cpu
    ├── __init__.py
    └── virt_driver.py

How/What we could do something here?

I could come with two possibilities.

Idea one, tune existing weigher’s `mult_opt` or `weight_setting`

The factors are existing weighers, just tune their mult_opt or weight_setting via machine learning. Frankly the default values might not have been properly studied.

To do that: we need to perform edge case on massive concurent VM/VNF instantiations, define a benchmark to evaluate the “good” result we would like to have. Then to provide optimised tunning on existing weigher’s mult_opt or weight_setting.

The key is to push the envrioment into limit, when, in that edge case, booting time, retry times and succefully rate may not be 100% good
The way to colletion data, should be well-considerred
- Could take one typical VNF as input to ensure this can benifit at least in one scenario
- Or, take as more as possible VNFs, which could possibly have result with less bias( meanwhile maybe useless in small scale of data collections)

Idea two, introduce new dynamic weigher based on online learning

This work actually is based on same work of idea one, but with two differences:

Introduce more factors as input for a new weigher to co-exist with legacy ones, do exactly same thing as idea one to tune their weight arguments properly in lab.
It’s a dynamic weigher , which, let’s call it learning-weigher:
- This learning-weigher is with arguments percentent in database instead of in CONF as legacy ones do.
- the arguments are being online-learning-updated by a machine learning daemon process.

Find a working model

Time
Resource
Telents # I would like to do this part. ☺

Define edge case on Massive concurent VM/VNF instantiations

Two pars as below

Defination edge case

The case should be designed to lead failures to ensure benchmark valid and of cource targeting to be like real work case/cases

Need domain knowledge and experiences from multi parties ☺

Time
Resource
Telents # I would like to do this part. ☺

Massive concurent VM/VNF instantiations

need hardware and supports, and ton of executuions for collecting data after being setup

Time
Resource
Telents # I have to do this part. ☺

Redesign on nova

For idea 2, this is needed and I can do this part ☺

Time
Resource
Telents # low priority :(

Envrionment Requirement

A Cloud Infra allowing CPU over commitment(non-dedicated) workloads

Bayesian optimization

GPyOpt

https://nbviewer.jupyter.org/github/SheffieldML/GPyOpt/blob/master/manual/index.ipynb

Refs

local and global optimization https://web.maths.unsw.edu.au/~rsw/lgopt.pdf
EI(expected improvement) originally comes from: Jones, D. R., Schonlau, M., and Welch, W. J. (1998). “Efficient Global Optimization of Expensive Black-Box Functions.” Journal of Global Optimization, 13: 455–492. MR1673460.
Rasmussen, Carl Edward and Williams, Christopher K. I. Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, Cambridge, Mass, 2006. ISBN 978-0-262-18253-9. URL http://www.gaussianprocess.org/gpml/chapters/RW.pdf. OCLC: ocm61285753.
Optimization, fast and slow: optimally switching between local and Bayesian optimization https://arxiv.org/pdf/1805.08610.pdf