nova scheduling study on making it smart by machine learning

update in 2019 June: I managed to implement it, check here ;-) will write more on how it was done…

This note is my brain dump on getting ideas to do machine learning enabled optimized nova scheduler weighing.

How existing Weighing works?

Short version conclusion

By default, it simply weighted all existing weighers with weighing factor 1.0.

TL;DR

see ref first : https://www.slideshare.net/guptapeeyush1/presentation1-23249150

The weighing was called by:

  • self.weight_handler.get_weighed_objects(self.weighers)

    • self.weighers comes from CONF.host_mgr_sched_wgt_cls_opt

      • By default it’s all weighers

        default=["nova.scheduler.weights.all_weighers"]

    • get_weighed_objects is doing this:

      1
      2
      3
      for i, weight in enumerate(weights):
      obj = weighed_objs[i]
      obj.weight += weigher.weight_multiplier() * weight

Below are mentioned subroutines…

nova/scheduler/host_manager.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class HostManager(object):
"""Base HostManager class."""

# Can be overridden in a subclass
def host_state_cls(self, host, node, **kwargs):
return HostState(host, node)

def __init__(self):
self.host_state_map = {}
self.filter_handler = filters.HostFilterHandler()
filter_classes = self.filter_handler.get_matching_classes(
CONF.scheduler_available_filters)
self.filter_cls_map = {cls.__name__: cls for cls in filter_classes}
self.filter_obj_map = {}
self.default_filters = self._choose_host_filters(self._load_filters())
self.weight_handler = weights.HostWeightHandler() <---------
weigher_classes = self.weight_handler.get_matching_classes(
CONF.scheduler_weight_classes)
self.weighers = [cls() for cls in weigher_classes] <---------
#...

def get_weighed_hosts(self, hosts, spec_obj): <---------
"""Weigh the hosts."""
return self.weight_handler.get_weighed_objects(self.weighers,
hosts, spec_obj)

nova/conf/scheduler.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

host_mgr_sched_wgt_cls_opt = cfg.ListOpt("scheduler_weight_classes",
default=["nova.scheduler.weights.all_weighers"],
help="""
This is a list of weigher class names. Only hosts which pass the filters are
weighed. The weight for any host starts at 0, and the weighers order these
hosts by adding to or subtracting from the weight assigned by the previous
weigher. Weights may become negative.

An instance will be scheduled to one of the N most-weighted hosts, where N is
'scheduler_host_subset_size'.

By default, this is set to all weighers that are included with Nova. If you
wish to change this, replace this with a list of strings, where each element is
the path to a weigher.

This option is only used by the FilterScheduler and its subclasses; if you use
a different scheduler, this option has no effect.

* Services that use this:

``nova-scheduler``

* Related options:

None
""")

#...

io_ops_weight_mult_opt = cfg.FloatOpt("io_ops_weight_multiplier",
default=-1.0,
help="""
This option determines how hosts with differing workloads are weighed. Negative
values, such as the default, will result in the scheduler preferring hosts with
lighter workloads whereas positive values will prefer hosts with heavier
workloads. Another way to look at it is that positive values for this option
will tend to schedule instances onto hosts that are already busy, while
negative values will tend to distribute the workload across more hosts. The
absolute value, whether positive or negative, controls how strong the io_ops
weigher is relative to other weighers.

This option is only used by the FilterScheduler and its subclasses; if you use
a different scheduler, this option has no effect. Also note that this setting
only affects scheduling if the 'io_ops' weigher is enabled.

Valid values are numeric, either integer or float.

* Services that use this:

``nova-scheduler``

* Related options:

None
""")

We could get all weight_mult_opt:

1
2
3
4
5
6
7
8
9
10
11
$ grep _multiplier  nova/conf/scheduler.py  -A3 -B3 | grep _opts
ram_weight_mult_opt = cfg.FloatOpt("ram_weight_multiplier",

disk_weight_mult_opt = cfg.FloatOpt("disk_weight_multiplier",

io_ops_weight_mult_opt = cfg.FloatOpt("io_ops_weight_multiplier",

metrics_weight_opts = [
cfg.FloatOpt("weight_multiplier",
default=1.0,

And in metrics_weight_opts , check below:

1
2
3
4
5
6
7
8
9
10
11

cpu.user.percent
cpu.user.time
cpu.iowait.percent
cpu.iowait.time
cpu.frequency
cpu.idle.percent
cpu.idle.time
cpu.percent
cpu.kernel.time
cpu.kernel.percent

example of a conf

1
2
3
4
[metrics]
weight_multiplier = -0.1
weight_setting = cpu.iowait.percent=-1.0

existing weighers

1
2
3
4
5
6
7
8
9
10
~/openstack/nova/nova/scheduler
❯ tree weights
weights
├── __init__.py
├── affinity.py
├── cpu.py
├── disk.py
├── io_ops.py
├── metrics.py
└── ram.py

nova/weights.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# Copyright (c) 2011-2012 OpenStack Foundation
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.

"""
Pluggable Weighing support
"""

import abc
import six

from nova import loadables


def normalize(weight_list, minval=None, maxval=None):
"""Normalize the values in a list between 0 and 1.0.

The normalization is made regarding the lower and upper values present in
weight_list. If the minval and/or maxval parameters are set, these values
will be used instead of the minimum and maximum from the list.

If all the values are equal, they are normalized to 0.
"""

if not weight_list:
return ()

if maxval is None:
maxval = max(weight_list)

if minval is None:
minval = min(weight_list)

maxval = float(maxval)
minval = float(minval)

if minval == maxval:
return [0] * len(weight_list)

range_ = maxval - minval
return ((i - minval) / range_ for i in weight_list)


class WeighedObject(object):
"""Object with weight information."""
def __init__(self, obj, weight):
self.obj = obj
self.weight = weight

def __repr__(self):
return "<WeighedObject '%s': %s>" % (self.obj, self.weight)


@six.add_metaclass(abc.ABCMeta)
class BaseWeigher(object):
"""Base class for pluggable weighers.

The attributes maxval and minval can be specified to set up the maximum
and minimum values for the weighed objects. These values will then be
taken into account in the normalization step, instead of taking the values
from the calculated weights.
"""

minval = None
maxval = None

def weight_multiplier(self): <-------- weight for weigher factor
"""How weighted this weigher should be.

Override this method in a subclass, so that the returned value is
read from a configuration option to permit operators specify a
multiplier for the weigher.
"""
return 1.0

@abc.abstractmethod
def _weigh_object(self, obj, weight_properties):
"""Weigh an specific object."""

def weigh_objects(self, weighed_obj_list, weight_properties):
"""Weigh multiple objects.

Override in a subclass if you need access to all objects in order
to calculate weights. Do not modify the weight of an object here,
just return a list of weights.
"""
# Calculate the weights
weights = []
for obj in weighed_obj_list:
weight = self._weigh_object(obj.obj, weight_properties)

# Record the min and max values if they are None. If they anything
# but none we assume that the weigher has set them
if self.minval is None:
self.minval = weight
if self.maxval is None:
self.maxval = weight

if weight < self.minval:
self.minval = weight
elif weight > self.maxval:
self.maxval = weight

weights.append(weight)

return weights


class BaseWeightHandler(loadables.BaseLoader):
object_class = WeighedObject

def get_weighed_objects(self, weighers, obj_list, weighing_properties): <-------
"""Return a sorted (descending), normalized list of WeighedObjects."""
weighed_objs = [self.object_class(obj, 0.0) for obj in obj_list]

if len(weighed_objs) <= 1:
return weighed_objs

for weigher in weighers:
weights = weigher.weigh_objects(weighed_objs, weighing_properties)

# Normalize the weights
weights = normalize(weights,
minval=weigher.minval,
maxval=weigher.maxval)

for i, weight in enumerate(weights):
obj = weighed_objs[i]
obj.weight += weigher.weight_multiplier() * weight <--------- caculation

return sorted(weighed_objs, key=lambda x: x.weight, reverse=True)

nova.scheduler.weights.metrics

ref:

weight_setting in nova/conf/scheduler.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
     cfg.ListOpt("weight_setting",
default=[],
help="""
This setting specifies the metrics to be weighed and the relative ratios for
each metric. This should be a single string value, consisting of a series of
one or more 'name=ratio' pairs, separated by commas, where 'name' is the name
of the metric to be weighed, and 'ratio' is the relative weight for that
metric.

Note that if the ratio is set to 0, the metric value is ignored, and instead
the weight will be set to the value of the 'weight_of_unavailable' option.

As an example, let's consider the case where this option is set to:

``name1=1.0, name2=-1.3``

The final weight will be:

``(name1.value * 1.0) + (name2.value * -1.3)``

This option is only used by the FilterScheduler and its subclasses; if you use
a different scheduler, this option has no effect.

Existing monitors in Mitaka, only cpu related

1
2
3
4
5
6
7
8
openstack/nova/nova/compute/monitors 
❯ tree
.
├── __init__.py
├── base.py
└── cpu
├── __init__.py
└── virt_driver.py

How/What we could do something here?

I could come with two possibilities.

Idea one, tune existing weigher’s mult_opt or weight_setting

The factors are existing weighers, just tune their mult_opt or weight_setting via machine learning. Frankly the default values might not have been properly studied.

To do that: we need to perform edge case on massive concurent VM/VNF instantiations, define a benchmark to evaluate the “good” result we would like to have. Then to provide optimised tunning on existing weigher’s mult_opt or weight_setting.

  • The key is to push the envrioment into limit, when, in that edge case, booting time, retry times and succefully rate may not be 100% good
  • The way to colletion data, should be well-considerred
    • Could take one typical VNF as input to ensure this can benifit at least in one scenario
    • Or, take as more as possible VNFs, which could possibly have result with less bias( meanwhile maybe useless in small scale of data collections)

Idea two, introduce new dynamic weigher based on online learning

This work actually is based on same work of idea one, but with two differences:

  1. Introduce more factors as input for a new weigher to co-exist with legacy ones, do exactly same thing as idea one to tune their weight arguments properly in lab.
  2. It’s a dynamic weigher , which, let’s call it learning-weigher:
    • This learning-weigher is with arguments percentent in database instead of in CONF as legacy ones do.
    • the arguments are being online-learning-updated by a machine learning daemon process.

Find a working model

  • Time
  • Resource
  • Telents # I would like to do this part. ☺

Define edge case on Massive concurent VM/VNF instantiations

Two pars as below

Defination edge case

The case should be designed to lead failures to ensure benchmark valid and of cource targeting to be like real work case/cases

Need domain knowledge and experiences from multi parties ☺

  • Time
  • Resource
  • Telents # I would like to do this part. ☺

Massive concurent VM/VNF instantiations

need hardware and supports, and ton of executuions for collecting data after being setup

  • Time
  • Resource
  • Telents # I have to do this part. ☺

Redesign on nova

For idea 2, this is needed and I can do this part ☺

  • Time
  • Resource
  • Telents # low priority :(

Envrionment Requirement

  • A Cloud Infra allowing CPU over commitment(non-dedicated) workloads

Bayesian optimization

GPyOpt

Refs