ML 2017 fall homework1 Linear Regression 笔记

Posted on 2018-05-28 In random notes

Gradient Descent for linear regression task

Data pre processing

Load data from csv file to a list

trainDataList = list()
with open("train.csv", newline="") as csvfile:
    trainData = csv.reader(csvfile, delimiter=",",quotechar="|")
    for line in trainData:
        trainDataList.append(line[3:])

make all train data in one list, to enable iterating per hour of the list, thus, we have (24 * days - 10 + 1) training data.

trainDataIteratedPerHour: list()

# n lines to be one day

def mod_18(n):
    return n % 18

# transpose a matrix(2D list)
def T_list(l):
    return np.array(l).T.tolist()


i = 0
trainDataIteratedPerHour = []
listToAppend = []

for item in trainDataList[1:]:
    if not mod_18(i):
        # from mod 10: 0
        listToAppend = [item]

    elif mod_18(i) == 10:
        # rain, consider "NR" = 0
        listToAppend.append(["0" if x == 'NR' else x for x in item])

    elif mod_18(i) == 17:
        # all 18 lines collected, built a set of data to extend
        listToAppend.append(item)
        trainDataIteratedPerHour.extend(T_list(listToAppend))
    else:
        # append other data
        listToAppend.append(item)
    i = i + 1

i = 0

build x_data and y_data

"""
build x_data and y_data
"""

x_data = [] # 18*9 dimensions
y_data = [] # scalar, pm2.5 of next hour for x_data

for n in range(len(trainDataIteratedPerHour) - 10):
    x_data.extend(np.array(trainDataIteratedPerHour[n : n + 9]).reshape(1,162).tolist())
    y_data.append(trainDataIteratedPerHour[n + 10][9])

x_data = [[float(j) for j in i] for i in x_data]
y_data = [float(i) for i in y_data]

draw plot (to feel the range of the data)

# draw plot of pm2.5 range?

import matplotlib
import matplotlib.pyplot as plt

x = range(len(y_data))
y = np.array(y_data)
# refer to https://matplotlib.org/2.2.2/gallery/lines_bars_and_markers/simple_plot.html

fig, ax = plt.subplots()
ax.plot(x, y)

# plt.show()

Loss function

def lossFunction(w, x_data, y_data):
    w = np.array(w)
    x_data = np.array(x_data)
    result = 0.

    for i in range(len(y_data)):
        result += (y_data[i] - sum(w * x_data[i]))**2
    return result

# w = [0.01] * 162
# L = lossFunction(w,x_data,y_data)

Iterations for grident descent

# Iterations

def iterationRun(lr,iteration,x_data,y_data):
    # initial data
    w = [0.01] * 162
    L_history = [lossFunction(w,x_data,y_data)]

    w = np.array(w)
    x_data = np.array(x_data)

    for iterator in range(iteration):
        # initialize w_grad
        w_grad = [0.0] * 162
        # sum of all training data set
        for n in range(len(y_data)):
            # per feature
            for i in range(162):
                w_grad[i] = w_grad[i] - 2.0 * x_data[n][i] * ( sum(w * x_data[n]) - y_data[n] )

        # update w
        for i in range(162):
            w[i] = w[i] + lr * w_grad[i]

        # store Loss Function hisotry for plotting
        L_history.append(lossFunction(w,x_data,y_data))
        print (str(iterator) + " : " + str(datetime.datetime.now().time()) + " L:" + str(L_history[-1]))
    return L_history

Run it for 10 iterations

1
2
3

lr = 0.0000000000001 # learning rate
iteration = 10
iterationRun(lr,iteration,x_data,y_data)

Tunning

Find good initial learning rate

start with lr = 0.0000000000001

In [19]: lr = 0.0000000000001 # learning rate
    ...: iteration = 10
    ...: iterationRun(lr,iteration,x_data,y_data)
    ...:
0 : 13:21:22.227145 L:5897161.869645979
1 : 13:21:53.988814 L:5890818.193569232
2 : 13:22:25.639150 L:5884483.073933817
3 : 13:22:57.211950 L:5878156.49918669
4 : 13:23:28.873442 L:5871838.457790465
5 : 13:24:01.059230 L:5865528.93822318

hmmm… try greater lr

lr = 0.00000000001

The loss function is descenting faster, like around 60000 per epoch.

In [21]: lr = 0.000000000001 # learning rate
    ...: iteration = 10
    ...: iterationRun(lr,iteration,x_data,y_data)
    ...:
0 : 13:35:09.910428 L:5840184.58334736
1 : 13:35:41.650901 L:5777706.652466557
2 : 13:36:13.323090 L:5716068.8576254565
3 : 13:36:45.001751 L:5655259.889673614
4 : 13:37:16.168479 L:5595268.591697776
5 : 13:37:47.701241 L:5536083.956972405
6 : 13:38:19.216952 L:5477695.126938047
7 : 13:38:51.013157 L:5420091.389206737

our training set data is 5750

1 2	In [20]: len(y_data) Out[20]: 5750

If prediected pm 2.5 value , a.k.a y data is in error range 10, the L value should be 5750*100=575000

With current descent speed, we will need 88 epoch:

1 2	In [24]: (5840184-575000)/60000 Out[24]: 87.75306666666667

let’s make it 10 times faster to see if target L could be get in 10 epoch

lr = 0.0000000001

In [27]: lr = 0.0000000001 # learning rate
    ...: iteration = 10
    ...: iterationRun(lr,iteration,x_data,y_data)
    ...:
0 : 13:49:10.609660 L:1692576.375268496
1 : 13:49:42.475991 L:1242849.9101765472
2 : 13:50:14.660518 L:1189717.4416525147
3 : 13:50:46.605178 L:1178550.5080278912
4 : 13:51:18.416010 L:1171963.6339294983
5 : 13:51:50.342550 L:1166008.2433918654
6 : 13:52:22.022203 L:1160260.444018784
7 : 13:52:53.942997 L:1154668.3447938378
8 : 13:53:25.925997 L:1149219.7383900573
9 : 13:53:57.718994 L:1143907.0426408767
Out[27]:
[5903514.1137326835,
 1692576.375268496,
 1242849.9101765472,
 1189717.4416525147,
 1178550.5080278912,
 1171963.6339294983,
 1166008.2433918654,
 1160260.444018784,
 1154668.3447938378,
 1149219.7383900573,
 1143907.0426408767]

# plot it 
x = range(iteration+1)
y = np.array(L_history)
# refer to https://matplotlib.org/2.2.2/gallery/lines_bars_and_markers/simple_plot.html

fig, ax = plt.subplots()
ax.plot(x, y)

As we printed in first step only, we can see now the learning rate is way faster in initial step thus the first L printed is in smaller range!

And we also know it’s descenting slower after 10 ephoch, thus it’s not reaching our target 575000 in small steps.

I (for sure) know that I need to use adaptive learning rate and customised learning rate per feature(adagrade/ adam), while before that, I would like to see how it goes with smaller learning rate, while it goes to mess!

In [46]: lr = 0.000000001 # learning rate
    ...: iteration = 10
    ...: iterationRun(lr,iteration,x_data,y_data)
    ...:
0 : 14:09:17.516243 L:156704618.55348668
1 : 14:09:49.509213 L:5150723049.19715
2 : 14:10:21.436315 L:170469039390.23218
3 : 14:10:53.396840 L:5642995478812.637

lr = `1e-10`

Then we could back to last initial learning rate, let’s get the final w by adding print (w) in iterationRun() , also we add initial w as an argument to enable modified initial w

def iterationRun(lr,iteration,x_data,y_data, w = [0.01] * 162):
    # initial data
    
    L_history = [lossFunction(w,x_data,y_data)]

    w = np.array(w)
    x_data = np.array(x_data)

    for iterator in range(iteration):
        # initialize w_grad
        w_grad = [0.0] * 162
        # sum of all training data set
        for n in range(len(y_data)):
            # per feature
            for i in range(162):
                w_grad[i] = w_grad[i] - 2.0 * x_data[n][i] * ( sum(w * x_data[n]) - y_data[n] )

        # update w
        for i in range(162):
            w[i] = w[i] + lr * w_grad[i]

        # store Loss Function hisotry for plotting
        L_history.append(lossFunction(w,x_data,y_data))
        print (str(iterator) + " : " + str(datetime.datetime.now().time()) + " L:" + str(L_history[-1]))
    print (w)
    return L_history

Then after re-run, we got the w:

In [49]: lr = 0.0000000001 # learning rate
    ...: iteration = 10
    ...: iterationRun(lr,iteration,x_data,y_data)
    ...:
0 : 14:24:45.664141 L:1692576.375268496
1 : 14:25:17.259352 L:1242849.9101765472
2 : 14:25:49.273630 L:1189717.4416525147
3 : 14:26:20.965149 L:1178550.5080278912
4 : 14:26:52.781849 L:1171963.6339294983
5 : 14:27:24.309887 L:1166008.2433918654
6 : 14:27:56.071350 L:1160260.444018784
7 : 14:28:27.729255 L:1154668.3447938378
8 : 14:28:59.185918 L:1149219.7383900573
9 : 14:29:31.678511 L:1143907.0426408767
[0.00865451 0.00992088 0.00998412 0.00999356 0.00989859 0.00958899
 0.00949017 0.00853832 0.00886119 0.00971015 0.00996496 0.00622562
 0.00986532 0.00991454 0.00047019 0.0002918  0.00985504 0.009888
 0.00866327 0.00992104 0.00998492 0.00999403 0.00990841 0.00960856
 0.00951873 0.00854217 0.00888944 0.00973543 0.00996355 0.00618855
 0.00987026 0.00991514 0.00064165 0.00051266 0.00985705 0.00988871
 0.00867558 0.00992112 0.00998615 0.00999456 0.00991583 0.00962887
 0.00954697 0.00857636 0.00896101 0.00978193 0.00996072 0.00613156
 0.00987708 0.0099157  0.00089779 0.00075592 0.00985905 0.00988954
 0.00869251 0.0099212  0.00998776 0.00999517 0.009925   0.00965916
 0.00958656 0.00863933 0.00905468 0.00984584 0.00995951 0.00605779
 0.00988265 0.00991644 0.0011024  0.00105049 0.00986111 0.00989001
 0.00871203 0.00992131 0.00998925 0.00999581 0.0099313  0.00969443
 0.00962812 0.00874158 0.00919581 0.00993813 0.00996109 0.00597563
 0.00989311 0.00991724 0.00152071 0.00139776 0.00986338 0.00989169
 0.00873482 0.00992151 0.00999107 0.00999641 0.00993875 0.00973362
 0.00967406 0.00888077 0.00938529 0.01006057 0.00996181 0.00589041
 0.00990578 0.00991814 0.00187297 0.00190307 0.00986629 0.00989357
 0.00875685 0.00992174 0.00999275 0.00999693 0.0099442  0.00977445
 0.00972014 0.00904638 0.00964319 0.01023079 0.00996201 0.00581055
 0.00992017 0.00991885 0.0023883  0.0023404  0.00986947 0.00989559
 0.0087768  0.009922   0.00999477 0.00999745 0.00994748 0.00981834
 0.0097673  0.00922359 0.00995879 0.01041849 0.00995968 0.00574316
 0.00993535 0.00991957 0.00290124 0.00287465 0.00987295 0.0098975
 0.00879132 0.0099223  0.00999638 0.00999786 0.00994411 0.00986186
 0.0098075  0.00938749 0.0103109  0.0108169  0.00995914 0.00570475
 0.00995173 0.00992029 0.0036527  0.00347423 0.00987645 0.00990038]
Out[49]:
[5903514.1137326835,
 1692576.375268496,
 1242849.9101765472,
 1189717.4416525147,
 1178550.5080278912,
 1171963.6339294983,
 1166008.2433918654,
 1160260.444018784,
 1154668.3447938378,
 1149219.7383900573,
 1143907.0426408767]

Run 10~20 epoch, lr=`1e-10`

Another 10 epoch( continued from first 10 epoch):

#w = [] output value in initial 10 epoch

In [59]: lr = 0.0000000001

In [60]: iteration = 10

In [61]: iterationRun(lr,iteration,x_data,y_data,w)
    ...:
0 : 14:41:55.554137 L:1138723.54416636
1 : 14:42:27.844246 L:1133663.0986562215
2 : 14:42:59.894751 L:1128719.8759725974
3 : 14:43:32.401907 L:1123888.4508929225
4 : 14:44:04.728092 L:1119163.74563917
5 : 14:44:36.823878 L:1114541.0056141382
6 : 14:45:08.958169 L:1110015.776899496
7 : 14:45:41.079041 L:1105583.8853543748
8 : 14:46:13.182146 L:1101241.4171967693
9 : 14:46:45.360379 L:1096984.7009616988
[ 8.39928810e-03  9.92018594e-03  9.98672249e-03  9.99395695e-03
  9.90650252e-03  9.66028161e-03  9.57133059e-03  8.59024624e-03
  9.70063031e-03  1.04003048e-02  9.94007211e-03  5.82092975e-03
  9.86818613e-03  9.91420797e-03 -7.30043313e-04 -9.74984684e-04
  9.81897077e-03  9.85914945e-03  8.42038505e-03  9.92049552e-03
  9.98824144e-03  9.99488279e-03  9.92479021e-03  9.70246400e-03
  9.63010828e-03  8.61918398e-03  9.77660141e-03  1.04616680e-02
  9.93740891e-03  5.73728090e-03  9.87953143e-03  9.91540645e-03
 -2.36708203e-04 -4.46497772e-04  9.82369317e-03  9.86089703e-03
  8.44699272e-03  9.92063613e-03  9.99060846e-03  9.99592360e-03
  9.93769384e-03  9.74480356e-03  9.68631482e-03  8.69975569e-03
  9.93568798e-03  1.05632875e-02  9.93195597e-03  5.62070493e-03
  9.89358966e-03  9.91648735e-03  3.09727865e-04  3.32769386e-05
  9.82806229e-03  9.86271172e-03  8.48109217e-03  9.92078753e-03
  9.99374187e-03  9.99713322e-03  9.95409819e-03  9.80638186e-03
  9.76454663e-03  8.82732248e-03  1.01321537e-02  1.06964648e-02
  9.92985319e-03  5.47712005e-03  9.90438972e-03  9.91794789e-03
  6.73973650e-04  5.44360035e-04  9.83218649e-03  9.86359313e-03
  8.51878100e-03  9.92099916e-03  9.99667211e-03  9.99839142e-03
  9.96501744e-03  9.87773770e-03  9.84678838e-03  9.02306858e-03
  1.04158946e-02  1.08835903e-02  9.93324622e-03  5.32255768e-03
  9.92446485e-03  9.91952209e-03  1.38594142e-03  1.10724366e-03
  9.83633504e-03  9.86671984e-03  8.56162962e-03  9.92140609e-03
  1.00002676e-02  9.99958471e-03  9.97874755e-03  9.95697261e-03
  9.93834450e-03  9.28329913e-03  1.07901635e-02  1.11284264e-02
  9.93483699e-03  5.16636435e-03  9.94860271e-03  9.92131553e-03
  1.92598395e-03  1.94810223e-03  9.84148301e-03  9.87004892e-03
  8.60192082e-03  9.92187479e-03  1.00036215e-02  1.00006191e-02
  9.98884103e-03  1.00394245e-02  1.00304865e-02  9.58954906e-03
  1.12961891e-02  1.14662940e-02  9.93528449e-03  5.02390839e-03
  9.97597866e-03  9.92274238e-03  2.75467811e-03  2.63288432e-03
  9.84692043e-03  9.87360657e-03  8.63756441e-03  9.92241075e-03
  1.00076795e-02  1.00016598e-02  9.99478393e-03  1.01275708e-02
  1.01245597e-02  9.91586834e-03  1.19162061e-02  1.18375866e-02
  9.93069923e-03  4.90773511e-03  1.00047837e-02  9.92420129e-03
  3.57134262e-03  3.51203380e-03  9.85289824e-03  9.87690267e-03
  8.66241404e-03  9.92304041e-03  1.00109440e-02  1.00024858e-02
  9.98759482e-03  1.02142033e-02  1.02041105e-02  1.02157481e-02
  1.26087041e-02  1.26296733e-02  9.92961546e-03  4.84886429e-03
  1.00360505e-02  9.92566063e-03  4.87428673e-03  4.54246885e-03
  9.85895437e-03  9.88225200e-03]
Out[61]:
[1143907.0116957787,
 1138723.54416636,
 1133663.0986562215,
 1128719.8759725974,
 1123888.4508929225,
 1119163.74563917,
 1114541.0056141382,
 1110015.776899496,
 1105583.8853543748,
 1101241.4171967693,
 1096984.7009616988]

In [62]: L_history = [1143907.0116957787,
    ...:  1138723.54416636,
    ...:  1133663.0986562215,
    ...:  1128719.8759725974,
    ...:  1123888.4508929225,
    ...:  1119163.74563917,
    ...:  1114541.0056141382,
    ...:  1110015.776899496,
    ...:  1105583.8853543748,
    ...:  1101241.4171967693,
    ...:  1096984.7009616988]
    ...:

In [63]: x = range(iteration+1)
    ...: y = np.array(L_history)
    ...:

In [64]: fig, ax = plt.subplots()
    ...: ax.plot(x, y)
    ...:
Out[64]: [<matplotlib.lines.Line2D at 0x1098122e8>]

In [65]: plt.show()

The result is not bad! let’s continue

Run 20-30 epoch, lr=`1e-10`

another 10 epoch

lr = 1e-10 # learning rate
...

Out[69]:
[1096984.7010237887,
 1092810.2907994157,
 1088714.9506539872,
 1084695.6401611501,
 1080749.50095292,
 1076873.8442235377,
 1073066.1391172102,
 1069324.0019368476,
 1065645.186115302,
 1062027.5728949562,
 1058469.162665172]

Let’s zoom to last 25 epoch

We still have 1060000-575000 = 485000, and now te speed is like 3000 descent per epoch, which means we could reach target in 160 epoch if it keeps the same speed :-) (for sure it won’t).

30~200 epoch, lr=`1e-10`

When it run to 200 epoch, the L reached 809150.2256672443

207~388 epoch, lr=`2e-10`

This time let’s give lr as doubled, the initial speed is faster while it will end up with very slow, and after 180 epoch, L is 703824.3536660115.

In [105]: def iterationRun(lr,iteration,x_data,y_data, w = [0.01] * 162):
     ...:     # initial data
     ...:
     ...:     L_history = [lossFunction(w,x_data,y_data)]
     ...:     W_history = [w]
     ...:     w = np.array(w)
     ...:     x_data = np.array(x_data)
     ...:
     ...:     for iterator in range(iteration):
     ...:         # initialize w_grad
     ...:         w_grad = [0.0] * 162
     ...:         # sum of all training data set
     ...:         for n in range(len(y_data)):
     ...:             # per feature
     ...:             for i in range(162):
     ...:                 w_grad[i] = w_grad[i] - 2.0 * x_data[n][i] * ( sum(w * x_data[n]) - y_data[n] )
     ...:
     ...:         # update w
     ...:         for i in range(162):
     ...:             w[i] = w[i] + lr * w_grad[i]
     ...:
     ...:         # store Loss Function hisotry for plotting
     ...:         L_history.append(lossFunction(w,x_data,y_data))
     ...:         W_history.append(w[0:])
     ...:         print (str(iterator) + " : " + str(datetime.datetime.now().time()) + " L:" + str(L_history[-1]))
     ...:     print (w)
     ...:     return L_history, W_history
     ...:

In [106]: lr = 2e-10 # learning rate
     ...: iteration = 180
     ...: #iterationRun(lr,iteration,x_data,y_data)
     ...: #w= <last time>
     ...: L, W = iterationRun(lr,iteration,x_data,y_data,w)
     ...: print (str(L))
     ...:

388~ 391 epoch, lr =`1e-8`

By tunning lr as we did in begining, it’s found 1e-8 can descent the L faster while 1e-7 will lead the value go mess:

In [135]: lr = 1e-8 # learning rate
     ...: iteration = 3
     ...: #iterationRun(lr,iteration,x_data,y_data)
     ...: #w= <last time>
     ...: L, W = iterationRun(lr,iteration,x_data,y_data,w)
     ...: print (str(L))
     ...:
0 : 20:36:21.318040 L:688000.4130500836
1 : 20:36:50.809668 L:674508.5537428795
2 : 20:37:20.181931 L:662770.7844944517

In [136]: lr = 1e-7 # learning rate
     ...: iteration = 3
     ...: #iterationRun(lr,iteration,x_data,y_data)
     ...: #w= <last time>
     ...: L, W = iterationRun(lr,iteration,x_data,y_data,w)
     ...: print (str(L))
     ...:
0 : 20:39:52.192365 L:608540.3895276675
1 : 20:40:20.381015 L:784689.7603969924
2 : 20:40:49.856693 L:7013862.489781529

Let’s do it with lr=1e-8, for another 20 epoch:

In the 0th~7th , L was normal while after from 4th epoch, the L goes crazy…

In [144]: lr = 1e-8 # learning rate
     ...: iteration = 7
     ...: #iterationRun(lr,iteration,x_data,y_data)
     ...: #w= <last time>
     ...: L, W = iterationRun(lr,iteration,x_data,y_data,w)
     ...: print (str(L))
     ...:
0 : 21:05:33.350228 L:688000.4130783302
1 : 21:06:02.751233 L:674508.5537976791
2 : 21:06:32.191330 L:662770.9045613626
3 : 21:07:01.596806 L:652980.2949830776
4 : 21:07:30.931137 L:2995810.210193815
5 : 21:07:58.316067 L:10415063655.932451
6 : 21:08:27.808302 L:46103934331702.31

below is a record for w in epoch 391

In [171]: lr = 1e-8 # learning rate
     ...: iteration = 3
     ...: #iterationRun(lr,iteration,x_data,y_data)
     ...: #w= <last time>
     ...: L, W = iterationRun(lr,iteration,x_data,y_data,w)
     ...: print (str(L))
     ...:
0 : 21:37:21.649233 L:688000.4130783302
1 : 21:37:51.147582 L:674508.5537976791
2 : 21:38:21.861637 L:662770.9045613626
[-0.00138578  0.00996187  0.00992188  0.00993515  0.00986949  0.00805936
  0.0080644   0.00106609  0.01072566  0.02052564  0.00912203  0.00516551
  0.00908692  0.00989289 -0.00494451 -0.00553225  0.00835639  0.00869886
 -0.00098925  0.00997357  0.00992211  0.00995052  0.01018278  0.0086643
  0.00890506  0.00047888  0.00890631  0.02018968  0.00904637  0.00401723
  0.00933431  0.00991875  0.00064744 -0.0033074   0.00849263  0.00874045
 -0.00051749  0.00997629  0.00995735  0.00997091  0.01031358  0.00907041
  0.00948262  0.00029924  0.00947152  0.02093619  0.00880021  0.00215371
  0.00952237  0.0099359   0.00232706 -0.00298543  0.00857945  0.0087675
  0.00012513  0.00998106  0.01003116  0.01000607  0.01077548  0.01044418
  0.01133557  0.00046847  0.01040126  0.02224197  0.00872205 -0.00046986
  0.00953514  0.0099784   0.00020964 -0.00338643  0.00863159  0.0087242
  0.00086619  0.00999047  0.01011715  0.0100483   0.0111914   0.01258666
  0.01390036  0.00246372  0.01393081  0.02545522  0.00884365 -0.00351077
  0.00998476  0.01003383  0.00191473 -0.00361946  0.0086276   0.0087466
  0.00184757  0.01000815  0.01025325  0.01009751  0.01193828  0.0155177
  0.01751577  0.00667147  0.02101305  0.03113944  0.00889165 -0.00695875
  0.0106579   0.01010881 -0.00077711 -0.00135554  0.00866196  0.00875398
  0.00287227  0.01003138  0.01041372  0.01014875  0.01271766  0.01906602
  0.02182703  0.01341795  0.03354977  0.0410597   0.00883776 -0.01065909
  0.01155844  0.01017867  0.00019663 -0.00158868  0.0087129   0.00877181
  0.00398101  0.01006115  0.01063675  0.01021088  0.01347817  0.02330813
  0.02683348  0.02287656  0.05497111  0.05429585  0.00857217 -0.01436562
  0.01264097  0.01026398  0.00019394  0.00094537  0.00884171  0.00880042
  0.00505424  0.01009949  0.01085199  0.01027042  0.01374666  0.02776093
  0.03155206  0.03391963  0.08474549  0.08799973  0.0083985  -0.01724419
  0.01399544  0.01035839  0.01004393  0.0089306   0.00902356  0.00898983]
[703824.353689585, 688000.4130783302, 674508.5537976791, 662770.9045613626]

392~395 epoch, lr=`12-9`

Let’s make w as the 3th epoch, and try tunning the lr smaller to 1e-9

In [174]: w = W[2]

In [175]: lr = 1e-9 # learning rate
     ...: iteration = 5
     ...: #iterationRun(lr,iteration,x_data,y_data)
     ...: #w= <last time>
     ...: L, W = iterationRun(lr,iteration,x_data,y_data,w)
     ...: print (str(L))
     ...:
0 : 21:41:46.601560 L:661713.7975718635
1 : 21:42:16.607766 L:660793.0750887658
2 : 21:42:46.034783 L:663980.0848216356
3 : 21:43:15.439357 L:802745.540277066
4 : 21:43:45.015877 L:5429166.075156927
[-6.16727255e-04  1.00484493e-02  9.93535744e-03  9.93784427e-03
  9.95911130e-03  8.35499858e-03  8.45588760e-03  2.26448667e-03
  1.22987564e-02  2.16546844e-02  9.10083843e-03  9.03077135e-03
  9.17185566e-03  9.98181751e-03  3.16708342e-03  2.67366966e-03
  8.41072713e-03  8.73860126e-03 -2.03925361e-04  1.00607272e-02
  9.93460928e-03  9.95365455e-03  1.02829207e-02  8.97031177e-03
  9.31315565e-03  1.65178857e-03  1.03485940e-02  2.12659601e-02
  9.02198751e-03  7.84152744e-03  9.43025959e-03  1.00085885e-02
  8.87651196e-03  4.98782391e-03  8.55433776e-03  8.78302430e-03
  2.85369965e-04  1.00635072e-02  9.97046563e-03  9.97461093e-03
  1.04114813e-02  9.36603695e-03  9.88070916e-03  1.43617060e-03
  1.08629084e-02  2.19977200e-02  8.76197799e-03  5.90977003e-03
  9.62326853e-03  1.00259754e-02  1.06493535e-02  5.41048668e-03
  8.64521541e-03  8.81179281e-03  9.52186844e-04  1.00684226e-02
  1.00466367e-02  1.00111076e-02  1.08903525e-02  1.07802435e-02
  1.17922921e-02  1.54851313e-03  1.16898484e-02  2.32874554e-02
  8.68029225e-03  3.18757813e-03  9.62848850e-03  1.00700956e-02
  8.57625809e-03  5.04569062e-03  8.69887254e-03  8.76585679e-03
  1.72083560e-03  1.00782383e-02  1.01355162e-02  1.00550971e-02
  1.13226458e-02  1.30078273e-02  1.44589394e-02  3.55086273e-03
  1.51943562e-02  2.65530997e-02  8.80986381e-03  3.28409285e-05
  1.00950294e-02  1.01279151e-02  1.03012885e-02  4.78865750e-03
  8.69208729e-03  8.78851492e-03  2.74314457e-03  1.00968105e-02
  1.02776366e-02  1.01066204e-02  1.21105528e-02  1.60772456e-02
  1.82501496e-02  7.85988030e-03  2.23833884e-02  3.24013740e-02
  8.86125894e-03 -3.55036887e-03  1.07979166e-02  1.02065963e-02
  7.53541740e-03  6.99336995e-03  8.72485483e-03  8.79427710e-03
  3.81308226e-03  1.01212778e-02  1.04463732e-02  1.01606057e-02
  1.29396336e-02  1.98142376e-02  2.27987403e-02  1.48422348e-02
  3.52605997e-02  4.27051700e-02  8.80287292e-03 -7.40948634e-03
  1.17432739e-02  1.02801699e-02  8.42208402e-03  6.63868882e-03
  8.77476129e-03  8.80978944e-03  4.97591884e-03  1.01527778e-02
  1.06827453e-02  1.02266089e-02  1.37564839e-02  2.43105168e-02
  2.81161272e-02  2.47089034e-02  5.76241338e-02  5.65673784e-02
  8.51926553e-03 -1.12891688e-02  1.28863403e-02  1.03706768e-02
  8.26939165e-03  9.02047482e-03  8.90751118e-03  8.83629524e-03
  6.10712791e-03  1.01935224e-02  1.09119304e-02  1.02902353e-02
  1.40567097e-02  2.90486692e-02  3.31514523e-02  3.63020625e-02
  8.90002712e-02  9.21841280e-02  8.33254748e-03 -1.43156605e-02
  1.43210918e-02  1.04713307e-02  1.79200559e-02  1.67676281e-02
  9.09731287e-03  9.03380699e-03]
[662770.9045613626, 661713.7975718635, 660793.0750887658, 663980.0848216356, 802745.540277066, 5429166.075156927]

In [176]: lossFunction(W[3],x_data,y_data)
Out[176]: 5429166.075156927

This is really strange, while we got the w with L = 5429166.075156927 which looks overfitting( I guess ).

In [190]: lossFunction(w391,x_data,y_data)
Out[190]: 662770.9037767164
    
In [192]: lossFunction(w395,x_data,y_data)
Out[192]: 5429166.071876905

We could verify both w391 and w395 with public test data.

Verify w391 and w395 with testdata

Processing test data

test.csv

# verify with test data test.csv
# preprocessing

testDataList = list()
with open("test.csv", newline="") as csvfile:
    testData = csv.reader(csvfile, delimiter=",",quotechar="|")
    for line in testData:
        testDataList.append(line[2:])


x_data_test = [] # 18*9 dimensions

i = 0
listToAppend = list()

for item in testDataList:
    if not mod_18(i):
        # from mod 18: 0
        listToAppend = [item]

    elif mod_18(i) == 10:
        # rain, consider "NR" = 0
        listToAppend.append(["0" if x == 'NR' else x for x in item])

    elif mod_18(i) == 17:
        # all 18 lines collected, built a set of data to extend
        listToAppend.append(item)
        x_data_test.append(T_list(listToAppend))
    else:
        # append other data
        listToAppend.append(item)
    i = i + 1

i = 0

x_data_test = [np.array(X).reshape(1,162).astype(np.float).tolist()[0] for X in x_data_test]

#x_data_test = [[float(j) for j in i[:18*9]] for i in x_data_test]

def y(w,x):
    return sum(w * x)

y_data_w391 = [y(w391,x) for x in x_data_test ]
y_data_w396 = [y(w396,x) for x in x_data_test ]

ans.csv

# ans.csv
answerTestDataList = list()
with open("ans.csv", newline="") as csvfile:
    testData = csv.reader(csvfile, delimiter=",",quotechar="|")
    for line in testData:
        answerTestDataList.append(line[1])

Verify data

It looks like the w391 is our best output :-) for now, and the w396 is overfitting!

In [86]: lossFunction(w391,x_data_test,y_answer)
Out[86]: 40678.24250669469

In [87]: lossFunction(w396,x_data_test,y_answer)
Out[87]: 241252.78309255745

    
In [90]: lossFunction(w388,x_data_test,y_answer)
Out[90]: 45347.799286792215

Draw the plot

y_w391 = np.array(y_data_w391)
y_w396 = np.array(y_data_w396)
y_answer = np.array(answerTestDataList[1:]).astype(np.float)
x = range(len(y_answer))

# refer to https://matplotlib.org/2.2.2/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py

plt.plot(x, y_w391, label="w391")
plt.plot(x, y_w396, label="w396")
plt.plot(x, y_answer, label="answer")

plt.xlabel('240 test set')
plt.ylabel('PM2.5')

plt.title("y_w391, y_w396 and y_answer")

plt.legend()
plt.show()

The plot is as below

Start studying all other gradient descent alogrithm

ref:http://ruder.io/optimizing-gradient-descent

ref:https://www.slideshare.net/SebastianRuder/optimization-for-deep-learning

ref:https://zhuanlan.zhihu.com/p/22252270

Adagrad

Descent speed

import numpy as np
import csv
import datetime
import matplotlib
import matplotlib.pyplot as plt

y_L_adagrad = np.array(L_adagrad[:391])
y_L_bgd = np.array(L_bgd[:391])
x = range(391)

# refer to https://matplotlib.org/2.2.2/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py

plt.plot(x, y_L_adagrad, label="Adagrade")
plt.plot(x, y_L_bgd, label="Batch Gradient Descent")

plt.xlabel('390 epoch')
plt.ylabel('Loss Function Value')

plt.title("Batch GD and ADAGRAD")

plt.legend()
plt.show()

It’s basically the same, the difference is only:

in the very begining, ADAGRAD went to crazy field, while it self corrected to normal path soon
ADAGRAD will be converging slower than BGD …
our BGD was actually human tunned one, which means ADAGRAD is easier to find themself the correct path (no need for human invention)

Also we could see the initial 30 epoch:

In [8]: initialN = 30
   ...: y_L_adagrad = np.array(L_adagrad[:initialN])
   ...: y_L_bgd = np.array(L_bgd[:initialN])
   ...: x = range(initialN)
   ...:
   ...:
   ...: plt.plot(x, y_L_adagrad, label="Adagrade")
   ...: plt.plot(x, y_L_bgd, label="Batch Gradient Descent")
   ...:
   ...: plt.xlabel('390 epoch')
   ...: plt.ylabel('Loss Function Value')
   ...:
   ...: plt.title("Batch GD and ADAGRAD")
   ...:
   ...: plt.legend()
   ...: plt.show()
   ...:

Verify the result

# loss function compare

In [12]: lossFunction(w391,x_data_test,y_answer)
Out[12]: 40678.24250669469

In [13]: lossFunction(w_adagrad,x_data_test,y_answer)
Out[13]: 50976.42462103875

Check the prediction result figure

y_data_w391 = [y(w391,x) for x in x_data_test ]
y_data_adagrad = [y(w_adagrad,x) for x in x_data_test ]

y_w391 = np.array(y_data_w391)
y_w_adagrad = np.array(y_data_adagrad)
y_answer = np.array(answerTestDataList[1:]).astype(np.float)
x = range(len(y_answer))


plt.plot(x, y_w391, label="BGD")
plt.plot(x, y_w_adagrad, label="adagrad")
plt.plot(x, y_answer, label="answer")

plt.xlabel('240 test set')
plt.ylabel('PM2.5')

plt.title("BGD, adagrad and Answer")

plt.legend()
plt.show()