training on mnist with cnn

refer to training on mnist, which was my first attempt with my "old" feedforward neural network implementation of simple perceptrons, but now i have a (rather slow, currently) convolutional neural network implementation, so time to see the results: load the dataset (replace train-data-file and test-data-file if youre not me):

(ql:quickload "cl-csv")

(defun parse-mylist (mylist)
  (let* ((digit (parse-integer (car mylist))) ;; digit is at the beginning of the list
         (mylist (cdr mylist)) ;; the rest of the list contains the pixels
         (size (floor (sqrt (length mylist))))
         (arr (make-array (list 1 size size))))
    (loop for i from 0 below size do
      (loop for j from 0 below size do
        (setf (aref arr 0 i j) (/ (parse-integer (elt mylist (+ (* size i) j))) 255))))
        ;; (setf (aref arr 0 i j) (parse-integer (elt mylist (+ (* size i) j))))))
    (cons arr digit)))
(defun load-mnist ()
  (defparameter *mnist-train-data* (cl-csv:read-csv (pathname train-data-file)))
  (defparameter *mnist-test-data* (cl-csv:read-csv (pathname test-data-file)))
  ;; use vector, access is O(1) unlike lists
  (defparameter *mnist-train-data* (map 'vector #'parse-mylist (cdr *mnist-train-data*)))
  (defparameter *mnist-test-data* (map 'vector #'parse-mylist (cdr *mnist-test-data*))))

note that i had to normalize the pixel value from 1-255 to 0-1, otherwise i couldnt train the network, all the deltas were turning into 0 each image is of size 28x28, we can use the following architecture:

;; input meant to be of size 1x28x28
(defun construct-mnist-network ()
  (defparameter *mnist-network*
    (make-network
     :layers (list
              (make-3d-convolutional-layer-from-dims :dims '(32 1 5 5)) ;; size of image becomes 32x24x24
              (make-pooling-layer :rows 2 :cols 2
                                  :pooling-function #'average-pooling-function
                                  :unpooling-function #'average-unpooling-function) ;; size beccomes 32x12x12
              (make-3d-convolutional-layer-from-dims :dims '(16 32 5 5)) ;; size becomes 16x8x8
              (make-pooling-layer :rows 2 :cols 2
                                  :pooling-function #'average-pooling-function
                                  :unpooling-function #'average-unpooling-function) ;; size becomes 6x4x4
              (make-flatten-layer) ;; flatten it, becomes 6x4x4=96
              (make-dense-layer :num-units 30 :prev-layer-num-units 96
                                :activation-function #'relu
                                :activation-function-derivative #'relu-derivative)
              (make-dense-layer :num-units 10 :prev-layer-num-units 30
                                :activation-function #'sigmoid
                                :activation-function-derivative #'sigmoid-derivative))
     :learning-rate 0.02)))

example usage:

(construct-mnist-network)
;; might wanna make weights closer to 0
(divide-network-weights *mnist-network* 5)
*mnist-network*

#<NETWORK 
  #<3D-CONVOLUTIONAL-LAYER weights: 800, dimensions: (32 1 5 5)>
  #<POOLING-LAYER rows: 2, columns: 2>
  #<3D-CONVOLUTIONAL-LAYER weights: 12800, dimensions: (16 32 5 5)>
  #<POOLING-LAYER rows: 2, columns: 2>
  #<FLATTEN-LAYER {10185E1AC3}>
  #<DENSE-LAYER weights: 2880, dimensions: (30 96)>
  #<DENSE-LAYER weights: 300, dimensions: (10 30)>
total network weights: 16780, learning rate: 0.02 {10186DB593}>

after running load-mnist, we can begin training because at the time (<2023-08-12 Sat 17:52:00>), my training algorithm for cnn's was slow, i wanted to measure the accuracy of the algorithm but didnt want to wait days for training to finish, so i tried training on a single image, the network should overfit and be able to classify the image correctly:

(defun train-on-mnist-single-image ()
  (let ((x (car (elt *mnist-train-data* 0)))
        (y (make-array '(10)))
        ;; (nw (make-network
        ;;      :layers
        ;;      (list
        ;;       (make-3d-convolutional-layer-from-dims
        ;;        :dims '(16 1 3 3)
        ;;        :activation-function #'relu
        ;;        :activation-function-derivative #'relu-derivative)
        ;;       (make-flatten-layer)
        ;;       (make-3d-convolutional-layer-from-dims
        ;;        :dims '(10 144)
        ;;        :activation-function #'sigmoid
        ;;        :activation-function-derivative #'sigmoid-derivative))
        ;;      :learning-rate 0.02)))
        (nw *mnist-network*))
    (setf (aref y (cdr (elt *mnist-train-data* 0))) 1)
    (format t "~%out layer should be: ~A" y)
    (print "running 100 epochs:")
    (loop for i from 0 below 10000 do
      (network-train
       nw
       (list x)
       (list y))
      (format t "~%cost: ~A" (network-test nw (list x) (list y))))
    (format t "~%out layer: ~A" (car (car (network-feedforward *mnist-network* (car (elt *MNIST-TRAIN-DATA* 0))))))))

eventually, after alot debugging of my code, simplified network did converge and overfit, so the next task was to train it on the actual dataset and not just a single image

(defun train-on-mnist ()
  (network-train-distributed-cpu
   *mnist-network*
   (map
    'list
    (lambda (data-entry)
      (let ((in-tensor (car data-entry))
            (digit (cdr data-entry))
            (out-tensor (make-array 10)))
        (setf (aref out-tensor digit) 1)
        (cons in-tensor out-tensor)))
    *mnist-train-data*)))