AlexNet

Paper Implementation : "Imagenet classification with deep convolutional neural networks(2012)."
Code Practice : 아래 Colab과 Git 링크를 통해 어떻게 구현 되었는지 구체적으로 확인해 보실 수 있습니다.

Description

1. AlexNet Architecture:

AlexNet의 Convolutional Layer는 5개로 구성되어 있으며, Fully Connected Layer는 3개로 구성되어 있습니다.Figure 1. AlexNet Architecture : Krizhevsky et al.(2012)

2. Activation Function : ReLU Function

LeNet에서는 activation 함수로 tanh를 사용했는데, AlexNet 이후로는 대부분은 CNN 모델에서 ReLU 함수를 활용하기 시작했습니다. ReLU 함수를 사용했을때 tanh 보다 더 낮은 error rate 을 보여주었습니다. AlexNet에서는 모든 convolution layer와 fully-connected에 적용되었습니다.

3. Max-Pooling Layer :

LeNet에서는 Average Pooling을 사용했지만 AlexNet부터는 Max-Pooling을 적용하기 시작했습니다. 논문에서 사용한 Kernel size, stride, Max-Pooling layer를 적용해 모델을 설계해 보았습니다.
kernel size = 3x3
stride = 2
number of Max-Pooling layer = 3

4. Dropout:

과적합 방지를 위해 Dropout이 적용되었는데 여기서 1,2 번째 fully-connected(FC) layer에만 선택적으로 활용되었습니다.
dropout rate = 0.5

5. LRN(local response normalization):

측면억제 기작의 대표적인 예시인 Hermann Grid는 다음과 같습니다.

Figure 2. Hermann Grid illusion: Hermann (1870)
검은 사각형 안에 흰색의 선이 지나가는 구조입니다. 검은 사각형을 집중해서 볼때 측면에 회색 점을 확인 할 수 있습니다. 흰색의 선에 집중하면 회색 점이 보이지 않습니다. why? 흰색으로 둘러싸인 측면에서 억제를 발생시킴, 그 결과 흰색이 더 반감되어 보입니다.
- LRN을 사용하는 이유? ReLU함수가 매우 높은 픽셀을 그대로 받아서 주변에 영향을 미칠 수 있습니다. 여기서 주변 영향 방지해주기 위해 같은 위치에 있는 픽셀끼리 정규화 해주면 학습시 error를 감소 시켜 줄수 있습니다. LRN은 1,2 convolution layer와 ReLU 뒤에 적용 합니다 (k = 2, n = 5, α = 10−4, and β = 0.75 ).하지만 LRN layer가 메모리 점유율과 계산 복잡도를 증대시 킬 수 있습니다(모델 용량 증대)

6. Weight and Bias initialization (convolution layer and Fully connected layer):

표준편차(STD)를 0.01로 하는 Zero-mean Gaussian 정규 분포를 활용해 모든 레이어의 weight 초기화 하였습니다.
neuron bias: 2, 4, 5번째 convolution 레이어와, fully-connected 레이어에 상수 1로 적용하고 나머지 레이어는 0을 적용 하였습니다.

7. Hyperparameter:

8. Image preprocessing:

이미지 전처리 과정에서 input shape, resize, center crop, RGB 채널에 따른 mean subtraction이 적용되었습니다.
Input image shape = 224x224x3
resize = 256x256
center crop = 227
Mean subtraction of RGB per channel

9. Test Results:

Test 결과는 Accuracy, Loss, Classification Report, Confusion Matrix로 확인해 보실 수 있습니다.

10. Dataset:

11. System Environment:

논문 : GTX 580 3GB 2개 GPU 병렬 구조
구현 : Google Colab Pro Plus GPU : K80(Kepler), T4(Turing), and P100(Pascal), Jupyter Notebook, Visual Studio Code

Reference:

[1] "[CNN 알고리즘들] AlexNet의 구조", 코딩재개발, 2019년 3월 11일 수정, 2022년 12월 28일 접속,
https://bskyvision.com/entry/CNN-%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98%EB%93%A4-AlexNet%EC%9D%98-%EA%B5%AC%EC%A1%B0

[2] "[논문 구현] AlexNet 파이토치로 구현하기", For a better world, 2022년 9월 7일 수정, 2022년 12월 31일 접속, https://roytravel.tistory.com/336

[3] "[Pytorch 팁 파이토치(Pytorch)에서 TensorBoard 사용하기]", 물공's의 딥러닝, 2020년 5월 10일 수정, 2023년 1월 2일 접속,https://sensibilityit.tistory.com/512

[4] "Review: AlexNet, CaffeNet — Winner of ILSVRC 2012 (Image Classification)", Sik-Ho Tsang, 2018년 8월 9일 수정, 2023년 1월 2일 접속,https://medium.com/coinmonks/paper-review-of-alexnet-caffenet-winner-in-ilsvrc-2012-image-classification-b93598314160

[5] Hermann L (1870). "Eine Erscheinung simultanen Contrastes".
Pflügers Archiv für die gesamte Physiologie. 3: 13–15. doi:10.1007/BF01855743

[6] "Hermann Grid", from Michael’s Visual Phenomena & Optical Illusions, 2021년 1월 21일 수정, 2023년 1월 3일 접속, https://michaelbach.de/ot/lum-herGrid/index.html

[7] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Communications of the ACM 60.6 (2017): 84-90.
https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

[8] Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.
http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf

[9]"The CIFAR-10 dataset", Alex Krizhevsky's home page, 2009년 작성, 2022년 12월 28일 접속,
https://www.cs.toronto.edu/~kriz/cifar.html

[PyTorch] Vision Transformer(ViT) 논문구현 (4)	2023.02.16
[PyTorch] ResNet 논문구현 (0)	2023.02.12
[PyTorch] VGGNet 논문구현 (0)	2023.02.08
[PyTorch] GoogLeNet 논문구현 (0)	2023.02.07

AI with JP