Intro

많은 패션 사진으로 데이터셋을 만들기 위해 python의 모듈 Beautifulsoup과 selenium을 이용해서 이미지크롤링하여 DataSet을 만들고 YOLO 모델을 통해 Detection을 진행하는 프로젝트를 한다.

Data

1. 데이터

https://www.musinsa.com/app/styles/lists

크롤링 여부를 확인하기 위해 /robots.txt를 사용해본다

Disallow라고 하니 하지 말라는 것이다. 아쉽지만 토이프로젝트로 많은 양의 데이터를 사용하지 않을 것이기 때문에 손수 다운로드하기로 했다.

약 400개의 이미지 파일을 만들었고 최대한 정면, 인식이 가능할 것 같은 사진 위주로 추렸다.

Image Labeling

이미지 라벨링 작업을 진행하는데에는 labelImg 툴을 사용하였다.

https://github.com/tzutalin/labelImg

GitHub - heartexlabs/labelImg: LabelImg is now part of the Label Studio community. The popular image annotation tool created by

LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source ...

github.com

내가 정한 Label은 다음과 같다.

bucket_hat

cap

beanie

sweatshirt

shirt

T_shirts

Coat

Jeans

Pants

데이터가 많지 않고 혼자하기 때문에 최대한 휴먼에러가 발생하지 않도록 라벨링을 신경썼다.

라벨링을 끝낸 후 Python 코드를 통해 데이터 셋을 8:1:1로 train, validation, test 로 나눴다.

import random

split_ratio = 0.1

img_file_name_list = list(map(lambda x : x.split('.')[0], os.listdir(TRAIN_IMAGE_DATA_DIR)))

split_num = int(len(img_file_name_list) * split_ratio)
random.shuffle(img_file_name_list)

move_file_name_list = img_file_name_list[:split_num]


for file_name in move_file_name_list:

    shutil.move(os.path.join(TRAIN_IMAGE_DATA_DIR,file_name) + IMG_EXTENSION,
                os.path.join(VALID_IMAGE_DATA_DIR,file_name) + IMG_EXTENSION)
    
    shutil.move(os.path.join(TRAIN_LABEL_DATA_DIR,file_name) + LABEL_EXTENSION,
                os.path.join(VALID_LABEL_DATA_DIR,file_name) + LABEL_EXTENSION)

그 후

train: ../train/images
val: ../valid/images

nc: 9
names: ['bucket_hat','cap','beanie','sweatshirt','shirt','T_shirts','Coat','Jeans','Pants,]

yaml파일을 생성해 학습할 이미지의 디렉토리와 label들을 구성했다.

Train

학습은 Yolov5s 모델을 사용하여 학습을 진행하였다.

https://github.com/ultralytics/yolov5

[Toy Project] YOLO를 이용한 Image Detection

Intro

Data

Image Labeling

Train

Result