OCR

Ryuzy 2025. 2. 20. 17:17

728x90

1. OCR

OCR(Optical Character Recognition, 광학 문자 인식) 는 이미지나 문서에서 문자를 식별하고 디지털 텍스트로 변환하는 기술입니다. OCR은 주로 스캔된 문서, 사진 속 글자, 번호판, 손글씨 등의 텍스트를 자동으로 인식하는 데 사용됩니다. 기본적인 OCR 방식은 이미지 전처리(그레이스케일 변환, 이진화, 노이즈 제거) 후 문자 영역을 감지하고, 문자 패턴을 데이터베이스와 비교하여 최적의 텍스트를 추출하는 과정으로 이루어집니다. OpenCV와 Tesseract OCR을 사용하면 Python에서 쉽게 구현할 수 있으며, 딥러닝 기반의 EasyOCR, PaddleOCR, Google Vision OCR 등을 활용하면 한글과 다양한 언어의 인식률을 더욱 높일 수 있습니다. OCR은 문서 자동화, 차량 번호판 인식(LPR), CAPTCHA 해독, 서류 디지털화 등 다양한 분야에서 활용됩니다.

2. Tesseract OCR

아래 이미지를 다운받고 예제를 실행합니다.

hello.png

0.01MB

Tesseract OCR 는 Google이 개발한 오픈소스 광학 문자 인식(Optical Character Recognition, OCR) 엔진으로, 다양한 언어를 지원하며 높은 정확도의 문자 인식을 제공합니다. Tesseract는 기본적으로 LSTM(Long Short-Term Memory) 기반의 딥러닝 OCR 모델을 포함하고 있으며, tesseract-ocr 엔진과 함께 pytesseract 라이브러리를 사용하면 Python에서도 쉽게 적용할 수 있습니다. Tesseract는 문서 디지털화, 자동차 번호판 인식, CAPTCHA 해독, 자동화 문서 처리 등 다양한 분야에서 활용됩니다.

import cv2
import pytesseract

img = cv2.imread('./images/hello.png')
dst = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# lang='kor', lang='eng', lang='kor+eng'
text = pytesseract.image_to_string(dst, lang='kor+eng')
print(text)

3. EasyOCR

EasyOCR는 딥러닝 기반의 광학 문자 인식(OCR) 라이브러리로, 여러 언어를 지원하며 이미지에서 텍스트를 빠르고 정확하게 추출할 수 있습니다. PyTorch를 기반으로 하여 동작하며, 사전 학습된 모델을 활용해 다양한 글꼴과 크기의 문자를 인식할 수 있습니다. 특히 번호판, 문서 스캔, 표지판 등의 텍스트를 감지하는 데 유용하며, OpenCV와 함께 사용하여 전처리 과정을 최적화하면 인식 정확도를 더욱 향상시킬 수 있습니다. EasyOCR의 핵심 기능은 readtext() 함수를 사용하여 이미지에서 텍스트를 감지하고, 바운딩 박스와 신뢰도를 함께 제공하는 것입니다.

아래 압축 이미지를 다운받고 예제를 실행합니다.

images.zip

0.48MB

import cv2
import easyocr

# EasyOCR 리더 초기화
ocr_reader = easyocr.Reader(['ko'])

# 이미지 경로 리스트
image_file_paths = ['./images/test1.jpg', './images/test2.jpg', './images/test3.jpg', './images/test4.jpg', './images/test5.jpg']

for image_path in image_file_paths:
    print(f"Processing {image_path}")
    
    # 이미지 읽기
    input_image = cv2.imread(image_path)
    if input_image is None:
        print(f"이미지를 불러올 수 없습니다: {image_path}")
        continue

    # 그레이스케일 변환
    gray_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY)
    
    # Bilateral 필터 적용 (노이즈 감소 + 가장자리 보존)
    denoised_image = cv2.bilateralFilter(gray_image, 9, 75, 75)
    
    # Otsu 이진화 적용 (자동 임계값 설정)
    _, binary_image = cv2.threshold(denoised_image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    
    # 번호판 ROI(관심 영역) 감지를 위한 Canny 엣지 검출
    edges = cv2.Canny(binary_image, 50, 150)
    
    # OCR을 원본 및 전처리된 이미지에서 수행
    ocr_images = [input_image, binary_image, edges]
    detected_license_plates = {}
    
    for ocr_img in ocr_images:
        detected_texts = ocr_reader.readtext(ocr_img)
        for (bbox, text, prob) in detected_texts:
            if len(text) >= 7 and any(char.isdigit() for char in text):  # 번호판 형식 필터링
                
                # 바운딩 박스 좌표 변환
                (tl, tr, br, bl) = bbox
                tl, tr, br, bl = tuple(map(tuple, [tl, tr, br, bl]))
                width = br[0] - tl[0]
                height = br[1] - tl[1]
                aspect_ratio = width / height if height > 0 else 0
                
                # 번호판 가로/세로 비율이 너무 작거나 크면 제외 (일반적으로 2~5 범위)
                if 2.0 <= aspect_ratio <= 5.0:
                    if text not in detected_license_plates or detected_license_plates[text][1] < prob:
                        detected_license_plates[text] = (bbox, prob)
    
    # 최종 감지된 번호판 중 신뢰도 높은 것 하나만 시각화
    if detected_license_plates:
        best_text = max(detected_license_plates, key=lambda x: detected_license_plates[x][1])
        best_bbox, best_prob = detected_license_plates[best_text]
        
        (tl, tr, br, bl) = best_bbox
        tl = tuple(map(int, tl))
        br = tuple(map(int, br))

        # 텍스트 영역에 사각형 그리기
        cv2.rectangle(input_image, tl, br, (0, 255, 0), 2)

        # 텍스트 출력
        cv2.putText(input_image, best_text, (tl[0], tl[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)

        print(f"감지된 번호판: {best_text} (신뢰도: {best_prob:.2f})")
    else:
        print("감지된 번호판이 없습니다.")
    
    # 결과 이미지 표시
    cv2.imshow('Detected License Plates', input_image)
    cv2.waitKey()
    print()

728x90