A New Approach for Hindi Optical Character Recognition Based On Neural Networks

  • Ajay Goel
  • O .P.Sahu

Abstract

OCR is the acronym for Optical Character Recognition. This technology allows a machine to automatically recognize characters through an optical
mechanism. Human beings recognize many objects in this manner our eyes are the "optical mechanism. Development of OCRs for Indian script is an active area
of activity today. Optical character recognition (OCR) is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text. In simple words OCR is a visual recognition process that turns printed or written text into an electronic character based file. OCR is a field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques. A lot of work had been carried out for OCR at international scenario but in Indian context a concrete approach for character recognition is still required as scripts of Indian languages are from the group of most complex scripts and it is very hard to recognize them. Indian scripts present great challenges to an OCR designer due to the large number of letters in the alphabet, the sophisticated ways in which they combine, and the complicated graphemes they result in. The problem is compounded by the unstructured manner in which popular fonts are designed. There is a lot of common structure in the different Indian scripts. All existing OCR systems developed for various Indian scripts do not provide sufficient efficiency due to various factors.The objective of this paper is to discuss a more efficient character recognition technique. This paper introduces a new technical approach to recognize Indian script characters which are unpredictable due to different problems in other OCR’s.

Published
2010-12-12