METHEMATICAL EXPRESSION DETECTION AND RECOGNITION IN SCIENTIFIC DOCUMENT IMAGES

A hybrid method for mathematical expression detection in scientific document images:

Mathematical expressions have been widely used in scientific documents. In order to analyze the documents, automatic detection of mathematical expressions is a crucial step. The paper presents a unified system for the detection of mathematical expressions including both inline and isolated expressions in scientific document images that usually consist of heterogeneous components (e.g., figures, tables, text and expressions). \textcolor{blue}{In the system, a hybrid method of two stages is proposed for the effective detection of mathematical expressions. First, the layout analysis of entire document images is introduced to improve the accuracy of text line and word segmentation. Then, both isolated and inline expressions in document images are detected. Both hand-crafted and deep learning features are extensively investigated and combined to improve the detection accuracy. Furthermore, a generic performance metric is applied to evaluate the system comprehensively. The proposed method has been evaluated on two public benchmark datasets (Marmot and GTDB). The obtained accuracies of isolated and inline expressions in the Marmot dataset are 91.18\% and 81.35\% while those in the GTDB dataset are 89.51\% and 80.20\%, respectively. The performance comparison is carried out with the conventional methods to show the outstanding effectiveness of the proposed system. Moreover, extensive experiments have been performed in order to point out the effect of document image resolution and post processing techniques on mathematical expression detection.

Publications:
Hai-Phong Bui, Manh-Thang Hoang, Thi-Lan Le, A hybrid method for mathematical expression detection in scientific document images, IEEE Access, 2020

Mathematical variable detection based on Convolutional Neural Network and Support Vector Machine :

- Text line is segmented from input documents.
- Isolated expressions are detected by using Fourier transformation and SVM [B.Phong et al. 2017].
- Non-isolated expressions are segmented into words.
- Variables are discriminated from words by using CNNs and SVM.

Publications:
Bui Hai Phong, Manh-Thang Hoang and Thi-Lan Le, Mathematical Variable Detection based on Convolutional Neural Network and Support Vector Machine, 2nd Int. Conference on Multimedia Analysis and Pattern Recognition (MAPR), May, 2019.