OCR++: A Robust Framework For Information Extraction From Scholarly Articles

Students of the Indian Institute of Technology, Kharagpur were received the Gandhian Young Technological Innovation Award-2017 for developing a website of OCR++: A Robust Framework For Information Extraction From Scholarly Articles.

Mayank Singh Including his team members Barnopriyo Barua, Priyank Palod, Manvi Garg, Sidhartha Satapathy, Samuel Bushi, Kumar Ayush, Krishna Sai Rohith, Tulasi Gamidi developed this website under guidence of Dr. Pawan Goyal and Dr. Animesh Mukherjee.

This project proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and email), structure (section headings and body text, table and figure headings, URLs and footnotes) and bibliography (citation instances and references). A diverse set of scientific articles written in English to understand generic writing patterns and formulate rules to develop this hybrid framework is analysed. Extensive evaluations show that the proposed framework outperforms the existing state-of-the-art tools by a large margin in structural information extraction along with improved performance in metadata and bibliography extraction tasks, both in terms of accuracy (50% improvement) and processing time (52% improvement). A user experience study conducted with the help of 30 researchers reveals that the researchers found it to be very helpful.

For this project Prof. Anil K Gupta Co-ordinator, SRISTI and Founder, Honey Bee Network, honored Mayank Singh and his team members Barnopriyo Barua, Priyank Palod, Manvi Garg, Sidhartha Satapathy, Samuel Bushi, Kumar Ayush, Krishna Sai Rohith, Tulasi Gamidi, Indian Institute of Technology, Kharagpur, with the prestigious Gandhian Young Technological Innovation (GYTI) Award 2017 at Rashtrapati Bhavan.

Leave a Reply

Your email address will not be published. Required fields are marked *