Abstract
This paper presents concepts for relevant content extract of known and prominent Quran interpretation (Tafseer) books. The extracted content can be used efficiently in building, creating and distributing digital and multimedia content on Quran, Tafseer and Islamic issues. Due to the uniqueness of Quran and most renown Tafseer books, extracting relevant information in a structured manner and with accuracy is a quite delicate matter, because of the important and sensitive issues being dealt with. Natural Language processing techniques for automatic information retrieval and extraction are not reliable and desirable approach in this case, due to the level of inaccuracy and objectivity involved, which is not tolerated for such highly referenced books for muslims. The aim of this paper is to propose a systematic approach into extracting and collecting the most relevant information in a structured manner from Tafseer books that are useful for academic purposes as well as for general use. Al Asfahani Tafseer book, " " - "Mufradat fi Gharib al-Quran", has been chosen as in this case. Building more digital content details of the book would allow for better search as well as further development into related authoring and indexing. Overall concepts of the content extraction approach is presented in this paper with the different phases involved.