Extending full text search engine for mathematical content. (English) Zbl 1170.68488

Sojka, Petr (ed.), DML 2008. Towards digital mathematics library, Birmingham, UK, July 27th, 2008. Proceedings. Brno: Masaryk University (ISBN 978-80-210-4658-0/pbk). 55-67 (2008).
Summary: The WWW became the main resource of mathematical knowledge. Currently available full text search engines can be used on these documents but they are deficient in almost all cases. By applying axioms, equal transformations, and by using different notation each formula can be expressed in numerous ways. Most of these documents do not contain semantic information; therefore, precise mathematical interpretation is impossible. On the other hand, semantic information can help to give more precise information.
In this work we address these issues and present a new technique how to search for mathematical formulae in real-world mathematical documents, but still offering an extensible level of mathematical awareness. It exploits the advantages of full text search engine and stores each formula not only once but in several generalised representations. Because it is designed as an extension, any full text search engine can adopt it.
Based on the proposed theory we developed EgoMath – new mathematical search engine. Experiments with EgoMath over two document sets, containing semantic information, showed that this technique can be used to build a fully-fledged mathematical search engine.
For the entire collection see [Zbl 1158.00018].


68P99 Theory of data