×

Determining empirical characteristics of mathematical expression use. (English) Zbl 1151.68675

Kohlhase, Michael (ed.), Mathematical knowledge management. 4th international conference, MKM 2005, Bremen, Germany, July 15-17, 2005. Revised selected papers. Berlin: Springer (ISBN 3-540-31430-X/pbk). Lecture Notes in Computer Science 3863. Lecture Notes in Artificial Intelligence, 361-375 (2006).
Summary: Many processes in mathematical computing try to use knowledge of the most desired forms of mathematical expressions. This occurs, for example, in symbolic computation systems, when expressions are simplified, or mathematical document recognition, when formula layout is analyzed. The decision about which forms are the most desired, however, has typically been left to the guess-work or prejudices of a small number of system designers.
This paper observes that, on a domain by domain basis, certain expressions are actually used much more frequently than others. On the hypothesis that actual usage is the best measure of desirability, this papers begins to quantify empirically the use of common expressions in the mathematical literature. We analyze all 20,000 mathematical documents from the mathematical arXiv server from 2000–2004, the period corresponding to the new mathematical subject classification. We report on the process by which these documents are analyzed, through conversion to MathML, and present first empirical results on the most common aspects of mathematical expressions by subject classification. We use the notion of a weighted dictionary to record the relative frequency of subexpressions, and explore how this information may be used for further processes, including deriving common patterns of expressions and probability measures for symbol sequences.
For the entire collection see [Zbl 1096.68004].

MSC:

68T30 Knowledge representation
68U15 Computing methodologies for text processing; mathematical typography
68W30 Symbolic computation and algebraic computation
PDFBibTeX XMLCite
Full Text: DOI