Computing expectations and marginal likelihoods for permutations. (English) Zbl 1482.62018

Summary: This paper demonstrates how we can re-purpose sophisticated algorithms from a range of fields to help us compute expected permutations and marginal likelihoods. The results are of particular use in the fields of record linkage or identity resolution, where we are interested in finding pairs of records across data sets that refer to the same individual. All calculations discussed can be reproduced with the accompanying R package expperm.


62-08 Computational methods for problems pertaining to statistics
Full Text: DOI


[1] Belin, TR; Rubin, DB, A method for calibrating false-match rates in record linkage, J Am Stat Assoc, 90, 430, 694-707 (1995) · Zbl 0925.62548
[2] Berkelaar M et al (2015) lpSolve: interface to ‘Lpsolve’ v. 5.5 to solve linear/integer programs. R package version 5.6.13. https://CRAN.R-project.org/package=lpSolve
[3] Bertsekas, DP, A new algorithm for the assignment problem, Math Program, 21, 1, 152-171 (1981) · Zbl 0461.90069
[4] Bilenko M, Kamath B, Mooney RJ (2006) Adaptive blocking: learning to scale up record linkage. In: 2006. ICDM’06. 6th international conference on data mining. IEEE, New York, pp 87-96
[5] Brualdi, RA; Gibson, PM, Convex polyhedra of doubly stochastic matrices. I. Applications of the permanent function, J Comb Theory Ser A, 22, 2, 194-230 (1977) · Zbl 0355.15013
[6] Chertkov, M.; Yedidia, AB, Approximating the permanent with fractional belief propagation, J Mach Learn Res, 14, 1, 2029-2066 (2013) · Zbl 1318.65022
[7] Cibella, N.; Fortini, M.; Scannapieco, M.; Tosco, L.; Tuoto, T., Relais: an open source toolkit for record linkage, Riv Stat Ufficiale, 9, 2-3, 55-68 (2007)
[8] Damerau, FJ, A technique for computer detection and correction of spelling errors, Commun ACM, 7, 3, 171-176 (1964)
[9] Diaconis, P.; Graham, R.; Holmes, SP, Statistical problems involving permutations with restricted positions, Lect Notes Monogr Ser, 36, 195-222 (2001) · Zbl 1373.62176
[10] Elmagarmid, AK; Ipeirotis, PG; Verykios, VS, Duplicate record detection: a survey, IEEE Trans Knowl Data Eng, 19, 1, 1-16 (2007)
[11] Fellegi, IP; Sunter, AB, A theory for record linkage, J Am Stat Assoc, 64, 328, 1183-1210 (1969)
[12] Hankin RKS (2017) Permutations: permutations of a finite set. R package version 1.0-2. https://CRAN.R-project.org/package=permutations
[13] Heap, BR, Permutations by interchanges, Comput J, 6, 3, 293-298 (1963) · Zbl 0118.33307
[14] Herzog, T.; Scheuren, F.; Winkler, W., Data quality and record linkage techniques (2007), New York: Springer, New York · Zbl 1262.62004
[15] Kim, G.; Chambers, R., Regression analysis under incomplete linkage, Comput Stat Data Anal, 56, 9, 2756-2770 (2012) · Zbl 1255.62199
[16] Knuth, D., The art of computer programming: generating all tuples and permutations. Addison-Wesley series in computer science and information proceedings (2005), Reading: Addison-Wesley, Reading
[17] Lahiri, P.; Larsen, MD, Regression analysis with linked data, J Am Stat Assoc, 100, 469, 222-230 (2005) · Zbl 1117.62376
[18] McLeod P, Heasman D, Forbes I (2011) Simulated record linkage data. Technical report, Office for National Statistics. https://ec.europa.eu/eurostat/cros/content
[19] Mersmann O (2018) Microbenchmark: accurate timing functions. R package version 1.4-6. https://CRAN.R-project.org/package=microbenchmark
[20] Michelson M, Knoblock CA (2006) Learning blocking schemes for record linkage. In: Association for the advancement of artificial intelligence, pp 440-445
[21] Pasula, H.; Russell, S.; Ostland, M.; Ritov, Y., Tracking many objects with many sensors, Int Joint Conf Artif Intell, 99, 1160-1171 (1999)
[22] Ruskey, F.; Williams, A., The coolest way to generate combinations, Discrete Math, 309, 17, 5305-5320 (2009) · Zbl 1180.68297
[23] Ryser, H., Combinatorial mathematics. Carus mathematical monographs (1963), New York: Mathematical Association of America, New York
[24] Savicky P (2014) Pspearman: Spearman’s rank correlation test. R package version 0.3-0. https://CRAN.R-project.org/package=pspearman
[25] Scheuren, F.; Winkler, WE, Regression analysis of data files that are computer matched, Surv Methodol, 19, 1, 39-58 (1993)
[26] She, Y.; Tang, S., Iterative proportional scaling revisited: a modern optimization perspective, J Comput Graph Stat, 28, 1-13 (2018) · Zbl 06844704
[27] Simpson GL (2016) Permute: functions for generating restricted permutations of data. R package version 0.9-4. https://CRAN.R-project.org/package=permute
[28] Sinkhorn, R., A relationship between arbitrary positive matrices and doubly stochastic matrices, Ann Math Stat, 35, 2, 876-879 (1964) · Zbl 0134.25302
[29] Valiant, LG, The complexity of computing the permanent, Theor Comput Sci, 8, 2, 189-201 (1979) · Zbl 0415.68008
[30] Whang SE, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H (2009) Entity resolution with iterative blocking. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data. ACM, New York, pp 219-232
[31] Yancey WE (2002) Bigmatch: A program for extracting probable matches from a large file for record linkage. Tech. Rep. 1, US Census Bureau, https://www.census.gov/srd/papers/pdf/rrc2002-01.pdf
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.