×

Outliers detection in the statistical accuracy test of a \(\mathrm pK_{\mathrm a}\) prediction. (English) Zbl 1222.92075

Summary: The regression diagnostics algorithm REGDIA in S-Plus is introduced to examine the accuracy of \(\text pK_{\text a}\) predicted with four programs: PALLAS, MARVIN, PERRIN and SYBYL. On basis of a statistical analysis of residuals, outlier diagnostics are proposed. Residual analysis of the ADSTAT program is based on examining goodness-of-fit via graphical diagnostics of 15 exploratory data analysis plots, such as bar plots, box-and-whisker plots, dot plots, midsum plots, symmetry plots, kurtosis plots, differential quantile plots, quantile-box plots, frequency polygons, histograms, quantile plots, quantile-quantile plots, rankit plots, scatter plots, and autocorrelation plots. Outliers in \(\text pK_{\text a}\) relate to molecules which are poorly characterized by the considered \(\text pK_{\text a}\) program. Of the seven most efficient diagnostic plots (the Williams graph, Graph of predicted residuals, Pregibon graph, Gray L-R graph, Index graph of Atkinson measure, Index graph of diagonal elements of the hat matrix and Rankit Q-Q graph of jackknife residuals) the Williams graph was selected to give the most reliable detection of outliers. The six statistical characteristics, \({F_{\text{exp}},R^{2},R_{\text P}^{2},MEP,AIC}\), and \(s\) in \(\text pK_{\text a}\) units, successfully examine the specimen of 25 acids and bases of a Perrin’s data set classifying four \(\text pK_{\text a}\) prediction algorithms. The highest values \({F_{\text{exp}},R^{2},R_{\text P}^{2}}\) and the lowest value of \(MEP\) and \(s\) and the most negative \(AIC\) have been found for PERRIN algorithm of \(\text pK_{\text a}\) prediction so this algorithm achieves the best predictive power and the most accurate results. The proposed accuracy test of the REGDIA program can also be extended to test other predicted values, as \(\log P, \log D\), aqueous solubility or some physicochemical properties.

MSC:

92E99 Chemistry
62P99 Applications of statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Xing L., Glen R.C.: Novel methods for the prediction of log P, pK and log D. J. Chem. Inf. Comput. Sci. 42, 796–805 (2002)
[2] Xing L., Glen R.C., Clark R.D.: Predicting pK a by molecular tree structured fingerprints and PLS. J. Chem. Inf. Comput. Sci. 43, 870–879 (2003)
[3] Zhang J., Kleinöder T., Gasteiger J.: Prediction of pK a values for aliphatic carboxylicv acids and alcohols with empirical atomic charge descriptors. J. Chem. Inf. Model. 46, 2256–2266 (2006)
[4] Hansen N.T., Kouskoumvekaki I., Jorgensen F.S., Brunak S., Jonsdottir S.O.: Prediction of pH-dependent aqueous solubility of druglike molecules. J. Chem. Inf. Model. 46, 2601–2609 (2006)
[5] ACD/LabsTM , pK a Predictor 3.0, Advanced Chemistry Development Inc. 133 Richmond St. W. Suite 605, Toronto
[6] Rekker R.F., ter Laak A.M., Mannhold R.: Prediction by the ACD/pK a method of values of the acid-base dissociation constant (pK a) for 22 drugs. Quant. Struct. Act. Relat. 12, 152 (1993)
[7] Slater B., McCormack A., Avdeef A., Commer J.E.A.: Comparison of ACD/pK a with experimental values. Pharm. Sci. 83, 1280–1283 (1994)
[8] Results of titrometric measurements on selected drugs compared to ACD/pK a September 1998 predictions, (Poster), AAPS, Boston, November 1997
[9] P. Fedichev, L. Menshikov, Long-range interactions of macroscopic objects in polar liquids, Quantum pK a calculation module, QUANTUM pharmaceuticals, http://www.q-lead.com
[10] Z. Gulyás, G. Pöcze, A. Petz, F. Darvas, Pallas cluster–a new solution to accelerate the high-throughut ADME-TOX prediction, ComGenex-CompuDrug, PKALC/PALLAS 2.1 CompuDrug Chemistry Ltd., http://www.compudrug.com
[11] J. Kenseth, Ho-ming Pang, A. Bastin, Aqueous pK a determination using the pK a Analyzer ProTM, http://www.CombiSep.com
[12] Evagelou V., Tsantili-Kakoulidou A., Koupparis M.: Determination of the dissociation constants of the cephalosporins cefepime and cefpirome using UV spectrometry and pH potentiometry. J. Pharm. Biomed. Anal. 31, 1119–1128 (2003)
[13] Tajkhorshid E., Paizs B., Suhai S.: Role of isomerization barriers in the pK a control of the retinal Schiff base: a density functional study. J. Phys. Chem. B 103, 4518–4527 (1999)
[14] SYBYL is distributed by tripos, Inc., St. Louis MO 63144, http://www.tripos.com
[15] Marvin: http://www.chemaxon.com/conf/Prediction_of_dissociation_constant_using_microcon-stants.pdf and http://www.chemaxon.com/conf/New_method_for_pKa_estimation.pdf
[16] Shapley W.A., Bacskay G.B., Warr G.G.: Ab initio quantum chemical studies of the pK a values of hydroxybenzoic acids in aqueous solution with special reference to the hydrophobicity of hydroxybenzoates and their binding to surfactants. J. Phys. Chem. B 102, 1938–1944 (1998)
[17] Schueuermann G., Cossi M., Barone V., Tomasi J.: Prediction of the pK a of carboxylic acids using the ab initio Continuum-Solvation Model PCM-UAHF. J. Phys. Chem. A 102, 6707–6712 (1998)
[18] da Silva C.O., da Silva E.C., Nascimento M.A.C.: Ab initio calculations of absolute pK a values in aqueous solution I. Carboxylic acids. J. Phys. Chem. A 103, 11194–11199 (1999)
[19] Tran N.L., Colvin M.E.: The prediction of biochemical acid dissociation constants using first principles quantum chemical simulations. Theochem 532, 127–137 (2000)
[20] Citra M.J.: Estimating the pK a of phenols, carboxylic acids and alcohols from semiempirical quantum chemical methods. Chemosphere 38, 191–206 (1999)
[21] Chen I.J., MacKerell A.D.: Computation of the influence of chemical substitution on the pK a of pyridine using semiempirical and ab initio methods. Theor. Chem. Acc. 103, 483–494 (2000)
[22] Bashford D., Karplus M.: pK a’s of ionizable groups in proteins: atomic detail from a continuum electrostatic model. Biochemistry 29, 10219–10225 (1990)
[23] Oberoi H., Allewell N.M.: Multigrid solution of the nonlinear Poison–Boltzmann equation and calculation of titration curves. Biophys. J. 65, 48–55 (1993)
[24] Antosiewicz J., McCammon J.A., Gilson M.K.: Prediction of pH-dependent properties of proteins. J. Mol. Biol. 238, 415–436 (1994)
[25] Sham Y.Y., Chu Z.T., Warshel A.: Consistent calculation of pK a’s of ionizable residues in proteins: semi-microscopic and microscopic approaches. J. Phys. Chem. B 101, 4458–4472 (1997)
[26] Kim K.H., Martin Y.C.: Direct prediction of linear free energy substituent effects from 3D structures using comparative molecular field effect. 1. Electronic effect of substituted benzoic acids. J. Org. Chem. 56, 2723–2729 (1991)
[27] Kim K.H., Martin Y.C.: Direct prediction of dissociation constants of clonidine-like imidazolines, 2-substituted imidazoles, and 1-methyl-2-substituted imidazoles from 3D structures using a comparative molecular field analysis (CoMFA) approach. J. med. Chem. 34, 2056–2060 (1991)
[28] Gargallo R., Sotriffer C.A., Liedl K.R., Rode B.M.: Application of multivariate data analysis methods to comparative molecular field analysis (CoMFA) data: proton affinities and pK a prediction for nucleic acids components. J. Comput. Aided Mol. Des. 13, 611–623 (1999)
[29] Perrin D.D., Dempsey B., Serjeant E.P.: pK a prediction for organic acids and bases. Chapman and Hall Ltd., London (1981)
[30] CompuDrug NA Inc., pKALC version 3.1, (1996)
[31] ACD Inc. ACD/pK a version 1.0, (1997)
[32] http://chemsilico.com/CS_prpKa/PKAhome.html . Accessed Aug 2006
[33] Habibi-Yangjeh A., Danandeh-Jenagharad M., Nooshyar M.: Prediction acidity constant of various benzoic acids and phenols in water using linear and nonlinear QSPR models. Bull. Korean Chem. Soc. 26, 2007–2016 (2005)
[34] Popelier P.L.A., Smith P.J.: QSAR models based on quantum topological molecular similarity. Eur. J. Med. Chem. 41, 862–873 (2006)
[35] Schmid G.H. et al.: The application of iterative optimization techniques to chemical kinetic data of large random error. Can. J. Chem. 54, 3330–3341 (1976)
[36] M. Meloun, J. Militký, M. Forina, Chemometrics for Analytical Chemistry, Vol. 2. PC-Aided Regression and Related Methods (Ellis Horwood, Chichester, 1994), and Vol. 1. PC-Aided Statistical Data Analysis (Ellis Horwood, Chichester, 1992)
[37] S-PLUS: MathSoft, Data Analysis Products Division, 1700 Westlake Ave N, Suite 500, Seattle, WA 98109, USA, http://www.insightful.com/products/splus (1997)
[38] ADSTAT: ADSTAT 1.25, 2.0, 3.0 (Windows 95), TriloByte Statistical Software Ltd., Pardubice, Czech Republic
[39] Belsey D.A., Kuh E., Welsch R.E.: Regression Diagnostics: Identifying Influential data and Sources of Collinearity. Wiley, New York (1980) · Zbl 0479.62056
[40] Cook R.D., Weisberg S.: Residuals and Influence in Regression. Chapman & Hall, London (1982) · Zbl 0564.62054
[41] Atkinson A.C.: Plots, Transformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Claredon Press, Oxford (1985) · Zbl 0582.62065
[42] Chatterjee S., Hadi A.S.: Sensitivity Analysis in Linear Regression. Wiley, New York (1988) · Zbl 0648.62066
[43] Barnett V., Lewis T.: Outliers in Statistical Data. 2nd edn. Wiley, New York (1984) · Zbl 0638.62002
[44] R.E. Welsch, Linear Regression Diagnostics, Technical Report 923-77, Sloan School of Management, Massachusetts Institute of Technology, (1977)
[45] Weisberg S.: Applied Linear Regression. Wiley, New York (1985) · Zbl 0646.62058
[46] Rousseeuw P.J., Leroy A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987) · Zbl 0711.62030
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.