Moving from data-constrained to data-enabled research: experiences and challenges in collecting, validating and analyzing large-scale e-commerce data. (English) Zbl 1426.62367

Summary: Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and nonmarket data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet-based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and nonmarket activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points toward a significant leap in our ability to understand the functioning of electronic commerce.


62P20 Applications of statistics to economics
Full Text: DOI arXiv Euclid


[1] Allen, G. and Wu, J. (2003). Shopbot market representativeness. In Proc. International Conference on Electronic Commerce . Working paper, Dept. Information and Operations Management, Tulane Univ. Available at gallen@tulane.edu.
[2] Ancarani, F. and Shankar, V. (2004). Price levels and price dispersion within and across multiple retailer types: Further evidence and extension. J. Academy of Marketing Science 32 176–187.
[3] Asvanund, A., Clay, K., Krishnan, R. and Smith M. (2004). An empirical analysis of network externalities in peer-to-peer music sharing networks. Information Systems Research 15 155–174.
[4] Bailey, J. (1998). Intermediation and electronic markets: Aggregation and pricing in Internet commerce. Ph.D. dissertation, Technology, Management and Policy, Massachusetts Institute of Technology.
[5] Bajari, P. and Hortaçsu, A. (2003). The winner’s curse, reserve prices and endogenous entry: Empirical insights from eBay auctions. RAND J. Economic s 34 329–355.
[6] Bajari, P. and Hortaçsu, A. (2004). Economic insights from Internet auctions. J. Economic Literature 42 457–486.
[7] Bakos, J. (1997). Reducing buyer search costs: Implications for electronic marketplaces. Management Sci. 43 1676–1693. · Zbl 0896.90145
[8] Bakos, J., Lucas, H. C., Jr., Oh, W., Simon, G., Viswanathan, S. and Weber, B. (2000). The impact of electronic commerce on the retail brokerage industry. Working paper, New York Univ.
[9] Bapna, R., Goes, P. and Gupta, A. (2001). Insights and analyses of online auctions. Comm. ACM 44 (11) 42–50.
[10] Bapna, R., Goes, P. and Gupta, A. (2003a). Analysis and design of business-to-consumer online auctions. Management Sci. 49 85–101. · Zbl 1232.91272
[11] Bapna, R., Goes, P. and Gupta, A. (2003b). Replicating online Yankee auctions to analyze auctioneers’ and bidders’ strategies. Information Systems Research 14 244–268.
[12] Bapna, R., Goes, P., Gupta, A. and Jin, Y. (2004). User heterogeneity and its impact on electronic auction market design: An empirical exploration. MIS Quarterly 28 21–43.
[13] Bapna, R., Goes, P., Gupta, A. and Karuga, G. (2005). Predicting bidders’ willingness to pay in online multi-unit ascending auctions. Working paper, Dept. Operations and Information Management, Univ. Connecticut. · Zbl 1243.91053
[14] Bapna, R., Jank, W. and Shmueli, G. (2005). Consumer surplus in online auctions. Working paper, Dept. Operations and Information Management, Univ. Connecticut. Available at www.sba.uconn.edu/users/rbapna/research.htm.
[15] Bhattacharjee, S., Gopal, R. D., Lertwachara, K. and Marsden, J. R. (2006a). Impact of online technologies on digital goods: Retailer pricing and licensing models in the presence of piracy. J. Management Information Systems .
[16] Bhattacharjee, S., Gopal, R. D., Lertwachara, K. and Marsden, J. R. (2006b). Whatever happened to payola? An empirical analysis of online music sharing. Decision Support Systems .
[17] Bhattacharjee, S., Gopal, R. D., Lertwachara, K. and Marsden, J. R. (2006c). Impact of legal threats on online music sharing activity: An analysis of music industry legal actions. J. Law and Economics 49 91–114.
[18] Brynjolfsson, E. and Smith, M. D. (2000). Frictionless commerce? A comparison of Internet and conventional retailers. Management Sci. 46 563–585.
[19] Clay, K., Krishnan, R. and Wolff, E. (2001). Prices and price dispersion on the web: Evidence from the online book industry. J. Industrial Economics 49 521–539.
[20] Clay, K. and Tay, C. (2001). Cross-country price differentials in the online textbook market. Working paper, Heinz School of Public Policy and Management, Carnegie Mellon Univ.
[21] Clemons, E., Hann, I. and Hitt, L. (2002). Price dispersion and differentiation in online travel: An empirical investigation. Management Sci. 48 534–549.
[22] Dellarocas, C. (2003). The digitization of word-of-mouth: Promise and challenges of online feedback mechanisms. Management Sci. 49 1407–1424.
[23] Erevelles, S., Rolland, E. and Srinivasan, S. (2001). Are prices really lower on the Internet?: An analysis of the vitamin industry. Working paper, Univ. California, Riverside.
[24] Fader, P. S. (2000). Expert report of Peter S. Fader, Ph.D. in record companies and music publishers vs. Napster. United States District Court, Northern District of California.
[25] Florescu, D., Levy, A. and Mendelzon, A. (1998). Database techniques for the World Wide Web: A survey. SIGMOD Record 27 (3) 59–74.
[26] Hoffman, E. and Marsden, J. R. (1986). Testing informational assumptions in common value bidding models. Scandinavian J. Economics 88 627–641.
[27] Hoffman, E., Marsden, J. R. and Saidi, R. (1991). Are joint bidding and competitive common value auction markets compatible?—Some evidence from offshore oil auctions. J. Environmental Economics and Management 20 99–112.
[28] Hoffman, E., Marsden, J. R. and Whinston, A. B. (1990). Laboratory experiments and computer simulation: An introduction to the use of experimental and process model data in economic analysis. In Advances in Behavioral Economics 2 (L. Green and J. H. Kagel, eds.) 1–30. Ablex, Norwood, NJ.
[29] Ghose, A., Smith, M. and Telang, R. (2005). Internet exchanges for used book. Working paper, New York Univ.
[30] Ghose, A. and Sundararajan, A. (2005). Software versioning and quality degradation? An exploratory study of the evidence. Working paper, New York Univ. Available at papers.ssrn.com/sol3/papers.cfm?abstract_id=786005.
[31] Kagel, J. H. and Roth, A. E., eds. (1995). The Handbook of Experimental Economics. Princeton Univ. Press.
[32] Laender, A. H. F., Ribeiro-Neto, B., da Silva, A. S. and Teixeira, J. S. (2002). A brief survey of web data extraction tools. SIGMOD Record 31 (2) 84–93.
[33] Laffont, J.-J., Ossard, H. and Vuong, Q. (1995). Econometrics of first-price auctions. Econometrica 63 953–980. · Zbl 0836.90060
[34] Lee, Z. and Gosain, S. (2002). A longitudinal price comparison for music CDs in electronic and brick-and-mortar markets: Pricing strategies in emergent electronic commerce. J. Business Strategies 19 55–72.
[35] Marsden, J. R. and Lung, Y. A. (1999). The use of information system technology to develop tests on insider trading and asymmetric information. Management Sci. 45 1025–1040. · Zbl 1231.91300
[36] McAfee, R. P. and McMillan, J. (1987). Auctions and bidding. J. Economic Literature 25 699–738. · Zbl 0624.90003
[37] Mierzwa, P. (2005). Squeezing new technology into old laws. CBA Record 19 .
[38] Milgrom, P. R. (1989). Auctions and bidding: A primer. J. Economic Perspectives 3 3–22.
[39] Myerson, R. B. (1981). Optimal auction design. Math. Oper. Res. 6 58–73. JSTOR: · Zbl 0496.90099
[40] Overby, E. (2005). Size matters: Heteroskedasticity, autocorrelation, and parameter inconstancy in large sample data sets. In Proc. First Statistical Challenges in E-Commerce Workshop . Smith School of Business, Univ. Maryland.
[41] Paarsch, H. J. (1992). Deciding between the common and private value paradigms in empirical models of auctions. J. Econometrics 51 191–215.
[42] Pan, X., Ratchford, B. T. and Shankar, V. (2003a). Why aren’t the prices of the same item the same at Me.com and You.com?: Drivers of price dispersion among e-tailers. Working paper, Smith School of Business, Univ. Maryland.
[43] Plott, C. R. (1987). Dimensions of parallelism: Some policy applications of experimental methods. In Laboratory Experimentation in Economics : Six Points of View (A. E. Roth, ed.) 193–219. Cambridge Univ. Press.
[44] Plott, C. R. and Sunder, S. (1982). Efficiency of experimental security markets with insider information: An application of rational-expectations models. J. Political Economy 90 663–698.
[45] Plott, C. R. and Sunder, S. (1988). Rational expectations and the aggregation of diverse information in laboratory security markets. Econometrica 56 1085–1118.
[46] Raudenbush, S.W. and Bryk, A. S. (2002). Hierarchical Linear Models : Applications and Data Analysis Methods . Sage, Thousand Oaks, CA. · Zbl 1137.62037
[47] Rosenthal, R. W. (1980). A model in which an increase in the number of sellers leads to a higher price. Econometrica 48 1575–1580. · Zbl 0443.90010
[48] Rothkopf, M. H. and Harstad, R. M. (1994). Modeling competitive bidding: A critical essay. Management Sci. 40 364–384.
[49] Shmueli, G. and Jank, W. (2006). Modeling the dynamics of online auctions: A modern statistical approach. In Economics , Information Systems and E-Commerce Research II : Advanced Empirical Methods (R. Kauffman and P. Tallon, eds.). Sharpe, Armonk, NY.
[50] Shmueli, G., Jank, W. and Bapna, R. (2005). Sampling eCommerce data from the web: Methodological and practical issues. In ASA Proc. Joint Statistical Meetings 941–948. Amer. Statist. Assoc., Alexandria, VA.
[51] Shmueli, G., Russo, R. P. and Jank, W. (2004). Modeling bid arrivals in online auctions. Working paper, Smith School of Business, Univ. Maryland. Available at www.smith.umd.edu/ceme/statistics/papers.html. · Zbl 1418.91224
[52] Shugan, S. M. (2002). In search of data: An editorial. Marketing Sci. 21 369–377.
[53] Smith, V. L. (1976). Experimental economics: Induced value theory. Amer. Economic Review 66 (2) 274–279.
[54] Smith, V. L. (1982). Microeconomic systems as an experimental science. Amer. Economic Review 72 923–955.
[55] Smith, V. L. (1987). Experimental methods in economics. In The New Palgrave : A Dictionary of Economics 2 (J. Eatwell, M. Milgate and P. Newman, eds.) 241–249. Stockton, New York.
[56] Smith, V. L. (1991). Papers in Experimental Economics . Cambridge Univ. Press.
[57] Steinberg, R. and Slavova, M. (2005). Empirical investigation of multidimensional types in Yankee auctions. Working paper, Judge Business School, Univ. Cambridge.
[58] Tang, Z., Montgomery, A. and Smith, M. D. (2005). The impact of shopbot use on prices and price dispersion: Evidence from disaggregate data. In Proc. Workshop on Information Systems and Economics , Univ. California, Irvine. Available at clients.pixelloom.info/WISE2005/papers/program4.htm.
[59] Venkatesan, R., Mehta, K. and Bapna, R. (2006). Understanding the confluence of retailer characteristics, market characteristics and online pricing strategiess. Decision Support Systems .
[60] Winn, J. K. (2005). Contracting spyware by contract. Berkeley Technology Law Review 20 1345–1359.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.