Unformatted text preview: Web Usage Mining: An Overview Lin Lin Department of Management Lehigh University Jan. 30th Agenda Web Usage Mining: Definition Research Issues in Web Usage Mining Current Research in Web Usage Mining Going Forward Web Usage Mining: A Definition The process of applying data mining techniques to the discovery of usage patterns from Web data, targeted towards various applications Different from content mining & structure mining (Adamic, L. A., and Adar, E. 2003. Friends and neighbors on the web. Social Networks 25(3):211230.) Web Usage Mining: Data Source Typical data sources for web usage mining are: Web structure data (site map, links, etc.) Web content data User profile (may not be available) Web log (web usage data, clickstream data) Web Usage Mining: Procedure Preprocessing: Challenges WHO are the users? IP vs. real people HOW LONG did the users stay? Measuring session time (L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 27(6), 1995) (Berendt, B. Mobasher, M. Nakagawa, and M. Spiliopoulou. The impact of site structure and user environment on session reconstruction in web usage analysis. In Proceedings of the 4th WebKDD 2002 Workshop, at the ACM-SIGKDD Conference on Knowledge Discovery in Databases (KDD'2002), Edmonton, Alberta, Canada, July 2002. WHERE did the users go? Server side vs. Client side WHAT did the users view? Content processing Moe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational click-stream. J. Consumer Psych. 13(1, 2) 2940. --------------------------------------------------------------------------------------For the best review on preprocessing methods, refer to: R. Cooley, B. Mobasher, J. Srivastava, Data preparation for mining world wide web browsing patterns, Knowledge and Information Systems 1 (1) (1999) 532 Usage Pattern Discovery: Methods Statistical Methods (including dependency modeling and stochastic modeling) Association Rule Mining Clustering (user cluster vs. page cluster) Classification Usage Pattern Discovery: Research Streams Why am I interested in web usage mining? (a.k.a., what's the big deal?) Blattberg, Robert C. and John Deighton (1991), "Interactive Marketing: Exploring the Age of Addressability," Sloane Management Review, 33 [1), 5-14 Ghosh, S. 1998. Making business sense of the Internet. Harvard Business Review 76(2) 126135 Bucklin R. E., Lattin, J. M., Ansari, A., Bell, D., Coupey, E. Gupta, S., Little, J. D. C., Mela, C. Montgomery, A. Steckel, J. Choice and the Internet: From Clickstream To Research Stream. Marketing Letters, 2002,Vol. 13, No. 3, pp. 245-258 Usage Pattern Discovery: Research Streams Lin's two cents on current research streams Build a better site: For everybody system improvement (caching & web design) For individuals personalization For search engines SEO Know your visitors better: Customer behavior Be a better business Build a Better Site: System Improvement Server-side caching of web pages (association rules) Y.-H. Wu, A.L.P. Chen, Prediction of web page accesses by proxy server log, World Wide Web 5 (1) (2002) 6788 Preprocessing: Method: Data: Contribution: No IP discussion, sessions split by time-based heuristics Sequential pattern mining Usage Use frequent sequence to predict candidate page, "personalize" based on user maturity Build a Better Site: System Improvement Improvement of general web design (AR, SP, MM) Fang, X. and O. R. L. Sheng (2004). Link Selector: A web mining approach to hyperlink selection for web portals. ACM Transactions on Internet Technology 4, 209237 Preprocessing: Method: Data: Contribution: No IP distinguished, sessions split by 25.5 minutes Association mining Usage & Structure Combine structure info. and usage info. to optimize portal page design Where are we headed: adaptive web design Y. Fu, M. Creado, C. Ju, Reorganizing web sites based on user access patterns, in: Proceedings of the Tenth International Conference on Information and Knowledge Management, ACM Press, 2001, pp. 583585 (usage & content) Build a Better Site: Personalization Personalize the web site based on usage patterns (AR, Clustering) A key research domain: recommender systems* Content clustering vs. users clustering vs. hybrid approach C. Shahabi and F. Banaei-Kashani. Ecient and anonymous web usage mining for web personalization. INFORMS Journal on Computing, Special Issue on Data Mining, 2002 Method: Data: Clustering of sessions Client side usage data Where are we headed: incorporate time and web 2.0 *: Refer to Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734749 for a good review on recommender systems Build a Better Site: SEO Adding usage information into PageRank Kalyan Beemanapalli, Ramya Rangarajan, Jaideep Srivastava, "Usage-Aware Average Clicks", In Proc. Of WebKDD 2006: KDD Workshop on Web Mining and Web Usage Analysis, in conjunction with the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), August 20-23 2006 Method: Association rule in spirit Know your visitors better: Customer behavior A favorite research stream by marketers and MIS researchers Statistical models are used most of the time "Macro-level" behavior is often the focus Interesting questions related to firm performance and profitability Know your visitors better: Customer behavior Johnson, E. J., Wendy Moe, Peter S. Fader, Steven Bellman, and Jerry Lohse. "On the Depth and Dynamics of Online Search Behavior," Management Science, Vol. 50, No. 3, March 2004, pp. 299308 model an individual's tendency to search as a logarithmic process hierarchical Bayesian model with Depth of Search , dynamics of search and activity of search interested in the number of unique sites searched by each household within a given product category Preprocessing: Method: Data: Households identified by client-side programs, session is month-based Statistical Modeling (log model) Usage (search) Know your visitors better: Customer behavior Moe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. J. Consumer Psych. 13(1, 2) 2940 WHY do the customers visit? Preprocessing: Method: Content Processing Clustering of sessions by visiting behavior parameters and content parameters Usage & Content Data: Conclusion: Know your visitors better: Customer behavior Moe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. J. Consumer Psych. 13(1, 2) 2940 Know your visitors better: Customer behavior Sismeiro, Catarina, Randolph E. Bucklin. 2004. Modeling Purchase Behavior at an ECommerce Web Site: A Task Completing Approach. Journal of Marketing Research. 41 (3), 306-323 How do the customers visit? Predicts online buying by linking the purchase decision to what visitors do and to what they are exposed while at the site. Preprocessing: Method: Data: Conclusion: Content Processing Statistical Modeling Usage & Content Know your visitors better: Customer behavior Sismeiro, Catarina, Randolph E. Bucklin. 2004. Modeling Purchase Behavior at an ECommerce Web Site: A Task Completing Approach. Journal of Marketing Research. 41 (3), 306-323 browsing behavior (i.e., time and page views) repeat visitation to the site (return and total number of sessions) use of interactive decision aids Data input effort and information gathering and processing a series of page specific characteristics Know your visitors better: Customer behavior My Research: Online Customer Lifetime predict an individual's tendency to stay with an e-tailer Hybrid BG/NBD model + Neural Networks interested in the relationship between online customer lifetime and firm profitability Preprocessing: Method: Data: Households identified by client-side programs, session is month-based Statistical Modeling & Classification Usage Know your visitors better: Customer behavior My Research: Online Customer Lifetime Given N customers with visiting history (Xi, txi, T ) T : the observed time period Xi : number of visits customer i made during T txi: time of the last visit made by customer i Find the best fit for the following maximum likelihood equation to estimate the four parameters r, a, b and B(a, b x) (r x) r B(a 1, b x 1) (r x) r [ B(a, b) (r )( T )r x x0 B(a, b) (r )( t )r x ] i 1 x N Know your visitors better: Customer behavior Given r, a, b and can predict: , we Total number of visits during a time period t (starting from time 0) a b 1 r t [1 ( ) F (r , b; a b 1; )]* N a 1 T t Number of visits an individual will make in the future t time units Y(t) (from T+1 to T+t) a b x 1 T rx t [1 ( ) F (r x, b x; a b x 1; )] a 1 T t T t a T rx 1 x 0 ( ) b x 1 tx Know your visitors better: Customer behavior My Research: Online Customer Lifetime Product Type Company Amazon BMG Columbia Drugstore Search Goods Ticketmaster landsend doldNavy Experience Goods victoriassecret Number of visitors 1267 177 304 57 179 32 88 54 Calibration period Testing period Mean Lifetime Percentage Right censored Acc B. 5 1 125.8 44.91% 75.27% 4 2 136 42.86% 72.45% 5 1 123.93 23.73% 67.23% 4 2 128.16 38.98% 79.10% 5 1 131.31 43.75% 88.16% 4 2 126.96 47.70% 80.59% 5 1 74.16 28.08% 84.21% 4 2 72.28 21.06% 78.95% 5 1 100.96 30.34% 80.89% 4 2 102.97 18.44% 78.77% 5 1 45.6 6.25% 71.88% 4 2 51.5 9.38% 87.50% 5 1 101.41 39.77% 79.31% 4 2 113.85 39.77% 78.41% 5 1 52.56 14.82% 74.07% 4 2 63.92 12.96% 72.22% Acc. 77.82% 74.90% 77.40% 70.62% 78.62% 70.07% 78.95% 75.43% 74.72% 70.95% 81.25% 75.00% 80.68% 72.73% 77.78% 79.63% Web Usage Mining: The Future ...
View Full Document
- Spring '08
- Data Mining, World Wide Web, web site, web usage mining