2440Data-mining-with-big-data-pdf - Data Mining with Big...

This preview shows page 1 - 3 out of 26 pages.

1 Data Mining with Big Data Xindong Wu 1,2 , Xingquan Zhu 3 , Gong-Qing Wu 2 , Wei Ding 4 1 School of Computer Science and Information Engineering, Hefei University of Technology, China 2 Department of Computer Science, University of Vermont, USA 3 QCIS Center, Faculty of Engineering & Information Technology, University of Technology, Sydney, Australia 4 Department of Computer Science, University of Massachusetts Boston, USA Abstract : Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data is now rapidly expanding in all science and engineering domains, including physical, biological and bio- medical sciences. This article presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution. 1. Introduction Dr. Yan Mo won the 2012 Nobel Prize in Literature. This is probably the most controversial Nobel prize of this category, as Mo speaks Chinese, lives in a socialist country, and has the Chinese government’s support. Searching on Google with Yan Mo Nobel Prize” , we get 1,050,000 web pointers on the Internet (as of January 3, 2013). For all praises as well as criticisms,” said Mo recently, “I am grateful.” What types of praises and criticisms has Mo actually received over his 31-year writing career? As comments keep coming on the Internet and in various news media, can we summarize all types of opinions in different media in a real-time fashion, including updated, cross-referenced discussions by critics? This type of summarization program is an excellent example for Big Data processing, as the information comes from multiple, heterogeneous, autonomous sources with complex and evolving relationships, and keeps growing. Along with the above example, the era of Big Data has arrived (Nature Editorial 2008; Mervis J. 2012; Labrinidis and Jagadish 2012). Every day, 2.5 quintillion bytes of data are created and 90% of the
Image of page 1

Subscribe to view the full document.

2 data in the world today were produced within the past two years ( IBM 2012 ). Our capability for data generation has never been so powerful and enormous ever since the invention of the Information Technology in the early 19 th century. As another example, on October 4, 2012, the first presidential debate between President Barack Obama and Governor Mitt Romney triggered more than 10 million tweets within two hours (Twitter Blog 2012). Among all these tweets, the specific moments that generated the most discussions actually revealed the public interests, such as the discussions about Medicare and vouchers. Such online discussions provide a new means to sense the public interests and
Image of page 2
Image of page 3

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern