10-Unit10 - Business Intelligence and Tools Unit 10 Unit 10...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Business Intelligence and Tools Unit 10 Unit 10 Structure 10.1 10.2 Introduction Objectives Data Mining and Data Visualization Concept of Data Mining 10.2.1 10.2.2 Reasons for Growth and Popularity Applications of Data Mining Self Assessment Question(s) (SAQs) 10.3 The Knowledge Discovery Process 10.3.1 10.3.2 10.3.3 10.3.4 10.3.5 10.3.6 10.4 Determination of Business Objectives Selection and Preparation of Data Application of appropriate Mining techniques Evaluation of Data Mining Results Presentation of Data Discoveries Incorporation of Usage of Discoveries Self Assessment Question(s) (SAQs) Data Mining Techniques 10.4.1 Classification 10.4.2 Linkage Analysis 10.4.3 Sequential Discovery 10.4.4 Cluster Analysis 10.4.5 Statistical Analysis Self Assessment Question(s) (SAQs) Challenges to Data Mining 10.5.1 Identification of Missing Information 10.5.2 Data Noise and Missing Values 10.5.3 Large Databases and High Dimensionality Self Assessment Question(s) (SAQs) Evaluation of a Data Mining Tool Self Assessment Question(s) (SAQs) OLAP Vs. Data Mining Self Assessment Question(s) (SAQs) Page No. 218 10.5 10.6 10.7 Sikkim Manipal University Business Intelligence and Tools Unit 10 10.8 10.9 Data Visualization Self Assessment Question(s) (SAQs) Summary 10.10 Terminal Questions (TQs) 10.11 Multiple Choice Questions (MCQs) 10.12 Answers to SAQs, TQs, and MCQs 10.12.1 10.12.2 10.12.3 Answers to Self Assessment Questions (SAQs) Answers to Terminal Questions (TQs) Answers to Multiple Choice Questions (MCQs) 10.1 Introduction Every organization that has embraced the concept of a data warehouse believes that data mining is a definite part of its future. Data mining is the process of using the raw business data to infer important business relationships. Once these business relationships have been analyzed and discovered, they can be used to gain the competitive advantage. A data warehouse has several other advantages other than data mining, but the fullest use of a data warehouse must include data mining. Also, data visualization is the process by which the data is converted into the meaningful images for improved understanding of the business situation. Apart from these, we also discuss the new trends in the area of data warehousing and data mining. Objectives: The objectives of this Unit are to make you understand: The purpose of data mining The techniques involved in the data mining Challenges to the process of data mining The differences between OLAP and data mining The significance of data visualization Trends in data warehousing and data mining Page No. 219 Sikkim Manipal University Business Intelligence and Tools Unit 10 10.2 Concept of Data Mining By its simplest definition, data mining (DM) is the set of activities used to find new, hidden, or unexpected patterns in data. It is the process of analyzing data from different perspectives and summarizing it into useful information. Technically, the data mining process finds the correlations and patterns existing among several fields in a large relational database. In the past, decision support activities were based on the concept of verification. In this sense, a relational database could be queried to provide dynamic answers to well-formed questions. The key issue in verification is that it requires a great deal of prior knowledge on the part of the decision maker in order to verify a suspected relationship through the query. In the 1990s, data warehouses with query and report tools assisted the users in retrieving the types of decision support information they needed. Later OLAP tools came in to existence for better sophisticated analysis. Till this point, the approach for obtaining the information was mainly driven by the users. But the sheer volume of data renders it impossible for anyone to use analysis and query tools to discern useful patterns. For instance, in marketing research analysis, it is practically impossible to go through all the possible associations and gain insights by querying and drilling down into the data warehouse. You might really need a technology that can learn from past associations and results, and predict future behavior of customers. It is really good to have a tool that will accomplish the discovery of knowledge by itself to sustain the cut-throat competition. Thus you require a data-driven approach rather than a user-driven one. Using the information stored within a data warehouse, the data mining techniques can provide solutions to the following questions: Which type of products needs to be promoted for a specific individual customer? Sikkim Manipal University Page No. 220 Business Intelligence and Tools Unit 10 Which scrips/securities are going to be more profitable during next trading session? What is the probability for an individual customer to respond to a particular promotion? What is the likelihood that an individual customer will default or payback on schedule? These questions can be answered easily if the information hidden among the terabytes of data in your databases can be located and utilized fully. Another important DM technique is knowledge data discovery (KDD). Using a combination of techniques, including statistical analysis, multidimensional analysis, intelligent agents, and data visualization, KDD can discover highly useful informative patterns within the data that can be used to develop predictive models of behavior. 10.2.1 Reasons for Growth and Popularity Several reasons can be cited in support of the growing popularity of DM. But the top most reason would be the ever-increasing volume of data that require processing. It is just impossible for human efforts to study, decipher, and interpret the entire data to find the useful information. The other reasons for the increased popularity of data mining are as follows: The inadequacy of the human brain to process data, particularly in situations involving multi-factorial dependencies or correlations. The increasing affordability of machine learning that means an automated DM system can operate at a much lower cost than a group of highly trained (and paid) professional managers. Although DM has not entirely eliminated human participation in problem solving, it significantly simplifies the tasks and allows humans to manage the process better. Sikkim Manipal University Page No. 221 Business Intelligence and Tools Unit 10 Organizations identified the need to discover the existing classifications among their customers so that the classifications enable them to properly target with suitable products and services. 10.2.2 Applications of Data Mining The application of the data mining tools in some of the business areas across various industries is as follows: Customer segmentation: The cluster detection algorithms discover the clusters of customers sharing the same characteristics. Accordingly, companies can come out with appropriate service offers to retain their customers. Fraud detection: The credit cards companies can make use of the data mining tools to study the spending patterns of its customers. They can also keep a close look on abnormal spending of the customers and as such patterns can expose fraudulent use of the cards. Risk management: Insurance companies can uncover the risks associated with the potential customers using the data mining tools. Marketing basket analysis: This is a useful application in the retail industry as the link analysis algorithms reveal the affinities between the products that are bought together. A super market chain can increase its earnings by rearranging the shelves based on discovery of affinities that sell together. Delinquency tracking: Banks may use the data mining technologies to track the customers who are likely to default in making the repayments Demand prediction: Various businesses can forecast and determine the demand for specific products. For example, a transportation company can arrange the number of vehicles required to meet the demand. Inventory management: Retailing companies can manage their inventory turnover effectively through use of data mining tools. Sikkim Manipal University Page No. 222 Business Intelligence and Tools Unit 10 Another recent application of data mining is in the area of customer relationship management (CRM). The data mining and CRM software enable the organization analyze large databases to solve business-decision problems. Here, data mining acts as an extension of statistics, with a few artificial intelligence features. On the other hand, CRM is successful in turning the information in a database into a business decision that drives the customized interactions with customers. In case a new product is launched, the data mining software uses the historical information to build a model of customer behavior that could be used to predict which customers would be likely to respond to the new product. A marketing manager can select only the customers who are most likely to respond by using this information. CRM software can then feed the results of the decision to the appropriate touch point systems (call centers, web servers, e-mail systems, etc.) so that the right customers receive the right offers. Self Assessment Question(s) (SAQs) For Section 10.2 1. Explain the concept of data mining? 2. Discuss the significance of data mining in the current business scenario? 3. List out the application of data mining tools in various business areas. 10.3 The Knowledge Discovery Process Data mining helps you to understand the substance of the data in a special unsuspected way. It unearths patterns and trends in the raw data that you are hardly aware of. Thus data mining is seen by the experts as a knowledge discovery process. This process aims at uncovering the hidden knowledge manifested in the data as relationships or patterns. Data mining techniques discover the relationship between two or more different objects along with the time dimension. For instance, the tools may discover the relationships between bread and chips and between Sikkim Manipal University Page No. 223 Business Intelligence and Tools Unit 10 milk and cheese packs, especially in an evening rush hour. Sometimes, you may determine the relationship between the attributes of the same object. Pattern discovery is another important outcome of the data mining tools. For instance, the credit card companies use the data mining tools to mine the usage patterns of thousands of card-holders that help them to design an appropriate marketing campaign. Determination of business objectives Selection and preparation of data Application of appropriate mining techniques Evaluation of data mining results Presentation of data discoveries Incorporation of usage of discoveries Fig. 10.1: The Phases in the Knowledge Discovery process The important phases involved in the process of knowledge discovery process is shown in Fig 10.1. Each of these phases is discussed below. 10.3.1 Determination of Business Objectives This is the first phase where you make an attempt to understand the purpose of going for data mining techniques. You can state these objectives as: How can I detect the frauds in the credit card usage? What is the best marketing campaign to increase my current sales? Can I identify the associations between the products that sell together? Sikkim Manipal University Page No. 224 Business Intelligence and Tools Unit 10 Whatever data mining method is employed later, this is an important phase as the rest of the phases are determined as per the objectives set in this phase. 10.3.2 Selection and Preparation of Data This phase consists of selection of the data, preprocessing of the data, and transformation of the data. Based on the business objectives identified in the previous phase, select the appropriate data and extract it from the data warehouse. Then you preprocess the data by enriching the selected data with the appropriate external data. Also, remove the noisy data that is out of range and make sure that you have selected the entire data with the characteristics required to attempt the business objective detailed in the previous phase. 10.3.3 Application of appropriate Mining techniques This is a crucial phase wherein the knowledge discovery engine applied the selected algorithm to the prepared data. The output of this phase is a set of relationships or patterns. But this phase and the next phase are to be performed in an iterative manner as you may have to adjust the data and redo the step after an initial evaluation. 10.3.4 Evaluation of Data Mining Results In the data selected in the second phase, there are many potential patterns or relationships. During this phase, you examine all the resulting patterns. You can see interesting patterns and relationship between various parameters; products, customers, sales, per unit cost, etc. you may apply the filtering mechanism and come out with the appropriate and realistic patterns of data. 10.3.5 Presentation of Data Discoveries You may present the data discoveries in the form of visual navigation, charts, graphs, and text as well. The presentation mainly deals with highlighting the important discoveries you have found and storing them in the knowledge base for future use. Sikkim Manipal University Page No. 225 Business Intelligence and Tools Unit 10 10.3.6 Incorporation of Usage of Discoveries This is the phase where you apply the results obtained in the earlier phases into business actions. You may assemble the results of the discovery in the best meaningful manner so that they can be applied for the effective improvement of the business. Self Assessment Question(s) (SAQs) For Section 10.3: 1. What are the outcomes of the knowledge discovery process? 2. Discuss the important phases involved in the knowledge discovery process? 10.4 Data Mining Techniques There are several data mining techniques that are in practice. With the growing popularity of this area, the cadre of new and innovative techniques to mine the warehouse data is exploding as well. Many of the new techniques are refinements of previous methods. But the lack of standardization across the vendors has become an important concern. Regardless of the vendor-specific technique, the data mining methods may be classified by the function they perform or by their class of application. Accordingly, the major categories of processing algorithms and rule approaches are discussed below. 10.4.1 Classification In this approach, the data mining processes are intended to discover rules that define whether an event belongs to a particular subset or class of data. This category of techniques is most applicable to different types of business problems and the technique involves two sub-processes; predicting classifications and building a model. For instance, you are trying to understand the buying patterns in a customer-base and so the classification model can be constructed that Sikkim Manipal University Page No. 226 Business Intelligence and Tools Unit 10 maps the various customer attributes (e.g., age, gender, income, etc.) with various product purchases (e.g., luxury cars, clothing, personal fitness products, etc.). Given an appropriate set of predicting attributes, the model can be used against a list of customers to determine their likely purchases for the coming month. By building and refining a predictive model of the business problem, the data mining classification methods can provide useful and highly accurate answers to the questions viz., Which household is likely respond to the promotion campaign being launched?, Which stocks in the portfolio will go during the current market correction? In general, the classification methods develop a model with IF-THEN rules. As the idea is to gain insight into probable members of a class, the standard approach to determine whether a specific rule is satisfied can be categorized into three possible subclasses: Exact rule: As per this rule, each IF object is an exact element of the THEN class. This approach creates the highest probability class of members (100 percent probability). Strong rule: As per this rule, an acceptable range of exceptions is prescribed to create a subclass of strong probability members (90 to 100 percent). Probabilistic rule: The rule relates the conditional probability P (THEN IF) to the probability P(THEN). The approach creates a measured probability subclass of members- ( x percent probability). 10.4.2 Linkage Analysis The data mining techniques that employ linkage analysis (associations) search all details or transactions from operational systems for patterns with a high probability of repetition. This approach results in the development of an associative algorithm that correlates one set of events with another set of events. Patterns derived from the algorithm are expressed as follows: "Seventy-five percent of all records that contain items A, B, and C also Sikkim Manipal University Page No. 227 Business Intelligence and Tools Unit 10 contain items D and E." The specific percentage supplied by the associative algorithm is referred to as the confidence factor of the rule and these associations can involve any number of items on either side of the rule. A common example of linkage analysis is market basket analysis, through which a retailer can mine the data generated by a point-of-sale system. By analyzing the products purchased by an individual purchaser, and then using an associative algorithm to compare numerous baskets, specific product affinities are derived. The outcome of a linkage analysis could be "if a customer buys brand X blender, then there is a probability of twenty-two for him also to buy a set of kitchen tumblers". This type of information can be used to design appropriate promotional campaign. 10.4.3 Sequential Discovery Techniques that use sequencing or time-series analysis relate events in time based sequences on a series of preceding events viz., prediction of interest rate fluctuations, stock performance. This analysis reveals various hidden trends and are often highly predictive of future events. The sequences are analyzed as they relate to a specific customer or group of customers. For example, you might observe a sequential pattern that leads to the determination of a set of purchases that frequently precedes the purchase of a microwave oven. By monitoring the buying patterns of customers, particularly using credit card transactions, the effective targeted mailing lists can be generated to focus promotions and marketing campaigns. Another interesting example is to find unrelated departments with similar sales streams. This information can be used to determine more profitable promotions, customer flow, or store layouts. Similar-sequence techniques can successfully balance investment portfolios by identifying stocks or securities with similar price movements. Sikkim Manipal University Page No. 228 Business Intelligence and Tools Unit 10 10.4.4 Cluster Analysis Clustering methods can be used to create partitions so that all members of each set are similar according to a set of metrics. A cluster is simply a set of objects grouped together by virtue of their similarity to each other. For instance, a clustering approach might be used to mine credit card purchase data to discover that meals charged on a business-issued gold card are typically purchased on weekdays and have a mean value of greater than Rs.175, whereas meals purchased using a personal platinum card occur predominately on week-1 ends, have a mean value of Rs.125. Clustering processes can be based on a particular event, such as the cancellation of a credit card by a customer. By analyzing the characteristics of members of this class, clustering might derive certain rules that could assist the credit card issuer in reducing the number of card cancellations in the future. 10.4.5 Statistical Analysis Statistical analysis is the most mature of all data mining technologies and is the easiest to understand. The traditional statistical modeling techniques such as regression analysis are useful in building linear models that describe predictable data points. However, complex data patterns are often not linear in nature. But the data mining requires the use of statistical techniques that are capable of accommodating the conditions of nonlinearity, and non-numerical data typically found in a DW environment. Some of the important statistical techniques include decision trees, neural networks, machine learning techniques, etc. Decision trees offer a conceptually simple mathematical method of following the effect of each event, or decision, on successive events. For example, in a simple decision tree involving the performance of an activity indoors or outdoors, if "indoors" is selected from the initial choice set, then the next decision will more likely be "upstairs/downstairs" rather than "sun/shade." By continually breaking datasets into separate, Sikkim Manipal University Page No. 229 Business Intelligence and Tools Unit 10 smaller groups, a predictive model can be built. Decision trees used in the data mining applications assist in the classification and prediction of items or events contained within the data warehouse. By constructing a tree, you can decipher the rules and understand why a record is classified in a certain way. Neural networks attempt to mirror the way the human brain works in recognizing patterns by developing mathematical structures with the ability to learn. By studying combinations of variables and how different combinations affect datasets, these networks develop nonlinear predictive models. Use of these networks is effective when the data is shapeless and lacks any apparent pattern. Machine learning techniques, such as genetic algorithms and fuzzy logic, can derive meaning from complicated and lengthy data and can extract patterns from and detect trends within the data that are far too complex to be noticed by either human brain or more conventional automated analysis techniques. Because of this ability, neural computing and machine learning technologies demonstrate broad applicability to the world of data mining and to a wide variety of complex business situations. Self Assessment Question(s) (SAQs) For Section 10.4 1. What are the various data mining techniques you are familiar with? 10.5 Challenges to Data Mining Despite the potential power and value of the data mining models to the business world, these are still new and yet to fully emerge. Some of the challenges that significantly limit the advancement of the data mining products are as follows: Sikkim Manipal University Page No. 230 Business Intelligence and Tools Unit 10 10.5.1 Identification of Missing Information Warehouse designers will continue to grapple with the conversion of data from an ODS into a homogenous form for the warehouse until such time that the data mining becomes common-place and legacy application databases are replaced by newer "warehouse friendly" databases. For instance, the data in an ODS cannot contain the entire knowledge about a particular application domain. Generally, the data within an ODS is limited to those needed by the operational application associated with that database. Even though application-relevant queries can be successfully made on the data, more generalized queries may not be possible. But the data mining applications need to include mechanisms for "listing" the datasets so that attribute sufficiency can be determined prior to loading the data into the warehouse. For example, if you want to diagnose potential cases of malaria from a patient database, it is also known that all patients’ datasets loaded into the warehouse must include the patient's red blood cell count. Without these data, no diagnosis can be effectively made. So one has to analyze the missing information and obtain the missing data so as to use the data mining for deriving effective benefits. 10.5.2 Data Noise and Missing Values Errors in either the values of the attributes or the classification of data are referred to as data noise. The operational databases may be contaminated to some degree by errors. Data attributes that rely on subjective measurements or judgments can lead to errors so significant that some examples may even be misclassified. The data mining applications must address the problem of data noise in order to minimize its negative effect on the overall accuracy of rules generated from the data. Sikkim Manipal University Page No. 231 Business Intelligence and Tools Unit 10 10.5.3 Large Databases and High Dimensionality In general, the operational databases are large and their contents are ever changing as data are added, modified, or removed. Rules derived from a dataset at a specific time may become less accurate as the data changes. This need for current data creates a parallel need for increased dimensions of the data for purposes of discovery. So the data patterns need to be constantly updated by an expanding set of time-sensitive data values. The problem must be addressed by future discovery applications that can portion the problem space into smaller, more manageable chunks without losing any of the intrinsic attributes of the data contained therein. Self Assessment Question(s) (SAQs) For Section 10.5 1. What are the limitations of the data mining techniques? 10.6 Evaluation of a Data Mining Tool You may consider the following points while evaluating the effectiveness of a data mining tool: The tool needs to be quick in accessing the data sources and bring over the required datasets into its environment. The tool must be capable of reading the other sources and input formats as it might need the data from the external sources to augment the data extracted from the warehouse. The tool has to be capable of filtering out the unwanted data and deriving the new data items from existing ones. The tool needs to be sensitive to the quality of the data that it mines. It has to recognize the missing or incomplete data. Also, it is good if the tool is able to produce the error reports. Sikkim Manipal University Page No. 232 Business Intelligence and Tools Unit 10 The architecture of the tool must be able to integrate with the data warehouse administration and other functions such as data extraction and metadata management. The tool must provide consistent performance irrespective of the amount of data to be mined, type of algorithm applied, and number of variables specified. The tool must have the capability to connect to the external applications The tool must be able to share the output with desktop tools such as graphic displays, spreadsheets, and database utilities. This feature is called openness which refers to the capacity to integrate with the environment and other type of tools. The tool should have the data visualization capability to display the results graphically and diagrammatically Self Assessment Question(s) (SAQs) For Section 10.6 1. What is the criterion to be adopted in selecting a right data mining tool? 10.7 OLAP Vs. Data Mining As you are already aware, you can obtain results from complex queries and derive interesting patterns with OLAP queries and analyses. Data mining also enables its users to come out with hidden patterns in the data. But there is a difference between these two processes. The user has prior knowledge of what he is going to do when he works with OLAP. In the case of data mining, the user has no knowledge of what results he is likely to get. OLAP helps the user analyze the past and gain insights, whereas data mining helps the user to predict the future. To conclude, OLAP and data mining are complementary to each other as data mining picks up from where OLAP has left. You drive the process to use OLAP tools, but you can sit back after preparing the data in case of data mining. Sikkim Manipal University Page No. 233 Business Intelligence and Tools Unit 10 Some of the important features of OLAP and data mining are detailed in Table 10.1. Subject Nature of information request Current state of technology Approach Analysis techniques Granularity of data Number of business dimensions, and dimension attributes OLAP To understand the happenings in the organization Data mining To predict the future on the basis of what has happened. Quite matured and widely in Emerging, but some areas of use the technology are matured User driven and interactive analysis Multi-dimensional, drilldown, slice-and-dice, etc. Summary data Data driven and automatic knowledge discovery Prepare the data and select an appropriate technique Detailed transaction-level data Small number of dimensions Large number of dimensions and limited number of and many dimension attributes attributes Normally large for every dimension Sizes of the data sets Not large for every for the dimensions dimension Table 10.1: Basic differences between OLAP and Data mining Self Assessment Question(s) (SAQs) For Section 10.7 1. Explain the fundamental differences between OLAP and data mining in terms of their approach in providing business intelligence to its users? 10.8 Data Visualization Data visualization is the process by which numerical data are converted into meaningful images. Here, the data may come from any type of sources viz., satellite photos, undersea sonic measurements, surveys, or computer simulations. Typically, these sources create data that is difficult to interpret because of the overwhelming quantity, complexity of information and the embedded patterns. As the human brain is capable of processing a significant amount of visual information and recognizing millions of different Sikkim Manipal University Page No. 234 Business Intelligence and Tools Unit 10 physical objects, the data visualization techniques are intended to assist in analyzing the complex datasets by mapping the physical properties to the data. National Center for Supercomputing Applications (NCSA), University of Illinois created a computer animation of smog data superimposed over a three-dimensional map of the Los Angles basin. NCSA was successful to pinpoint locations of major contributors to pollution and to accurately predict the smog levels and movement over a wide area. Three-dimensional visualization programs developed by Xerox PARC enable its users to fly through large datasets, view the data from infinite angles, examine and rearrange virtual object representations of the data. Self Assessment Question(s) (SAQs) For Section 10.8 1. What is data visualization and discuss its significance in the study of business intelligence tools? 10.9 Summary Data mining is a process of analyzing data from different perspectives and summarizing it into the useful information. Some of the important reasons for the increased growth of data mining methods include the inadequacy of the human brain to process larger amounts of data, the increasing affordability of machine learning, and the need of the organizations to discover the existing classifications in the data. Some of the important applications of data mining tools include the following areas; customer segmentation, fraud detection, risk management, marketing basket analysis, delinquency tracking, demand prediction, inventory management, and customer relationship management (CRM). Sikkim Manipal University Page No. 235 Business Intelligence and Tools Unit 10 The entire process of data mining can be termed as knowledge discovery process and the important phases in this process include determination of business objectives, selection and preparation of data, application of appropriate mining techniques, evaluation of results, presentation of discoveries, and incorporation of and application of results. The important data mining techniques include classification, linkage analysis, sequential discovery, cluster analysis, and statistical techniques like decision trees, neural networks, and machine learning techniques such as genetic algorithms and fuzzy logic. OLAP helps the user analyze the past and gain insights, whereas data mining helps the user to predict the future. To conclude, OLAP and data mining are complementary to each other as data mining picks up from where OLAP has left off. You drive the process to use OLAP tools, but you can sit back after preparing the data in case of data mining. Data visualization is the process by which numerical data are converted into meaningful images. As the human brain is capable of processing a significant amount of visual information and recognizing millions of different physical objects, the data visualization techniques are intended to assist in analyzing the complex datasets by mapping the physical properties to the data. 10.10 Terminal Questions (TQs) 1. Data mining techniques are successful in brining several benefits to their organizations. However, these techniques are yet to progress a lot so as to give their best. Comment. 2. Discuss the important data mining techniques and explain how they are useful for the managers in making strategic business decisions. Sikkim Manipal University Page No. 236 Business Intelligence and Tools Unit 10 10.11 Multiple Choice Questions (MCQs) 1. Which of the following is the result of a data mining effort? a. Information and data trends b. Operational efficiency c. Organizational effectiveness d. None of the above 2. Data mining is a process used to _____ . a. remove the unnecessary data in a data warehouse b. analyze and understand the trends in the data c. add data into the data warehouse from the external data sources d. connect various data marts in a data warehouse 3. The process of analyzing data from different perspectives and summarizing it into the useful information is called as _________. a. Data loading b. Data analyzing c. Data prototyping d. Data mining 4. KDD stands for ________. a. Knowledge data designing b. Knowledge data discovery c. Knowledge designing and development d. Knowledge derivation from data 5. Which of the following business area has benefited from the use of data mining tools? a. Demand prediction b. Risk management c. Delinquency tracking d. All the above Sikkim Manipal University Page No. 237 Business Intelligence and Tools Unit 10 6. Which of the following is an incorrect phase in the knowledge discovery process? a. Determination of business objectives b. Selection and preparation of data mining method c. Application of appropriate mining techniques d. Evaluation and application of results 7. What is the fundamental difference between OLAP and data mining techniques? a. OLAP queries are user-driven and the outcomes of data mining are data-driven b. OLAP helps the user analyze the past and gain insights, whereas data mining helps the user to predict the future c. Both (a) and (b) d. None of the above 8. Which of the following is the process by which numerical data are converted into meaningful images? a. Data visualization b. Data mining c. Data discovery d. None of the above 9. Which of the following is not correct about the data mining? a. Data mining is a data-driven process b. Data mining tools predict the future on the basis of what has already happened c. Multi-dimensional, drill-down, slice-and-dice are some of the data mining tools d. None of the above Sikkim Manipal University Page No. 238 Business Intelligence and Tools Unit 10 10. Which of the following is not a category of data mining techniques? a. Linkage analysis b. Cluster analysis c. Groups d. Associations 11. Which of the following mining techniques searches all details from operational systems to discover patterns with a high probability of repetition? a. Linkage analysis b. Cluster analysis c. Classifications d. Associations 12. Which data mining approach develops a model with IF-THEN rules? a. Linkage analysis b. Classifications c. Associations d. Cluster analysis 10.12 Answers to SAQs, TQs, and MCQs 10.12.1 Answers to Self Assessment Questions (SAQs) Section 10.2: 1. Data mining is a process of analyzing data from different perspectives and summarizing it into useful information. Also, you can discuss the concept as detailed in the Section 10.2. 2. You can discuss the following important reasons for the increased growth of data mining methods; inadequacy of the human brain to process larger amounts of data, the increasing affordability of machine learning, and the need of the organizations to discover the existing classifications in the data. Sikkim Manipal University Page No. 239 Business Intelligence and Tools Unit 10 3. Some of the important applications of data mining tools include the following areas; customer segmentation, fraud detection, risk management, marketing basket analysis, delinquency tracking, demand prediction, inventory management, and customer relationship management (CRM). You may discuss these application areas as detailed in the Section 10.2.2. Section 10.3 1. The important outcome of the knowledge discovery process are discovery of relationships within the data, data patterns. You can discuss these outcomes as provided in the Section 10.3. 2. The important phases in this process include determination of business objectives, selection and preparation of data, application of appropriate mining techniques, evaluation of results, presentation of discoveries, and incorporation of and application of results. You can discuss these phases as provided in the Section 10.3. Section 10.4 1. The important data mining techniques include classification, linkage analysis, sequential discovery, cluster analysis, and statistical techniques like decision trees, neural networks, and machine learning techniques such as genetic algorithms and fuzzy logic. You can discuss these techniques as provided in the Section 10.4. Section 10.5 1. The current limitations of the data mining techniques include identification of missing information, data noise and missing values, and large databases and high dimensionality. You can discuss these limitations as provided in the Section 10.5. Sikkim Manipal University Page No. 240 Business Intelligence and Tools Unit 10 Section 10.6 1. While selecting an appropriate tool, one has to consider various issues viz., ability of the tool in accessing the tool quickly, capability to filter the unwanted data, consistent in delivering the performance. You can list out the issues discussed for the evaluation of a data mining tool in the Section 10.6. Section 10.7 1. OLAP helps the user to analyze the past and gain insights, whereas data mining helps the user to predict the future. To conclude, OLAP and data mining are complementary to each other as data mining picks up from where OLAP has left. You drive the process to use OLAP tools, but you can sit back after preparing the data in case of data mining. The important differences between OLAP and data mining techniques are detailed in Table 10.1. Section 10.8 1. Data visualization is the process by which numerical data are converted into meaningful images. As the human brain is capable of processing a significant amount of visual information and recognizing millions of different physical objects, the data visualization techniques are intended to assist in analyzing the complex datasets by mapping the physical properties to the data. 10.12.2 Answers to Terminal Questions (TQs) 1. Organizations obtain several benefits through the use of data mining tools. Some of the important applications of data mining tools include the following areas; customer segmentation, fraud detection, risk management, marketing basket analysis, delinquency tracking, demand prediction, inventory management, and customer relationship management (CRM). However, the data mining techniques are now in the emerging phase. These tools are facing various challenges viz., Sikkim Manipal University Page No. 241 Business Intelligence and Tools Unit 10 identification of missing information, data noise and missing values, and large databases and high dimensions. Thus the tools need to overcome the current limitations and progress a lot in order to give their best to its users. 2. The important data mining techniques include classification, linkage analysis, sequential discovery, cluster analysis, and statistical techniques like decision trees, neural networks, and machine learning techniques such as genetic algorithms and fuzzy logic. As discussed in the Section 10.4, you describe all the data mining techniques and the benefits that organizations obtain through use of these techniques. 10.12.3 Answers to Multiple Choice Questions (MCQs) 1. Ans: a 2. Ans: b 3. Ans: d 4. Ans: b 5. Ans: d 6. Ans: b 7. Ans: c 8. Ans: a 9. Ans: c 10. Ans: c 11. Ans: a 12. Ans: b Sikkim Manipal University Page No. 242 ...
View Full Document

Ask a homework question - tutors are online