10ROLE OF DATA WAREHOUSING AND DATA MINING
8 Pages

10ROLE OF DATA WAREHOUSING AND DATA MINING

Course Number: FINANCE 10, Fall 2013

College/University: Uppsala

Word Count: 4780

Rating:

Document Preview

304 Vol 03, Issue 01; January-April 2012 Research Journal of Computer Systems Engineering - RJCSE http://technicaljournals.org cFACT -2012, Loyola College, Chennai, TN ISSN: 2230-8563; e-ISSN-2230-8571 ROLE OF DATA WAREHOUSING AND DATA MINING TECHNOLOGY IN BUSINESS INTELLIGENCE PALAK GUPTA ASSISTANT PROFESSOR, JAGANNATH INTERNATIONAL MANAGEMENT SCHOOL, MOR, POCKET-105, KALKAJI, NEW DELHI-19...

Unformatted Document Excerpt
Coursehero >> Sweden >> Uppsala >> FINANCE 10

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

03, 304 Vol Issue 01; January-April 2012 Research Journal of Computer Systems Engineering - RJCSE http://technicaljournals.org cFACT -2012, Loyola College, Chennai, TN ISSN: 2230-8563; e-ISSN-2230-8571 ROLE OF DATA WAREHOUSING AND DATA MINING TECHNOLOGY IN BUSINESS INTELLIGENCE PALAK GUPTA ASSISTANT PROFESSOR, JAGANNATH INTERNATIONAL MANAGEMENT SCHOOL, MOR, POCKET-105, KALKAJI, NEW DELHI-19 vaishpalak@rediffmail.com ABSTRACT This paper includes the combined study of Business Intelligence (BI), Data Warehouse and Data Mining technologies. BI techniques are generally computer based that provide past, current and future trends of the enterprise. Its applications include activities of decision support system, query and reporting, complex event processing, online analytical processing, process mining, business performance management and statistical and predictive analysis. A data warehouse is a repository of r elational databases designed for query and analysis. It separates analysis workload from transaction workload and enables an organization to consolidate data from different varying sources. This data warehouse is analyzed by data mining, which is a latest technique allowing enormous data sets to be explored so as to yield hidden and unknown predictions that can be used in future for efficient decision making. The data that is used in current business domains is not precise, accurate and complete. Instead, data is considered uncertain and therefore this uncertainty is propagated to the results produced by Business Intelligence. So, now companies use techniques of data mining that involves pattern recognition, statistical and mathematical techniques to search data warehouses and help the analysts in recognizing significant trends, facts, relationships, exceptions and anomalies that might otherwise go unnoticed. This paper includes the study of data warehousing and data mining of uncertain and voluminous data in business intelligence. To provide better analysis of data, data mining methodology is used which is a computer assisted process of exploring and analyzing volumes of data and then extracting their correct meaning to enable business managers take best profitable decisions. Through this paper such areas would be uncovered and an in -depth study would be done on business intelligence, structured and unstructured data, data warehouse, its architecture, data mining process, models, techniques and then discuss their potential applications, future and scope in huge enterprise databases that provides quick, predictive and summarized information to the decision makers and enables companies to have sound competitive intelligence and in turn be the leaders in the mark et. Keywords - Business Intelligence, Data mining, Data warehouse, pattern recognition and statistics, structured and unstructured data 1. INTRODUCTION Information technology is now required in every aspect of our lives which helps business and enterpri se to make use of applications like decision support system, query and reporting, complex event processing, online analytical processing, statistical and predictive analysis and business performance management. In today’s business environment, where compan ies face global competition, survival and defeat depends on the efficiency, timeliness, knowledge and better decision making ability. Business Intelligence (BI) refers to technologies and applications for collecting, storing and analyzing business data tha t ultimately helps the enterprise to make better decisions [B. de Ville, 2001]. BI techniques are generally computer based that provide past, current and future trends of the enterprise. Its applications include activities of decision support system, query and reporting, data mining, complex event processing, online analytical processing, process mining, business performance management, text mining, statistical and predictive analysis. BI applications in an enterprise are diverse providing enterprise reporting to serve strategic management of business, executive information system, collaboration platform to allow sharing of inside and outside business data and electronic data interchange. Business Intelligence helps in good decision making and ensures competitive intelligence by analyzing the company’s internal data along with the information of the competitors [Bergeron and Hiller, 2002]. An important aspect of BI is knowledge management that helps companies in making good strategies through proper insight and experiences. To analyze volumes of data stored in data warehouse and data marts is still a time consuming and complex task. Thus, to provide better analysis of data, data mining methodology is used which is a computer assisted process of exploring and analyzing volumes of data and then extracting their correct meaning to enable business managers take best profitable decisions. BI helps companies to analyze their loads of data for decision making but not all data is structured and simple to understand as some data exists in unstructured or semi -structured form in which searching and interpretation consumes a lot of time. Thus, making decisions in such situations becomes complex. 2010-2012 - TECHNICALJOURNALS®, Peer Reviewed International Journals -IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 305 Vol 03, Issue 01; January-April 2012 Research Journal of Computer Systems Engineering - RJCSE http://technicaljournals.org cFACT -2012, Loyola College, Chennai, TN ISSN: 2230-8563; e-ISSN-2230-8571 1.1 Metadata- handling unstructured and semi-structured data Businesses store volumes of data in the form of web pages, emails, video and image files, news and reports which are called semi structured or unstructured data. In practise, such data leads to waistage of time in searching and leads to poor decisions as volumes of unstructured data are stored in variety of formats and referred by different technologies. To solve this problem, metadata which is data about data, should be kept with unstructured or semi-structured data. By the techniques of information extraction and automa tic categorization, metadata can be generated in the form of summaries or topics. This metadata is built from data warehouse and then used for data mining so, these technologies are required to be studied and researched. 2. DATA WAREHOUSE Data Warehouse is actually a repository of business or enterprise databases which gives a picture of historical and current organization’s operations [C. Date, 2003]. It focuses on internal sources as quality control, sales, inventory and production. Data warehouse is designed to enable efficient management decision making as it presents a coherent picture of business conditions at a single point of time. It involves development of systems that enable extraction of data from operating system and installation of a warehouse database system that helps managers to access data in flexible ways. Fig. 1. Data Warehouse (Source Han & Kamber, 2006) Data warehouse integrates multiple related data stores and provides managers volumes of information for analytical and decision support purposes. Its environment includes relational database, Extraction, Transportation, Transformation and Loading (ETL) solutions, client analysis tools and Online Analytical Processing (OLAP) engine. It provides information related to what, when, who, where and how aspects. To build data warehouse is a critical process for Knowledge Management and business intelligence so there is a rapid requirement to avail huge amounts of data and then turn such data into useful information and knowledge. 2.1 Requirement for data warehouse Data warehouse is a subject-oriented, time-variant, integrated and non -volatile collection of data in support of management’s decision making process. Its technology includes data cleansing, online analytical processing and data integration. It provides a single, consistent and complete store of data obtained from a variety of different sources to end users in a format which they can understand and use in a business context. It is used to simplify the following processes in an enterprise Manage and control business  Do optimized inquiry rather than update  To provide ad-hoc information for loosely-defined system  Used by managers and end users to understand the business and make judgements.  To integrate data across the enterprise  To allow what-if analysis  To allow quick decisions on historical data  2.2 Process of Data Warehousing Data Warehouse is a prescription for achieving analysis and identification of organizational value streams, strategic initiatives, and related business goals through a specific architecture. This process is conducted in an 2010-2012 - TECHNICALJOURNALS®, Peer Reviewed International Journals -IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 306 Vol 03, Issue 01; January-April 2012 Research Journal of Computer Systems Engineering - RJCSE http://technicaljournals.org cFACT -2012, Loyola College, Chennai, TN ISSN: 2230-8563; e-ISSN-2230-8571 iterative fashion after the initial business requirements and architectural foundations are developed with the emphasis on populating the Data Warehouse with "chunks" of functional subject -area information. The Process acts as a guideline for the development team to identify the business requirements, developing the business plan and Warehouse solution to business requirements, and implement the technical, configuration, and application architecture for the overall Data Warehouse. It then specifies the iterative activities for the cyclical planning, design, construction, and deployment of each population project. The Data Warehouse Process also includes conventional project management, startup, and wrap-up activities which are detailed in the Plan, Activate, Control and End stages. 2.3 Data Warehouse Architecture The architecture of data warehouse is based on business processes of a business enterprise along with considerations for adequate security, extent of query requirements, data modelling, metadata management and full technology and bandwidth utilization. The metadata and raw data of a traditional OLTP system is present, as is an additional type of data, summary data. Summaries are very va luable in data warehouses because they precompute long operations in advance. For example, a typical data warehouse query is to retrieve something like February sales. A summary in Oracle is called a materialized view. This illustrates three things:  Data Sources (operational systems and flat files)  Warehouse (metadata, summary data, and raw data)  Users (analysis, reporting, and mining) Fig. 2. Data Warehouse Architecture (Source- Oracle9i Data Warehousing Guide Release 2 (9.2), 2002) 3. DATA MINING- A BOON IN BI TO GAIN COMPETITIVE ADVANTAGE Data mining or knowledge discovery helps businesses in discovering new trends or patterns of behaviour that were previously not found or unnoticed. These patterns in turn, can be used in a predictive manner for variety of applications. It implies extraction of implicit, previously unknown and useful information from data [Witten and Frank, 1999]. Small changes in the strategy of data mining process can result into profits or better scientific decisions. Data mining is data driven in contrast to statistical, online analytical processing and query and reporting tools which are usually user driven. It is primarily used by compa nies with a strong consumer focus like financial, marketing, retail and communication organizations and so enables them to find relationships among “internal factors” like price, staff skills, and product line and “external factors” like competition, customer choice, economic indicators and market segmentation. It also helps enterprises to determine the impact on corporate profits, sales, goals and objective fulfilment and ultimately the customer satisfaction. Finally, it enables them to “drill-down” into the summary information to new detail transactional data. 3.1 Data Mining Process Data mining is a pure application dependent stage that provides extraction of useful, valid, understandable patterns from database, texts and web. It provides ways to make best use of data through rapid computerization [Pyle, 2003]. Data mining software analyzes relationships and patterns in stored transaction’s data based on open-ended user queries. It uses modelling techniques to make a model that is a set of examples or a mathematical relationship based on data from situations where the answer is known and then applying the same model to other situations where answers are hidden [Dunham, 2005]. The process of data mining involves following three stages- 2010-2012 - TECHNICALJOURNALS®, Peer Reviewed International Journals -IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 307 Vol 03, Issue 01; January-April 2012 Research Journal of Computer Systems Engineering - RJCSE http://technicaljournals.org cFACT -2012, Loyola College, Chennai, TN ISSN: 2230-8563; e-ISSN-2230-8571 1. ExplorationThis stage involves data preparation, data cleaning, and data transformations, selecting subsets of records and then performing feature selection to reduce number of variables to a manageable range. This reduction and choice of variables depends on the complexity of analysis varying from simple predictions for regression model to exploratory analyses of graphical and statistical data. 2. Model Building and ValidationThis stage involves choosing the best model based on their predictive performance by competitive eval uations in which different models are applied to same data set and the best is chosen after comparing their performances. The techniques used for comparison of models are bagging (voting for classification and averaging for regression- type problems with continuous dependent variables of interest), boosting (generate classifiers for predictive analysis and derive weights to combine their predictions into one), stacking ( stacked generalizations) and meta learning (combines predictions from multiple varying projects). 3. DeploymentThis is the final stage in which the model selected as best in the previous stage is applied to new data so as to generate predictions and estimates of the expected outcome. For example, an online shopping site doing e commerce transactions through credit card may deploy neural networks and meta learner to identify fraudulent transactions. Data mining process involves following tasks to sought relationships Classification It allows stored data to be divided into classes so as to locat e data into pre-determined groups. For example, a company while marketing its products could mine stored customer’s purchase data to determine when customers use their products in more numbers, in which area and what they purchase. This information could be now used to focus on such potential customers and market segments and increase sales by having discounts or free schemes.  Sequential Pattern MatchingIt is based on the sequential rule A->B which implies \that event B will always be followed by event A. It allows data mining to predict behaviour patterns and trends.  AssociationIt is a rule X->Y such that X and Y are data item sets.  ClusteringIt involves finding clusters of related or similar traits in groups. Data items are grouped in accordance with consumer preferences or logical relationships to identify market segmentation or consumer affinities. It may be of hierarchical order or non -hierarchical. Large single cluster Smallest Clusters Fig. 3. Hierarchy of clusters (lower level clusters are merged to give clusters at next higher level)  Deviation DetectionIt analyzes and finds significant changes in data.  Data VisualizationIt enables usage graphical of ways to show hidden patterns in data.  RegressionIt takes a numerical data set to develop a best -fit mathematical formula which can be then used to feed new data and get a prediction. It works well with continuous quantitative data. 4. DATA MINING MODELS TO SUPPORT BI Data mining uses a number of models to find mathematical relationships based on data from situations where the answer is known and then using the same model to other situations where answers are hidden. Following are the data mining models- [Hill and Lewicki, 2007] 2010-2012 - TECHNICALJOURNALS®, Peer Reviewed International Journals -IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 308 Vol 03, Issue 01; January-April 2012 Research Journal of Computer Systems Engineering - RJCSE http://technicaljournals.org cFACT -2012, Loyola College, Chennai, TN ISSN: 2230-8563; e-ISSN-2230-8571 1. CRISP (Cross-Industry Standard Process for Data Mining) It tells the process how to integrate data mining methodology into an organization, how to involve stakeholders, how to convert data into information and how to distribute information to enable decision making by stakeholders. It is basically used in European companies. Data Understanding Business Understanding Data Preparation Modelling Evaluation Deployment Fig. 4. CRISP Model for Data Mining 2. Six Sigma It is a well-structured, data driven model that removes waste, defect ive and quality control problems in activities related to management, manufacturing and service delivery. It is quite popular in American industries. Define Measure Analyze Improve Control Fig. 5. Six Sigma Model for Data Mining 3. SEMMA This model focuses more on the technical activities involved in a data mining project. Sample Explore Modify Model Assess Fig.6. SEMMA Model for Data Mining 5. NEXT GENERATION DATA MINING TECHNIQUES Data mining is less concerned to find relations between the involved variables and is more focussed on applications producing a solution that can generate useful predictions. It uses “black-box” approach to explore data and discover knowledge using traditional Exploratory Data Analysis (EDA) techniques and latest neural 2010-2012 - TECHNICALJOURNALS®, Peer Reviewed International Journals -IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 309 Vol 03, Issue 01; January-April 2012 Research Journal of Computer Systems Engineering - RJCSE http://technicaljournals.org cFACT -2012, Loyola College, Chennai, TN ISSN: 2230-8563; e-ISSN-2230-8571 network technologies that can generate valid estimates. Data m ining techniques are a blend of statistics, artificial intelligence and database research [Berson, Smith and Thearling, 1999]. Latest/ Next Generation techniques are artificial neural networks, decision trees, induction rules and genetic algorithms  Artificial Neural Networks This data mining technique uses non-linear predictive models to enable learning through training across a large number of diverse problems. Here, the computer is trained to think, respond and take decisions similar to humans. However, a lot of training has to be given to the system and only processed data is fed which gradually by learning and expertise makes the system efficient to mine and predict patterns from a database. Neural network models are quite complex to use and deploy even by experts due to which it is packaged ad a complete solution which once proved successful could be used endless without requiring deep understanding. It is also packaged with expert consulting services to enable business organizations detect fraudulent us e of credit cards. It determines relevant predictors for a model which are either used by themselves or in conjunction to yield “features” [Michael Gilman, 2004].  Decision Trees It is a data mining technique in which tree shaped structures represent sets of decisions generating rules for dataset classification. The top node also called as root is the starting node which is partitioned into two or more nodes depending on the results of the test. It is a fast data mining technique which allows results to be p resented as rules with little or no pre-processing of business data. Decision trees are used both for exploration and prediction using methods like Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID). Both techniques allow dataset classification and predict which records will give predictions by using a set of rules. CART creates two-way splits from dataset segmentation and requires less data preparation than CHAID which creates multi -way splits. It however, dur ing tree traversal, may leave valuable rules undiscovered based on limited information. Rules for decision tree are mutually exclusive and relatively exhaustive by top-to-bottom “greedy search” that is looking for the best split in next step.  Rule Induction It is one of the major data mining technique which enables knowledge discovery and unsupervised learning by extracting useful if-then rules from the database based on statistical significance. It pulls out possible patterns from database along with accuracy and significance parameters attached to each. Thus, user can now be more confident in selecting a prediction which is more sound and correct or has better logic and explanations attached by neural network. But sometimes it becomes confusing to select t he best rule out of the pool of rules. Rule induction is used on databases with either fields of higher cardinality or with many columns of binary fields. The rules produced by this data mining technique are not mutually exclusive and could be collectively exhaustive. It uses bottom-to-top approach in collecting patterns suited for later predictions. They retain all possible patterns even if they are redundant.  Genetic Algorithms It is an optimized data mining technique that is based on the concepts of gen etics, combinations, natural selection and mutation [Chen, 2002]. It has the ability to do fitness functions and crossover operations through random selection thus allowing strong analysis in changes in fitness from one population segment to another. However, it may sometimes pose problem in choosing the best possibility out of others [Chuck Kelley, 2002]. Genetic algorithms promote “survival of the fittest” using heuristic functions. There are two approaches to apply genetic algorithms in pattern recognition- one directly as a classifier and another as an optimization tool for resetting the parameters in other classifiers. Genetic algorithms are used to find an optimal set of feature weights that improve classification accuracy. 6. TECHNOLOGICAL INFRASTRUCTURE FOR DATA WAREHOUSE AND DATA MINING Data warehouse and data mining applications, in today’s scenario, are quite diverse in size and storage capacities ranging from mainframes to personal computers. Enterprise-wide applications range from 10 Gigabytes to even higher. So there main two critical technological drivers for data mining1. Database size- the configuration of the system now needs to be more advanced as more volumes of data are now processed and maintained. 2. Query complexity- more advanced systems are required now with more complex queries and increasing numbers to be processed. With increase in number and size of applications, relational databases are now used with extensive indexing capabilities to enhance performance of query evaluation and resp onse. Along with this, new hardware architectures like Massively Parallel Processors (MPP) are used which can link hundreds of high speed Pentium processors to achieve performance better than the supercomputers. 2010-2012 - TECHNICALJOURNALS®, Peer Reviewed International Journals -IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 310 Vol 03, Issue 01; January-April 2012 Research Journal of Computer Systems Engineering - RJCSE http://technicaljournals.org cFACT -2012, Loyola College, Chennai, TN ISSN: 2230-8563; e-ISSN-2230-8571 7. APPLICATIONS OF DATA WAREHOUSE AND DATA MINING IN BUSINESS INTELLIGENCE Data warehousing and data mining has gained its popularity in almost all traits as it helps to quickly analyze large databases which otherwise would be too complex and time consuming. The following list includes some possible applications Market segmentation - to identify customer’s common characteristics and behaviour that purchase same products of a company [Doug Alexander, 2000].  Banking- to learn underwriting, mortgage approval etc.  Web marketing- for targeted banner advertisements, personalization and cross sell/ upsell opportunities.  Customer Churn- to predict customers who are likely to leave the company and go to a competitor.  Direct marketing- for identifying prospects that are included in mailing list so as to obta in highest response time. It includes churn models, response models and next to buy analysis.  Fraud detection- to identify fraudulent transactions such as in credit card usage.  Finance- for stock and bond analysis, analysis and forecasting of business performance.  Manufacturing- for quality control, improvement and preventive maintenance.  Trend Analysis- to reveal the difference between a customer ’s behaviour over consecutive months.  Medicine- for diagnosis, epidemiological studies, drug analysis and qualit y control.  Government- for threat assessment and searching terrorist profiles. 8. FUTURE AND SCOPE OF DATA WAREHOUSING AND DATA MINING Data mining technology has bright future in business domains as it helps to generate new opportunities by automated prediction of behaviours and trends in a large database. For example, targeted marketing to get better Return On Investment(ROI) can be done by data mining past promotional mailings and identifying population segments that respond similarly to given events. Data mining techniques help to automatically discover previously unknown patterns such as identifying anomalous data that could highlight errors generated during keying data entry. Data mining is not only a hit with sales and marketing companies but also wit h financial institutions as it allows analysts to fast search through financial records and make best investment decisions. Even healthcare organizations are using data mining techniques to understand past trends and reduce future costs. So, future o f data mining can be analyzed in three phases as following1. In short term- here data mining is profitable for micro-marketing campaigns which advertise for target potential customers. 2. In medium term- here data mining is as easy as to use internet and emails like finding best prices. 3. In long term- here prospects of data mining are too fruitful, enabling new decisions and new insights from the database. Though data mining has lots of benefits in varying fields, it yet poses privacy concerns where one’s data stored at one database could be accessed by others either in same physical location or across internet leading to eavesdropping, frauds and security issues. 9. DATA MINING PRODUCTS Data mining is now welcomed and used aggressively in industries [Mike Chapple, 2011]. The major database vendors are already using data mining techniques in their platforms such as Darwin- It is an Oracle Data Mining Suite which implements classification and regression trees, k-nearest neighbours, regression analysis, neural networks and clustering algorithms.  SQL Server- It is Microsoft database platform which allows data mining functionality through the use of clustering algorithms and classification trees.  SPSS, SAS and S-Plus- These are advanced statistical packages allowing the im plementation of data mining algorithms. 10. CONCLUSION Thus data warehouse and data mining is an essential component in business operations to gain competitive intelligence as it helps in quick and efficient analysis of volumes of data stored in data warehouse and data marts from different perspectives and suggests hidden and unknown predictions that ultimately enhance future decision making process. Its techniques allow statistical, analytical and multidimensional analysis of data to evaluate relationships, correlations and trends. It is a powerful new technology helping companies to focus on 2010-2012 - TECHNICALJOURNALS®, Peer Reviewed International Journals -IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 311 Vol 03, Issue 01; January-April 2012 Research Journal of Computer Systems Engineering - RJCSE http://technicaljournals.org cFACT -2012, Loyola College, Chennai, TN ISSN: 2230-8563; e-ISSN-2230-8571 the most important information in the data collected so as to evaluate and understand the behaviour of potential customers and in turn capture the market. REFERENCES [1] Alexander Doug, “Data Mining”, (2000),http://www.laits.utexas.edu/norman/BUS.FOR /course.mat/Alex/, electronic article. [2] B. de Ville, (2001), “Microsoft Data Mining: Integrated Business Intelligence for e-Commerce and Knowledge Management”, Boston: Digital Press [3] Berson Alex, Smith J. Stephen, Thearling Kurt, (1999), “Building Data Mining Applications for CRM”, McGraw-Hill Companies [4] Chapple Mike, “Data Mining: An Introduction”, (2011), http:/ /databases.about. com/od/datamining/a/datamining.htm [5] Chen, S. H, (2002), “Genetic Algorithms and Genetic Progr amming in Computational Finance”, Boston, A: Kluwer [6] D. Pyle, (2003), “Business Modeling and Data Mining”, Morgan Kaufmann, San Francisco, CA. [7] Frank, E., Paynter, G., Witten, I.H., Gutwin, C. and Nevill-Manning, C, (1999), “Domain-specific keyphrase extraction.” Proc Int Joint Conf on Artificial Intelligence IJCAI-99. Stockholm, Sweden, pp. 668-673 [8] Gilman Michael, (2004), “Nuggets and Data Mining”, Data Mining Technologies Inc. Mel ville, NY 11714, (631) 692-4400 [9] Hill.T, Lewicki.P, (2007), “STATISTICS: Methods and Applications”, Statsoft, Tulsa, OK [10] Kelly Chuck, (2002), “What is the role of Genetic Algorithms in Data Mining”, Information Management: How your Business Works, Electronic Newsletter, http://www.information -management.com/news/5755-1.html [11] M. H. Dunham, (2005.), “Data Mining-Introductory and Advanced Topics”, Prentice Hall [12] P. Bergeron, C. A. Hiller, (2002), “Competitive intelligence”, in B. Cronin, Annual Review of Information Science and Technology, zedford, N.J.: Information Today, vol. 36, chapter 8 [13] Han Jiawei, Kamber Micheline, “ Data Mining: Concepts and Techniques”, 2nd edition, Morgan Kaufman Publishers, March 2006. ISBN 1-55860-901-6 [14] Oracle9i Data Warehousing Guide Release 2 (9.2), Part No. A96520-01, March 2002 [15] C. Date, (2003), “Introduction to Database Systems”, 8th ed., Upper Saddle River, N.J.: Pearson Addison Wesley. BIOGRAPHY I am an Assistant Professor in Jagannath Internati onal Management School, Kalkaji, Delhi. I have done MCA and is into academics and teaching for more than 8 years. My specialization is Computer Science and Information Technology in which I have made a number of live projects on .Net technology. I have written a number of research papers for National and International Journals and conferences. 2010-2012 - TECHNICALJOURNALS®, Peer Reviewed International Journals -IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST – TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc.,

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Lamar - MATH - 1214
was use the formula( a + b )2 = a 2 + 2ab + b 2with a = y and b = y - 4 . You will need to be able to do these because while this may not have worked here we will need to this kind of work in the next set of problems. Now, just what is the proble
University of Florida - PHY - 2049
59. Using Eq. 7-8, we find W = F d = ( F cos sin (x + y = Fx cos + Fy sin i+F j) i j) where x = 2.0 m, y = 4.0 m, F = 10 N, and = 150 . Thus, we obtain W = 37 J. Note that the given mass value (2.0 kg) is not used in the computation.
Grand Canyon - PSY - 225
waking person. The author then lists a number of objects that represent the maleand female genitalia, including sticks and rods for the male part and empty rooms and boxes forthe female parts. Freud extends the list to smooth walls, landscapes, and
McGill - ECON - 423
4-Part 1TRADE AND RESOURCES:THE HECKSCHER-OHLINMODEL1Heckscher-OhlinModel2Effects of Trade onProduction and onFactor PricesIntroduction In this chapter we outline the Heckscher-Ohlin model. The Heckscher-Ohlin model (HO) shows how trade occur
Uppsala - FINANCE - 10
University of PortsmouthPORTSMOUTHHantsUNITED KINGDOMPO1 2UPThis Conference or Workshop ItemGaber, Mohamed (2012) Pocket Data Mining: the nextgeneration in predictive analytics. In: Predictive AnalyticsInnovation Summit, 18 and 19 April, 2012, Lon
Uppsala - FINANCE - 10
University of PortsmouthPORTSMOUTHHantsUNITED KINGDOMPO1 2UPThis Conference or Workshop ItemGaber, Mohamed (2012) Pocket Data Mining: the nextgeneration in predictive analytics. In: Predictive AnalyticsInnovation Summit, 18 and 19 April, 2012, Lon
Uppsala - FINANCE - 10
Uppsala - FINANCE - 10
Towards better data on New Zealand debt securitiesmarkets1Rochelle Barrow and Michael ReddellThe recent global recession and international financial crisis have sparked fresh interest in financial data. Traditionally, dataon the balance sheets of fina
Uppsala - FINANCE - 10
Alessandro Spina*Friendship with the Government ? Web 2.0 tools andinformational power in risk governance.I. IntroductionPublic administrations are setting up Twitter accounts, Facebook profiles andthe like. They recruit social media officers and the
Uppsala - FINANCE - 10
Data Min Knowl Disc (2012) 25:173207DOI 10.1007/s10618-012-0275-9Comparing apples and oranges: measuring differencesbetween exploratory data mining resultsNikolaj Tatti Jilles VreekenReceived: 27 October 2011 / Accepted: 5 June 2012 / Published onlin
Uppsala - FINANCE - 10
Remote Aboriginal and Torres Strait Islanderemployment pathways: a literature reviewLaurie RiversWorking paperCW0082012Cooperative Research Centre for Remote Economic Participation Working Paper CW008ISBN: 978-1-74158-218-5CitationRivers L. 2012.
Uppsala - FINANCE - 10
Open Research OnlineThe Open Universitys repository of research publicationsand other research outputsRestructuring and rescaling water governance in mining contexts: the co-production of waterscapes in PeruJournal ArticleHow to cite:Budds, Jessica
Uppsala - FINANCE - 10
IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 1, No 3, January 2013ISSN (Print): 1694-0784 | ISSN (Online): 1694-0814www.IJCSI.org140Implementation of Data Mining in Estimating The GrowthOf Local SheepAan Kardiana1, Lilis Kh
Uppsala - FINANCE - 10
ReseaRch BRiefon Americas CitiesChrIstopher W. hoeneCity Budget Shortfalls and Responses:Projections for 2010-2012DeceMBeR 2009While the nations economy may be approaching the late stages of the worst economic downturn since the Great Depression, lo
Uppsala - FINANCE - 10
tGov Workshop 11 (tGOV11)March 17 18 2011, Brunel University, West London, UB8 3PHIT MANAGEMENT FOR TRANSFORMING LOCAL GOVERNMENT ADANISH COLLABORATIVE PRACTICE RESEARCH PROJECTPernille Krmmergaard, Center for IS Management, Aalborg University, Denmar
Uppsala - FINANCE - 10
Open Research OnlineThe Open Universitys repository of research publicationsand other research outputsRestructuring and rescaling water governance in mining contexts: the co-production of waterscapes in PeruJournal ArticleHow to cite:Budds, Jessica
American Public University - SOCI - 111
SOCIOLOGY 111Introduction to SociologyCourse ContractDue by Sunday of Week 1I Eric Giovanucci (4390504) hereby certify that I have carefully reviewed(printed name and Student ID)and understand all terms of the syllabus for SOCI 111, Introduction to
American Public University - SOCI - 111
SOCIOLOGY 111Introduction to SociologyCourse ContractDue by Sunday of Week 1I _ hereby certify that I have carefully reviewed(printed name and Student ID)and understand all terms of the syllabus for SOCI 111, Introduction toSociology. I understand
American Public University - SOCI - 111
SociologySOCI 111Introduction to Sociology3 Credit Hours8 Week CoursePrerequisite(s) NoneTable of ContentsInstructor InformationCourse DescriptionCourse ScopeCourse ObjectivesCourse Delivery MethodCourse MaterialsEvaluation ProceduresGrading
American Public University - SOCI - 111
Outline Style Mind MappingMain Idea Left Brain Vs. Right Brain and How It Affects LearningIntroduction- Left Brain/Right Brain TheoryFirst Subtopic Left BrainDetail Positive effect on heartDetail Positive effect on lungsDetail Positive effect on bon
American Public University - SCIN - 130
http:/screencast-o-matic.com/watch/cIQvYCVHMY
American Public University - SCIN - 130
Lab 1 Worksheet: Dependent and Independent VariablesStudent instructions: Follow the step-by-step instructions for this exercise found withinthe virtual lab and the instructions below and record your answers in the spaces below.This assignment does not
American Public University - SCIN - 130
Summary of Organism ProfileI will start with the basic life cycle of the Texas Sage. I will talk about the different typesof Sage, the structure, and where it naturally grows. I will talk about why this plantflourishes in a warm and dry climate as well
American Public University - SCIN - 130
TEXAS SAGE(Leucophyllum frutescens)Observed in Killeen, TexasTexas Sage (Leucophyllum frutescens) uses a process calledphotosynthesis to obtain and store energyPhotosynthesis: The process in green plants and certain other organisms by which carbohydr
American Public University - SCIN - 130
SCIN130 Course Project GuideI.Project DescriptionThisprojectwillallowyoutodemonstratethelearningobjectivesofthiscourseinan integrativeandcreativeformat.Youwillbeginthisassignmentbythinkingaboutthe biologicaldiversityoforganismsinthenaturalworldaround
American Public University - SCIN - 130
TEXAS SAGE(Leucophyllum frutescens)Area of Residence: U.S.A, Texas, KilleenI chose the Texas Sage because it is a veryhearty plant and I enjoy to garden and keepmy garden beds looking nice.
American Public University - SCIN - 130
ScienceSchoolofScienceandTechnologySCIN130IntroductiontoBiologywithLab4CreditHours16WeekCoursePrerequisite(s):NoneTableofContentsInstructorInformationEvaluationProceduresCourseDescriptionGradingScaleCourseScopeCourseOutlineCourseObjectivesP
Boise State - BIO - 227
Boise State - ENGR - 101
Boise State - PSYC - 295
Chapter 12Introduction to Analysis of VariancePowerPoint Lecture SlidesEssentials of Statistics for theBehavioral SciencesEighth Editionby Frederick J. Gravetter and Larry B. WallnauChapter 12 LearningOutcomesTools You Will Need Variability (Cha
Boise State - PSYC - 295
Chapter 8Introduction to HypothesisTestingPowerPoint Lecture SlidesEssentials of Statistics for theBehavioral SciencesEighth Editionby Frederick J. Gravetter and Larry B. WallnauChapter 8 LearningOutcomesTools You Will Need z-Scores (Chapter 5)
Boise State - PSYC - 295
Intro to StatisticsSymbol and Notation ReviewSome Basics Before We Move Forward. Generally, a single score value in a dataset of numbers will berepresented by X (or Y). If you have two datasets of numbers, typically, you will designate one asthe X d
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 12: One-Way Analysis of Variance (ANOVA)Worksheet1. Calculate T1:2. Calculate T2:3. Calculate T3:4. Calculate T4:5. Calculate Total Sample Gtotal:6. Calculate n1:7. Calculate n2:8. Calculate n3:9. Calculate
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 12: One-Way Analysis of Variance (ANOVA)Worksheet #2 (For Unequal Sample Sizes)1. Calculate dftotal = N 1:2. Calculate dfbetween = k 1:3. Calculate dfwithin = N k:4. Calculate SStotalG2= X2 - N:5. Calculate SS
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 12: Introduction to Analysis of Variance (ANOVA)Practical Exercise3. Why should you use an ANOVA instead of several t tests to evaluate mean differences whenan experiment consists of three or more treatment conditi
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 11: The t Test for Two Related SamplesPractical Exercise3. Explain the difference between a matched-subjects design and a repeated-measures design.5. A sample of n = 9 individuals participates in a repeated-measure
Boise State - PSYC - 295
Chapter 5z-Scores: Location of Scores andStandardized DistributionsPowerPoint Lecture SlidesEssentials of Statistics for theBehavioral SciencesEighth Editionby Frederick J. Gravetter and Larry B. WallnauChapter 5 LearningOutcomesTools You Will N
Boise State - PSYC - 295
Chapter 4Measures of VariabilityPowerPoint Lecture SlidesEssentials of Statistics for theBehavioral SciencesEighth Editionby Frederick J Gravetter and Larry B. WallnauLearning OutcomesTools You Will Need Summation notation (Chapter 1) Central te
Boise State - PSYC - 295
CHAPTER 10 THE t TEST FOR TWO INDEPENDENT SAMPLESPRACTICAL EXERCISEFORMULA WORKSHEET1. Calculate the degrees of freedom for each sample:__2. Calculate the pooled variance for the two samples:_3. Calculate the estimated standard error:_4. Calcula
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 10: The t Test for Two Independent SamplesPractical Exercise3. If other factors are held constant, explain how each of the following influences the value ofthe independent-measures t statistic and the likelihood of
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 9: Introduction to the t StatisticPractical Exercise1. Under what circumstances is a t statistic used instead of a z-score for a hypothesis test?7. The following sample was obtained from a population with unknown p
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 8: Introduction to Hypothesis TestingPractical Exercise1. The value of the z-score in a hypothesis test is influenced by a variety of factors. Assumingthat all other variables are held constant, explain how the val
Boise State - PSYC - 295
Chapter 3Measures of CentralTendencyPowerPoint Lecture SlidesEssentials of Statistics for theBehavioral SciencesEighth Editionby Frederick J Gravetter and Larry B. WallnauLearning OutcomesTools You Will Need Summation notation (Chapter 1) Frequ
Boise State - PSYC - 295
Chapter 6ProbabilityPowerPoint Lecture SlidesEssentials of Statistics for theBehavioral SciencesEighth Editionby Frederick J. Gravetter and Larry B. WallnauChapter 6 LearningOutcomesTools You Will Need Proportions (Math Review, Appendix A) Frac
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 7: Probability and Samples The Distribution of Sample MeansPractical Exercise1. Describe the distribution of sample means (shape, expected value, and standard error) for samplesof n = 100 selected from a population
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 5: z-Scores Location of Scores and Standardized DistributionsPractical Exercise3. For a distribution with a standard deviation of = 20, describe the location of each of thefollowing z-scores in terms of its positio
Boise State - PSYC - 295
Psych 295t-test Identification WorksheetType of t-test to use:Description of data youreworking withdf for Table lookup ():t for single SampleA sample,a group,a data set,a set of scores,M- where:t for Independent MeasuresORIndependent Sample
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 6: ProbabilityPractical Exercise3. What are the two requirements that must be satisfied for a random sample?10. Find the z-score boundaries that separate a normal distribution as described in each of thefollowing:
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 3: Central TendencyPractical Exercise5. Find the mean, median, and mode for the scores in the following frequency distribution table:Xf876543112522a. Calculate the mean (M) for the frequency distribut
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 4: VariabilityPractical Exercise1. In words, explain what is measured by each of the following:a. SS:b. Variance:c. Standard deviation:5. Explain why the formulas for sample variance and population variance are
Boise State - PSYC - 295
DatesTopicRequiredAssignmentsandTestsReading28AugustOverviewofCourseandtheScienceofKnowingReviewofSyllabusMathReviewChapter1MathReviewWorksheetHomework:HomeworkwillnotbeassigneduntiltheChapterisfinished.Onceassigned,homeworkwillbeduethefoll
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 1: Introduction to StatisticsPractical Exercise1. A researcher is investigating the effectiveness of a treatment for adolescent boys who aretaking medication for depression. A group of 30 boys is selected and half
Boise State - PSYC - 295
PSYCH 295 STATISTICAL METHODSChapter 2 Frequency DistributionsPractical Exercise3. Find each value requested for the distribution of scores in the following table.Xf5432123511a. nb. Xc, X25. For the following scores, the smallest value
Boise State - PSYC - 295
Chapter 1: Intro to Statistics(Statistics, Science and Observations) Statistics: short for statistical procedures Uses of Statistics Used to organize and summarize information Used to determine exactly what conclusionsare justified based on the resu
Penn State - CHEMISTRY - 110
Lecture 9 Monday 9/16Reality Check: Exam 1!Reading: Lesson 04-4-5, 05-1-2Know: Periodic propertieselectron configurations of ionsionic radiusisoelectronic serieschemical reactivity Chemical formulaReminder: ALEKS objective 3 due on Tues 9/17 E
Penn State - CHEMISTRY - 110
Know:Find practice exams from previous semesters onour course webpage.http:/courses.chem.psu.edu/chem110fall/handouts/handouts.htm Coulombs Law Ionic bonding strength: Lattice energy Lattice energy and melting pointBe sure to do the practice exams
Penn State - CHEMISTRY - 110
Avogadros number (NA)The Mole and Avogadros NumberLecture 3: Friday 8/30Reading: Lesson 01-4-5 (use Reading Guide)Know:Need onnect betweenbetween a(what we dontacroscopTo c a connection nanoscale toms and mof matter: chemists' dozen is the molesee
Penn State - CHEMISTRY - 110
Lecture 10 Wednesday 9/18Reading: Lesson 05-3Periodicity of Ionic SizesKnow:What You Should Know From Lecture 9 (9/16) What are the charges of the common ionsand how do you use the periodic table topredict them?Percent compositionCombustion analy
Penn State - CHEMISTRY - 110
Fall 2013 Chem 110 Exam ILecture 7, Wednesday 9/11Monday, September 23, 6:30 pm 7:45 pmData sheet and periodic table will be provided.Reading: Lesson 03-1-4Know:Energy level diagram of a many-electron atomElectron spin and Pauli Exclusion principle
Penn State - CHEMISTRY - 110
Lecture 14 Friday 9/27Reading: Lesson 07-3-4Know: Rules for drawing Lewis structures Resonance structures Bond order, bond length and bond energyReminder: ALEKS Objective #5 due Tues 10/1 Chem 110 Learning Strategies available on Angel.Earn 5 ext
Penn State - CHEMISTRY - 110
Welcome to CHEM 110 Chemical Principles IChemistry the Central ScienceChemistry: study of matter (materials) and itsSections 49 - 60Lectures: MWF 2:30 3:20 pmDr. Linlin Jensen201 Whitmore BuildingEmail: luz11@psu.eduOffice hours: by appointmentSc