Exercise 3
Name: Tanya Peddi
Student ID: 800968024
First we set the directory to the location where data is present.
setwd("C:/Users/Tanya Peddi/Documents/UNCC Academic/6162 Knowledge Discovery in Databases")
studentRatings < read.table('C:/Users/Tanya P
Chapter 2
Exploratory Data Analysis
2.1
Objectives
Nowadays, most ecological research is done with hypothesis testing and modelling
in mind. However, Exploratory Data Analysis (EDA), which uses visualization
tools and computes synthetic descriptors, is st
#examples of calculating confidence interals
#1. calculating confidence interval from a normal distribution
#
where population standard deviation is known
#
which would be rare!
#
sample mean is 5, standard dev of pop is 2,
#
sample size is 20, 95% confid
# This R environment comes with all of CRAN preinstalled, as well as many other
helpful packages
# The environment is defined by the kaggle/rstats docker image:
https:/github.com/kaggle/dockerrstats
# For example, here's several helpful packages to load
iris
aggregate(.~Species, data=iris, FUN = mean)
#boxplots are important in finding outliers.
boxplot(Sepal.Length~Species,data=iris)
boxplot(Species~Sepal.Length,data=iris)
barplot(table(iris$Species)
hist(iris$Sepal.Length)
mtcars
mtcars[,"6"]
mtcars["F
#EDA Homework name:
#TASK 1
#1.
#Recall the example that focused on the 'iris' dataset.
#Using this dataset, calculate the three separate correlation
#matrices with the four variables, which correspond to the
#three levels in the Species factor.
#use cor
Summary Plots
Time Series Plots
Geographical Plots
3D Plots
Simulation Plots
UCLA Department of Statistics
R Bootcamp
Graphics for Exploratory Data Analysis in R
Irina Kukuyeva
[email protected]
September 20, 2009
Irina Kukuyeva [email protected]
#EDA Homework name:
#TASK 1
#1.
#Recall the example that focused on the 'iris' dataset.
#Using this dataset, calculate the three separate correlation
#matrices with the four variables, which correspond to the
#three levels in the Species factor.
#use cor
Introduction to R and Exploratory data analysis
Gavin Simpson
November 2006
Summary
In this practical class we will introduce you to working with R. You will complete an
introductory session with R and then use a data set of Spheroidal Carbonaceous Partic
Problem
The Internet is now a household tool. In 2007 it was estimated that around 179 million
people worldwide used the Internet (over 100 million of those were in the USA and
Canada). From the increasing popularity (and usefulness) of the Internet has e
ITIS 6162 Knowledge Discovery in
Databases
Spring 2017
Getting to Know Your Data
Prof. Xi Niu
Assistant Professor, University of North Carolina at Charlotte
Jan 19, 2016
Outline
Data Objects and Attribute Types
Basic Statistical Descriptions of Data
Data
ITIS 6162 Knowledge Discovery in
Databases
Spring 2017
Linear Regression
Prof. Xi Niu
Assistant Professor, University of North Carolina at Charlotte
Feb 16, 2017
1
Aims
Understand linear regression with one or
several predictor
Understand how we assess th
ITIS 6162 Knowledge Discovery in
Databases
Spring 2017
Correlation
Prof. Xi Niu
Assistant Professor, University of North Carolina at Charlotte
Jan 26, 2017
1
Special Cases of Minkowski Distance
h = 1: Manhattan (city block, L1 norm) distance
d (i, j)  x
ITIS 6162 Knowledge Discovery in
Databases
Spring 2017
Categorical Data Analysis
Prof. Xi Niu
Assistant Professor, University of North Carolina at Charlotte
Feb 23, 2017
1
Agenda
ChiSquare Analysis
LogLinear Analysis
Hair color and eye color
Hair color
GettingStartedwithR
XiNiueditedthispageonJan25,201528revisions
Download and Installation
Download R from http:/cran.rproject.org/ by choosing the version
corresponding to your operating system. The installation is pretty
straightforward by clicking on th
Exercise 8
Name:Tanya Peddi
UNCC id : 800968024
In this assignment, we were provided with data taken from an experiment conducted by Laura
Nichols and Richard Nicki on Internet Addiction. The experiment involved a 36 items
questionnaire. Out of this they
Assignment 10
Name: Tanya Peddi
UNCC ID: 800968024
1) We load the data from AdultUCI of arules package into adultUCI
2) Data Preparation
 Removing columns fnlgwt and educationnum

Bin all the four numerical attributes to categorical variables
3) Applyi
Exercise 7
Name: Tanya Peddi
UNCC id : 800968024
Given data consists of three columns, namely Beckham.Profession, Beckham.Response,
Beckham.Happy. We are supposed to perform a chi square test on profession of participants
and how happy they are, professio
# This R environment comes with all of CRAN preinstalled, as well as many other
helpful packages
# The environment is defined by the kaggle/rstats docker image:
https:/github.com/kaggle/dockerrstats
# For example, here's several helpful packages to load
Name: Rachit Jaldipkumar Choksi
Kaggle Account ID: Rachit29
My Experience:
I have been working on Kaggle for past few days. So as per my knowledge,
Kaggle is the largest and most diverse data community in the world. Kaggle is
an open source platform which
Apache Hive
i
Apache Hive
About the Tutorial
Hive is a data warehouse infrastructure tool to process structured data in Hadoop.
It resides on top of Hadoop to summarize Big Data, and makes querying and
analyzing easy.
This is a brief tutorial that provide
REDUCTS
IN
INCOMPLETE
INFORMATION SYSTEMS
Zbigniew Ras
Information Systems
S = (X, AT) is an information system, where
X  objects,
ATattributes (partial functions from X into 2Va cfw_*),
Va  set of values of attribute a.
Example 1:
S = (cfw_1,2,3,4,5,6
www.kdd.uncc.ed
u
ACTION RULES
& META ACTIONS
College of Computing and
Informatics
University of North Carolina,
Charlotte
presented by
Zbigniew W. Ras
University of North Carolina, Charlotte, NC
College of Computing and Informatics
Introduction : Action
Rough Sets
Basic Concepts of Rough Sets
Information/Decision Systems (Tables)
Indiscernibility
Set Approximation
Reducts and Core
Rough Membership
Dependency of Attributes
Information Systems/Tables
Age
LEMS
x
1630 50
x2 1630
0
x3 3145 125
x4 3145 1
Problem 1.
Follow agglomerative strategy to cluster objects cfw_y1,y2,y6 represented by the
information system below.
Y
M
N
y1
1
2
y2
2
4
y3
6
2
y4
10
8
y5
6
6
y6
1
4
Use Manhattan distance / d(yi, yj) = Mi Mj  + Ni Nj  / for objects yi, yj and the
di
Sample Problems (ITCS 6114)
Problem 1.
Follow Prims algorithm (Kruskals algorithm) to find a minimum spanning tree for the
graph represented by the following set of edges: [(a,b),2], [(a,c),4], [(b,c),3], [(b,d),4],
[(b,e),1], [(c,e),5], [(d,e),2], [(e,f)
MIDTERM EXAM
Name:
Problem 1
For the information system given below, find the set of rules describing C in terms
of E, F, G by applying RSES algorithm. Find the set of all reducts of C.
Assume that
Dom(E)=cfw_e1,e2, Dom(F)=cfw_f1,f2,f3, Dom(G)=cfw_g1,g2,
Association Rules
presented by
Zbigniew W. Ras*,#)
*)
University of North Carolina Charlotte
#)
Warsaw University of Technology
Market Basket Analysis (MBA)
Customer buying habits by finding associations and
correlations between the different items that
c
Problem 1
For the information system given below, find the set of rules describing C in terms
of E, F, G by applying, CART, RSES, LERS algorithms.
Find the set of all coverings of C (reducts) using Rosetta (RSES). Assume that
Dom(E)=cfw_e1,e2, Dom(F)=cfw_