1-hdfs-notes - What is BIGDATA 3 Vs of BIGDATA Volume...

This preview shows page 1 - 17 out of 38 pages.

What is BIGDATA?
Image of page 1

Subscribe to view the full document.

Big Data 3 V’s of BIGDATA Volume Petabyte scale Variety Structured Semi-structured Unstructured Velocity Social Sensor
Image of page 2
Image of page 3

Subscribe to view the full document.

Image of page 4
Image of page 5

Subscribe to view the full document.

What is Hadoop?
Image of page 6
New Hardware & Software Approach to handle BIGDATA New Hardware Approach New Software Approach
Image of page 7

Subscribe to view the full document.

Image of page 8
Image of page 9

Subscribe to view the full document.

Image of page 10
HDFS
Image of page 11

Subscribe to view the full document.

A self-healing distributed filesystem running on clusters of commodity hardware, intended for storing large files with streaming data access patterns.
Image of page 12
Principles of HDFS Highly fault-tolerant Designed to be deployed on low-cost hardware Highly scalable Provides high throughput access to application data Suitable for applications that have large data sets(typically GBs to TBs) Portable across heterogeneous hardware and operating system platforms No support for random updates but append is allowed
Image of page 13

Subscribe to view the full document.

Image of page 14