Setup passphraseless ssh • ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa – This generates public/private key pairs – id_rsa is the private key; id_rsa.pub public key • cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys – Add public key into the list of authorized keys • chmod 0600 ~/.ssh/authorized_keys – Change the file permission properly 47 -P specifies passphrase: here is an empty string
Check if it works • ssh localhost – It should login to localhost without asking for password (may need to confirm yes first time) • exit – Make sure you exit from "ssh localhost" 48
Formatting hdfs & starting hdfs • bin/hdfs namenode -format • sbin/start-dfs.sh – sbin/stop-dfs.sh to stop it 49
Verifying HDFS is started properly • Execute jps, you should see 3 java processes: – SecondaryNameNode – DataNode – NameNode • If NameNode is not started – Try to stop hdfs & reformat namenode (see previous slide) 50
Working with hdfs • Setting up home directory in hdfs – bin/hdfs dfs -mkdir /user – bin/hdfs dfs -mkdir /user/ec2-user (ec2-user is user name of your EC2 account) • Create a directory "input" under home – bin/hdfs dfs -mkdir /user/ec2-user/input – Or simply: – bin/hdfs dfs -mkdir input 51 This will automatically create the "input" directory under /user/ec2-user
Working with hdfs • Copy data from local file system – bin/hdfs dfs -put etc/hadoop/*.xml /user/ec2- user/input – Ignore error if you see one like this: "WARN hdfs. DataStreamer: Caught exception…" • List the content of directory – bin/hdfs dfs -ls /user/ec2-user/input 52
Working with hdfs • Copy data from hdfs – bin/hdfs dfs -get /user/ec2-user/input input1 – If input1 does not exist, it will create one – If it does, it will create another one under it • Examine the content of file in hdfs – bin/hdfs dfs -cat /user/ec2-user/input/core- site.xml 53
Working with hdfs • Remove files – bin/hdfs dfs -rm /user/ec2-user/input/core- site.xml – bin/hdfs dfs -rm /user/ec2-user/input/* • Remove directory – bin/hdfs dfs -rmdir /user/ec2-user/input – Directory "input" needs to be empty first 54
Where is hdfs located? • /tmp/hadoop-ec2-user/dfs/ 55
References • K. Shvachko, H. Kuang, S. Radia, and R. Chansler, " The hadoop distributed file system ," in Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, 2010, pp. 1-10. 56
You've reached the end of your free preview.
Want to read all 56 pages?
- Fall '14