preview

Taking a Look at the HDFS File System

Decent Essays

Introduction Hadoop distributed file system is a highly scalable file system. It is specially designed for applications with large data sets. HDFS supports parallel reading and processing of data. It is significantly different from other distributed file systems. Typically HDFS is designed for streaming large files. HDFS is specially designed to run commodity hardware and deployed into low cost hardware. It has large throughput instead of low latency. HDFS typically uses read one write many pattern. It is highly fault tolerant and easy to manage. The main feature of HDFS is built in redundancy it typically keeps multiple replicas in the system. In HDFS cluster manages addition and removal of nodes automatically. Here an operator can operate upto 3,000 nodes at a time. In the HDFS key areas of POSIX semantics have been traded to increase data throughput rate. Working of HDFS Hardware In HDFS hardware failure is a norm. Hardware failure is very common in HDFS. In any instance there is thousands of working server machines. There is huge number of components in HDFS. And each component has significant probability of failure. So there will always be some component which will be not working in HDFS system. Data in HDFS Applications in HDFS will require streaming access to data sets. Batch processing is done rather than interactive use by the users. HDFS is specially designed to operate large data sets. In any single instance it supports millions of files. Model of HDFS

Get Access