Abstract
In the last decade or so there has been an increased use and growth of social media, unconventional web technologies, computers and mobile applications, which have all encouraged development of various database models. Recent datasets are extremely costly and unpractical to administer with SQL databases due to lack of structure, high scalability, and elasticity that is needed. The No SQL data stores such as Mongo DB and Cassandra provide a desirable platform for fast and efficient data queries. With the introduction of the “Big Data” the size and structure of data have become highly dynamic and complex. This paper exhibits evaluation of the Cassandra a No SQL database when used in conjunction with the Hadoop Map Reduce engine, also the Ceph File system developed by Data Stax and the Lustre File system which all can be used as an alternative to HDFS (Hadoop Distributed File System). We provide a brief overview of MapReduce and Hadoop and then show some of the shortcomings of Hadoop + HDFS. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function. Also we have evaluated theoretical and actual performance of Lustre and HDFS for a variety of workloads in both traditional and Map/Reduce-based applications.