Abstract
Current generation is witnessing data explosion most of it is unstructured and is called Big Data. This data has characteristics of high volume, velocity, variety and veracity. HDFS, GFS, Ceph, Lustre, PVFS etc are used as file system for storing Big Data. MapReduce processes program in parallel across clusters and generates output. Spark framework improves performance by 10x when datasets are stored in hard disk and performance improves by 100x when data is stored in memory. This paper proposes optimization of Big Data processing using Spark framework.