There is a retail database with the following tables: 1.)Customers 2.)Orders 3.)Order_Items 4.)Products 5.)Categories 6.)Departments
I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perform analytics using Spark and Scala and loading the data back to HDFS.
I have added a document called 'Project Requirements' which specifies the problem statements in this project.