This is a big data analytics project. The topic is to investigate the relation between the input in different internal departments and the influence of the companies' development globally. By using AWS (EC2/AMI), Jupyter, PySpark, and Spark, the project illustrates a completed process of on-cloud big data analysis. Read the full report: Final report
- The practice of CRISP (Cross-Industry Standard Process) data mining methodology.
- The use of Python programming in data mining tasks
- Experience of deploying a series of data analyzing scripts on the cloud (it could be advantageous when the data workload is high, or the data pipeline is operated on the cloud)
- The use of data visualization tool, Tableau
