Below you will find pages that utilize the taxonomy term “apache spark”
Data Analysis using Sparks, Pandas, and Matplotlib using Jupyter Notebook for data in S3(Minio)
Data Analysis is to understand problems facing an organization and to explore data in meaningful ways. Data in itself is merely facts and figures. Evaluation of the data can provide advantages to the organization and aid in making business decisions.
Brief Overview of the components Apache Spark is a lightning-fast cluster computing technology, designed for fast computation and based on Hadoop MapReduce.
Pandas is a software library written in Python for data manipulation and analysis.
read more
Monday, May 13, 2019
By Prashant Shahi