A playground for getting familiarized with PySpark is established. PySpark is the Python API for Apache Spark, an open-source framework for distributed big data processing. The contained examples widely follow the official documentation.
-
PySpark can be easily installed via pip:
pip install pyspark
-
An interactive shell session can be started by:
pyspark