%md-sandbox
<h2 style="color:red">Instructor Note</h2>
#### The purpose of this notebook:
* Introduce the various Spark entry points, namely `SparkSession`.
* Introduce students to the API documentation.
* Introduce students to the `DataFrameReader` class.
* The payoffs for this notebook include...
* How to read in data from CSV
* Understanding the difference between using **inferSchema** and specifying a schema.
* Regarding `printRecordsPerPartition(..)`, it
* converts the specified `DataFrame` to an RDD
* counts the number of records in each partition
* prints the results to the console.
Last refresh: Never
%md
# Reading Data - CSV Files
**Technical Accomplishments:**
- Start working with the API documentation
- Introduce the class `SparkSession` and other entry points
- Introduce the class `DataFrameReader`
- Read data from:
* CSV without a Schema.
* CSV with a Schema.
Reading Data - CSV Files
Technical Accomplishments:
- Start working with the API documentation
- Introduce the class
SparkSession
and other entry points - Introduce the class
DataFrameReader
- Read data from:
- CSV without a Schema.
- CSV with a Schema.
Last refresh: Never
%md
##  Classroom-Setup<br>
For each lesson to execute correctly, please make sure to run the **`Classroom-Setup`** cell at the start of each lesson (see the next cell) and the **`Classroom-Cleanup`** cell at the end of each lesson.
Classroom-Setup
For each lesson to execute correctly, please make sure to run the Classroom-Setup
cell at the start of each lesson (see the next cell) and the Classroom-Cleanup
cell at the end of each lesson.
Last refresh: Never
Last refresh: Never
Last refresh: Never
Last refresh: Never