Class Lectures & Assignments

EMSE 6586 - Database Management for Data Analyts

Course Description (3 credits)

Study and design of database and data management systems for big data and data analytics; design of relational database systems and the SQL query language; NoSQL databases for unstructured data, including key-value, distributed table, graph databases, parallel processing databases.

Lecture 1 - Course Overview

Lecture Description

This is the introduction to the courseware, comparative databases & Softwares. We want to give a high-level explanation of the environment of database structure. This will enable us to focus more narrowly on the Database Software for now.

Lecture Note

Lecture 2 - MongoDB

Lecture Description

Go through a set of notebooks for introduction of MongoDB. The links below are to the course lectures and labs.

Lecture Note

Lab Assignment - JSON with Python

Assignment 01 Submission

Lecture 3 - MongoDB Part 2

Lecture Description

Continue on exploring the use of MongoDB. Introduce the topic of Regex.

Lecture Note

Regex

Lab Assignment - Regex with Python

Lecture 4 - PyMongo

Lecture Description

This is the introduction to use Python in MongoDB.

Lab Assignment - PyMongo

Lecture 5 - MySQL

Lecture Description

Go through a set of notebooks for introduction of MySql. The links below are to the course lectures and labs.

Lecture Note

MySQL

Lab Assignment

MySQL in Python

MySQL Table Creation

SQL Support Classes

Assignment 02 Submission

Lecture 6 - Arango

Lecture Description

Introduction to the topic and the use of Arango database. The links below are to the course lectures and labs.

Lecture Note

Lab Assignment - Graph View Class

Lecture 7 - Hadoop

Lecture Description

Introduction to the topic of Hadoop. The links below are to the course lectures and labs.

Lecture Note

Hadoop

Lecture 8 - Spark

Lecture Description

Introduction to the topic of Spark environment database. The links below are to the course lectures and labs.

Lecture Note

Spark

Lab Assignment - PySpark

Lecture 9 - DB Speedrun

Lecture Description

Introduction to the topic of Spark environment database. The links below are to the course lectures and labs.

Lecture Note

DB Speedrun

Final Project

Project Description

The goal of this project is to restructure a flattened dataset, and load into a SQL database, demonstrate the convenience to have this dataset stored in a database, and provide an efficient and easier way for end-users to search for specific information. The dataset was collected from the Zomato API in the form of .json files (raw data) and sotred in the Comma Separated Value file Zomato.csv. We explored this dataset by visualizing the information that has been fetched, and have a better understanding of the dataset.

We worked with Zomato dataset that is stored in Kaggle commnunity. https://www.kaggle.com/shrutimehta/zomato-restaurants-data Zomato is a project launched in Delhi 12 years ago, and is present in 10000+ cities globally. Zomato is one of the ‘largest food aggregators in the world’ and their mission is to connect people to food.(https://www.zomato.com/who-we-are)