EMSE 6586 - Database Management for Data Analyts


Course Description (3 credits)

Study and design of database and data management systems for big data and data analytics; design of relational database systems and the SQL query language; NoSQL databases for unstructured data, including key-value, distributed table, graph databases, parallel processing databases.


Lecture 1 - Course Overview

Lecture Description

This is the introduction to the courseware, comparative databases & Softwares. We want to give a high-level explanation of the environment of database structure. This will enable us to focus more narrowly on the Database Software for now.

Lecture Note

Lecture 2 - MongoDB

Lecture Description

Go through a set of notebooks for introduction of MongoDB. The links below are to the course lectures and labs.

Lecture Note
Lab Assignment - JSON with Python

nbviewer

Assignment 01 Submission

nbviewer


Lecture 3 - MongoDB Part 2

Lecture Description

Continue on exploring the use of MongoDB. Introduce the topic of Regex.

Lecture Note
Lab Assignment - Regex with Python

nbviewer


Lecture 4 - PyMongo

Lecture Description

This is the introduction to use Python in MongoDB.

Lab Assignment - PyMongo

nbviewer


Lecture 5 - MySQL

Lecture Description

Go through a set of notebooks for introduction of MySql. The links below are to the course lectures and labs.

Lecture Note
Lab Assignment
MySQL in Python

nbviewer

MySQL Table Creation

nbviewer

SQL Support Classes

nbviewer

Assignment 02 Submission

nbviewer


Lecture 6 - Arango

Lecture Description

Introduction to the topic and the use of Arango database. The links below are to the course lectures and labs.

Lecture Note
Lab Assignment - Graph View Class

nbviewer


Lecture 7 - Hadoop

Lecture Description

Introduction to the topic of Hadoop. The links below are to the course lectures and labs.

Lecture Note

Lecture 8 - Spark

Lecture Description

Introduction to the topic of Spark environment database. The links below are to the course lectures and labs.

Lecture Note
Lab Assignment - PySpark

nbviewer


Lecture 9 - DB Speedrun

Lecture Description

Introduction to the topic of Spark environment database. The links below are to the course lectures and labs.

Lecture Note

Final Project

Project Description

The goal of this project is to restructure a flattened dataset, and load into a SQL database, demonstrate the convenience to have this dataset stored in a database, and provide an efficient and easier way for end-users to search for specific information. The dataset was collected from the Zomato API in the form of .json files (raw data) and sotred in the Comma Separated Value file Zomato.csv. We explored this dataset by visualizing the information that has been fetched, and have a better understanding of the dataset.

We worked with Zomato dataset that is stored in Kaggle commnunity. https://www.kaggle.com/shrutimehta/zomato-restaurants-data Zomato is a project launched in Delhi 12 years ago, and is present in 10000+ cities globally. Zomato is one of the ‘largest food aggregators in the world’ and their mission is to connect people to food.(https://www.zomato.com/who-we-are)

Project Slides

Slides PDF

Exploratory Data Analysis

nbviewer

SQL Database Creation

nbviewer



Back to Top