Class Lectures & Assignments
EMSE 6586 - Database Management for Data Analyts
Course Description (3 credits)
Study and design of database and data management systems for big data and data analytics; design of relational database systems and the SQL query language; NoSQL databases for unstructured data, including key-value, distributed table, graph databases, parallel processing databases.
Lecture 1 - Course Overview
Lecture Description
This is the introduction to the courseware, comparative databases & Softwares. We want to give a high-level explanation of the environment of database structure. This will enable us to focus more narrowly on the Database Software for now.
Lecture Note
Lecture 2 - MongoDB
Lecture Description
Go through a set of notebooks for introduction of MongoDB. The links below are to the course lectures and labs.
Lecture Note
Lab Assignment - JSON with Python
Assignment 01 Submission
Lecture 3 - MongoDB Part 2
Lecture Description
Continue on exploring the use of MongoDB. Introduce the topic of Regex.
Lecture Note
Lab Assignment - Regex with Python
Lecture 4 - PyMongo
Lecture Description
This is the introduction to use Python in MongoDB.
Lab Assignment - PyMongo
Lecture 5 - MySQL
Lecture Description
Go through a set of notebooks for introduction of MySql. The links below are to the course lectures and labs.
Lecture Note
Lab Assignment
MySQL in Python
MySQL Table Creation
SQL Support Classes
Assignment 02 Submission
Lecture 6 - Arango
Lecture Description
Introduction to the topic and the use of Arango database. The links below are to the course lectures and labs.
Lecture Note
Lab Assignment - Graph View Class
Lecture 7 - Hadoop
Lecture Description
Introduction to the topic of Hadoop. The links below are to the course lectures and labs.
Lecture Note
Lecture 8 - Spark
Lecture Description
Introduction to the topic of Spark environment database. The links below are to the course lectures and labs.
Lecture Note
Lab Assignment - PySpark
Lecture 9 - DB Speedrun
Lecture Description
Introduction to the topic of Spark environment database. The links below are to the course lectures and labs.
Lecture Note
Final Project
Project Description
The goal of this project is to restructure a flattened dataset, and load into a SQL database, demonstrate the convenience to have this dataset stored in a database, and provide an efficient and easier way for end-users to search for specific information. The dataset was collected from the Zomato API in the form of .json files (raw data) and sotred in the Comma Separated Value file Zomato.csv. We explored this dataset by visualizing the information that has been fetched, and have a better understanding of the dataset.
We worked with Zomato dataset that is stored in Kaggle commnunity. https://www.kaggle.com/shrutimehta/zomato-restaurants-data Zomato is a project launched in Delhi 12 years ago, and is present in 10000+ cities globally. Zomato is one of the ‘largest food aggregators in the world’ and their mission is to connect people to food.(https://www.zomato.com/who-we-are)