Class Labs & Assignments
EMSE 6574 - Programming for Analytics
Course Description (3 credits)
Introduction to programming for data analytics using the Python programming language. Prepares students for higher-level courses in data analytics. Recommended background: Some prior experience with programming.
Assignment 1
Assignment Description
“Palindrome” script – take any string, and find out if it is a palindrome – does it read backward the same as forward? Create a huge dataset of fake data (or real data) using a list of dicts as a data structure 1 Iterate through that list; if a record matches some condition, print it.
Assignment 2
Assignment Description
Convert the tree class to a graph class. A graph can have as many “child nodes” to a parent node as you want (in a graph we don’t call them “child nodes”, they’re “neighbors”) You can have loops – a node can point back to its parents / grandparents / etc Write a depth first traversal of a graph starting at any random node – you stop and backtrack when you hit a leaf node, or when you hit a node you’ve already visited Write a breadth-first traversal of a graph starting at any random node Example = simple social network. Pick a book or a movie, characters are nodes, edges mean “friends with”
Assignment 3
Assignment Description
Do analysis on the FEC dataset: Go “spelunking” and see what you can find – find some celebrities? Who did they donate to? Find some major corporations – can you learn about their political strategy? BONUS POINTS – you wrote a graph data structure in part 1: see if you can fit this data into your data structure Straw donors – that’s when your boss tells you to donate to X, and reimburses you
Assignment 4
Assignment Description
Analyze on two different dataset, m_data
and w_data
. And by running analysis on the datasets to know what is the data telling us? There were two groups: a contol group and a test group. The test group was told some additional information on the dataset.
Assignment 5
Assignment Description
we go through the dataset and notebook of https://www.kaggle.com/ash316/ml-from-scratch-with-iris and try to work on it on our own.
Assignment 6
Assignment Description
Try to conduct machine learning regressors and to predict diamond prices better than the in-class regressor. It should have:
- Fewer errors > $2,500
- Fewer or no errors > 10% of price
The dataset to use is the diamond dataset (https://www.kaggle.com/shivam2503/diamonds).
Assignment 7
Assignment Description
Apply random forest classifier/regressor on a dataset we found online.
Assignment 8
Assignment Description
Code a genetic algorithm to optimize a particular problem. My problem of choice is the traveling salesman problem - visit all cities in a list in the shortest distance possible without revisiting already visited cities.
Assignment 9
Assignment Description
Pick a stock and download its historical price data. The stock I found is Tesla. I then resample data to weekly and monthly prices. Determine if there’s any seasonality and train a SARIMA model on it to try and predict future prices.
Assignment 10
Assignment Description
Find a source of text and create a bag-of-words representation. Build a simple sentiment analyzer from scratch without using any sentiment packages.
Assignment 11
Assignment Description
Find a source of text, process it, and use k-means to generate a topic map.
Assignment 12
Assignment Description
Find a source of text and implement a query/search engine from scratch.
Assignment 13
Assignment Description
Get some data and do something interesting with NetworkX.
Final Project
Assignment Description
For this class project, we are trying to create two models to predict the type of cuisine and the amount of calories from a list of ingredients. With these two models, we then create a web application so people can play around with it. The app was created using Streamlit and hosted with Streamlit Sharing. To see the app, click the Streamlit badge below.