Class Labs & Assignments

EMSE 6574 - Programming for Analytics

Course Description (3 credits)

Introduction to programming for data analytics using the Python programming language. Prepares students for higher-level courses in data analytics. Recommended background: Some prior experience with programming.

Assignment 1

Assignment Description

“Palindrome” script – take any string, and find out if it is a palindrome – does it read backward the same as forward? Create a huge dataset of fake data (or real data) using a list of dicts as a data structure 1 Iterate through that list; if a record matches some condition, print it.

Assignment 2

Assignment Description

Convert the tree class to a graph class. A graph can have as many “child nodes” to a parent node as you want (in a graph we don’t call them “child nodes”, they’re “neighbors”) You can have loops – a node can point back to its parents / grandparents / etc Write a depth first traversal of a graph starting at any random node – you stop and backtrack when you hit a leaf node, or when you hit a node you’ve already visited Write a breadth-first traversal of a graph starting at any random node Example = simple social network. Pick a book or a movie, characters are nodes, edges mean “friends with”

Assignment 3

Assignment Description

Do analysis on the FEC dataset: Go “spelunking” and see what you can find – find some celebrities? Who did they donate to? Find some major corporations – can you learn about their political strategy? BONUS POINTS – you wrote a graph data structure in part 1: see if you can fit this data into your data structure Straw donors – that’s when your boss tells you to donate to X, and reimburses you

Assignment 4

Assignment Description

Analyze on two different dataset, m_data and w_data. And by running analysis on the datasets to know what is the data telling us? There were two groups: a contol group and a test group. The test group was told some additional information on the dataset.

Assignment 5

Assignment Description

we go through the dataset and notebook of https://www.kaggle.com/ash316/ml-from-scratch-with-iris and try to work on it on our own.

Assignment 6

Assignment Description

Try to conduct machine learning regressors and to predict diamond prices better than the in-class regressor. It should have:

Fewer errors > $2,500
Fewer or no errors > 10% of price

The dataset to use is the diamond dataset (https://www.kaggle.com/shivam2503/diamonds).

Assignment 7

Assignment Description

Apply random forest classifier/regressor on a dataset we found online.

Assignment 8

Assignment Description

Code a genetic algorithm to optimize a particular problem. My problem of choice is the traveling salesman problem - visit all cities in a list in the shortest distance possible without revisiting already visited cities.

Assignment 9

Assignment Description

Pick a stock and download its historical price data. The stock I found is Tesla. I then resample data to weekly and monthly prices. Determine if there’s any seasonality and train a SARIMA model on it to try and predict future prices.

Assignment 10

Assignment Description

Find a source of text and create a bag-of-words representation. Build a simple sentiment analyzer from scratch without using any sentiment packages.

Assignment 11

Assignment Description

Find a source of text, process it, and use k-means to generate a topic map.

Assignment 12

Assignment Description

Find a source of text and implement a query/search engine from scratch.

Assignment 13

Assignment Description

Get some data and do something interesting with NetworkX.

Final Project

Assignment Description

For this class project, we are trying to create two models to predict the type of cuisine and the amount of calories from a list of ingredients. With these two models, we then create a web application so people can play around with it. The app was created using Streamlit and hosted with Streamlit Sharing. To see the app, click the Streamlit badge below.