Although I was doing experimental research for all of my academic research, it always included some component of computational analysis. At first, I was tracking and analyzing colloidal particle trajectories over tens of thousands of images. I later was solving a system of differential equations to determine the interfacial tension in images of an oil-water interface. Moving forward, it seemed like fully immersing myself into data analysis was an obvious next step, so I took an immersive, project-based course. I later built on my initial projects to make them comprehensive and learn new techniques at the same time. In this section, I have described the thought process and inspiration behind some of my work. Accompanying notebooks are available at my Github.
I received an assessment after applying for an analytics position with a baseball team. Anyone who knows me knows this would be a dream job in a way, having played softball basically my whole life. I dove right into the assignment, which had the stipulation of only spending 1 - 2 hours on it. I did try to stay within the limits, but the subject matter being one of those magical marriages between personal and professional interests, I continued on afterwards to practice 1) scaling and sampling techniques and 2) creating useful visualizations for the results.
In an attempt to data science my way to solutions to the world’s problems, I tried to predict results from professional men’s tennis. More specifically, I used existing rankings and statistics to determine if a given player would make the round of 16 (R16) of a Grand Slam tournament. This wasn’t even for gambling purposes - I’m too cheap to gamble - but the motivation to see my favorite players in person. Ticket prices increase with each round and go on sale before the tournament starts, so it’s a gamble to spring for pricey later round seats without knowing who the players will be. Knowing with greater certainty that my favorites would be playing, perhaps I’d be more willing to spend more.
The second project was one of our own choosing. I (finally) settled on trying to model the rental rates of Airbnb listings in Toronto using the details of the rental unit. As someone who frequently uses Airbnb, I am quite familiar with the website and its offerings. Choosing the location was easy - having spent the last six years in Toronto, I know the city well, which in theory would come in handy when analyzing.
The first project at Metis involved using Python and Pandas to examine the turnstile data available online from the Metropolitan Transportation Authority. Doing some introductory data cleaning and analysis, we were expected to determine the MTA ridership of different stations and over different time periods.