Hey guys, in this meeting we will start our data analysis on datasets pertaining to (previous or ongoing) competitions. After Datafest, I thought this would be a good opportunity to have the club practice and see real world datasets. Data Competition are a great for beginners to benchmark themselves and learn from others, but it is also a great source of reputation if you are able to win. Data Competitions help rack up achievements that you can put on your portfolio. If that’s not enough to incentivize you, data competitions can award their participants with free swag and prizes ranging anywhere from $500 - $1,000,000. The chances of winning is sometimes easier than you think, but overall this is a good experience for expanding your career and knowledge base.

Wrap up the Movie Dataset: You can continue working on the movie dataset, but if you want to showcase your work please email us back with your rmd file.

Schedule tentative plans: We’re Recruiting Officers! Please talk to one of the current officers or email us if you have any interest in helping out!

Talk about Datafest: Today we’re going to talk about PROS and CONS about Datafest. First, I would like to say thank you for those of you who could stick around for the entirety of Datafest. There is great honor for the club and the representing school for participating in this event. For many our members, I can imagine how busy and stressful the weekend was. Working on a extremely large dataset can also be difficult. There are multiple ways to circumventing this issue, but I would suggest having a good grasp on data manipulation would take you further as a Data Scientist. In spite of the fact, we will discuss more about our experience in today’s club meeting.


Our main focus this week is to explore the Home Depot Product Search Relevance Dataset:

(1) Home Depot Dataset- https://www.kaggle.com/c/home-depot-product-search-relevance/data

If you need help getting started try answering these questions for your data analysis:

  • Can you predict which products is going to be relevent for the customer’s searches?
  • Can you perform a AB Testing analysis with the data?
  • Can you write an algorithm that offers relevant product ratings or recommendations?
  • Can you create a business analysis for how Home Depot can improve their site searches?
  • Can you compare keywords or draw outside information about search optimizations?

Optional Datasets:

(2) [EASY] Blood Donation Dataset : https://www.drivendata.org/competitions/2/warm-up-predict-blood-donations/

(3) [MEDIUM] Classifying Crisis Reports: https://www.datasciencechallenge.org/challenges/2/growing-instability/ *Available for Prize Money

(4) [HARD] Yelp Dataset Challenge: https://www.yelp.com/dataset_challenge *Available for Prize Money