Welcome to our project for the NTU course SC1015 Introduction to Data Science and Artificial Intelligence!

In this project, we explore how to become more attractive on dating app.

Presentation Video

Content

All code is located under the src.

Please read through the code in the following sequence:

Problem Formulation

How to become more attractive in dating app?

Data preparation

Data Exploration

3972 responses

43 features

All female

Taken integer and boolean values as primary exploration

Understanding the Data

Integer Data

Central Tendency of frequency
Spread of frequency

Mean Medium Q25 Q50 Skew

Boolean Data

Feature engineering
Spread of frequency

Mean Medium Q25 Q50 Skew

Quantile-based discretization

Helps in capturing the inherent variability within the data

Reduce noise and focusing on broader trends rather than individual data points

Feature engineering

Convert individual boolean indicators into a more informative ordinal scale

Simplifies the input for modeling and may reveal patterns more effectively

Machine Learning

LinearRegression

Explore what integer values imposes an effect on counts_kisses

Explore which integer variable have a stronger correlation with counts_kisses

Decision Tree

Explore correlation between boolean value and counts_kisses

Explore which boolean value have a stronger correlation with counts_kisses

Chi-test

Explore correlation between boolean values

Conclusion

Insights

Distance seems to play a significant role in user engagement, as suggested by the chi-square test results between distance_category and kisses_category.

The levels of expressed flirtatious interest are strongly associated with the likelihood of receiving more 'kisses', an indicator of attractiveness on the platform.

The logistic regression analysis highlighted the importance of specific categories within flirt interest and distance_category, quantifying their unique impacts on the likelihood of higher kisses_category.

The use of decision trees demonstrated the importance of counts_kisses as a feature, and how different variables interact with it to affect a user's perceived attractiveness.

Sub-problem

Which variable most effectively indicates the attractiveness of a user on dating apps?

Count_kisses

What variables demonstrate a strong correlation with the key indicator of attractiveness?

Profile_visits , Distance category , Flirt_interest

Improvements

Group Members

Name Email Contribution
Wang Yanjie WANG2037@e.ntu.edu.sg Machine Learning, Conclusion, Slides ; Script
Dai Shiyu dais0013@e.ntu.edu.sg Motivation, Problem formulation, Data preparation, Slides ; Script

Reference

Various resources were used to help us gain a better understanding of the project and the various machine learning methods.

  1. DataSet from Kaggle
  2. DataSet from Kaggle
  3. Learning Materials from Nanyang Technological University
    1. Helped us gain a basic understanding of machine learning.
    2. Lab classes guided us to start using Jupyter Notebook.
  4. The Elements of Statistical Learning
    1. Helped us dive deeper into the theory behind support vector machines.
  5. ChatGPT
    1. Helped us understand the code.
    2. Help us debug code when it`s not working properly.