Data Scientist

Data Science & Research Bloomington, Indiana, United States

Profile
Contacts

Summary

Predicted customer confidence score ranged from 0 to 1. 1 being the highest probability of a lead being converted to a customer to target those customers for follow up.
Imported market data collected from website and various ad agencies and stored in sequel server database into python.
Cleaned data and handled missing data with random forest algorithm and other techniques.
Developed features(Feature Engineering) from the processed data by creating various user defined functions. Created calculated feature columns and train model with mini batching.
Used over sampling technique to up sample low occurrence target values to be learned more by the model.
Developed train and test data by using stratifying technique to split data into same proportion of high and low occurrence target values. Trained random forest, neural network and logistic regression models with train data and calculated performance between each model on test data.
Used k-fold cross validation on train data to minimize variance and predict better results on test data.
Explored grid search cross validation to tune hyperparameters to find best parameters. Tested various performance metrics like accuracy,F1-score,Precision and Recall and used Recall to calculate the performance.