Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of iPython Notebooks.
Note: Data used in the projects (accessed under data directory) is for demonstration purposes only.
-
- Predicting Boston Housing Prices: A model to predict the value of a given house in the Boston real estate market using various statistical analysis tools. Identified the best price that a client can sell their house utilizing machine learning.
- Supervised Learning: Finding Donors for CharityML: Testing out several different supervised learning algorithms to build a model that accurately predicts whether an individual makes more than $50,000, to identify likely donors for a fictional non-profit organisation.
- Unsupervised Learning: Creating Customer Segments: Analyzing a dataset containing data on various customers' annual spending amounts (reported in monetary units) of diverse product categories for discovering internal structure, patterns and knowledge.
_Tools: scikit-learn, Pandas, Seaborn, Matplotlib.
-
-
3-way Sentiment Analysis for Tweets: 3-way polarity (positive, negative, neutral) classification system for tweets, without using NLTK's sentiment analysis engine.
-
Cross language Information Retrieval: Cross language information retrieval system (CLIR) which, given a queryin German, searches text documents written in English.
-
Amazon Fine Food Reviews Analysis: Given a review, determine whether the review is positive (Rating of 4 or 5) or negative (rating of 1 or 2).
Tools: NLTK, scikit
-
-
- Python
- Titanic Dataset - Exploratory Analysis: Exploratory Analysis of the passengers onboard RMS Titanic using Pandas and Seaborn visualisations.
- Stock Market Analysis for Tech Stocks: Analysis of technology stocks including change in price over time, daily returns, and stock behaviour prediction.
- 2016 US General Election Poll Data Analysis: Very simple analysis of 2016 US General Election Poll data.
- 911 Calls - Exploratory Analysis: Exploratory Data Analysis of the 911 calls dataset hosted on Kaggle. Demonstrates extraction of useful features from different variables.
Tools: Pandas, Folium, Seaborn and Matplotlib
- Python
-
- Python
- ML with Logistic Regression: Using Logistic Regression to predict whether an internet user clicked an ad or not.
- ML with K Nearest Neighbours: Using KNN to classify instances from a fake dataset into two target classes, while choosing the best value for K using the elbow method.
- ML with Decision Trees and Random Forests: Using Decision Trees and Random Forests to predict whether a lender will pay their loan back. Uses publically available data from LendingClub.com
- Machine Learning with Support Vector Machines and Parameter Tuning:micro-project, we'll work on classifying flowers from the famous Iris data set into different categories.
- Python
If you liked what you saw, want to have a chat with me about the portfolio, work opportunities, or collaboration, shoot an email at sanidsourav76@gmail.com