Rahul Sangole
http://rsangole.netlify.com/
Recent content on Rahul SangoleHugo -- gohugo.ioen-usThu, 12 Apr 2018 00:00:00 +0000Performance Benchmarking for Date-Time conversions
http://rsangole.netlify.com/post/performance-benchmarking-for-date-time-conversions/
Thu, 12 Apr 2018 00:00:00 +0000http://rsangole.netlify.com/post/performance-benchmarking-for-date-time-conversions/Motivation Performance comparison Packages compared Results Motivation Once more, there’s was an opportunity at work to optimize code and reduce run-time. The last time was for dummy-variable creation. Upon querying large data from our hive tables, the returned dataframe contains values of class character. Thus, everything has to be first type converted before any processing can be done.
The most time consuming of these has been character to date-time conversion for which I traditionally used base::as.Yet Another Titanic Solve
http://rsangole.netlify.com/project/yet-another-titanic-solve/
Fri, 30 Mar 2018 00:00:00 +0000http://rsangole.netlify.com/project/yet-another-titanic-solve/Objectives Read in the dataset Train-Test Split Missing values analysis EDA Target Variable Predictor Variables Univariate & Bivariate Multivariate Analyses Data Preparation Missing Values Imputation Derived Variables Final Data Review Modeling Extreme Gradient Boosting Elastinet k-NN SVM C5.0 Averaged Neural Networks Conditional Inference Random Forests Compare models Test Set Evaluation Create test set Predict test results Kaggle Performance tl;dr: Another titanic solve. Uses caret and tidyverse for everything.Books I Reference
http://rsangole.netlify.com/project/books-i-reference/
Tue, 13 Feb 2018 00:00:00 +0000http://rsangole.netlify.com/project/books-i-reference/The full list of the books in my shelf is on my Goodreads account 1. The ones I refer to the most are listed here:
Deep Learning Deep Learning with R Francois Chollet Handbook Of Neural Computing Applications Alianna J Maren Deep Learning Ian Goodfellow LSTM with Python Jason Brownlee GLM Generalized Additive Models: An Introduction with R, Second Edition Simon Wood Applied Regression Modeling Iain Pardoe Generalized Linear Models John P.First foray into Shiny
http://rsangole.netlify.com/post/first-foray-into-shiny/
Sat, 27 Jan 2018 00:00:00 +0000http://rsangole.netlify.com/post/first-foray-into-shiny/Visualising Distributions Visualising Linear Discriminant Analysis Shiny had interested me for a while for it’s power to quickly communicate and vizualise data and models. I hadn’t delved into it due to lack of time to do so, until now.
Two quick visualizations I’ve created as my 1st foray into R Shiny. Nothing earth shattering, but was helpful to learn the tool.
Visualising Distributions Hosted on shinyapps for free, at link Github code herePerformance Benchmarking for Dummy Variable Creation
http://rsangole.netlify.com/post/dummy-variables-one-hot-encoding/
Wed, 27 Sep 2017 00:00:00 +0000http://rsangole.netlify.com/post/dummy-variables-one-hot-encoding/Motivation Why do we need dummy variables? Ways to create dummy variables in R stats package dummies package dummy package caret package Performance comparison Smaller datasets Large datasets Conclusion Qs Motivation Very recently, at work, we got into a discussion about creation of dummy variables in R code. We were dealing with a fairly large dataset of roughly 500,000 observations for roughly 120 predictor variables. Almost all of them were categorical variables, many of them with a fairly large number of factor levels (think 20-100).Pur(r)ify Your Carets
http://rsangole.netlify.com/post/pur-r-ify-your-carets/
Sun, 17 Sep 2017 00:00:00 +0000http://rsangole.netlify.com/post/pur-r-ify-your-carets/The motivation An example using BostonHousing data Load libs & data Create a starter dataframe Select the models Create data-model combinations Solve the models Extract results In conclusion tl;dr: You’ll learn how to use purrr, caret and list-cols to quickly create hundreds of dataset + model combinations, store data & model objects neatly in one tibble, and post process programatically. These tools enable succinct functional programming in which a lot gets done with just a few lines of code.Finite Mixture Modeling using Flexmix
http://rsangole.netlify.com/post/finite-mixture-modeling-using-flexmix/
Wed, 01 Feb 2017 00:00:00 +0000http://rsangole.netlify.com/post/finite-mixture-modeling-using-flexmix/Model Based Clustering Quick EDA Model building Mixtures of Regressions Quick EDA Model Building Results Further investigation Notes References This page replicates the codes written by Grun & Leish (2007) in ‘FlexMix: An R package for finite mixture modelling’, University of Wollongong, Australia. My intent here was to learn the flexmix package by replicating the results by the authors.
Model Based Clustering The model based clustering on the whiskey dataset.Factor Analysis of Personality Traits
http://rsangole.netlify.com/project/factor-analysis-of-personality-traits/
Sat, 03 Sep 2016 00:00:00 +0000http://rsangole.netlify.com/project/factor-analysis-of-personality-traits/Background Objective Duplication of the Survey Results What does the factor analysis tell us? Conclusion How many factors to select? Where’s the R code? Background In the course Predict-410: Linear Regression & Multivariate Analyses, taught by the excellent Prof Srinivasan, we were taught Factor Analysis (FA). FA is a technique used to identify ‘latent’ or ‘hidden’ factors common to a larger pool of observable or measurable variables. These factors would cause the measurable variables to behave the way they do.