Showing posts from June, 2018

K Means with ScikitLearn

K Means Clustering K-Means is an interesting way of identifying clusters in the given data. Conceptually, we can think of the process as follows: If we want to identify K clusters in the data set; we start with picking K separate points in the data space. Assuming these K points are the centroids, identify the K clusters. Any point in the data space belongs to the cluster defined by the closest centroid. Now, based on these clusters identify the new centroids. And then identify the clusters again. If we do this recursively, we should ideally end up with a situation where the K centroids do not move much on each iteration. The clusters are thus identified using the K-Means algorithm. Ofcourse, there are several important factors here. How do you choose K? If K is very large, we might end up with a cluster for every data point. That defeats the purpose of clustering. On the other hand, if we use K = 1, we have one big cluster for the entire data set. This again defeats the purpose. Ide…

Decision Tree Optimization

How to Optimize Decision Tree? There are two important factors that drive the efficiency of a Decision Tree implementation. Ensure split on the right featuresAvoid Overfitting The essential concept of decision tree is based on splitting the tree on the appropriate features at appropriate thresholds. But identifying these features and thresholds is very important. And overfitting, of course is the single dominant evil in any machine learning scenario. Researchers have identified several ways around these problems. Below is an introduction to the most important ones. Optimize the Split The first step to preparing for the Decision Tree is to identify the right set of features and split. Below are some of the important techniques to help you with that. Gini Index This works with categorical target variable, only Binary splits. The Gini Index is based on the concept of Purity of Population. A Population is pure if the probability of two random samples belonging to the same class is 1. You …

Core concepts of Azure

Core concepts of Google Cloud

Core concepts of AWS

Apache DB Utils

Introduction Apache Commons DbUtils library is a great utility library designed to simplify JDBC and reducing the problems of resource leak and to have cleaner code. As JDBC resource cleanup is quite tedious and error prone, DBUtils classes helps to abstract out the boiler plate code so that developers can focus on database related operations only. Advantages of using DBUtils: Minimal and Functional - It does what is normally required, without much frills.Transparent − DBUtils library is not doing much work behind the scenes. It simply takes query and executes.Fast − DBUtils library classes do not create many background objects and is quite fast in database operation executions.No Resource Leakage − DBUtils classes ensures that no resource leakage happen.Clean & Clear code − DBUtils classes provides clean and clear code to do database operations without needing to write any cleanup or resource leak prevention code.Bean Mapping − DBUtils class supports to automatically populate jav…