Showing posts from May, 2018

R - File IO

Introduction R deals with data. So it has functions for various aspects of data processing. Reading from in input file is a very important aspect of data processing. R provides functions for reading and writing data to various file formats. File Dump R allows you to dump data into a file. Such file can be read only in R > df = mtcars > save(file = "file.out", compress = T, list = c("df")) > This saves the contents of df into file.out. The same can be loaded back from the file using the load method > load(file = "file.out") > head(df, 3) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.061601103.902.62016.460144 Mazda RX4 Wag 21.061601103.902.87517.020144 Datsun 71022.84108933.852.32018.611141 > Note the parameter compress=T . Obviously this results in a compressed output file. If you check out the generated file, it is an illegible binary file. You have an option to use ascii=T, that generates a file with…

Recurrent Neural Networks

Convolutional Neural Networks

Why Convolution? A simple tiny image of 64x64 RGB pixels implies 64x64x3 = 12288 input parameters. A decent Neural network that analyzes such an image would have atleast 1% -> 122 neurons in the first layer. That would mean 12288 x 122 = 1.5 million weights in the first layer of the network. And many more in the following layers. We would need a massive data set to avoid overfitting. And it would require a huge amount of processing just for tiny images of 64x64 pixels. Any meaningful training would require a decent image of atleast 1024x1024 pixels. The processing of these images would go beyond all scales. Certainly, we need some improvement in the process. If we think about the way vision works, the object in one corner of the image has little to do with the other corner of the image. Objects and edges are limited to a small area around the edge. Then why process the entire image at the same time? CNN deals with processing the image in a small parts at a time. The Convolution Ne…

Introduction to Chatbots

Introduction to Natural Language Processing

Introduction to Keras

Introduction to TensorFlow

What is Tensorflow? TensorFlow is an open-source software library from Google. It was meant for dataflow programming across a range of tasks. It is a symbolic math library, and is largely used for machine learning applications such as neural networks. Originally it was developed by the Google Brain team for internal Google use. As the AI research community got more and more collaborative, Tensorflow was released under the Apache 2.0 open source license. Tensorflow and its component Keras, are vastly used in implementing Deep Learning algorithms. Like most machine learning libraries, Tensorflow is "concept-heavy and code-lite". The syntax is not very difficult to learn. But its concepts are very important. By design, Tensorflow is based on lazy execution (though we can force eager execution). That means, it does not actually process the data available till it has to. It just gathers all the information that we feed into it. It processes only when we finally ask it to process…

Introduction to Pandas

Working with Pandas Machine learning requires huge amounts of data and it requires an efficient way for processing this huge amount of data. Pandas helps us with the latter. It provides efficient data structures like Series, DataFrame and Panel for processing one, two and three dimensional data structures. It provides a good chunk of methods for manipulating the data in these structures.  It has a good functionality for statistical processing of this data. It provides for indexing, selecting, grouping and filtering the data by columns and values - virtually everything that one would want to do while processing data. Pandas is also extensible and it has builtin capabilities to allow us to add more functionality. It works very well with its cousins - NumPy and Tensorflow. All this makes it the library of choice for data handling. Let's now look into the important concepts in using Pandas. SeriesDataFramePanelCommon FunctionsStatisticsData HandlingFile IO These blogs can server as an…

Introduction to SciKit Learn

The ScikitLearn library The SciKitLearn library has a chunk of ready implementations of most basic Machine Learning algorithms. Most of the Machine Learning libraries are based on the principle of "concept-heavy and code-lite'. Once we understand the concepts well, the syntax of implementation is quite simple. ScikitLearn offors ready configurable classes for most of the algorithms. We just instantiate and then "fit" the model to the training data and then verify with the test data. All this can be achieved in just a couple of lines of code. Implementation It provides for most of the scenarios and algorithms. Below are some typical usages of the common ones. Nearest NeighborsLogistic RegressionDecision TreesRandom ForestsSupport Vector MachinesK MeansAffinity PropagationMean ShiftWard's MethodNeural Networks This should be enough to give you a flavor of what you can expect in ScikitLearn. Ofcourse it provides a lot more than this. Reference The API Reference and…