Introduction to Data Analytics

Introduction

Big data analytics is quickly gaining adoption. Enterprises have awakened to the reality that their big data stores represent a largely untapped gold mine that could help them lower costs, increase revenue and become more competitive. They don't just want to store their vast quantities of data, they want to convert that data into valuable insights that can help improve their companies. Clearly, the trend toward big data analytics is here to stay. IT professionals need to familiarize themselves with the topic if they want to remain relevant within their companies.

What is Data?

Before we try to understanding the data analytics, it is important to understand what is Data? It is important to understand Data; it is important to understand why the world is chasing Data? - Structured or unstructured / big or small. Data is essentially a chunk of the information generated from a given set of sources source; it is a track record of events, their inputs and outputs. Such data is as valuable as the benefit one can obtain by processing and analyzing it. For example, every internet purchase generates information about the buyer, seller and the purchase. Such information, if processed and analyzed properly, can provide valuable inputs about the buyer’s choices, the seller’s capabilities and the popularity of the object sold. Ever since businesses understood this value, they have been investing and developing more and more efficient ways of capturing, storing, processing and analyzing such data.

Structured Data

Information is unstructured by default. Any event contains huge amount of information, of various kinds. It is impossible to capture all of it; let alone storing, processing and analyzing it. The traditional data analysis followed the basic approach of picking just what is necessary, and storing it in a way that can be easy to retrieve, process and analyze.
This gave rise to the traditional relational databases. The relational databases had predefined tables, with predefined columns and predefined relations between records in different tables. Information picked from any event being recorded, was in this predefined format and stored in the database. Such data was easy to store, access and process – to obtain results in predefined formats. But, the only disadvantage was the loss in the amount of information that was picked up.
For example, a telephone call can generate a huge amount of information. But, a billing system would just record the phone numbers; the billing plan used and the duration of the call. It would just pick what was necessary, and generate the bill – with just the required result – the amount due. It was easy to pick, store and process such data. But the limitation was that it had absolutely very little value beyond from the bill amount. Some analysis of the data formed an input to the network team – for predicting the kind of network traffic.
Structured data may be generated by humans or machines; as long as the data is created within the templates, and follows the structure defined by the tables. This format is eminently searchable both with human generated queries and via algorithms using type of data and field names, such as alphabetical or numeric, currency or date.
Common relational database applications with structured data include airline reservation systems, inventory control, sales transactions, and ATM activity. Structured Query Language (SQL) enables queries on this type of structured data within relational databases. Even today, a huge amount of software applications continue to use structured data.

Unstructured Data

As people realized what we are missing out on, research on processing and storing unstructured data gained momentum. Even today, we are at a really nascent stage in this. But, industries have started implementing the solutions and have started making revenue out of it.
What has changed today is the awareness of the importance of data, the storage capacity, and algorithms for processing such unstructured data. The essential difference between structured and unstructured data is the phase in which the data is captured. Raw data is always unstructured. But, due to limitations in the processing, storage and sensing capabilities, we were forced to extract just what we needed and forget the rest.
In essence, structured data was extracted out of unstructured data, after an amount of processing. And this processed structured data was stored for analysis. Today, the order has changed. Now raw data is pumped into the storage - with very little initial processing – with the hope (or knowledge) that it is useful. This is because, now we have ways of storing such volumes of data. And we have enough processing power and mature algorithms to make sense out of such data at query.
Each such data unit has its own internal structure. But it is not fixed via pre-defined data models or schema. It may be textual or non-textual, and generated by humans or machines. Such data can be stored in non-relational databases like NoSQL.
The most inclusive Big Data analysis makes use of both structured and unstructured data.

Big Data

This refers to digital information that has the 3 V's – high volume, velocity and variety. Big data analytics refers to the process of identifying trends, patterns, correlations or other useful insights in such data – using various software tools. Data analytics is not a new concept. Businesses have been analyzing their own data for decades. But, software tools used for analysis have greatly improved in their capability and performance, that it can handle much larger varieties of large volumes of data, at a much higher velocity. This is partly because of improved algorithms and partly because of improved performance of the underlying hardware.

Analysis of Data

Structured data is traditionally easier for Big Data applications to digest, yet today's data analytics solutions are making great strides in this area. New tools are available to analyze unstructured data, particularly given specific use case parameters. Most of these tools are based on machine learning. Structured data analytics can use machine learning as well, but the massive volume and many different types of unstructured data requires it.
Until a few years ago, analysts used keywords and key phrases to search through unstructured data and get a decent idea of what the data involved. But, unstructured data has grown so fast and so huge, that users need to employ analytics that not only work at compute speeds, but also automatically learn from their activity and user decisions. Natural Language Processing (NLP), pattern sensing and classification, and text-mining algorithms are all common examples, as are document relevance analytics, sentiment analysis, and filter-driven Web harvesting.
Unstructured data analytics with machine-learning intelligence allows organizations to analyze digital communications for compliance. Pattern recognition and email threading analysis software searches massive amounts of email and chat data for potential noncompliance. It can track high-volume customer conversations in social media. Text analytics and sentiment analysis lets analysts review positive and negative results of marketing campaigns, or even identify online threats.
This level of analytics is far more sophisticated compared to simple keyword search, which can only report basics like how many Facebook posts mentioned the name of a given company. New analytics also include context – if the mention positive or negative? Were the posters reacting to each other, or were they independent posts? It can capture the tone of reactions to announcements?
The analytics help understand the market pulse. AI analytics tools work quickly on massive amounts of documents to analyze behavior of customers. For example, a magazine publisher can apply text mining to hundreds of thousands of articles, analyzing each separate publication by the popularity of major subtopics. Then they can extend analytics across all their content properties to see which overall topics got the most attention by customer demographic. Such analytics can help in obvious ways. There was no way of doing such analysis on structured data – that would have missed capturing most of the information available.

Types of Analytics:

There is no derth of information in the data available to us. There are four major fields in Data Analytics that focus on extracting different kinds of information from the available data.

Descriptive Analytics

This tells "what" happened. It helps create simple reports, visualisations that help you understand what happened at a given point in time or a period of time. It is the least advanced in terms of algorithms.

Diagnostic Analytics

This helps explain why something happened. More advanced than descriptive analytics, it allows analysts to dive deep into the data and determine root causes for a given situation.

Predictive Analytics

Among the most popular big data analytics available today, predictive analytics involves highly advanced algorithms to forecast what might happen next. IT is based on various artificial intelligence and machine learning technologies

Prescriptive Analytics

This goes a step beyond predictive analytics. After predicting what is going to happen, prescriptive analytics tell how it should be enhanced or avoided (depending upon the desired results). Of course this requires very advanced machine learning capabilities, and few solutions on the market today offer true prescriptive capabilities.

Applications of Big Data Analytics

Big data analytics offers a great potential to revolution businesses. Use of analytics helps organizations achieve competitive advantage over others. Big data analytics can help companies develop products and services that appeal to their customers, as well as helping them identify new opportunities for revenue generation. Big Data analytics helps improve Customer Perception. It helps examine social media, customer service, sales and marketing data. This can help them better gauge customer sentiment and respond to customers in real time. IT Security is another area where application of Bit Data Analytics has several advantages. Any security software creates an enormous amount of log data. Applying Big Data analytics techniques to this data can help identify and thwart cyberattacks that would otherwise have gone unnoticed.

Challenges in Big Data Analytics

Implementing a big data analytics solution isn't always as straightforward as one would want it will be. There are several obstacles that can make it difficult to achieve the benefits promised. One of the biggest challenges is the explosive rate of data growth. The amount of data in the world's servers is roughly doubling every two years. Big data analytics solutions must be able to perform well at scale in order to gain anything useful from such data.
Most of the data stored in the systems is unstructured data, such as email messages, images, reports, audio files, videos and other types of files. This unstructured data can be very difficult to search – unless you have advanced artificial intelligence capabilities. These technologies are still very nascent and may not be able to perform as well as one would want them to.
The unstructured database is often a chaos because of the data supplied from various different sources. Integrating such data from all these different sources is one of the most difficult challenges in any big data analytics project. And the most important challenge, like any new technology is the reluctance to its adoption. Data Analytics looks good in magazines and whitepapers. But very few are ready to invest in its potential.

Latest Trends

Open Source

As the concept of big data analytics is catching its momentum, several open-source tools are coming up that help break down and analyze data. Hadoop, Spark and NoSQL databases are just a few. Most of the big data implementations in the commercial world are based on these leading open source technologies. That seems unlikely to change for the foreseeable future.

Market Segments

Based on the different market requirements, several big data analytics platforms have started emerging – that focus on specific domains, such as security, marketing, CRM, application performance monitoring, hiring, etc. Along with this, Analytics tools are also getting integrated into existing enterprise software at a rapid rate.

Artificial Intelligence and Machine Learning

AI is showing up as the base of Big Data Analytics. Any kind of predictive and prescriptive analysis is impossible without AI. Although the current technologies are way behind these techniques, one can guess from the pace of things that the day is not very far.
But any amount of AI or Machine Learning cannot replace the need for humans. As Big Data analytics becomes a mainstream technology, it would possibly be just another tool – a tool that can help you analyze huge amounts of data. But it would always need a human to operate and improve that tool for getting business value out of it.

R - Lists

Introduction Lists are the objects which contain elements of different types like numbers, strings, vectors, data frames and another list in...