What is Big Data? and What does it look like?

If we want the definition of the ‘Big data’ its some sort of the data that can’t be processed by the conventional database systems. In fact, the data is big, moving very fast and doesn’t fit the structures of the database architectures. In order to get information from this ‘Big Data’ we need to choose such an option that could process it. Recent breakthrough in IT ‘Big data’ is something like new tech. A normal people  thinks of it simply as the data that is big. Well, we cannot say it is wrong. As it is such data that requires the processing to get some information with it.

We all know that, today in this world the data has become a viable. As the data is simply massive, thus many cost effective approach has emerged out to tame the volume variability and velocity. And the question remains, What is the value of Big Data in an organization? The answer is simple, The value of it falls on the 2 categories. One is analytical use and other is enabling new products.

Talking about the past decades, all the successful web startups are the examples of the big data. The emergence of big data into the enterprise brings with it a necessary counterpart: agility. Successfully exploiting the value in big data requires experimentation and exploration. Whether creating new products or looking for ways to gain competitive advantage, the job calls for curiosity and an entrepreneurial outlook.

As a catch-all term, “big data” can be pretty nebulous, in the same way that the term “cloud” covers diverse technologies. Input data to big data systems could be from social networks, web server, traffic flow sensors, satellite imagery, audio and video  streams, banking transactions, MP3s of any music, contents of web pages, scans of any documents, GPS trails, telemetry from the automobiles, financial market data, and much more which leads the list to go on. So are these all the same thing? To clarify this issue, the three Vs of volume, velocity, and variety are commonly used to characterize different aspects of big data. They’re a helpful lens through which to view and understand the nature of the data and the software platforms available to exploit them. Most probably you will contend with each of the Vs to one degree or another.

Advertisements

Building Blocks for Global Data Quality Success

About the data quality success, I found this information in the MSDN magazine of december 2013 edition.

  • Address Verification
  • Phone Verification
  • Email Verification
  • Rooftop Geocoding
  • Name Parsing and Genderizing
  • Full Identity Verification

Prediction

We know what the prediction is. In short, the prediction is to some information which is likely to occur in upcoming future. There are lot of fortune tellers that would tell us the future. They may occur or may not occur. We won’t believe what they say. But in the computational world, the computer telling the future about something is likely to be believed. As here the fortune is told by the systems by analyzing the past data and records. Some of the successful prediction is done in forecasting the weather. Despite of their different nature, the prediction is done with some knowledge about the past elements and may be some other available information. So the thing is how is this prediction done? and here is one basic version of the sequential prediction problem.

Basically in the sequential prediction problem, the forecaster studies the elements of the sequence and guesses the next element of the sequence on the basis of the previous observation. In the classical statistical theory of the sequential prediction, the elements are assumed to be a realization of the stationary stochastic process. So in this the properties of the process is estimated on the basis of the past observation. The risk of the prediction rules can be derived from some of the loss calculating function which measures from the differences between the predicted value and the outcomes.

Without the proper probabilistic modeling, the idea of risk cannot be defined at all. Many possibilities may exist . in our basic model the performance of the forecaster is measured by the loss accumulated during many predictions.The loss is calculated by the help of some fixed loss function. To provide the better baseline, the reference forecaster is introduced. They make their forecasting before the next outcome is revealed. The forecaster can make their own outcome and take the reference forecaster outcome as an advice. This results in forecaster take their loss records to get the best outcome in further forecasting.