Pages

Tuesday 23 March 2021

Big Data Characteristics

Big Data Characteristics : (Volume, Velocity, Variety ,Veracity and  value...)

For a dataset to be considered Big Data, it must possess one or more characteristics that require accommodation in the solution design and architecture of the analytic environment.

This explores the five Big Data characteristics that can be used to help differentiate data categorized as “Big” from other forms of data.



1.Volume :

The name ‘Big Data’ itself is related to a size which is enormous.

Volume is a huge amount of data.

To determine the value of data, size of data plays a very crucial role. If the volume of data is very large then it is actually considered as a ‘Big Data’. This means whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data.

Hence while dealing with Big Data it is necessary to consider a characteristic ‘Volume’.

Example: Typical data sources that are responsible for generating high data volumes can include:

• online transactions, such as point-of-sale and banking

• scientific and research experiments.

• sensors, such as Global Positioning System(GPS), RFIDs, smart meters and telematics.

• social media, such as Facebook and Twitter.

2.Velocity :

Velocity refers to the high speed of accumulation of data.

In Big Data velocity data flows in from sources like machines, networks, social media, mobile phones etc.

There is a massive and continuous flow of data. This determines the potential of data that how fast the data is generated and processed to meet the demands.

Sampling data can help in dealing with the issue like ‘velocity’.

Example: There are more than 3.5 billion searches per day are made on Google. Also, Facebook users are increasing by 22%(Approx.) year by year.

3.Variety :

It refers to nature of data that is structured, semi-structured and unstructured data.

It also refers to heterogeneous sources.

Variety is basically the arrival of data from new sources that are both inside and outside of an enterprise. It can be structured, semi-structured and unstructured.

Structured data: This data is basically an organized data. It generally refers to data that has defined the length and format of data.

Semi- Structured data: This data is basically a semi-organized data. It is generally a form of data that do not conform to the formal structure of data. Log files are the examples of this type of data.

Unstructured data: This data basically refers to unorganized data. It generally refers to data that doesn’t fit neatly into the traditional row and column structure of the relational database. Texts, pictures, videos etc. are the examples of unstructured data which can’t be stored in the form of rows and columns.


Note: JavaScript Object Notation (JSON) is a standard text-based format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web applications (e.g., sending some data from the server to the client, so it can be displayed on a web page, or vice versa).

4.Veracity :

It refers to inconsistencies and uncertainty in data, that is data which is available can sometimes get messy and quality and accuracy are difficult to control.

Data that is acquired in a controlled manner, for example via online customer registrations, usually contains less noise than data acquired via uncontrolled sources, such as blog postings.

Example: Data in bulk could create confusion whereas less amount of data could convey half or Incomplete Information.


5. Value:

             After having the 4 V’s into account there comes one more V which stands for Value. The bulk of Data having no Value is of no good to the company, unless you turn it into something useful.
             Data in itself is of no use or importance but it needs to be converted into something valuable to extract Information. Hence, you can state that Value! is the most important V of all the 5V’s.

No comments:

Post a Comment

Friends-of-friends-Map Reduce program

Program to illustrate FOF Map Reduce: import java.io.IOException; import java.util.*; import org.apache.hadoop.conf.Configuration; import or...