Pages

Sunday 4 April 2021

Different Types of Data's used in Big Data

The data processed by Big Data solutions can be human-generated or machine-generated, although it is ultimately the responsibility of machines to generate the analytic results.

Human-generated data is the result of human interaction with systems, such as online services and digital devices.

Machine-generated data is generated by software programs and hardware devices in response to real-world events.

Examples of machine-generated data include web logs, sensor data, telemetry data, smart meter data and appliance usage data. 

Human-generated and machine-generated data can come from a variety of sources and be represented in various formats or types. 

The primary types of data are:

• structured data  ....          10%

semi-structured data ....   10%

• unstructured data ....        80%

Structured Data :

Structured data conforms to a data model or schema and is often stored in tabular form. It is used to capture relationships between different entities and is therefore most often stored in a relational database.


Structured data is frequently generated by enterprise applications and information systems like ERP (Enterprise Resource Planning )and CRM (Customer Relationship Management). Examples of this type of data include banking transactions, invoices, and customer records.

Semi-structured Data :

Semi-structured data has a defined level of structure and consistency, but is not relational in nature. Instead, semi-structured data is hierarchical or graph-based. This kind of data is commonly stored in files that contain text.

This shows that JSON and XML files (JSON stands for JavaScript Object Notation, XML stands for eXtensible Markup Language.) are common forms of semi-structured data. Due to the textual nature of this data and its conformance to some level of structure, it is more easily processed than unstructured data.

Examples of common sources of semi-structured data include electronic data interchange (EDI) files, spreadsheets, RSS feeds and sensor data.

Unstructured Data :

Data that does not conform to a data model or data schema is known as unstructured data. It is estimated that unstructured data makes up 80% of the data within any given enterprise. Unstructured data has a faster growth rate than structured data.


This form of data is either textual or binary and often conveyed via files that are self-contained and non-relational. A text file may contain the contents of various tweets or blog postings. Binary files are often media files that contain image, audio or video data.

Unstructured data cannot be directly processed or queried using SQL. If it is required to be stored within a relational database, it is stored in a table as a Binary Large Object (BLOB). Alternatively, a Not-only SQL (NoSQL) database is a non-relational database that can be used to store unstructured data.

No comments:

Post a Comment

Friends-of-friends-Map Reduce program

Program to illustrate FOF Map Reduce: import java.io.IOException; import java.util.*; import org.apache.hadoop.conf.Configuration; import or...