BigData: HIVE DATA TYPES

I. Hive Primitive Data Type

1. Numeric Type

The Numeric data type in Hive is categorized into

Integral data type
Floating data type

1.1 Integral data type

a. TINYINT

(1-byte signed integer ranging from -128 to 127)

b. SMALLINT

(2-byte signed integer ranging from -32, 768 to 32, 767)

c. INTEGER

(4-byte signed integer ranging from -2, 147, 483, 648 to 2, 147, 483, 647)

d. BIGINT

(8-byte signed integer ranging from -9, 223, 372, 036, 854, 775, 808 to 9, 223, 372, 036, 854, 775, 807)

In Hive, Integral literals are assumed to be INTEGER by default unless they cross the range of INTEGER values. If we want to use a low integral value like 100 to be treated as TINYINT, SMALLINT, or BIGINT, then we will use the following postfixes (shown in the below table) with the number.

Type	Postfix	Example
TINYINT	Y	100Y
SMALLINT	S	100S
BIGINT	L	100L

1.2 Floating data type

a. FLOAT

It is a 4-byte single-precision floating-point number.

b. DOUBLE

It is an 8-byte double-precision floating-point number.

c. DOUBLE PRECISION

It is an alias for DOUBLE. It is only available starting with Hive 2.2.0

d. DECIMAL

DECIMAL types support both scientific and non-scientific notations.

In Hive 0.11.0 and 0.12, the precision of the DECIMAL type is fixed and limited to 38 digits.

As of Hive 0.13, user can specify the scale and precision during table creation using the syntax:

DECIMAL(precision, scale)

If precision is not specified, then by default, it is equal to 10.

If the scale is not specified, then by default, it is equal to 0.

DECIMAL provides more precise values and greater range than DOUBLE.

e. NUMERIC

It started with Hive 3.0.0. The NUMERIC data type is the same as the DECIMAL type.

2. Date/Time data type:

a. TIMESTAMP

Timestamps were introduced in Hive 0.8.0. It supports traditional UNIX timestamp with the optional nanosecond precision.

The supported Timestamps format is yyyy-mm-dd hh:mm:ss[.f…] in the text files.

If they are in any other format, declare them as the appropriate type and use UDF(User Defined Function) to convert them to timestamps.

The supported conversions are:

Integer numeric type	UNIX timestamp in seconds
Floating-point numeric type	UNIX timestamp in seconds with decimal precision
Strings	java.sql.Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff” (9 decimal place precision)

b. DATE

Dates were introduced in Hive 0.12.0. DATE value describes a particular year/month/day in the form of YYYY-MM-DD.

For example- DATE ‘2020-02-04’

It does not have a time of day component. The range of value supported for the DATE type is 0000-01-01 to 9999-12-31.

c. INTERVAL

Hive Interval data types are available only after starting with Hive version 1.2 or above.

Hive accepts the interval syntax with unit specifications. We have to specify the units along with the interval value.

For example, INTERVAL ‘1’ DAY refers to the day time.

3. String data type

a. STRING

In Hive, String literals are represented either with the single quotes(‘ ’) or with double-quotes(“ ”).

Hive uses C-style escaping.

b. VARCHAR

In Hive, VARCHAR data types are of different lengths, but we have to specify the maximum number of characters allowed in the character string.

If the string value assigned to the varchar is less than the maximum length, then the remaining space will be freed out.

Also, if the string value assigned is more than the maximum length, then the string is silently truncated.

The length of the varchar is between(1 to 65535).

Trailing whitespace is important in varchar and will affect the comparison results.

c. CHAR

CHAR data types are fixed-length.

The values shorter than the specified length are padded with the spaces.

Unlike VARCHAR, trailing spaces are not significant in CHAR types during comparisons.

The maximum length of CHAR is fixed at 255.

4. Miscellaneous data type

a. BOOLEAN

Boolean types in Hive store either true or false.

b. BINARY

BINARY type in Hive is an array of bytes.

This is all about Hive Primitive Data Types. Let us now study Hive Complex Data Types.

II. Hive Complex Data Type:

1. arrays

Array in Hive is an ordered sequence of similar type elements that are indexable using the zero-based integers.

Arrays in Hive are similar to the arrays in JAVA.

array<datatype>

Example: array(‘Data’,’Flair’). The second element is accessed as array[1].

// Create an array to find the temperature using array

Step1: create a database with a file temp.txt using tab spaces as

1    vsp    44.1,34.5,38.4,31.1
2    hyd    45.4,42.2,39.1,40.2
3    vzm    44.4,42.1,39.4,43.1

Step2: How to load the data into a table temperature

hive> create table temperature(sno int, place string, mytemp array<double>)
    > row format delimited
    > fields terminated by '\t'
    > collection items terminated by ',' ;

hive> show tables ;

hive> load data local inpath '/home/rk/Desktop/temp.txt' into table temperature ;

Step3:How to use the complex data type array in a table temperature

hive> select * from temperature ;

hive> desc temperature ;
hive> select place, mytemp[0] from temperature ;

2. maps

Map in Hive is a collection of key-value pairs, where the fields are accessed using array notations of keys (e.g., [‘key’]).

map<primitive_type, data_type>

Example: ‘first’ -> ‘John’, ‘last’ -> ‘Deo’, represented as map(‘first’, ‘John’, ‘last’, ‘Deo’). Now ‘John’ can be accessed with map[‘first’].

// Create an map to find the total population

Step1: create a database with a file map.txt using tab spaces as

vsp    male    2018:44567,2019:23423,2020:56856
vzm    female    2018:77567,2019:88423,2020:54856
rjy    male    2018:84567,2019:93423,2020:66856

Step2: How to load the data into a table cdtmap1

hive> create table cdtmap1(city string, gender string, coll map<int,int>)
    > row format delimited
    > fields terminated by '\t'
    > collection items terminated by ','
    > map keys terminated by ':' ;

hive> show tables ;

hive> desc cdtmap1 ;

hive> load data local inpath '/home/rk/Desktop/maptab.txt' into table cdtmap1 ;

Step 3: How to use the complex data type map in a table cdtmap1

hive> desc cdtmap1 ;

hive> select coll[2018] from cdtmap1 where city='vsp' ;

hive> select coll[2018] from cdtmap1 where gender='male' ;

3. structs

STRUCT in Hive is similar to the STRUCT in C language. It is a record type that encapsulates a set of named fields, which can be any primitive data type.

We can access the elements in STRUCT type using DOT (.) notation.

STRUCT <col_name : data_type [ COMMENT col_comment], ...>

Example: For a column c3 of type STRUCT {c1 INTEGER; c2 INTEGER}, the c1 field is accessed by the expression c3.c1.

// Create an struct to find the address

Step 1: create a database with a file structab.txt using tab spaces as

ram    vsp    srinagar,22.4
sita    vzm    ramnagar,24.4
rani    rjy    mvp,26.4

Step 2: How to load the data into a table cdtstruct

hive> create table cdtstruct(name string, city string, info struct<street:string,temp:float>)
    > row format delimited
    > fields terminated by '\t'
    > collection items terminated by ',' ;

hive> load data local inpath '/home/rk/Desktop/structab.txt' into table cdtstruct ;

Step 3: How to use the complex data type struct in a table cdtstruct

hive> select info.temp from cdtstruct ;
hive> select info.street from cdtstruct ;

4. union

UNION type in Hive is similar to the UNION in C. UNION types at any point of time can hold exactly one data type from its specified data types.

The full support for UNIONTYPE data type in Hive is still incomplete.

UNIONTYPE<data_type, data_type, ...>

Handling of NULL Values

In Hive data types, the missing values are represented by the special value NULL.

BigData

Pages

Friday, 11 June 2021

HIVE DATA TYPES

I. Hive Primitive Data Type

1. Numeric Type

1.1 Integral data type

1.2 Floating data type

2. Date/Time data type:

3. String data type

4. Miscellaneous data type

hive> create table cdtmap1(city string, gender string, coll map<int,int>)
    > row format delimited
    > fields terminated by '\t'
    > collection items terminated by ','
    > map keys terminated by ':' ;

3. structs

Handling of NULL Values

No comments:

Post a Comment

Friends-of-friends-Map Reduce program

Report Abuse

Labels

Pages

Friday, 11 June 2021

HIVE DATA TYPES

I. Hive Primitive Data Type

1. Numeric Type

1.1 Integral data type

1.2 Floating data type

2. Date/Time data type:

3. String data type

4. Miscellaneous data type

hive> create table cdtmap1(city string, gender string, coll map<int,int>) > row format delimited > fields terminated by '\t' > collection items terminated by ',' > map keys terminated by ':' ;

3. structs

Handling of NULL Values

No comments:

Post a Comment

Friends-of-friends-Map Reduce program

hive> create table cdtmap1(city string, gender string, coll map<int,int>)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ','
> map keys terminated by ':' ;