Hive duplicate record when creating table from HDFS data

I try to create a table from json file to hive by using org.apache.hive.hcatalog.data.JsonSerDe.

First, load file from local to HDF. Here is the code in Hive:

CREATE EXTERNAL TABLE tweet8(
  user struct<userlocation:string, id:string, name:string>,
  tweetmessage string,
  createddate string,
  geolocation string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/tmp/hive/hello';

Hive duplicates the records existing in my file except the last object. For example, in my text file, there are 4 JSON objects A, B, C, D; after loading in Hive, I have A, A, B, B, C, C, D.

According to my understanding, when loading file from local to HDFS, Hadoop creates number of replications. Based on these replications, we have duplication in Hive table. There are two solutions for the problem:

1 - set the replication factor to 1 when uploading file from local to HDFS;
2 - after create table, I do a SELECT Distinct query on the tweet8 table to create a new table without duplication.

Which is the best practice?

Thanks for any suggestion! (Feel free to ask if you need to clarify further my question and sorry for my bad english)

Hive duplicate record when creating table from HDFS data

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112