SphereGen

spheregen logo on transparent background

HIVE_BAD_DATA: Error Parsing Field Value for Field ERROR in AWS Athena

Reviewed for accuracy on January 23, 2023

HIVE_BAD_DATA is generally related to bad data in source file or data type mismatch. I have experienced this error while working with JSON data read in AWS Athena. For example, I have below sample JSON data that is stored in s3 bucket

{“id”:”jggdfdysf866″,”socialdata”:{“facebook”:”Facebook data details”,”twitter”:””}}

{“id”:”676fdggdcdf4″,”socialdata”:”This is bad data”}

{“id”:”676fdggdcdf4″,”socialdata”:{“facebook”:””,”twitter”:”twitter data details”}}

And here is external table DDL statement. Since socialdata field forming a nested structural data, “struct” has been used to read inner set of data.

CREATE EXTERNAL TABLE extJSON (

id string,

socialdata struct< facebook : string , twitter : string>

)

ROW FORMAT SERDE ‘org.openx.data.jsonserde.JsonSerDe’

WITH SERDEPROPERTIES (

‘serialization.format’ = ‘1’,

‘ignore.malformed.json’ = ‘true’

)

LOCATION ‘s3:///’

TBLPROPERTIES (‘has_encrypted_data’=’false’);

Now, if I attempt to read data using below statement,

Select * from extJSON limit 10;

It gives error “HIVE_BAD_DATA: Error parsing field value for field 1: Data is not JSONObject but java.lang.String with value This is bad data”

This is due to data type mismatch for socialdata field. Since I have declared STRUCT to read inner set of data from socialdata field but one of the records is not forming the socialdata structure correctly, it causing error.

Solution

To overcome from above issue, I have removed “STRUCT” from external table DDL and used “string” as data type.

CREATE EXTERNAL TABLE extJSON (

id string,

socialdata string

)

ROW FORMAT SERDE ‘org.openx.data.jsonserde.JsonSerDe’

WITH SERDEPROPERTIES (

‘serialization.format’ = ‘1’,

‘ignore.malformed.json’ = ‘true’

)

LOCATION ‘s3:///’

TBLPROPERTIES (‘has_encrypted_data’=’false’);

And below sql statement returns expected result

select

id

,socialdata

,json_extract_scalar(socialdata, ‘$.facebook’)social_facebook_data

,json_extract_scalar(socialdata, ‘$.twitter’)social_twitter_data

from extJSON;

SQL screenshot

About SphereGen

SphereGen logo on white background

SphereGen is a unique solutions provider that specializes in cloud-based applications, Intelligent Automation, and Extended Reality (AR/VR/MR). We offer full-stack custom application development to help customers employ innovative technology to solve business problems.

Learn more about what we do in Application Development: https://www.spheregen.com/application-development

microsoft partner badge
uipath silver partner badge