i have semi-structured csv , looks this.
vts,01,0099,7022606164,sp,gp,33,060646,a,1258.9805,n,07735.9303,e,0.0,278.6,280515,0000,00,4000,11,999,842,4b61 vts,01,0099,7022606164,nm,gp,20,060637,a,1258.9805,n,07735.9302,e,0.0,278.6,280515,0000,00,4000,11,999,841,7407+++ vts,66,0065,7022606164,nm,0,gp,22,060648,280515,1258.9804,n,07735.9301,e,04ae+++ vts,01,0099,7022606164,nm,gp,22,060656,a,1258.9804,n,07735.9301,e,0.0,278.6,280515,0000,00,4000,11,999,843,8feb+++ vts,01,0099,7022606164,nm,gp,22,060721,a,1258.9803,n,07735.9304,e,0.0,278.6,280515,0000,00,4000,11,999,845,044d++++++ vts,99,0065,7022606164,nm,0,a,gp,22,060648,280515,1258.9804,n,07735.9301,e,04ae+++ vts,99,0065,7022606164,nm,0,a,gp,22,060648,280515,1258.9804,n,07735.9301,e,04ae
i want make make 3 different tables data. i.e. 1 vts,01 vts,99 , vts,66. again need remove "+++" attached each line error, have written pig script.
data = load '/user/simulator/skytrack/27thmay2015' using pigstorage('\n') (f1:chararray); splt = foreach data generate flatten(strsplit($0, '\\+++')); data_pkt = filter splt $0 matches '.*vts,01+.*'; sos_pkt = filter splt $1 matches '.*vts,99+.*'; health_pkt = filter splt $2 matches '.*vts,66+.*';
when testing scripts individually each of table 1 output receive rest no output,
dump data_pkt; dump sos_pkt; dump health_pkt;
i new pig can me solve issue..it appreciated.
to remove +++, need escape "+" , not one. not specific on meaning of these pluses. rather use regex split :
"\\+{3,}"
and consequently, in pig script :
splt = foreach data generate flatten(strsplit($0, '\\+{3,}'));
altough aman correct, however, rather use split instead of filter separate datasets :
= load '/abc.txt'; split b01 if $1 == 01, b66 if $1 == 66, b99 if $1 == 69;
Comments
Post a Comment