Writing an array of STRUCT from Dataflow to big query - java

I am trying to write an Array of Structs field from my Dataflow pipeline to big query, the schema of the table generated is correct but no data gets populated in the fields.
My DoFn function:
public class ProcessIpBlocks {
public static class IpBlocksToIp extends DoFn<TableRow, TableRow> {
private static final long serialVersionUID = 1L;
#Override
public void processElement(ProcessContext c) throws JSONException {
TableRow row = c.element();
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Calendar cal = Calendar.getInstance();
long startIp = 0L, endIp = 0L;
if(row.get("start_ip") != null)
startIp = Long.parseLong((String)row.get("start_ip"));
if(row.get("end_ip") != null)
endIp = Long.parseLong((String)row.get("end_ip"));
for(long i= startIp; i<=endIp; i++)
{
TableRow outputRow = new TableRow();
outputRow.set("start_ip", startIp);
outputRow.set("ip", i);
if(row.get("postal_code") != null && !((String)row.get("postal_code")).isEmpty()){
System.out.println("This is getting written to logs");
endIp = Long.parseLong((String)row.get("end_ip"));
JSONArray atrArray = new JSONArray();
JSONObject atr = new JSONObject();
atr.put("id", "zippostal_code");
JSONArray atrValueArray = new JSONArray();
atr.put("value", atrValueArray.put((String)row.get("postal_code")));
atr.put("pr", 0.5);
atr.put("dt", cal.getTime());
atrArray.put(atr);
outputRow.set("atr", atrArray);
}
c.output(outputRow);
}
}
}
}
My pipeline write step:
iPBlocksToIPData.apply("Foo", ParDo.of(new ProcessIpBlocks.IpBlocksToIp()))
.apply(BigQueryIO.Write
.named("WriteIPs")
.to(String.format("%1$s:%2$s.%3$s",projectId, eventDataset, ipBlocksToIpTable))
.withSchema(schema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));

Below solution worked, using TableRow instead of JSONArray
public class Foo {
public static class Foo extends DoFn<TableRow, TableRow> {
#Override
public void processElement(ProcessContext c) throws JSONException {
TableRow row = c.element();
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Calendar cal = Calendar.getInstance();
long startIp = 0L, endIp = 0L;
if(row.get("start_ip") != null)
startIp = Long.parseLong((String)row.get("start_ip"));
if(row.get("end_ip") != null)
endIp = Long.parseLong((String)row.get("end_ip"));
for(long i= startIp; i<=endIp; i++)
{
TableRow outputRow = new TableRow();
outputRow.set("start_ip", startIp);
outputRow.set("ip", i);
if(row.get("postal_code") != null && !((String)row.get("postal_code")).isEmpty()){
endIp = Long.parseLong((String)row.get("end_ip"));
TableRow atrRow = new TableRow();
atrRow.set("id", "zippostal_code");
atrRow.set("value", new String[] {(String)row.get("postal_code")});
outputRow.set("atr", atrRow);
}
System.out.println(outputRow);
c.output(outputRow);
}
}
}

Related

Why do I get duplicated data from flink?

I'm new to flink, and I'm trying to read a stream from kafka, however I'm getting duplicate data processed, and I'm wondering why ?
I know that's the problem came from flink because when I wrote a simple consumer in java I got no duplicate data
flink-connector-kafka_2.11 version 1.10.0
flink version 1.11
is there any issue to check if flink is processing only once the data provided by kafka ?
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
KafkaConsumer consumer = new KafkaConsumer("fashion","172.16.3.241:9092","fashion","org.apache.kafka.common.serialization.ByteBufferDeserializer");
FlinkKafkaConsumer<JsonNode> stream_consumer = new FlinkKafkaConsumer<>(consumer.getTopic(), new DeserializationSchema<JsonNode>() {
private final ObjectMapper objMapper = new ObjectMapper();
#Override
public JsonNode deserialize(byte[] bytes) throws IOException {
return objMapper.readValue(bytes,JsonNode.class);
}
#Override
public boolean isEndOfStream(JsonNode jsonNode) {
return false;
}
#Override
public TypeInformation<JsonNode> getProducedType() {
return TypeExtractor.getForClass(JsonNode.class);
}
}, consumer.getProperties());
DataStream<JsonNode> tweets = env.addSource(stream_consumer);
tweets.flatMap(new getTweetSchema());
env.execute("Flink Streaming Java API Skeleton");
}
private static class getTweetSchema implements FlatMapFunction<JsonNode, Tweet>{
private static final long serialVersionUID = -6867736771747690202L;
private JSONObject objTweet;
public void flatMap(JsonNode tweet, Collector<Tweet> out) throws JSONException, ParseException{
try{
if (objTweet == null){
objTweet = new JSONObject(tweet.asText());
}
HashSet<String> hashtag = new HashSet<>();
String text = objTweet.get("text").toString();
DateFormat dateFormat = new SimpleDateFormat("EEE MMM d HH:mm:ss Z yyyy", Locale.ENGLISH );
Date created_at = dateFormat.parse(objTweet.get("created_at").toString());
String source = objTweet.get("source").toString();
source = source.substring(source.length() - 11).replaceAll("</a>","");
String lang = objTweet.get("lang").toString();
Boolean isRT = text.matches("^RT.*");
Long id = Long.parseLong(objTweet.get("id").toString());
if (objTweet.has("extended_tweet")){
JSONArray arr = objTweet.getJSONObject("extended_tweet").getJSONObject("entities").getJSONArray("hashtags");
if(!(arr.isEmpty())){
for(int i = 0; i< arr.length();i++){
hashtag.add(arr.getJSONObject(i).get("text").toString());
}
System.out.println(arr);
}
}
out.collect(new Tweet(id, text,created_at,source,lang,isRT,hashtag));
}catch (JSONException | ParseException e){
System.out.println("e");
throw e;
}
}
}

How push to store data in Redis for Custom Map?

I'm storing Data in Map and Fetching the Field values now instead of Map I want to store data in Redis. I'm new to Redis and using below code to store in Redis :
public class CachingRedis {
private static HashMap<String, UserFields> Cache;
private static JedisPool pool = null;
private static final String redisHost = "localhost";
private static final Integer redisPort = 6379;
static Jedis jedis = null;
static User u;
public CachingRedis() {
pool = new JedisPool(redisHost, redisPort);
this.Cache = new HashMap<String, UserFields>();
}
public static void main(String[] args) throws ExecutionException {
CachingRedis gt = new CachingRedis();
gt.addSets();
}
private void addSets() {
InputStream in = ClassLoader.class.getResourceAsStream("/users.csv");
u = new User(in);
String[] consideredUserFields = { "Area","FirstName","LastName","Contact","Level"};
List<String[]> users = p.getUsers();
jedis = pool.getResource();
int count1 = 0;
String token = null;
String fieldName = null;
String fieldVal = null;
for (int i = 0; i < users.size(); i++) {
UserFields uFields = new UserFields();
String tmpId = Integer.toString(p.getUserId(i));
String[] tmpFields = Users.get(i);
for (int j = 0; j < tmpFields.length; j++) {
fieldName = consideredUsersFields[j];
fieldVal = tmpFields[j];
if (Cache != null && Cache.containsKey(tmpId)) {
uFields = Cache.get(tmpId);
uFields.setFieldKeyValues(fieldName, fieldVal);
Cache.put(tmpId, uFields);
**jedis.hsetnx( tmpId,fieldName,fieldVal );**
} else {
uFields.setFieldKeyValues(fieldName, fieldVal);
Cache.put(tmpId, pFields);
**jedis.hsetnx( tmpId,fieldName,fieldVal );**
}
}
}
} }
I'm getting the following error
Exception in thread "main" redis.clients.jedis.exceptions.JedisDataException: WRONGTYPE Operation against a key holding the wrong kind of value
at redis.clients.jedis.Protocol.processError(Protocol.java:117)
at redis.clients.jedis.Protocol.process(Protocol.java:142)
at redis.clients.jedis.Protocol.read(Protocol.java:196)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:288)
at redis.clients.jedis.Connection.getIntegerReply(Connection.java:213)
at redis.clients.jedis.Jedis.lpush(Jedis.java:814)
at net.cintell.CachingRedis.addSets(CachingRedis.java:85)
at net.cintell.CachingRedis.main(CachingRedis.java:48)
Can anyone tell where I am doing wrong? I want to store the csv file into redis and fetch respective fields?
public class CachingRedis {
private static HashMap<String, UserFields> Cache;
private static JedisPool pool = null;
private static final String redisHost = "localhost";
private static final Integer redisPort = 6379;
static Jedis jedis = null;
static User u;
public CachingRedis() {
pool = new JedisPool(redisHost, redisPort);
this.Cache = new HashMap<String, UserFields>();
}
public static void main(String[] args) throws ExecutionException {
CachingRedis gt = new CachingRedis();
gt.addSets();
}
private void addSets() {
InputStream in = ClassLoader.class.getResourceAsStream("/users.csv");
u = new User(in);
String[] consideredUserFields = { "Area","FirstName","LastName","Contact","Level"};
List<String[]> users = p.getUsers();
jedis = pool.getResource();
int count1 = 0;
String token = null;
String fieldName = null;
String fieldVal = null;
for (int i = 0; i < users.size(); i++) {
UserFields uFields = new UserFields();
String tmpId = Integer.toString(p.getUserId(i));
String[] tmpFields = Users.get(i);
for (int j = 0; j < tmpFields.length; j++) {
fieldName = consideredUsersFields[j];
fieldVal = tmpFields[j];
if (Cache != null && Cache.containsKey(tmpId)) {
uFields = Cache.get(tmpId);
uFields.setFieldKeyValues(fieldName, fieldVal);
Cache.put(tmpId, uFields);
} else {
uFields.setFieldKeyValues(fieldName, fieldVal);
Cache.put(tmpId, pFields);
}
}
}
Map<String, String> Properties = new HashMap<String, String>();
for (Map.Entry<String, PersonaFields> entry : Cache.entrySet()) {
Properties.put("Id", entry.getKey());
Properties.put("Area", entry.getValue()
//rest of the fields
jedis.hmset("Users"+ entry.getKey(), Properties);
}
} }
I have loaded entire cache map into redis by loading each key value into other map so that I can retrieve based on same key value
from redis

Java - JSONObject Parsing only 1 string?

I'm fairly new to JSON parsing in Java but when I try and parse this JSON String & find out it's "ID", it repeats the same one twice.
[
{"id":"{ID1}","time":123},
{"id":"{ID2}","time":124}
]
This is my Java code:
// v = json string, c = "id"
String output = v.replace("[", "").replace("]", "");
JSONObject obj = new JSONObject(output);
ArrayList<String> list = new ArrayList<String>();
for(int i = 0 ; i < obj.length(); i++){
System.out.println(obj.getString(c));
list.add(obj.getString(c));
}
return list.get(1);
it returns ID1 twice or more. Please help
Your JSON represents an array - so that's how you should parse it. You can then easily get the id property from each JSONObject within the array. For example:
import org.json.*;
public class Test {
public static void main(String[] args) throws JSONException {
String json =
"[{\"id\":\"{ID1}\",\"time\":123}, {\"id\":\"{ID2}\",\"time\":124}]";
JSONArray array = new JSONArray(json);
for (int i = 0; i < array.length(); i++) {
JSONObject o = array.getJSONObject(i);
System.out.println(o.getString("id"));
}
}
}
Output:
{ID1}
{ID2}
I fixed my code by using it as a JSONArray(Thanks #HotLicks)
JSONArray obj = new JSONArray(v);
ArrayList<String> list = new ArrayList<String>();
for(int i = 0 ; i < obj.length(); i++){
Logger.WriteOutput(obj.getJSONObject(i).getString(c), Logger.LogLevel.Info);
}
Try this :
// This line is useless
// String output = v.replace("[", "").replace("]", "");
JSONArray arr = new JSONArray(output);
ArrayList<String> list = new ArrayList<String>();
for(int i = 0 ; i < arr.length(); i++){
System.out.println(arr.getJSONObject(i).getString(c));
list.add(arr.getJSONObject(i).getString(c));
}
First create a java bean for your json (for example here):
public class Item {
#JsonProperty("id")
private String id;
#JsonProperty("time")
private Integer time;
public final String getId() {
return id;
}
public final void setId(String id) {
this.id = id;
}
public final Integer getTime() {
return time;
}
public final void setTime(Integer time) {
this.time = time;
}
}
If you are using Jackson Java JSON-processor, you can create a List from JSON-String this way:
ObjectMapper objectMapper = new ObjectMapper();
try {
List<Item> items = objectMapper.readValue(
yourJSONString,
objectMapper.getTypeFactory().constructCollectionType(List.class, Item.class));
for (Item item : items) {
System.out.println(item.getId());
}
} catch (IOException e) {
e.printStackTrace();
}
use below code
String v = "[{\"id\":\"ID1\",\"time\":123},{\"id\":\"ID2\",\"time\":124}]";
String c = "id";
JSONArray obj = null;
try {
obj = new JSONArray(v);
ArrayList<String> list = new ArrayList<String>();
for (int i = 0; i < obj.length(); i++) {
JSONObject j = (JSONObject) obj.get(i);
System.out.println(j.getString(c));
list.add(j.getString(c));
}
} catch (JSONException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
note that i have slightly corrected the json structure too
before
[
{"id":"{ID1}","time":123},
{"id":"{ID2}","time":124}
]
after
[
{"id":"ID1","time":123},
{"id":"ID2","time":124}
]

No value for "Invoice"

Anyone knows what might be the reason why my "invoice" does not have an value? Tallied with the php response , it is called invoice. At this line String invoice = jtransaction.getString("invoice");
public static ArrayList<Transaction> getMemberTransactions(String memberId)
{
String url= second_URL + "get_member_transactions.php";
String method = GET;
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("member_id", memberId));
JSONObject result = makeHttpRequest(url, method, params);
try {
if (result.getInt("success") == 1) {
ArrayList<Transaction> list = new ArrayList<Transaction>();
JSONArray jItems = result.getJSONArray("transaction_info");
int count = jItems.length();
for (int i = 0; i < count; i++) {
JSONObject jtransaction = jItems.getJSONObject(i);
SimpleDateFormat sdf = new SimpleDateFormat(DATE_FORMAT,
Locale.US);
Date date = null;
try {
date = sdf.parse(jtransaction.getString("date"));
} catch (ParseException e) {
e.printStackTrace();
}
String invoice = jtransaction.getString("invoice");
String warehouse = jtransaction.getString("warehouse");
Transaction transaction = new Transaction(date,invoice, warehouse);
list.add(transaction);
}
return list;
}
} catch (JSONException e) {
e.printStackTrace();
}
return null;
}
php
while ($row = sqlsrv_fetch_array($stmt, SQLSRV_FETCH_ASSOC))
{
$transactionInfo[]["date"] = get_date($row['Transaction_Date']);
$transactionInfo[]["invoice"] = $row['Invoice_No'];
$transactionInfo[]["warehouse"] = $row['WarehouseName'];
}
if(!empty($transactionInfo)===true)
{
response_success($transactionInfo);
}
function response_success($transactionInfo) {
$response = array();
$response["success"] = 1;
$response["transaction_info"] = $transactionInfo;
echo json_encode($response);
exit;
}
This:
$transactionInfo[]["date"] = get_date($row['Transaction_Date']);
$transactionInfo[]["invoice"] = $row['Invoice_No'];
$transactionInfo[]["warehouse"] = $row['WarehouseName'];
will create three separate items in $transactionInfo, one containing date, one with invoice and one with warehouse.
Example:
array(3) {
[0]=>
array(1) {
["date"]=>
string(10) "2014-10-20"
}
[1]=>
array(1) {
["invoice"]=>
string(5) "08/15"
}
[2]=>
array(1) {
["warehouse"]=>
int(13)
}
}
I suppose you want them in one item, so you have to build it like this:
$item["date"] = get_date($row['Transaction_Date']);
$item["invoice"] = $row['Invoice_No'];
$item["warehouse"] = $row['WarehouseName'];
// now add the item to the array
$transactionInfo[] = $item;
Example:
array(1) {
[0]=>
array(3) {
["date"]=>
string(10) "2014-10-20"
["invoice"]=>
string(5) "08/15"
["warehouse"]=>
int(13)
}
}

Passing values in runnable class constructor in Java

I hope somebody could please help me out in identifying to where I am going wrong with the codes below.
public class CCFileImpl implements CCFileAbs {
private LogMe logMe = null;
private ExecutorService ccfileimpl_exsc = null;
private CCProcessorImpl cProc = null;
private DataUtil dUtil = null;
public CCFileImpl() {
this.logMe = LogMe.getLogger();
if (dUtil == null) {
dUtil = new DataUtil();
}
}
#Override
public void getFilesForProcess() {
CCHeader cHead = null;
Future future = null;
String sPath = PropReader.getPropValue(PropReader.FILEDIR); //D:\samples\
int iCtr = 0;
ccfileimpl_exsc = Executors.newFixedThreadPool(Integer.parseInt(PropReader.getPropValue(PropReader.TPool_File)));
Date dToday = new Date();
SimpleDateFormat sdf = new SimpleDateFormat("MM/dd/yyyy");
Iterator iter = dUtil.getFilesForProcess(sdf.format(dToday)).iterator();
String sFileGroupName = "", sFileName = "";
String sId = null; //"testFiles";
while (iter.hasNext()) {
cHead = (CCHeader) iter.next();
sFileName = cHead.getsFileName(); //(String) iter.next();
sId = cHead.getsId();
sFileGroupName = sFileName + "_" + iCtr++;
dUtil.updTPDHDRStatusById(sId); //Interface utility class // <=== And also here, when trying to update the db
// nothing happened.
cProc = new CCProcessorImpl(sId, sFileGroupName, sPath, sFileName); // <=== Problem is here?
future = ccfileimpl_exsc.submit(cProc);
}
ccfileimpl_exsc.shutdown();
}
}
The above code retrieves the files for processing then assigning it to a runnable class (below) then submitting it to an executorService class.
Now i can't understand to why the passed values of the constructor (below) is set to null/space and only the sPath variable has a definite value.
public class CCProcessorImpl implements Runnable{
private CCFileParser rpsCCParser;
private ExecutorService ccprocimpl_exsc;
private static LogMe logMe;
private final String sGroupName;
private final String sId;
private final String sFileName;
#Override
public void run() {
this.parseFiles(sId, sFileName);
}
public CCProcessorImpl(String sId, String sGroupName, String sPath, String sFileName) {
this.logMe = LogMe.getLogger();
this.sId = sId;
this.sGroupName = sGroupName;
this.sFileName = sPath + sFileName;
}
public void parseFiles(String sId, String sFileName) {
try {
Future future = null;
rpsCCParser = new CCFileParser(sId, sFileName);
ArrayList aList = rpsCCParser.getFileContent();
String sGroupName = sId + "_";
ccprocimpl_exsc = Executors.newFixedThreadPool(Integer.parseInt(PropReader.getPropValue(PropReader.TPool_Content)));
int iStart = 0, iSize = 9, iEnd = iSize;
for (int iCtr = 0; iCtr <= ((aList.size() / 10) - 1); iCtr++, iStart += iSize, iEnd += iSize) {
future = ccprocimpl_exsc.submit(new CCUpdater(aList.subList(iStart, iEnd), sGroupName + iCtr));
}
future.get();
ccprocimpl_exsc.shutdown();
} catch (ExecutionException e) {
throw new RuntimeException(e);
} catch (InterruptedException ie) {
throw new RuntimeException(ie);
}
}
}
Also as a supplementary question, why is it when i tried to update the db table no updates were performed? Would this be related to being in a thread environment?
Why don't you use the futures returned by ccfileimpl_exsc.submit()?
Instead, you call ccfileimpl_exsc.shutdown() right after submitting the jobs, killing them before they finish.

Categories

Resources