Fetching data from RestUI which comes in com.ow.vo.computer.ApplicationUsage.
Fetching data from Database.
Compare each content as given below
public void compare(com.ow.vo.computer.ApplicationUsage src, ApplicationUsage dest) {
if(!Objects.equals(src.getApplicationItemCost(), dest.getApplicationItemCost())) {
dest.setApplicationItenCost(src.getApplicationItemCost);
}
if(!Objects.equals(src.getAvgUsageTime(), dest.getAvgUsageTime())) {
}
if(!Objects.equals(src.getBundleApplicationId(), dest.getBundleApplicationId())) {
}
if(!Objects.equals(src.getBundleApplicationName(), dest.getBundleApplicationName())) {
}
if(!Objects.equals(src.getDiscoveredDate(), dest.getDiscoveredDate())) {
}
.
.
.
If any update then only update it into database else do not.
The structure of both src and dest is almost same with difference in date types like Source have String and dest have Date.
Could anyone suggest more elegant way or design pattern to handle this situation in place of comparison each field one by one.
Business Logic: (Load chunk from database for 100 records, hit rest api for 100 records. Compare each record 1 by 1( all properties in that 1). If any no difference, Do nothing. If difference Merge it, If new Create record, if not exist in Rest API delete from database too.)
Related
We are using spark for file processing. We are processing pretty big files with each file around 30 GB with about 40-50 million lines. These files are formatted. We load them into data frame. Initial requirement was to identify records matching criteria and load them to MySQL. We were able to do that.
Requirement changed recently. Records not meeting criteria are now to be stored in an alternate DB. This is causing issue as the size of collection is too big. We are trying to collect each partition independently and merge into a list as suggested here
https://umbertogriffo.gitbooks.io/apache-spark-best-practices-and-tuning/content/dont_collect_large_rdds.html
We are not familiar with scala, so we are having trouble converting this to Java. How can we iterate over partitions one by one and collect?
Thanks
Please use df.foreachPartition to execute for each partition independently and won't returns to driver. You can save the matching results into DB in each executor level. If you want to collect the results in driver, use mappartitions which is not recommended for your case.
Please refer the below link
Spark - Java - foreachPartition
dataset.foreachPartition(new ForeachPartitionFunction<Row>() {
public void call(Iterator<Row> r) throws Exception {
while (t.hasNext()){
Row row = r.next();
System.out.println(row.getString(1));
}
// do your business logic and load into MySQL.
}
});
For mappartitions:
// You can use the same as Row but for clarity I am defining this.
public class ResultEntry implements Serializable {
//define your df properties ..
}
Dataset<ResultEntry> mappedData = data.mapPartitions(new MapPartitionsFunction<Row, ResultEntry>() {
#Override
public Iterator<ResultEntry> call(Iterator<Row> it) {
List<ResultEntry> filteredResult = new ArrayList<ResultEntry>();
while (it.hasNext()) {
Row row = it.next()
if(somecondition)
filteredResult.add(convertToResultEntry(row));
}
return filteredResult.iterator();
}
}, Encoders.javaSerialization(ResultEntry.class));
Hope this helps.
Ravi
I want to create an application that shows a user how many times he opened or used the software. For this I have created the code below. But it is not showing correct output: when I run the application first it is showing 1 and then the second time I run it it is also showing 1.
public Founder() {
initComponents();
int c=0;
c++;
jLabel1.setText(""+c);
return;
}
I’m unsure whether I’m helping you or giving you a load of new problems and unanswered questions. The following will store the count of times the class Founder has been constructed in a file called useCount.txt in the program’s working directory (probably the root binary directory, where your .class files are stored). Next time you run the program, it will read the count from the file, add 1 and write the new value back to the file.
static final Path counterFile = FileSystems.getDefault().getPath("useCount.txt");
public Founder() throws IOException {
initComponents();
// read use count from file
int useCount;
if (Files.exists(counterFile)) {
List<String> line = Files.readAllLines(counterFile);
if (line.size() == 1) { // one line in file as expected
useCount = Integer.parseInt(line.get(0));
} else { // not the right file, ignore lines from it
useCount = 0;
}
} else { // program has never run before
useCount = 0;
}
useCount++;
jLabel1.setText(String.valueOf(useCount));
// write new use count back to file
Files.write(counterFile, Arrays.asList(String.valueOf(useCount)));
}
It’s not the most elegant nor robust solution, but it may get you started. If you run the program on another computer, it will not find the file and will start counting over from 0.
When you are running your code the first time, the data related to it will be stored in your system's RAM. Then when you close your application, all the data related to it will be deleted from the RAM (for simplicity let's just assume it will be deleted, although in reality it is a little different).
Now when you are opening your application second time, new data will be stored in the RAM. This new data contains the starting state of your code. So the value of c is set to 0 (c=0).
If you want to remember the data, you have to store it in the permanent storage (your system hard drive for example). But I think you are a beginner. These concepts are pretty advanced. You should do some basic programming practice before trying such things.
Here you need to store it on permanent basic.
Refer properties class to store data permanently: https://docs.oracle.com/javase/7/docs/api/java/util/Properties.html
You can also use data files ex. *.txt, *.csv
Serialization also provide a way for persistent storage.
You can create a class that implements Serializable with a field for each piece of data you want to store. Then you can write the entire class out to a file, and you can read it back in later.Learn about serialization here:https://www.tutorialspoint.com/java/java_serialization.htm
I have the following DoFN function that kind of does it but there is no documentation of questions I could find about it.
Problem No. 1 is how do I automatically translate keys so they are constructed in BigQuery in the same way that the BigQuery does it when importing form Datastore backup file?
Problem No. 2 is how to handle timestamps? The code below breaks the pipeline with following message:
JSON object specified for non-record field: timestamp
Here is a code I wrote:
public class SensorObservationEntityToRowFn extends DoFn<Entity, TableRow> {
/**
* In this example, put the whole string into single BigQuery field.
*/
#Override
public void processElement(ProcessContext c) {
Map<String, Value> props = getPropertyMap(c.element());
TableRow row = new TableRow();
row.set("id", c.element().getKey().getPathElement(c.element().getKey().getPathElementCount()-1).getId());
if (
props.get("property1") != null &&
props.get("property2") != null
) {
// Map data from the source Entity to the destination TableRow
row.set("property1", props.get("property1").getStringValue());
row.set("property2", props.get("property2").getStringValue());
}
row.set("source_type", props.get("source_type").getStringValue());
DateTime dateTime = new DateTime(props.get("timestamp").getTimestampMicrosecondsValue()/1000L);
row.set("timestamp", dateTime);
// Output new TableRow only if all data is present in the source
c.output(row);
}
}
My expectation was to find something in helper classes, but I was unsuccessful. Guess Google is still adding new bits to their APIs. Maybe in the next version.
The biggest problem is that the API is a little not intuitive and inconsistent with other parts. Entity's key should have it's own accessor method instead of having to dig in the ancestor path like this (get the last element of the path array):
getKey().getPathElement(c.element().getKey().getPathElementCount()-1).getId()
The second problem with timestamps: a little unelegant as well. I couldn't find anywhere in the documentation, how to format timestamp in Datastore or in BigQuery from the API point of view (data type, length of the field, its format, etc.). The solution that works now requires third party library ("joda"):
import org.joda.time.DateTime;
import org.joda.time.format.ISODateTimeFormat;
And the below data translation. You have to remember that it is in milliseconds in one place and in microseconds in another. Another unnecessary confusion.
DateTime dateTime = new DateTime(props.get("timestamp").getTimestampMicrosecondsValue()/1000L);
row.set("timestamp", ISODateTimeFormat.dateTime().print(dateTime));
Hope this helps others working with Dataflow and moving data from one place to another.
I have a Couchbase cluster which has around 25M documents. I am able to read them sequentially and also I have a function that can read a specific number of documents from the database. But my use case is slightly different since I cannot store all the 25M documents (each document is huge) in memory.
I need to process the documents in batches, say 1M/batch, push that batch to my memory, (do some operation on those documents) and push the next batch.
The function which I have written to read specific number of documents doesn't ensure that it returns a different set of documents when called again.
Is there a way by which I can complete this functionality? I also have a function which can create documents in batches. I am not sure if I can write a similar function that can read the documents in batches. The function is given below.
public void createMultipleCustomerDocuments(String docId, Customer myCust, long numDocs) {
Gson gson = new GsonBuilder().create();
JsonObject content = JsonObject.fromJson(gson.toJson(myCust));
JsonDocument document = JsonDocument.create(docId, content);
jsonDocuments.add(document);
documentCounter++;
if (documentCounter == numDocs) {
Observable.from(jsonDocuments).flatMap(new Func1<JsonDocument, Observable<JsonDocument>>() {
public Observable<JsonDocument > call(final JsonDocument docToInsert) {
return (theBucket.async().upsert(docToInsert));
}
}).last().toBlocking().single();
documentCounter = 0;
//System.out.println("Batch counter: " + batchCounter++);
}
Can someone please help me with this?
I would try to create a view which containing all of the documents, and then querying the view with skip and limit. (Can use .startKey() and startKeyId() functions instead of skip() to avoid overhead.)
but, remember not to keep that view in a production env, it will be cpu hog.
Another option, use the DCP protocol to replicate the database into your app. but it is more work.
I am trying to get a grasp on Google App Engine programming and wonder what the difference between these two methods is - if there even is a practical difference.
Method A)
public Collection<Conference> getConferencesToAttend(Profile profile)
{
List<String> keyStringsToAttend = profile.getConferenceKeysToAttend();
List<Conference> conferences = new ArrayList<Conference>();
for(String conferenceString : keyStringsToAttend)
{
conferences.add(ofy().load().key(Key.create(Conference.class,conferenceString)).now());
}
return conferences;
}
Method B)
public Collection<Conference> getConferencesToAttend(Profile profile)
List<String> keyStringsToAttend = profile.getConferenceKeysToAttend();
List<Key<Conference>> keysToAttend = new ArrayList<>();
for (String keyString : keyStringsToAttend) {
keysToAttend.add(Key.<Conference>create(keyString));
}
return ofy().load().keys(keysToAttend).values();
}
the "conferenceKeysToAttend" list is guaranteed to only have unique Conferences - does it even matter then which of the two alternatives I choose? And if so, why?
Method A loads entities one by one while method B does a bulk load, which is cheaper, since you're making just 1 network roundtrip to Google's datacenter. You can observe this by measuring time taken by both methods while loading a bunch of keys multiple times.
While doing a bulk load, you need to be cautious about loaded entities, if datastore operation throws exception. Operation might succeed even when some of the entities are not loaded.
The answer depends on the size of the list. If we are talking about hundreds or more, you should not make a single batch. I couldn't find documentation what is the limit, but there is a limit. If it not that much, definitely go with loading one by one. But, you should make the calls asynchronous by not using the now function:
List<<Key<Conference>> conferences = new ArrayList<Key<Conference>>();
conferences.add(ofy().load().key(Key.create(Conference.class,conferenceString));
And when you need the actual data:
for (Key<Conference> keyConference : conferences ) {
Conference c = keyConference.get();
......
}