when creating JSON, how to avoid getting same data twice into JSON

when creating JSON, how to avoid getting same data twice into JSON - java

I need some advice. I´ve built a tool that does image operations for uploaded pictures and saves the results. Each time an operation is done, it creates an entry in a JSON file in its folder.
So if there is no JSON it creates a new one and if there is a JSON it appends the information. The problem is, if someone accidentally adds an image that already had been added, the json appends the information again.
Its too much code to post here, but I would be thankful if someone has some advice on how to compare the files before appending or something else.

"Its too much code to post here", so I am going just to lay down the approach fundamentals :) .
An in memory solution would work if you never use the same folder again after exiting the application. If you count duplicates by a property like picture name, then you can go for a simple solution like a HashMap.
HashMap<String,Boolean> hashMapImages = ...
String newImageName = ...
if (! hashMapImages.containsKey(newImageName)){
hashMapImages.put(newImageName,true);
//... append to JSON
}
If you actually want distinct pictures in content you have to design a hash function for your images. As an example you could use a hash function which sums pixel values every 256 pixels away. For random images it is enough to get a distinctive hash value.
int pos = 0;
long hashsum;
while (pos< image.length){
hashsum += image[pos];
pos += 256;
}
long hashKey = hashsum % 65536; //for a 16 bit key
If you plan to reuse same folder again construct an additional JSON file which contains just the key values (whichever key you choose to use). Parse this JSON file and check if you have the image, before appending.
HashMap<String,Boolean> hashMapImages = loadFromJSONContentFolder();
String key = getKey()
if (! hashMapImages.containsKey(key)){
hashMapImages.put(key,true);
//... append to JSON
}

Related

String ID for RenderedImage?

I have to use RenderedImages as parts of keys in a cache (the rest of the key is made up of an X and Y coordinate pair).
Previously, I wrapped the RenderedImage in a custom class with a string field filename and just used the class's filename/toString() to construct a key, but it turns out the keys absolutely must use RenderedImages.
Now, I'm writing the image to a ByteArrayOutputStream and then using that to make a base64 string using DatatypeConverter, but the resulting string from that is massive and really slows down the program.
Is there a good method of creating some kind of string ID from a RenderedImage that doesn't slow everything down too much?
Thanks.

You could take the contents of the image and create an SHA digest or MD5 checksum of the byte contents ? That's going to be fairly unique...
See https://howtodoinjava.com/core-java/io/how-to-generate-sha-or-md5-file-checksum-hash-in-java/
If scanning the whole file is taking too long, just take the first n characters ?

How to find specific substrings (that can be very similar) and do different things with them in java

I am writing a program that takes in a file and extracts data from a single string within the file. I run into a problem when I try to separate the substrings in the way that I want. The goal is to separate the larger chunks of the line from other large chunks without separating the smaller chunks within the larger chunk (separated by commas).
An example of the file contents would be this: (Although it is slightly long, the files I have may vary from short lists like this to 50 or even to 100 blocks of item sets)
{"timeStamp":1477474644345,"itemSets":[{"mode":"any","sortrank":4999,"type":"custom","priority":false,"isGlobalForMaps":true,"uid":"LOL_D957E9EC-39E4-943E-C55E-52B63E05D99C","isGlobalForChampions":false,"associatedMaps":[],"associatedChampions":[40],"blocks":[{"type":"starting","items":[{"id":"3303","count":1},{"id":"2031","count":1},{"id":"1082","count":1},{"id":"3340","count":1},{"id":"3363","count":1},{"id":"2043","count":1},{"id":"3364","count":1}]},{"type":"Support Build Items","items":[{"id":"2049","count":1},{"id":"1001","count":1},{"id":"3165","count":1},{"id":"3117","count":1},{"id":"2301","count":1},{"id":"3089","count":1},{"id":"3135","count":1},{"id":"3504","count":1}]},{"type":"AP Build Items","items":[{"id":"3165","count":1},{"id":"3020","count":1},{"id":"3089","count":1},{"id":"3135","count":1},{"id":"3285","count":1},{"id":"3116","count":1}]},{"type":"Other Items (Situational Items)","items":[{"id":"3026","count":1},{"id":"3285","count":1},{"id":"3174","count":1},{"id":"3001","count":1},{"id":"3504","count":1}]}],"title":"Janna Items","map":"any"},{"mode":"any","sortrank":0,"type":"custom","priority":false,"isGlobalForMaps":false,"uid":"LOL_F265D25A-EA44-5B86-E37A-C91BD73ACB4F","isGlobalForChampions":true,"associatedMaps":[10],"associatedChampions":[],"blocks":[{"type":"Searching","items":[{"id":"3508","count":1},{"id":"3031","count":1},{"id":"3124","count":1},{"id":"3072","count":1},{"id":"3078","count":1},{"id":"3089","count":1}]}],"title":"TEST","map":"any"}]}
The code I have attempted to write tries to separate this into meaningful chunks, here is what I have written so far:
cutString = dataFromFile.substring(dataFromFile.indexOf("itemSets\":") + 11, dataFromFile.indexOf("},{"));
stringContinue = dataFromFile.substring(cutString.length());
while(stringContinue.contains("},{"))
{
//Do string manipulation to cut every part and re-attach it, then re-check to find if this ("},{\"id") is not there
if(stringContinue.contains("},{\"id"))
{
//if(stringContinue.equals(anObject))
cutString = cutString + stringContinue.substring(0, stringContinue.indexOf("},{\"id"));
}
else if(stringContinue.contains("},{\"count"))
{
cutString = cutString + stringContinue.substring(0, stringContinue.indexOf("},{\"count"));
}
else if(stringContinue.contains("},{"))
{
cutString = cutString + stringContinue.substring(0, stringContinue.indexOf("},{"));
}
stringContinue = stringContinue.substring(cutString.length());
//Check if we see a string pattern that is the cut off point
//if()
//System.out.println(stringContinue);
System.out.println(cutString);
}
But when I run it, I get an output like this:
{"mode":"any","sortrank":4999,"type":"custom","priority":false,"isGlobalForMaps":true,"uid":"LOL_D957E9EC-39E4-943E-C55E-52B63E05D99C","isGlobalForChampions":false,"associatedMaps":[],"associatedChampions":[40],"blocks":[{"type":"starting","items":[{"id":"3303","count":1arting","items":[{"id":"3303","count":1
The output I want to achieve is this:
{"mode":"any","sortrank":4999,"type":"custom","priority":false,"isGlobalForMaps":true,"uid":"LOL_D957E9EC-39E4-943E-C55E-52B63E05D99C","isGlobalForChampions":false,"associatedMaps":[],"associatedChampions":[40],"blocks":[{"type":"starting","items":[{"id":"3303","count":1},{"id":"2031","count":1},{"id":"1082","count":1},{"id":"3340","count":1},{"id":"3363","count":1},{"id":"2043","count":1},{"id":"3364","count":1}]},{"type":"Support Build Items","items":[{"id":"2049","count":1},{"id":"1001","count":1},{"id":"3165","count":1},{"id":"3117","count":1},{"id":"2301","count":1},{"id":"3089","count":1},{"id":"3135","count":1},{"id":"3504","count":1}]},{"type":"AP Build Items","items":[{"id":"3165","count":1},{"id":"3020","count":1},{"id":"3089","count":1},{"id":"3135","count":1},{"id":"3285","count":1},{"id":"3116","count":1}]},{"type":"Other Items (Situational Items)","items":[{"id":"3026","count":1},{"id":"3285","count":1},{"id":"3174","count":1},{"id":"3001","count":1},{"id":"3504","count":1}]}],"title":"Janna Items","map":"any"}
{"mode":"any","sortrank":0,"type":"custom","priority":false,"isGlobalForMaps":false,"uid":"LOL_F265D25A-EA44-5B86-E37A-C91BD73ACB4F","isGlobalForChampions":true,"associatedMaps":[10],"associatedChampions":[],"blocks":[{"type":"Searching","items":[{"id":"3508","count":1},{"id":"3031","count":1},{"id":"3124","count":1},{"id":"3072","count":1},{"id":"3078","count":1},{"id":"3089","count":1}]}],"title":"TEST","map":"any"}
So then my question is how do I check for the point where I can separate the blocks without getting java to detect the same pattern that it uses to separate the smaller chunks? Basically I am looking for a pattern like this ("},{"), but not this ("},{\"id:") or this ("},{\count:"). Is there any other things that the String Class can offer for functionality that is similar that i am not aware of?
Edit: Although using a json parser would make things easier and convenient for this type of problem, another one rises because it would make the program only take in json files. This question is more for string manipulation and trying to find a part of the string that can separate the large blocks of information without touching or changing (very minimally as possible) the smaller blocks that have the same way of separation. So far regex and splitting strings to be re-attached later seems to be the way to go unless there is a more clear-cut answer.

You could split the string into an array based on regex like this:
//fileString is the String you get from your file
String[] chunksIWant = fileString.split("\\},\\{");
This will return the String array chunksIWant split in the chunks you want. It does get rid of the regex itself, in this case "},{", so if you would need the symbols for some reason you will have to add them back afterwards.

You are getting this data from file in Json format.
So when you get that data on java side use JsonParser to convert data in JsonArray format.
Then you can iterate that JsonArray to get as JsonObject by using String name.
You can use value of JsonObject as required.

RTSP Programming with Java. How to read single part frames?

I am trying to implement RTSP Server with Java for fun.
Since I do not have any pre-knowledge about RTSP.
I am starting with analysis already made source code.
I found those code in internet.
Link :
http://nsl.cs.sfu.ca/teaching/09/820/streaming/Client.html
http://nsl.cs.sfu.ca/teaching/09/820/streaming/Server.html
http://nsl.cs.sfu.ca/teaching/09/820/streaming/VideoStream.html
http://nsl.cs.sfu.ca/teaching/09/820/streaming/RTPpacket.html
For this posting, I got question about VideoStream.java.
it has a method like below :
public int getnextframe(byte[] frame) throws Exception
{
int length = 0;
String length_string;
byte[] frame_length = new byte[5];
//read current frame length
fis.read(frame_length,0,5);
//transform frame_length to integer
length_string = new String(frame_length);
length = Integer.parseInt(length_string);
return(fis.read(frame,0,length));
}
As you can see, it casts byte[] to String than Integer. However, in my experience, the String turns out a hexa String. So I changed like... below.
Integer.parseInt(length_string.trim(), 16);
It looks OK sometimes, but sometimes gets Number Format Exception.
When I print length_string variables, it show in console like iso2a, vc1mp, 41��� ....
I can not know what I am missing here. Can you explain what is the purpose of codes here?
length_string = new String(frame_length);
length = Integer.parseInt(length_string);
P.S
Is there anyone knows full implementation of these code or other samples which does not uses extra third party libs, it could be much help for me.

I answer to myself to close this question.
Each video types has its own data structure.
In this case, I set data type for 'MJPEG', but I load 'MPEG' file.
So bytes array appears weird way.
FYI, MJPEG is just pack of jpeg. So every frame is key frame.
Other type I handled before, for instance, TS file, has i-frames and b-frames. Former one has clear image since it is key frame. However second one looks broken since it is not a key frame.

File Writer Java ; how to overwrite

i'm making an Highscore implementation for my game. Here there's what i want to do:
I have object Score which contains String Name and Integer score.
Now :
if Name isn't already in the file, add it
if Name is on the file, after a space take the String and convert into integer, so i got the score.
Now, if score is better than the actual, i have to OVERWRITE it on the file...
and here's my problem..i can i do that? how can i write exactly a string over another in a certain point of the file?

Generally it's considered too fiddly to replace text in text files for this kind of requirement, so the usual way to do it is just to read in the whole file, make the replacement and write a new version of the whole file. If you have large amounts of data you would use a database or a NoSQL solution instead.
P.S. consider using serialization, it can make things easier.

I have object Score which contains String Name and Integer score.
Use a Properties file for this instead. It provides an easy interface to load & save the file, get the keys (Name) and set or retrieve values (Score). String values are stored, but they can be converted to/from integer easily.

I concur that this is best done by fully re-serializing the entire database. On modern computers, you can push 30MB/s per disk for linear writes (more if there's sufficient cache). And if you're dealing with more than 30MB of data, you REALLY need a DB (HSQLDB, DerbyDB, BerkleyDB) are trivial DBs. Or go all the way to postgres/mysql.
However, the way to overwrite a FIXED sized section of an existing file (or rather, the way to emulate doing so), is to use:
RandomAccessFile raf = new RandomAccessFile(fileName, "rw");
try {
raf.seek(position);
raf.writeInt(newScore);
raf.close();
} finally { raf.close(); }
Note using the writeInt instead of raf.write(Integer.toHexString(newScore).getBytes()), because you really really need that to be fixed in size.
Now if the text file is intrinsicly ascii (e.g. humans will read the file), and thus the value can't be binary.. Perhaps you could keep it HexString (because that will be fixed in size), or you could a zero-padded decimal string:
But what you absoluately positively can not do is grow the string by 1 byte.
So:
bob 15
joe 7
nina 981
Can't have joe's score replaced with 10, UNLESS you've padded a bunch of spaces.
If this is your data-file, then you will absolutely have to rewrite the whole file (even if you write the extra code to only rewrite from the point of change on - statistically that'll be 50% of the file and thus not worth bothering.
One other thing - if you do rewrite, you have the risk of shortening the file.. For that you need to call
raf.setLength(0);
before writing the first byte.. Otherwise, you'll see phantom text beyond the end of your new file.

final Map<String,Long> symbol2Position = new ConcurrentHashMap<>();
final Map<String,Integer> symbol2Score = new ConcurrentHashMap<>();
final String fileName;
final RandomAccessFile raf;
// ... skipped code
void storeFull() {
raf.position(0);
raf.setLength(0);
for (Map.Entry<String,Long> e : symbol2Position.entrySet()) {
raf.writeUTF8(e.getKey());
raf.write(',');
symbol2Score.put(e.getKey(), raf.position());
raf.writeUTF8(String.format("%06d",e.getValue()));
raf.write('\n');
}
}
void updateScore(String key, int newScore) {
if (symbol2Score.containsKey(key)) {
symbol2Score.put(key, newScore);
raf.position(symbol2Position.get(key));
raf.writeString(String.format("%06d", newScore));
} else {
symbol2Score.put(key, newScore);
long fileLen = raf.length();
symbol2Position.put(key, fileLen);
raf.position(fileLen);
raf.writeString(String.format("%06d", newScore));
}
}
I'd probably rather use a DB or binary map file.. meaning file with 8B per field, 4B pointing to user-name position, and 4B representing score. But this allows for a human readable data-file, while making updates faster than just rewriting the property file.
Check out LevelDB - fastest damn DB on the planet for embedded systems. :) Main thing it has over the above, is thousands / millions of updates per second without the rand-seek-rewrite cost of updating 6 bytes randomly across a multi-GB file.

Just a thought, any specific reason for the file storage of names and scores ?
Seems like a Map<String, Integer> would serve you much better...

Extra bytes appended to values returned by HBase TableMapper

I am working with a large set of data stored in HBase. Many of the values stored in my columns are actually "vectors" of data -- multiple values. The way I've set out to handle storing multiple values is through a ByteBuffer. Since I know the type of data stored in every column in my column families, I have written a series of classes extending a base class that wraps around ByteBuffer and gives me an easy set of methods for reading individual values as well as appending additional values to the end. I have tested this class independently of my HBase project and it works as expected.
In order to update my database (nearly every row is updated in each update), I use a TableMapper mapreduce job to iterate over every row in my database. Each of my mappers (in my cluster, there are six), loads the entire update file (rarely more than 50MB) into memory and then updates each row id as it iterates over it.
The problem I am encountering is that every time I pull a data value out of the Result object, it has 4 bytes appended to the end of it. This makes things difficult for my update because I am not sure whether to expect this "padding" to be an extra 4 bytes every time or whether it could balloon out to something larger/smaller. Since I am loading this into my ByteBuffer wrapper, it is important that there is no padding because that would cause there to be gaps in my data when I appended additional data points to it which would make it impossible to read them out later without error.
I've written up a test to confirm my hypothesis by creating a test table and class. The table only has one data point per column (a single double -- I have confirmed that the length of the bytes going in is 8) and I have written the following code to retrieve and examine it.
HTable table = new HTable("test");
byte[] rowId = Bytes.toBytes("myid");
Get get = new Get(rowId);
byte[] columnFamily = Bytes.toBytes("data");
byte[] column = Bytes.toBytes("column");
get.addColumn(columnFamily, column);
Result = table.get(get);
byte[] value = result.value();
System.out.printlin("Value size: " + value.length);
double doubleVal = Bytes.toDouble(value);
System.out.println("Fetch yielded: " + doubleVal);
byte[] test = new byte[8];
for (int i = 0; i < value.length - 4; i++)
blah[i] = value[i];
double dval = Bytes.toDouble(test);
System.out.println("dval: " + dval);
table.close()
Which results in:
Value size: 12
Fetch yielded: 0.3652
dval: 0.3652
These values are to be expected.
Any thoughts on how to tackle this problem? I'm aware of the existence of serialization engines like Avro but I'm trying to avoid using them for the time being and my data is so straightforward that I feel as though I shouldn't have to.
EDIT: I've continued onward, truncating my data by the greatest common multiple of my data type size. In my experience, these extra bytes are exclusively appended to the end of my byte[] array. I've made a few classes that handle this automatically in a rather clean manner, but I'm still curious as to why this might be happening.

I had a similar problem when importing data using MapReduce into HBase. There were junk bytes appended to my rowkeys, due to this code:
public class MyReducer extends TableReducer<Text, CustomWritable, Text> {
protected void reduce(Text key, Iterable<CustomWritable> values, Context context) throws IOException, InterruptedException {
// only get first value for the example
CustomWritable value = values.iterator().next();
Put put = new Put(key.getBytes());
put.add(columnFamily, columnName, value.getBytes());
context.write(outputKey, put);
}
}
The problem is that Text.getBytes() returns the actual byte array from the backend (see Text) and the Text object is reused by the MapReduce framework. So the byte array will have junk chars from previous values it held. This change fixed it for me:
Put put = new Put(Arrays.copyOf(key.getBytes(), key.getLength()));
If you're using Text as your value type in your job somewhere, it could be doing the same thing.

Is it a jdk7 vs. jdk6 issue? Are you in two different jvm versions?
could be related to something a playorm user ran into
https://github.com/deanhiller/playorm/commit/5e6ede13477a60c2047daaf1f7a7ce55550b0289
Dean

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

when creating JSON, how to avoid getting same data twice into JSON - java

Related

String ID for RenderedImage?

How to find specific substrings (that can be very similar) and do different things with them in java

RTSP Programming with Java. How to read single part frames?

File Writer Java ; how to overwrite

Extra bytes appended to values returned by HBase TableMapper

Categories

Resources