I am writing a program that takes in a file and extracts data from a single string within the file. I run into a problem when I try to separate the substrings in the way that I want. The goal is to separate the larger chunks of the line from other large chunks without separating the smaller chunks within the larger chunk (separated by commas).
An example of the file contents would be this: (Although it is slightly long, the files I have may vary from short lists like this to 50 or even to 100 blocks of item sets)
{"timeStamp":1477474644345,"itemSets":[{"mode":"any","sortrank":4999,"type":"custom","priority":false,"isGlobalForMaps":true,"uid":"LOL_D957E9EC-39E4-943E-C55E-52B63E05D99C","isGlobalForChampions":false,"associatedMaps":[],"associatedChampions":[40],"blocks":[{"type":"starting","items":[{"id":"3303","count":1},{"id":"2031","count":1},{"id":"1082","count":1},{"id":"3340","count":1},{"id":"3363","count":1},{"id":"2043","count":1},{"id":"3364","count":1}]},{"type":"Support Build Items","items":[{"id":"2049","count":1},{"id":"1001","count":1},{"id":"3165","count":1},{"id":"3117","count":1},{"id":"2301","count":1},{"id":"3089","count":1},{"id":"3135","count":1},{"id":"3504","count":1}]},{"type":"AP Build Items","items":[{"id":"3165","count":1},{"id":"3020","count":1},{"id":"3089","count":1},{"id":"3135","count":1},{"id":"3285","count":1},{"id":"3116","count":1}]},{"type":"Other Items (Situational Items)","items":[{"id":"3026","count":1},{"id":"3285","count":1},{"id":"3174","count":1},{"id":"3001","count":1},{"id":"3504","count":1}]}],"title":"Janna Items","map":"any"},{"mode":"any","sortrank":0,"type":"custom","priority":false,"isGlobalForMaps":false,"uid":"LOL_F265D25A-EA44-5B86-E37A-C91BD73ACB4F","isGlobalForChampions":true,"associatedMaps":[10],"associatedChampions":[],"blocks":[{"type":"Searching","items":[{"id":"3508","count":1},{"id":"3031","count":1},{"id":"3124","count":1},{"id":"3072","count":1},{"id":"3078","count":1},{"id":"3089","count":1}]}],"title":"TEST","map":"any"}]}
The code I have attempted to write tries to separate this into meaningful chunks, here is what I have written so far:
cutString = dataFromFile.substring(dataFromFile.indexOf("itemSets\":") + 11, dataFromFile.indexOf("},{"));
stringContinue = dataFromFile.substring(cutString.length());
while(stringContinue.contains("},{"))
{
//Do string manipulation to cut every part and re-attach it, then re-check to find if this ("},{\"id") is not there
if(stringContinue.contains("},{\"id"))
{
//if(stringContinue.equals(anObject))
cutString = cutString + stringContinue.substring(0, stringContinue.indexOf("},{\"id"));
}
else if(stringContinue.contains("},{\"count"))
{
cutString = cutString + stringContinue.substring(0, stringContinue.indexOf("},{\"count"));
}
else if(stringContinue.contains("},{"))
{
cutString = cutString + stringContinue.substring(0, stringContinue.indexOf("},{"));
}
stringContinue = stringContinue.substring(cutString.length());
//Check if we see a string pattern that is the cut off point
//if()
//System.out.println(stringContinue);
System.out.println(cutString);
}
But when I run it, I get an output like this:
{"mode":"any","sortrank":4999,"type":"custom","priority":false,"isGlobalForMaps":true,"uid":"LOL_D957E9EC-39E4-943E-C55E-52B63E05D99C","isGlobalForChampions":false,"associatedMaps":[],"associatedChampions":[40],"blocks":[{"type":"starting","items":[{"id":"3303","count":1arting","items":[{"id":"3303","count":1
The output I want to achieve is this:
{"mode":"any","sortrank":4999,"type":"custom","priority":false,"isGlobalForMaps":true,"uid":"LOL_D957E9EC-39E4-943E-C55E-52B63E05D99C","isGlobalForChampions":false,"associatedMaps":[],"associatedChampions":[40],"blocks":[{"type":"starting","items":[{"id":"3303","count":1},{"id":"2031","count":1},{"id":"1082","count":1},{"id":"3340","count":1},{"id":"3363","count":1},{"id":"2043","count":1},{"id":"3364","count":1}]},{"type":"Support Build Items","items":[{"id":"2049","count":1},{"id":"1001","count":1},{"id":"3165","count":1},{"id":"3117","count":1},{"id":"2301","count":1},{"id":"3089","count":1},{"id":"3135","count":1},{"id":"3504","count":1}]},{"type":"AP Build Items","items":[{"id":"3165","count":1},{"id":"3020","count":1},{"id":"3089","count":1},{"id":"3135","count":1},{"id":"3285","count":1},{"id":"3116","count":1}]},{"type":"Other Items (Situational Items)","items":[{"id":"3026","count":1},{"id":"3285","count":1},{"id":"3174","count":1},{"id":"3001","count":1},{"id":"3504","count":1}]}],"title":"Janna Items","map":"any"}
{"mode":"any","sortrank":0,"type":"custom","priority":false,"isGlobalForMaps":false,"uid":"LOL_F265D25A-EA44-5B86-E37A-C91BD73ACB4F","isGlobalForChampions":true,"associatedMaps":[10],"associatedChampions":[],"blocks":[{"type":"Searching","items":[{"id":"3508","count":1},{"id":"3031","count":1},{"id":"3124","count":1},{"id":"3072","count":1},{"id":"3078","count":1},{"id":"3089","count":1}]}],"title":"TEST","map":"any"}
So then my question is how do I check for the point where I can separate the blocks without getting java to detect the same pattern that it uses to separate the smaller chunks? Basically I am looking for a pattern like this ("},{"), but not this ("},{\"id:") or this ("},{\count:"). Is there any other things that the String Class can offer for functionality that is similar that i am not aware of?
Edit: Although using a json parser would make things easier and convenient for this type of problem, another one rises because it would make the program only take in json files. This question is more for string manipulation and trying to find a part of the string that can separate the large blocks of information without touching or changing (very minimally as possible) the smaller blocks that have the same way of separation. So far regex and splitting strings to be re-attached later seems to be the way to go unless there is a more clear-cut answer.
You could split the string into an array based on regex like this:
//fileString is the String you get from your file
String[] chunksIWant = fileString.split("\\},\\{");
This will return the String array chunksIWant split in the chunks you want. It does get rid of the regex itself, in this case "},{", so if you would need the symbols for some reason you will have to add them back afterwards.
You are getting this data from file in Json format.
So when you get that data on java side use JsonParser to convert data in JsonArray format.
Then you can iterate that JsonArray to get as JsonObject by using String name.
You can use value of JsonObject as required.
I am trying to implement RTSP Server with Java for fun.
Since I do not have any pre-knowledge about RTSP.
I am starting with analysis already made source code.
I found those code in internet.
Link :
http://nsl.cs.sfu.ca/teaching/09/820/streaming/Client.html
http://nsl.cs.sfu.ca/teaching/09/820/streaming/Server.html
http://nsl.cs.sfu.ca/teaching/09/820/streaming/VideoStream.html
http://nsl.cs.sfu.ca/teaching/09/820/streaming/RTPpacket.html
For this posting, I got question about VideoStream.java.
it has a method like below :
public int getnextframe(byte[] frame) throws Exception
{
int length = 0;
String length_string;
byte[] frame_length = new byte[5];
//read current frame length
fis.read(frame_length,0,5);
//transform frame_length to integer
length_string = new String(frame_length);
length = Integer.parseInt(length_string);
return(fis.read(frame,0,length));
}
As you can see, it casts byte[] to String than Integer. However, in my experience, the String turns out a hexa String. So I changed like... below.
Integer.parseInt(length_string.trim(), 16);
It looks OK sometimes, but sometimes gets Number Format Exception.
When I print length_string variables, it show in console like iso2a, vc1mp, 41��� ....
I can not know what I am missing here. Can you explain what is the purpose of codes here?
length_string = new String(frame_length);
length = Integer.parseInt(length_string);
P.S
Is there anyone knows full implementation of these code or other samples which does not uses extra third party libs, it could be much help for me.
I answer to myself to close this question.
Each video types has its own data structure.
In this case, I set data type for 'MJPEG', but I load 'MPEG' file.
So bytes array appears weird way.
FYI, MJPEG is just pack of jpeg. So every frame is key frame.
Other type I handled before, for instance, TS file, has i-frames and b-frames. Former one has clear image since it is key frame. However second one looks broken since it is not a key frame.
i'm making an Highscore implementation for my game. Here there's what i want to do:
I have object Score which contains String Name and Integer score.
Now :
if Name isn't already in the file, add it
if Name is on the file, after a space take the String and convert into integer, so i got the score.
Now, if score is better than the actual, i have to OVERWRITE it on the file...
and here's my problem..i can i do that? how can i write exactly a string over another in a certain point of the file?
Generally it's considered too fiddly to replace text in text files for this kind of requirement, so the usual way to do it is just to read in the whole file, make the replacement and write a new version of the whole file. If you have large amounts of data you would use a database or a NoSQL solution instead.
P.S. consider using serialization, it can make things easier.
I have object Score which contains String Name and Integer score.
Use a Properties file for this instead. It provides an easy interface to load & save the file, get the keys (Name) and set or retrieve values (Score). String values are stored, but they can be converted to/from integer easily.
I concur that this is best done by fully re-serializing the entire database. On modern computers, you can push 30MB/s per disk for linear writes (more if there's sufficient cache). And if you're dealing with more than 30MB of data, you REALLY need a DB (HSQLDB, DerbyDB, BerkleyDB) are trivial DBs. Or go all the way to postgres/mysql.
However, the way to overwrite a FIXED sized section of an existing file (or rather, the way to emulate doing so), is to use:
RandomAccessFile raf = new RandomAccessFile(fileName, "rw");
try {
raf.seek(position);
raf.writeInt(newScore);
raf.close();
} finally { raf.close(); }
Note using the writeInt instead of raf.write(Integer.toHexString(newScore).getBytes()), because you really really need that to be fixed in size.
Now if the text file is intrinsicly ascii (e.g. humans will read the file), and thus the value can't be binary.. Perhaps you could keep it HexString (because that will be fixed in size), or you could a zero-padded decimal string:
But what you absoluately positively can not do is grow the string by 1 byte.
So:
bob 15
joe 7
nina 981
Can't have joe's score replaced with 10, UNLESS you've padded a bunch of spaces.
If this is your data-file, then you will absolutely have to rewrite the whole file (even if you write the extra code to only rewrite from the point of change on - statistically that'll be 50% of the file and thus not worth bothering.
One other thing - if you do rewrite, you have the risk of shortening the file.. For that you need to call
raf.setLength(0);
before writing the first byte.. Otherwise, you'll see phantom text beyond the end of your new file.
final Map<String,Long> symbol2Position = new ConcurrentHashMap<>();
final Map<String,Integer> symbol2Score = new ConcurrentHashMap<>();
final String fileName;
final RandomAccessFile raf;
// ... skipped code
void storeFull() {
raf.position(0);
raf.setLength(0);
for (Map.Entry<String,Long> e : symbol2Position.entrySet()) {
raf.writeUTF8(e.getKey());
raf.write(',');
symbol2Score.put(e.getKey(), raf.position());
raf.writeUTF8(String.format("%06d",e.getValue()));
raf.write('\n');
}
}
void updateScore(String key, int newScore) {
if (symbol2Score.containsKey(key)) {
symbol2Score.put(key, newScore);
raf.position(symbol2Position.get(key));
raf.writeString(String.format("%06d", newScore));
} else {
symbol2Score.put(key, newScore);
long fileLen = raf.length();
symbol2Position.put(key, fileLen);
raf.position(fileLen);
raf.writeString(String.format("%06d", newScore));
}
}
I'd probably rather use a DB or binary map file.. meaning file with 8B per field, 4B pointing to user-name position, and 4B representing score. But this allows for a human readable data-file, while making updates faster than just rewriting the property file.
Check out LevelDB - fastest damn DB on the planet for embedded systems. :) Main thing it has over the above, is thousands / millions of updates per second without the rand-seek-rewrite cost of updating 6 bytes randomly across a multi-GB file.
Just a thought, any specific reason for the file storage of names and scores ?
Seems like a Map<String, Integer> would serve you much better...
I am working with a large set of data stored in HBase. Many of the values stored in my columns are actually "vectors" of data -- multiple values. The way I've set out to handle storing multiple values is through a ByteBuffer. Since I know the type of data stored in every column in my column families, I have written a series of classes extending a base class that wraps around ByteBuffer and gives me an easy set of methods for reading individual values as well as appending additional values to the end. I have tested this class independently of my HBase project and it works as expected.
In order to update my database (nearly every row is updated in each update), I use a TableMapper mapreduce job to iterate over every row in my database. Each of my mappers (in my cluster, there are six), loads the entire update file (rarely more than 50MB) into memory and then updates each row id as it iterates over it.
The problem I am encountering is that every time I pull a data value out of the Result object, it has 4 bytes appended to the end of it. This makes things difficult for my update because I am not sure whether to expect this "padding" to be an extra 4 bytes every time or whether it could balloon out to something larger/smaller. Since I am loading this into my ByteBuffer wrapper, it is important that there is no padding because that would cause there to be gaps in my data when I appended additional data points to it which would make it impossible to read them out later without error.
I've written up a test to confirm my hypothesis by creating a test table and class. The table only has one data point per column (a single double -- I have confirmed that the length of the bytes going in is 8) and I have written the following code to retrieve and examine it.
HTable table = new HTable("test");
byte[] rowId = Bytes.toBytes("myid");
Get get = new Get(rowId);
byte[] columnFamily = Bytes.toBytes("data");
byte[] column = Bytes.toBytes("column");
get.addColumn(columnFamily, column);
Result = table.get(get);
byte[] value = result.value();
System.out.printlin("Value size: " + value.length);
double doubleVal = Bytes.toDouble(value);
System.out.println("Fetch yielded: " + doubleVal);
byte[] test = new byte[8];
for (int i = 0; i < value.length - 4; i++)
blah[i] = value[i];
double dval = Bytes.toDouble(test);
System.out.println("dval: " + dval);
table.close()
Which results in:
Value size: 12
Fetch yielded: 0.3652
dval: 0.3652
These values are to be expected.
Any thoughts on how to tackle this problem? I'm aware of the existence of serialization engines like Avro but I'm trying to avoid using them for the time being and my data is so straightforward that I feel as though I shouldn't have to.
EDIT: I've continued onward, truncating my data by the greatest common multiple of my data type size. In my experience, these extra bytes are exclusively appended to the end of my byte[] array. I've made a few classes that handle this automatically in a rather clean manner, but I'm still curious as to why this might be happening.
I had a similar problem when importing data using MapReduce into HBase. There were junk bytes appended to my rowkeys, due to this code:
public class MyReducer extends TableReducer<Text, CustomWritable, Text> {
protected void reduce(Text key, Iterable<CustomWritable> values, Context context) throws IOException, InterruptedException {
// only get first value for the example
CustomWritable value = values.iterator().next();
Put put = new Put(key.getBytes());
put.add(columnFamily, columnName, value.getBytes());
context.write(outputKey, put);
}
}
The problem is that Text.getBytes() returns the actual byte array from the backend (see Text) and the Text object is reused by the MapReduce framework. So the byte array will have junk chars from previous values it held. This change fixed it for me:
Put put = new Put(Arrays.copyOf(key.getBytes(), key.getLength()));
If you're using Text as your value type in your job somewhere, it could be doing the same thing.
Is it a jdk7 vs. jdk6 issue? Are you in two different jvm versions?
could be related to something a playorm user ran into
https://github.com/deanhiller/playorm/commit/5e6ede13477a60c2047daaf1f7a7ce55550b0289
Dean