Fastest way to parse single value from JSON string - java

I have a string that I get from a websocket in the below JSON format. I want a very low-latency way to parse the value c associated with the key C. The key names are the same for every packet I get, but the values may differ. So, the key C will stay the same but the value can change from c to something perhaps longer. In my real life application, the number of entries inside X and Y can be much longer.
I've tried a few different approaches, including parsing a single field using Jackson [1], to parsing the full string to JsonNode, but they're all too slow (over 50 microseconds). I thought of finding the position in the String of "C" using [2], and then taking the next few characters after that, but the issue is that the value, c, is variable length so that makes it tricky.
String s = {"A":"a","B":"b","data":[{"C":"c","X":[["3.79","28.07","1"],["3.791","130.05","3"],["3.792","370.8958","5"]],"Y":[["3.789","200","1"],["3.788","1238.1709","4"],["3.787","513.4051","3"]]}'
I'd like something like this:
String valueOfC = getValueOfC(s) // return in only a few microseconds
[1] How to read single JSON field with Jackson
[2] Java: method to get position of a match in a String?

s.substring(s.indexOf("\"data\":[{\"C") + 4, s.indexOf("\"",s.indexOf("\"data\":[{\"C") + 4)));
This is sub-microsecond.

Related

Implementing a very efficient bit structure

I'm looking for a solution in pesudo code or java or js for the following problem:
We need to implement an efficient bit structure to hold data for N bits (you could think of the bits as booleans as well, on/off).
We need to support the following methods:
init(n)
get(index)
set(index, True/False)
setAll(True/false)
Now I got to a solution with o(1) in all except for init that is o(n). The idea was to create an array where each index saves value for a bit. In order to support the setAll I would also save a timestamp withe the bit vapue to know if to take the value from tge array or from tge last setAll value. The o(n) in init is because we need to go through the array to nullify it, otherwise it will have garbage which can be ANYTHING. Now I was asked to find a solution where the init is also o(1) (we can create an array, but we cant clear the garbage, the garbage might even look like valid data which is wrong and make the solution bad, we need a solution that works 100%).
Update:
This is an algorithmic qiestion and not a language specific one. I encountered it in an interview question. Also using an integer to represent the bit array is not good enough because of memory limits. I was tipped that it has something to do with some kind of smart handling of garbage data in the array without ckeaning it in the init, using some kind of mechanism to not fall because if the garbage data in the array (but I'm not sure how).
Make lazy data structure based on hashmap (while hashmap sometimes might have worse access time than o(1)) with 32-bit values (8,16,64 ints are suitable too) for storage and auxiliary field InitFlag
To clear all, make empty map with InitFlag = 0 (deleting old map is GC's work in Java, isn't it?)
To set all, make empty map with InitFlag = 1
When changing some bit, check whether corresponding int key bitnum/32 exists. If yes, just change bitnum&32 bit, if not and bit value differs from InitFlag - create key with value based on InitFlag (all zeros or all ones) and change needed bit.
When retrieving some bit, check whether corresponding key exists. If yes, extract bit, if not - get InitFlag value
SetAll(0): ifl = 0, map - {}
SetBit(35): ifl = 0, map - {1 : 0x10}
SetBit(32): ifl = 0, map - {1 : 0x12}
ClearBit(32): ifl = 0, map - {1 : 0x10}
ClearBit(1): do nothing, ifl = 0, map - {1 : 0x10}
GetBit(1): key=0 doesn't exist, return ifl=0
GetBit(35): key=1 exists, return map[1]>>3 =1
SetAll(1): ifl = 1, map = {}
SetBit(35): do nothing
ClearBit(35): ifl = 1, map - {1 : 0xFFFFFFF7 = 0b...11110111}
and so on
If this is a college/high-school computer science test or homework assignment question - I suspect they are trying to get you to use BOOLEAN BIT-WISE LOGIC - specifically, saving the bit inside of an int or a long. I suspect (but I'm not a mind-reader - and I could be wrong!) that using "Arrays" is exactly what your teacher would want you to avoid.
For instance - this quote is copied from Google's Search Reults:
long: The long data type is a 64-bit two's complement integer. The
signed long has a minimum value of -263 and a maximum value of 263-1.
In Java SE 8 and later, you can use the long data type to represent an
unsigned 64-bit long, which has a minimum value of 0 and a maximum
value of 264-1
What that means is that a single long variable in Java could store 64 of your bit-wise values:
long storage;
// To get the first bit-value, use logical-or ('|') and get the bit.
boolean result1 = (boolean) storage | 0b00000001; // Gets the first bit in 'storage'
boolean result2 = (boolean) storage | 0b00000010; // Gets the second
boolean result3 = (boolean) storage | 0b00000100; // Gets the third
...
boolean result8 = (boolean) storage | 0b10000000; // Gets the eighth result.
I could write the entire thing for you, but I'm not 100% sure of your actual specifications - if you use a long, you can only store 64 separate binary values. If you want an arbitrary number of values, you would have to use as many 'long' as you need.
Here is a SO posts about binary / boolean values:
Binary representation in Java
Here is a SO post about bit-shifting:
Java - Circular shift using bitwise operations
Again, it would be a job, and I'm not going to write the entire project. However, the get(int index) and set(int index, boolean val) methods would involve bit-wise shifting of the number 1.
int pos = 1;
pos = pos << 5; // This would function as a 'pointer' to the fifth element of the binary number list.
storage | pos; // This retrieves the value stored as position 5.

Java & MySQL: Store an Read a 365 position of bitarray. HOW?

I am currently working with Java and MySQL, and I found an issue I don't know how to solve.
I have a class that stores a String of 365 positions that represents a Binary String "010111010010100...", and I would like to be able to store and read that field from the database.
Once it is read, I will perform an AND Logic operation with another bitarray.
I read about the BitSet class, that allows the logical operators (AND, OR, XOR, ...) between them. I tried it, but I didn't like the solutions I got. I could also try to transform the String to a byte array, and then store and read it from the database, in order to later perform the logic AND operation, but not sure if I would need to always create a BitSet, and how performant could it be.
I don't know which is the most performant way to do what I want:
Convert the Binary String in another element.
Store that element in the database (in the case of BitSet I tried to define the Database field as BLOB, but I had a lot of issues transforming the BitSet to BLOB and reading the BLOB to a BitSet).
Read the element from the database (at this point would be great to directly work with the element without making any cast or transformation).
Perform a logic AND with another bitarray and get the result.
I have tried a lot of options, but they didn't work.
Could someone help me with this problem and how to better approach it from the performance point of view?
Thanks!
Storing bit in a string is bit weird, I used long to store a number, and make bitwise operations on that. It won't work for you, since you use much more bits. If it can remain string, maybe you can write a short function to make the AND operator on each byte of the string, somehow like this:
for (int i = 0; i<366; i++) {
data .= (stringname[i] == binarystring[i]?"1":"0");
}
Go through your string, while checking if it equals binary string (The one you want to AND it), if they equal, concat 1, if not, concat 0;

Parsing a string into different variable types

Relatively new to programming here so I apologize if this is rather basic.
I am trying to convert string lines into actual variables of different types.
My input is a file in the following format:
double d1, d2 = 3.14, d3;
int a, b = 17, c, g;
global int gInt = 1;
final int fInt = 2;
String s1, s2 = "Still with me?", s3;
These lines are all strings at this point. I wish to extract the variables from the strings and receive the actual variables so I can use and manipulate them.
So far I've tried using regex but I'm stumbling here. Would love some direction as to how this is possible.
I thought of making a general type format for example:
public class IntType{
boolean finalFlag;
boolean globalFlag;
String variableName;
IntType(String variableName, boolean finalFlag, boolean globalFlag){
this.finalflag = finalFlag;
this.globalFlag = globalFlag;
this.variableName = variableName;
}
}
Creating a new wrapper for each of the variable types.
By using and manipulating I would like to then compare between the wrappers I've created and check for duplicate declarations etc'.
But I don't know if I'm on the right path.
Note: Disregard bad format (i.e. no ";" at the end and so on)
While others said that this is not possible, it actually is. However it goes somewhat deep into Java. Just search for java dynamic classloading. For example here:
Method to dynamically load java class files
It allows you do dynamically load a java file at runtime. However your current input does not look like a java file but it can easily be converted to one by wrapping it with a small wrapper class like:
public class CodeWrapper() {
// Insert code from file here
}
You can do this with easy file or text manipulations before loading the ressource as class.
After you have loaded the class you can access its variables via reflection, for example by
Field[] fields = myClassObject.getClass().getFields();
This allows you to access the visibility modifier, the type of the variable, the name, the content and more.
Of course this approach presumes that your code actually is valid java code.
If it is not and you are trying to confirm if it is, you can try to load it. If it fails, it was non-valid.
I have no experience with Java, but as far as my knowledge serves me, it is not possible to actually create variables using a file in any language. You'll want to create some sort of list object which can hold a variable amount of items of a certain type. Then you can read the values from a file, parse them to the type you want it to be, and then save it to the list of the corresponding type.
EDIT:
If I were you, I would change my file layout if possible. It would then look something like this:
1 2 3 4 //1 int, 2 floats, 3 booleans and 4 strings
53
3.14
2.8272
true
false
false
#etc.
In pseudo code, you would then read it as follows:
string[] input = file.Readline().split(' '); // Read the first line and split on the space character
int[] integers = new int[int.Parse(input[0])] // initialise an array with specefied elements
// Make an array for floats and booleans and strings the same way
while(not file.eof) // While you have not reached the end of the file
{
integers.insert(int.Parse(file.ReadLine())) // parse your values according to the size which was given on the first line of the file
}
If you can not change the file layout, then you'll have to do some smart string splitting to extract the values from the file and then create some sort of dynamic array which resizes as you add more values to it.
MORE EDITS:
Based on your comment:
You'll want to split on the '=' character first. From the first half of the split, you'll want to search for a type and from the second half, you can split again on the ',' to find all the values.

How to get a unique alphanumeric based on a unique integer

My webapplication has a table in the database with an id column which will always be unique for each row. In addition to this I want to have another column called code that will have a 6 digit unique Alphanumeric code with numbers 0-9 and alphabets A-Z. Alphabets and number can be duplicate in a code. i.e. FFQ77J. I understand the uniqueness of this 6 digit alphanumeric code reduces over time as more rows are added but for now I am ok with this.
Requirement (update)
- The code should be at least of length 6
- Each code should be Alphanumeric
So I want to generate this Alphanumeric code.
Question
What is a good way to do this?
Should I generate the code and after the generation, run a query to the database and check if it already exists, and if so then generate a new one? To ensure the uniqueness, does this piece of code need to be synchronized so that only one thread runs it?
Is there something built-in to the database that will let me do this?
For the generation I will be using something like this which I saw in this answer
char[] symbols = new char[36];
char[] buf;
for (int idx = 0; idx < 10; ++idx)
symbols[idx] = (char) ('0' + idx);
for (int idx = 10; idx < 36; ++idx)
symbols[idx] = (char) ('A' + idx - 10);
public String nextString()
{
for (int idx = 0; idx < buf.length; ++idx)
buf[idx] = symbols[random.nextInt(symbols.length)];
return new String(buf);
}
Since it's a requirement for the shortcode to not be guessable, you don't want to tie it to your uniqueID row ID. Otherwise that means your rowID needs to be random, in addition to unique. Starting with a counter 0, and incrementing, makes it pretty obvious when your codes are: 000001, 000002, 000003, and so forth.
For your short code, generate a random 32bit int, omit the sign and convert to base36. Make a call to your database, to ensure it's available.
You haven't explicitly called out scalability, but I think it's important to understand the limitations of your design wrt to scale.
At 2^31 possible 6 char base36 values, you will have collisions at ~65k rows (see Birthday Paradox questions)
From your comment, modify your code:
public String nextString()
{
return Integer.toString(random.nextInt(),36);
}
I would simply do this:
String s = Integer.toString(i, 36).toUpperCase();
Choosing base-36 will use characters 0-9a-z for the digits. To get a string that uses uppercase letters (as per your question) you would need to fold the result to upper case.
If you use an auto increment column for your id, set the next value to at least 60,466,176, which when rendered to base 36 is 100000 - always giving you a 6 digit number.
I would start with 0 for an empty table and do a
SELECT MAX(ID) FROM table
to find the largest id so far. Store it in an AtmoicInteger and convert it using toString
AtomicInteger counter = new AtomicInteger(maxSoFar);
String nextId = Integer.toString(counter.incrementAndGet(), 36);
or for padding. 36 ^^ 6 = 2176782336L
String nextId = Long.toString(2176782336L + counter.incrementAndGet(), 36).substring(1);
This will give you uniqueness and no duplicates to worry about. (it's not random either)
Simply, you can use Integer.toString(int i, int radix). Since you have base 36(26 letters+10 digits) you set the radix to 36 and i to your integer. For example, to use 16501, do:
String identifier=Integer.toString(16501, 36);
You can uppercase it with .toUpperCase()
Now onto your other questions, yes, you should query the database first to ensure it doesn't exist. If depending on the database, it may need to be synchronized, or it may not be as it'll use its own locking system. In any case, you'd need to tell us which database.
On the question of whether there's a builtin, we'd need to know the DB type as well.
To create a random but unique value within a small range here are some ideas I know of:
Create a new random value and try to insert it.
Let a database constraint catch violations. This column should also likely be indexed. The DML may need to be tried several times until a unique ID is found. This will lead to more collisions as time progresses, as noted (see the birthday problem).
Create a "free IDs" table ahead of time and on usage mark the ID as being used (or delete it from the "free IDs" table). This is similar to #1 but shifts when the work is done.
This allows the work of finding "free IDs" to be done at another time, perhaps during a cron job, so that there will not be a contraint violation during the insert keeping the insert itself the "same speed" throughout the usage of said domain. Make sure to use transactions.
Create a 1-to-1/injective "mixer" function such that the output "appears random". The point is this function must be 1-to-1 to inherently avoid duplicates.
This output number would then be "base 36 encoded" (which is also injective); but it would be guaranteed unique as long as the input (say, an auto-increment PK) was unique. This would likely be less random than the other approaches, but should still create a nice-looking non-linear output.
A custom injective function can be created around an 8-bit lookup table fairly trivially - just process a byte at a time and shuffle the map appropriately. I really like this idea, but it can still lead to somewhat predictable output
To find free IDs, approaches #1 and #2 above can use "probing with IN" to minimize the number of SQL statements used. That is, generate a bunch of random values and query for them using IN (keeping in mind what sizes of IN your database likes) and then see which values were free (as having no results).
To create a unique ID not constained to such a small space, a GUID or even hashing (e.g. SHA1) might be useful. However, these only guarantee uniqueness because they have 126/160-bit spaces so that the chance of collision (for different input/time-space) is currently accepted as improbable.
I actually really like the idea of using an injective function. Bearing in mind that it is not good "random" output, consider this pseudo-code:
byte_map = [0..255]
map[0] = shuffle(byte_map, seed[0])
..
map[n] = shuffle(byte_map, seed[1])
output[0] = map[0][input[0]]
..
output[n] = map[n][input[n]]
output_str = base36_encode(output[0] .. output[n])
While a very simple setup, numbers like 0x200012 and 0x200054 will still share common output - e.g. 0x1942fe and 0x1942a9 - although the lines will be changed a bit due to the later application of the base-36 encoding. This could probably be further improved to "make it look more random".
For efficient usage, try caching generated code in a HashSet<String> in your application:
HashSet<String> codes = new HashSet<String>();
This way you don't have to make a db call every time to check whether the generated code is unique or not. All you have to do is:
codes.contains(newCode);
And, yes, you should synchronize your method which updates the cache
public synchronize String getCode ()
{
String newCode = "";
do {
newCode = nextString();
}
while(codes.contains(newCode));
codes.put(newCode);
}
You mentioned in your comments that the relationship between id and code should not be easily guessable. For this you basically need encryption; there are plenty of encryption programs and modules out there that will perform encryption for you, given a secret key that you initially generate. To employ this approach, I would recommend converting your id into ascii (i.e., representing as base-256, and then interpreting each base-256 digit as a character) and then running the encryption, and then converting the encrypted ascii (base-256) into base 36 so you get your alpha-numeric, and then using 6 randomly chosen locations in the base 36 representation to get your code. You can resolve collisions e.g. by just choosing the nearest unused 6-digit alpha-numeric code when a collision occurs, and noting the re-assigned alpha-numeric code for the id in a (code <-> id) table that you will have to maintain anyway since you cannot decrypt directly if you only store 6 base-36 digits of the encrypted id.

Hector does not handle Control Characters correctly in Java Strings - how to get Hexadecimal from Hector instead of text string?

I have a problem with Hector's handling of control-characters in Key and Column names. I am writing a program using Hector to talk with a Cassandra instance, and there are pre-existing Keys and Column names with e.g. hexadecimal "594d69e0b8e611e10000242d50cf1ff7".
I have inputted that hexadecimal into a Java String and plugged it through some simple conversion-to-text code:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s1.length() - 1; i+=2 ){
/*Grab the hex in pairs*/
String output = s1.substring(i, (i + 2));
/*Convert Hex to Decimal*/
int decimal = Integer.parseInt(output, 16);
sb.append((char)decimal);
}
return sb.toString();
(Converting the returned Java String back to hexadecimal by calling hexString.append(Integer.toHexString(textString.charAt(i))); for every character, returns the original hexadecimal, so Java should be capable of handling this data.) Printing said Java String yields the top line in the below image:
[Image not posted because new users aren't allowed to post images.]
Image here: http://i.stack.imgur.com/yUJxs.png
Unfortunately, the bottom line (corrupted) is what Hector is returning to me when I call the following code (lots of checks and setup omitted, for simplicity of the question):
OrderedRows<String, String, String> orderedRows;
orderedRows = rangeSlicesQuery.execute().get();
Row<String,String,String> lastRow = orderedRows.peekLast();
for (Row<String, String, String> r : orderedRows) {
String key = r.getKey();
System.out.println(key);
...
So, Hector is not handling control characters properly when returning the Java String. How can I get Hector to return to me the Keys and Columns in Hexadecimal instead of a (corrupted) text-based Java String? I tried to look it up but the documentation on how to do so is essentially is missing (http://hector-client.github.com/hector//source/content/API/core/1.0-1/me/prettyprint/hector/api/beans/OrderedRows.html - what are K, V, and N?). I imagine it should be simple, as the Cassandra CLI assumes hexadecimal if you do not wrap the input with ascii(''), but I cannot figure out how to do it.
In Cassandra, everything is stored as hex bytes. The Cassandra thrift API also accepts binary. In real life however, people like to deal with human types like String, integer etc. Hector makes it easy for you to use the thrift API by abstracting out the serializing/deserializing logic.
K, N and V are types of the row key, column name and column value respectively. When you use String, String, String, you are telling hector that all the three types for your column family are Strings.
If you are storing the row key and column names as Bytes, you should use byte[] instead for retrievals and BytesArraySerializer for serializing.

Categories

Resources