Unique representation of String for in-memory cache in Java

Unique representation of String for in-memory cache in Java - java

I am trying to build in-memory cache (using a ConcurrentHashMap in Java 8).The key value pair would be a json string and the result of a complex operation on that string.
The objective is to not do the complex operation everytime and do it only when the json string changes.
Is there a way I can uniquely represent this string as the value of any of the json keys can change within the application at any time.
I have looked up the hashCode() method but saw the shortcomings of it.
Right now am trying to see if the MD5 representation of the string would serve as a good key for the JSON string.
If anyone has already faced such a situation, can you please provide your inputs?

As I understand it, a java String instance is final (immutable), so that even if the JSON object is a very long string, the String class only calculates the hashCode of the String once (at construction time or first use i can't remember) - and keeps it as an instance attribute for the lifetime of the String. So there is no problem (in terms of performance penalty) using the JSON object both as the key and value in a concurrent HashMap. This is exactly what the same as how a java "Set" works, being backed by a Map.

This should be a easy task for Cacheonix, and you will save time on building your own caching solution:
Cache<String, ResultOfCalculation> cache = Cacheonix.getInstance().getCache("my.cache");
cache.put(myJsonString, myResultofCalcualtion);
...
ResultOfCalculation result = cache.get(myJsonString);

Related

Mongodb document format consistency

I am relatively new to Mongodb and I have an idea in mind but not sure how to go about it.
What I would like to do is essentially hash a mongodb document (preferably it's Json format so it is not database specific) and store that hash somewhere with a reference to that specific document. This needs to allow me to retrieve the document in the future via a query and compare against the stored hash.
My idea is to get the json representation of the DBObject, hash it and then add the hash as a field to that specific document before persisting it. Then when querying for the object, make sure to exclude the hash field from the answer so the returned DBObject includes the same hash.
1 - Does mongodb always return a consistent DBObject format which will always convert to the same json so that the hash would always be the same
2 - Would such an implementation even be viable? As in storing the hash with the object itself, essentially changing the object (thus making the hash invalid) but getting around by not retrieving that field in the response
3 - If the implementation would not work, what would be the simplest way to store the hash, another object with a reference to the original document?

1- Does mongodb always return a consistent DBObject format which will always convert to the same json so that the hash would always be the same. - No Mongo does not guarantee the order so the json can be different based on what kind of updates were done on the document. There is no guarantee that the field order will be consistent, or the same, after an update. If no such order changing updates were done on it then the order should be preserved MongoDB update on Field Order .
But when you serialize the json into an object using Jackson or something else it will serialize to the same object and should have the same hash.
2 - Would such an implementation even be viable? As in storing the hash with the object itself, essentially changing the object (thus making the hash invalid) but getting around by not retrieving that field in the response.
Looks like from this answer you can use Jakson or Gson to hash the json object, even though it is not ordered.
excluding a field should not be a problem.
If you store the hash as a field in the object itself all the write queries that save ( which is an overwrite of the entire document ) will have to write the hash into it. If any of them fail to do so the hash will be lost.
An update query will have another problem since along with changing the data it also has to update the hash of the document. So this will have to involve reading the object, modifying it, computing the hash and storing it back. You will not be able to use the primitive update queries.
If you make the hash as the primary key which is _id field that would mitigate this problem although you probably need it for something else.
3- The simplest way would be to store the _id of the document to be hashed into another collection along with the hash as the _id of the new collection.
{
"_id":<hash code of docuemnt>,
"refer":<_id of the document to be hashed>
}
This would involve multiple read writes which will hurt performance and depending on your use case it
Mongo according to me is a simplistic database designed to store and retrieve objects. If you have the need to do something complicated with it other than retrieving fast and writing its probably not fit for the task.

Storing various object types in single column of database

I am working in Java. I have an class called Command. This object class stores a variable List of parameters that are primitives (mostly int and double). The type, number, and order of parameters is specific to each command, so the List is type Object. I won't ever query the table based on what these parameter values are so I figured I would concatenate them into a single String or serialize them in some way. I think this may be a better approach that normalizing the table because I will have to join every time and that table will grow huge pretty quickly. (Edit: The Command object also stores some other members that won't be serialized such as a String to identify the type of command, and a Timestamp for when it was issued.)
So I have 2 questions:
Should I turn them into a delimited String? If so, how do I get each object as a String without knowing which type to cast them to? I attempted to loop through and use the .toString method, but that is not working. It seems to be returning null.
Or is there some way to just serialize that data of the array into a column of the DB? I read about serialization and it seems to be for the context of serializing whole classes.

I would use JSON serializer and deserializer like Jackson to store and retrieve those command objects in DB without losing the specific type information. On a side note, I would have these commands implement a common interface and store them in a list of commands and not in a list of objects.

Key-Value on top of Appengine

Although appengine already is schema-less, there still need to define the entities that needed to be stored into the Datastore through the Datanucleus persistence layer. So I am thinking of a way to get around this; by having a layer that will store Key-value at runtime, instead of compile-time Entities.
The way this is done with Redis is by creating a key like this:
private static final String USER_ID_FORMAT = "user:id:%s";
private static final String USER_NAME_FORMAT = "user:name:%s";
From the docs Redis types are: String, Linked-list, Set, Sorted set. I am not sure if there's more.
As for the GAE datastore is concerned a String "Key" and a "Value" have to be the entity that will be stored.
Like:
public class KeyValue {
private String key;
private Value value; // value can be a String, Linked-list, Set or Sorted set etc.
// Code omitted
}
The justification of this scheme is rooted to the Restful access to the datastore (that is provided by Datanucleus-api-rest)
Using this rest api, to persist a object or entity:
POST http://datanucleus.appspot.com/dn/guestbook.Greeting
{"author":null,
"class":"guestbook.Greeting",
"content":"test insert",
"date":1239213923232}
The problem with this approach is that in order to persist a Entity the actual class needs to be defined at compile time; unlike with the idea of having a key-value store mechanism we can simplify the method call:
POST http://datanucleus.appspot.com/dn/org.myframework.KeyValue
{ "class":"org.myframework.KeyValue"
"key":"user:id:johnsmith;followers",
"value":"the_list",
}
Passing a single string as "value" is fairly easy, I can use JSON array for list, set or sorted list. The real question would be how to actually persist different types of data passed into the interface. Should there be multiple KeyValue entities each representing the basic types it support: KeyValueString? KeyValueList? etc.

Looks like you're using a JSON based REST API, so why not just store Value as a JSON string?

You do not need to use the Datanucleus layer, or any of the other fine ORM layers (like Twig or Objectify). Those are optional, and are all based on the low-level API. If I interpret what you are saying properly, perhaps it already has the functionality that you want. See: https://developers.google.com/appengine/docs/java/datastore/entities

Datanucleus is a specific framework that runs on top of GAE. You can however access the database at a lower, less structured, more key/value-like level - the low-level API. That's the lowest level you can access directly.
BTW, the low-level-"GAE datastore" internally runs on 6 global Google Megastore tables, which in turn are hosted on the Google Big Table database system.
Saving JSON as a String works fine. But you will need ways to retrieve your objects other than by ID. That is, you need a way to index your data to support any kind of useful query on it.

Mapping data structure in Java

I have to devise a function that will take as input a keyword and will output a category id.
Ex:
f('dog') returns _ANIMAL
f('chair') returns _FURNITURE
I already have the mapping and I could just iterate over the tag array each time, but I have a feeling this is not the best solution.
Is there a special data structure (I'm thinking of ternary search trees) in the Java libraries for this specific task? Should I just use HashMap (or maybe Set (since there are few categories))?
P.S. This mapping is fixed, I do not need to add or remove elements from it once it is built.

If I understand you correctly, then HashMap sounds like exactly what you want. You wouldn't want to iterate through an entire array each time, because with many function calls and/or a large array your program would wind up running slowly. With a HashMap, pulling a value (your category) from a key (your keyword) happens more or less immediately, in constant time.
You can build the map like this:
HashMap map = new HashMap();
map.put("dog", "animal");
map.put("chair", "furniture");
map.put("cat", "animal");
And then map.get("dog") returns "animal", map.get("chair") returns "furniture".
As others have indicated, enums would work well (and a tiny bit faster) for this too—with the caveat that they're fixed at compile time and thus cannot be changed during execution.

You can change your enum like the following:
public enum Things{
_ANIMAL("Dog"), _FURNITURE("Animal");
private String description;
Things(String description){
this.description= description;
}
public String toString(){
return description;
}
};
Whenever you want to retrieve the string representation of your enum, just call toString
Example:
Things._ANIMAL.toString() will output "Dog"

Empty list in App Engine Datastore: Java vs Python

I have the following java model class in App Engine:
public class Xyz ... {
#Persistent
private Set<Long> uvw;
}
When saving an object Xyz with an empty set uvw in Java, I get a "null" field (as listed in the appengine datastore viewer).
When I try to load the same object in Python (through remote_api), as defined by the following python model class:
class Xyz(db.Model):
uvw = db.ListProperty(int)
I get a "BadValueError: Property uvw is required".
When saving another object of the same class in Python with an empty uvw list, the Datastore viewer prints a "missing" field.
Apparently empty lists storage handling differs between Java and Python and lead to "incompatible" objects.
Thus my question: Is there a way to, either:
force Java to store an empty list as a "missing" field,
force Python to gracefully accept a "null" list as an empty list when loading the object?
Or any other suggestion on how to handle empty list field in both languages.
Thanks for your answers!

It should work if you assign a default value to your Python property:
uvw = db.ListProperty(int, default=[])

I use the low-level java api, so perhaps what I am doing would be different. But before I save a collection-type data structure to the datastore, I convert it into something that the datastore naturally handles. This would include mainly Strings and ByteArrays.
It sounds like java app engine is interpreting the empty set as a null value. And python is not reading this null value correctly. You might try saving an empty set as the String value "empty set". And then have python check to see if the datastore holds that string value. If it does, it could allocate a new empty set, if not, it could read the property as a set.

The Java Set behavior is because Java's Collections are reference types, which default to being null.
To actually create an empty Set, declare it like this:
#Persistent
private Set<Long> uvw = new HashSet<Long>();
or using some other implementation of Set on the right side. HashSet is the most commonly used Set type, though. Other interesting set types are the two thread-safe Sets CopyOnWriteArraySet and ConcurrentSkipListSet; also the Ordered Set type LinkedHashSet and the Sorted Set type TreeSet.

It may work to you
uvw = db.ListProperty(int, default=[])
Its the most comment way to short it out...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.