caching in java - java

guys
I am implementing a simple example of 2 level cache in java:
1st level is memeory
2nd - filesystem
I am new in java and I do this just for understanding caching in java.
And sorry for my English, this language is not native for me :)
I have completed 1st level by using LinkedHashMap class and removeEldestEntry method and it is looks like this:
import java.util.*;
public class level1 {
private static final int max_cache = 50;
private Map cache = new LinkedHashMap(max_cache, .75F, true) {
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > max_cache;
}
};
public level1() {
for (int i = 1; i < 52; i++) {
String string = String.valueOf(i);
cache.put(string, string);
System.out.println("\rCache size = " + cache.size() +
"\tRecent value = " + i +
" \tLast value = " +
cache.get(string) + "\tValues in cache=" +
cache.values());
}
}
Now, I am going to code my 2nd level. What code, methods I should write to implement this tasks:
1) When the 1st level cache is full, the value shouldn't be removed by removeEldestEntry but it should be moved to 2nd level (to file)
2) When the new values are added to 1st level, firstly this value should be checked in file (2nd level) and if it exists it should be moved from 2nd to 1st level.
And I tried to use LRUMap to upgrade my 1st level but the compiler couldn't find class LRUMap in library, what's the problem? Maybe special syntax needed?

You can either use the built in java serialization mechanism and just send your stuff to file by wrapping FileOutputStrem with DataOutputStream and then calling writeObjet().
This method is simple but not flexible enough. for example you will fail to read old cache from file if your classes changed.
You can use serialization to xml, e.g. JaxB or XStream. I used XStream in past and it worked just fine. You can easily store any collection in file and the restore it.
Obviously you can store stuff in DB but it is more complicated.

A remark is that you are not getting thread safety under consideration for your cache! By default LinkedHashMap is not thread-safe and you would need to synchronize your access to it. Even better you could use ConcurrentHashMap which deals with synchronization internally being able to handle by default 16 separate threads (you can increase this number via one of its constructors).
I don't know your exact requirements or how complicated you want this to be but have you looked at existing cache implementations like the ehcache library?

Related

JAVA - Most efficient way to remove file extension

I want remove the extension of a file. For example:
ActualFile = image.png
ExpectedFile = image
Which method is the most efficient to use?
removeExtension() method provided by org.apache.commons.io
fileName.substring(0, fileName.lastIndexOf('.'))
No difference, except one. Do you really want to add whole Apache lib to use only one method? If you use Apache in your application - then use it as well. If not - do create your custom implementation - this is not a rocket-science.
Looking at removeExtension from commons-io you can see that substring is also used underwater in that method in a similar way you describe:
public static String removeExtension(final String fileName) {
if (fileName == null) {
return null;
}
failIfNullBytePresent(fileName);
final int index = indexOfExtension(fileName);
if (index == NOT_FOUND) {
return fileName;
}
return fileName.substring(0, index);
}
Your method is faster since there are less operations being done, but the removeExtension method has the failIfNullBytePresent which states:
Check the input for null bytes, a sign of unsanitized data being
passed to to file level functions.
This may be used for poison byte attacks.
and indexOfExtension to get the index of the extension which has more checks (as you can see in the javadoc of that method here).
Conclusion
Your method is faster but I'd say using the commons-io is safer/more consistent in various situations, but what to use depends on how complex your situation is whether it's a critical feature of an application or just a home made project for yourself. removeExtension is not that complex or slow that you shouldn't use it perse.

Is there a way to clear the to visit queue in crawler4j during crawling

I am trying to figure out a way to change seed at crawling runtime and delete completely the "to visit" database/queue.
In particular, I would like to remove all the current urls in the queue and add a new seed. Something along the lines of:
public class MyCrawler extends WebCrawler {
private int discarded = 0;
#Override
public boolean shouldVisit(Page referringPage, WebURL url) {
boolean isDiscarded = checkPage(referringPage,url);
if(isDiscarded){
this.discarded++;
if(discarded >= 100){
//Clear all the urls that need to be visited
?_____?
//Add the new seed
this.myController.addSeed("http://new_seed.com");
discarded = 0;
}
}
return isDiscarded;
}
....
I know I can call controller.shutdown() and start everything again but it's kind of slow.
There is no build-in functionality for achieving this without modifying the original source-code (via forking it or using Reflection API).
Every WebCrawler obtains new URLs via a Frontier instance, which stores the current (discovered and not yet fetched) URLs for all web-crawlers. Sadly, this variable has private access in WebCrawler.
If you want to remove all current URLs, you need to reset the Frontier object. Without implementing a custom Frontier (see the source code), which offers this functionality, resetting will not be possible.

Obfuscate Strings in Java

I'm working on a project and I need specific URL calls to be hidden, I do not want this URL to be seen here is the example method of what the URL call would look like
public void example(View view) {
goToUrl("example.com");
}
You really can't. You can obfuscate method names because in the end the original method name never needs to be known. You can just work with the obfuscation. Here you do eventually need to know the real URL. So there would need to be an unobfuscate function. Which means you could trivially get the result from there. Or you know, just track what url outgoing HTTP requests use via a proxy.
Taking a look back at this question after almost 2 years, This question has gotten quite a lot of attention, I have found some obfuscators that I ended up using for String obfuscation but every Obfuscation can be broken. This is my List of obfuscators that encrypts Strings I will start of by listing paid obfuscators.
1. Zelix Klass Master
Their official website is https://zelix.com
This is one of the best java obfuscators for either jar or android in my opinion.
How ever It's not cheap as expected because of how good the obfuscator is.
A single license can cost you $239 If you are a small developer or $479 if you are a team of developer (Comapany).
You can see the list of features here
2. DexGuard
Their official website is https://www.guardsquare.com/en
DexGuard is an Obfuscator made by the people who are behind Proguard
This is the second best obfuscator in my opinion. The name obfuscation is way better then the name obfuscation on Zelix.
I am not sure about their pricing since I have never used it but I have seen it being used on applications. How ever you can request a pricing here
Free Obfuscators.
You can find free alternative's such as StringCare and Paranoid
They aren't as good as the one's I listed above, It would take at most 5 seconds for someone with basic knowledge of java to crack your program with these two tools.
One possibility is to use the array of integers with some shift operation for the strings which needs to be obfuscated. But as others already mentioned, this is just a mechanism to hide the plain string. With some effort, it can be decoded easily.
Gettign int array code
public static String getIntArrayCode(String string) {
int[] result = new int[string.length()];
for (int i = 0; i < string.length(); i++) {
int numericValue = Character.codePointAt(string, i);
result[i] = numericValue << 3;
}
StringBuffer arrayCode = new StringBuffer();
arrayCode.append("new int[]{");
for (int i = 0; i < result.length; i++) {
arrayCode.append(result[i]);
if (i < result.length - 1) {
arrayCode.append(",");
}
}
arrayCode.append("}");
return arrayCode.toString();
}
This int array needs to be copied to the code.
To unobfuscate, use the method
public static String getString(int[] data) {
StringBuffer test = new StringBuffer();
for (int i = 0; i < data.length; i++) {
int t = data[i] >> 3;
test.append((char) t);
}
return test.toString();
}
Usage :
//"Hello12345&%$";
int []data1 = new int[]{1152,1616,1728,1728,1776,784,800,816,832,848,608,592,576};
System.out.println(getString(data1));
Obfuscating the application (eg : using Proguard ) would help to hide the decode function to some extend.
For obfuscating Strings you can now use a new gradle plugin + library, Please check it here
https://github.com/MichaelRocks/paranoid
Also now there is a new plugin which can obfuscate resources also, please check it below
https://github.com/shwenzhang/AndResGuard
share so more developers can use it and thus more and more developers will contribute to the further development of these plugins, and thus we can collectively improve these plugins.

Google App Engine Objectify - load single objects or list of keys?

I am trying to get a grasp on Google App Engine programming and wonder what the difference between these two methods is - if there even is a practical difference.
Method A)
public Collection<Conference> getConferencesToAttend(Profile profile)
{
List<String> keyStringsToAttend = profile.getConferenceKeysToAttend();
List<Conference> conferences = new ArrayList<Conference>();
for(String conferenceString : keyStringsToAttend)
{
conferences.add(ofy().load().key(Key.create(Conference.class,conferenceString)).now());
}
return conferences;
}
Method B)
public Collection<Conference> getConferencesToAttend(Profile profile)
List<String> keyStringsToAttend = profile.getConferenceKeysToAttend();
List<Key<Conference>> keysToAttend = new ArrayList<>();
for (String keyString : keyStringsToAttend) {
keysToAttend.add(Key.<Conference>create(keyString));
}
return ofy().load().keys(keysToAttend).values();
}
the "conferenceKeysToAttend" list is guaranteed to only have unique Conferences - does it even matter then which of the two alternatives I choose? And if so, why?
Method A loads entities one by one while method B does a bulk load, which is cheaper, since you're making just 1 network roundtrip to Google's datacenter. You can observe this by measuring time taken by both methods while loading a bunch of keys multiple times.
While doing a bulk load, you need to be cautious about loaded entities, if datastore operation throws exception. Operation might succeed even when some of the entities are not loaded.
The answer depends on the size of the list. If we are talking about hundreds or more, you should not make a single batch. I couldn't find documentation what is the limit, but there is a limit. If it not that much, definitely go with loading one by one. But, you should make the calls asynchronous by not using the now function:
List<<Key<Conference>> conferences = new ArrayList<Key<Conference>>();
conferences.add(ofy().load().key(Key.create(Conference.class,conferenceString));
And when you need the actual data:
for (Key<Conference> keyConference : conferences ) {
Conference c = keyConference.get();
......
}

BooleanQuery$TooManyClauses exception when using wildcard queries

I'm using Hibernate Search / Lucene to maintain a really simple index to find objects by name - no fancy stuff.
My model classes all extend a class NamedModel which looks basically as follows:
#MappedSuperclass
public abstract class NamedModel {
#Column(unique = true)
#Field(store = Store.YES, index = Index.UN_TOKENIZED)
protected String name;
}
My problem is that I get a BooleanQuery$TooManyClauses exception when querying the index for objects with names starting with a specific letter, e.g. "name:l*".
A query like "name:lin*" will work without problems, in fact any query using more than one letter before the wildcard will work.
While searching the net for similar problems, I only found people using pretty complex queries and that always seemed to cause the exception. I don't want to increase maxClauseCount because I don't think it's a good practice to change limits just because you reach them.
What's the problem here?
Lucene tries to rewrite your query from simple name:l* to a query with all terms starting with l in them (something like name:lou OR name:la OR name: ...) - I believe as this is meant to be faster.
As a workaround, you may use a ConstantScorePrefixQuery instead of a PrefixQuery:
// instead of new PrefixQuery(prefix)
new ConstantScoreQuery(new PrefixFilter(prefix));
However, this changes scoring of documents (hence sorting if you rely on score for sorting). As we faced the challenge of needing score (and boost), we decided to go for a solution where we use PrefixQuery if possible and fallback to ConstantScorePrefixQuery where needed:
new PrefixQuery(prefix) {
public Query rewrite(final IndexReader reader) throws IOException {
try {
return super.rewrite(reader);
} catch (final TooManyClauses e) {
log.debug("falling back to ConstantScoreQuery for prefix " + prefix + " (" + e + ")");
final Query q = new ConstantScoreQuery(new PrefixFilter(prefix));
q.setBoost(getBoost());
return q;
}
}
};
(As an enhancement, one could use some kind of LRUMap to cache terms that failed before to avoid going through a costly rewrite again)
I can't help you with integrating this into Hibernate Search though. You might ask after you've switched to Compass ;)

Categories

Resources