Java ConcurrentHashMap corrupt values

Java ConcurrentHashMap corrupt values - java

I have a ConcurrentHashMap that exhibits strange behavior on occasion.
When my app first starts up, I read a directory from the file system and load contents of each file into the ConcurrentHashMap using the filename as the key. Some files may be empty, in which case I set the value to "empty".
Once all files have been loaded, a pool of worker threads will wait for external requests. When a request comes in, I call the getData() function where I check if the ConcurrentHashMap contains the key. If the key exists I get the value and check if the value is "empty". If value.contains("empty"), I return "file not found". Otherwise, the contents of the file is returned. When the key does not exist, I try to load the file from the file system.
private String getData(String name) {
String reply = null;
if (map.containsKey(name)) {
reply = map.get(name);
} else {
reply = getDataFromFileSystem(name);
}
if (reply != null && !reply.contains("empty")) {
return reply;
}
return "file not found";
}
On occasion, the ConcurrentHashMap will return the contents of a non-empty file (i.e. value.contains("empty") == false), however the line:
if (reply != null && !reply.contains("empty"))
returns FALSE. I broke down the IF statement into two parts: if (reply != null) and if (!reply.contains("empty")). The first part of the IF statement returns TRUE. The second part returns FALSE. So I decided to print out the variable "reply" in order to determine if the contents of the string does in fact contain "empty". This was NOT the case i.e. the contents did not contain the string "empty". Furthermore, I added the line
int indexOf = reply.indexOf("empty");
Since the variable reply did not contain the string "empty" when I printed it out, I was expecting indexOf to return -1. But the function returned a value approx the length of the string i.e. if reply.length == 15100, then reply.indexOf("empty") was returning 15099.
I experience this issue on a weekly basis, approx 2-3 times a week. This process is restarted on a daily basis therefore the ConcurrentHashMap is re-generated regularly.
Has anyone seen such behavior when using Java's ConcurrentHashMap?
EDIT
private String getDataFromFileSystem(String name) {
String contents = "empty";
try {
File folder = new File(dir);
File[] fileList = folder.listFiles();
for (int i = 0; i < fileList.length; i++) {
if (fileList[i].isFile() && fileList[i].getName().contains(name)) {
String fileName = fileList[i].getAbsolutePath();
FileReader fr = null;
BufferedReader br = null;
try {
fr = new FileReader(fileName);
br = new BufferedReader(fr);
String sCurrentLine;
while ((sCurrentLine = br.readLine()) != null) {
contents += sCurrentLine.trim();
}
if (contents.equals("")) {
contents = "empty";
}
return contents;
} catch (Exception e) {
e.printStackTrace();
if (contents.equals("")) {
contents = "empty";
}
return contents;
} finally {
if (fr != null) {
try {
fr.close();
} catch (Exception e) {
e.printStackTrace();
}
}
if (br != null) {
try {
br.close();
} catch (Exception e) {
e.printStackTrace();
}
}
if (map.containsKey(name)) {
map.remove(name);
}
map.put(name, contents);
}
}
}
} catch (Exception e) {
e.printStackTrace();
if (contents.equals("")) {
contents = "empty";
}
return contents;
}
return contents;
}

I think your problem is that some of your operations should be atomic and they aren't.
For example, one possible thread interleaving scenario is the following:
Thread 1 reads this line in the getData method:
if (map.containsKey(name)) // (1)
the result is false and Thread 1 goes to
reply = getDataFromFileSystem(name); // (2)
in getDataFromFileSystem, you have the following code:
if (map.containsKey(name)) { // (3)
map.remove(name); // (4)
}
map.put(name, contents); // (5)
imagine that another thread (Thread 2) arrives at (1) while Thread 1 is between (4) and (5): name is not in the map, so thread 2 goes to (2) again
Now that does not explain the specific issue you are observing but it illustrates the fact that when you let many threads run concurrently in a section of code without synchronization, weird things can and do happen.
As it stands, I can't find an explanation for the scenario you describe, unless you call reply = map.get(name) more than once in your tests, in which case it is very possible that the 2 calls don't return the same result.

First off, don't even think that there is a bug in ConcurrentHashMap. JDK faults are very rare and even entertaining the idea will pull you away from properly debugging your code.
I think your bug is as follows. Since you are using contains("empty") what happens if the line from the file has the word "empty" in it? Isn't that going to screw things up?
Instead of using contains("empty") I would use ==. Make the "empty" a private static final String then you can use equality on it.
private final static String EMPTY_STRING_REFERENCE = "empty";
...
if (reply != null && reply != EMPTY_STRING_REFERENCE) {
return reply;
}
...
String contents = EMPTY_STRING_REFERENCE;
...
// really this should be if (contents.isEmpty())
if (contents.equals("")) {
contents = EMPTY_STRING_REFERENCE;
}
This is, btw, the only time you should be using == to compare strings. In this case you want to test it by reference and not by contents since lines from your files could actually contain the magic string.
Here are some other points:
In general, whenever you are using the same String in multiple places in your program, it should be pulled up to a static final field. Java will probably do this for you anyway but it makes the code a lot cleaner as well.
#assylias is spot on about race conditions when you make 2 calls to ConcurrentHashMap. For example, instead of doing:
if (map.containsKey(name)) {
reply = map.get(name);
} else {
You should do the following so you do only one.
reply = map.get(name);
if (reply == null) {
In your code you do this:
if (map.containsKey(name)) {
map.remove(name);
}
map.put(name, contents);
That should be rewritten as the following. There is no need to remove before the put which introduces race conditions as #assylias mentioned.
map.put(name, contents);
You said:
if reply.length == 15100, then reply.indexOf("empty") was returning 15099.
This is not possible with the same reply String. I suspect that you were looking at different threads or in some other way misinterpreting the output. Again, don't be fooled into thinking that there are bugs in java.lang.String.

First, using ConcurrentHashMap does not protect you if you call its methods from multiple threads in sequence. If you call containsKey and get afterwards and another thread calls remove in between you will have a null result. Be sure to call only get and check for null instead of containsKey/get. It's also better regarding performance, because both methods nearly have the same cost.
Second, the weird indexOf call result is either due to a programming error, or points to memory corruption. Is there any native code involved in your application? What are you doing in getDataFromFileSystem? I observed memory corruption when using FileChannel objects from multiple threads.

Related

how do you return a boolean for if scanner finds a specified string within a file?

I'm not very familiar with File & Scanner objects so please bear with me:
I'm attempting to have a scanner look through a file and see if a specific string exists, then return true/false - I thought there would be a method for this but either I'm reading the docs wrong or it doesn't exist.
What I'm able to come up with is the following but I'm sure there's a simpler way.
public boolean findString(File f, String s) throws FileNotFoundException {
Scanner scan = new Scanner(f);
if(scan.findWithinHorizon(s, 0) != null) {
return true;
} else {
return false;
}
}

Well, there are many ways to check whether a certain file contains a certain string, but I can't think of a single method which opens the file, scans for the given pattern and then returns a boolean indicating whether the pattern has been found within the file.
I think that use-case would be to small, as in many cases, you want to do more with the contents than only searching whether it contains a specific pattern. Often you want to process the file contents.
Regarding your code, it is already fairly short. However, I would change two things here. First, as scan.findWithinHorizon(s, 0) != null already is a boolean expression, you could immediately return it, instead of using if-else. Second, you should close the file you opened. You can use the try-with-resources construct for this.
Here is the updated code:
try (Scanner scan = new Scanner(f)) {
return scan.findWithinHorizon(s, 0) != null;
}
Note that this code finds a pattern. If you want to find a literal string, then use scan.findWithinHorizon(Pattern.compile(s, Pattern.LITERAL), 0).
More on finding a pattern in a file: this Stack Overflow post
More on try-with-resources: Oracle Java documentation, Baeldung

I would use a while-loop and simply use indexOf to compare the currentLine to your string.
public boolean findString(File f, String s) throws FileNotFoundException {
Scanner scan = new Scanner(f);
String currentLine;
while((currentLine = scanner.readLine()) != null) {
if(currentLine.indexOf(s)) {
return true;
}
}
return false;
}
An advantage of doing it this way is that you can also have an integer which you increase with every run of the loop to get the line in which the string is included (if you want/need to).

Best way to check if an object is null to prepare for mapping

I’m trying to map an object to another and I’m having trouble deciding what’s the best practice to check if the object from where I want to map is null
1 -
public DTOIntIdentityDocument mapIdentityDocument(Identitydocument in) {
if (in == null) {
return null;
} else {
DTOIntIdentityDocument out = new DTOIntIdentityDocument();
out.setDocumentType(this.mapDocumentTypeÇ(in.getDocumenttype()));
out.setDocumentNumber(in.getDocumentnumber());
return out;
}
}
2 -
public DTOIntIdentityDocument mapIdentityDocument(Identitydocument in) {
DTOIntIdentityDocument out = null;
if (in != null) {
out = new DTOIntIdentityDocument();
out.setDocumentType(this.mapDocumentTypeÇ(in.getDocumenttype()));
out.setDocumentNumber(in.getDocumentnumber());
}
return out;
}
¿Any ideas on what's the best practice to do this?

Obviously, this boils down to style, thus there are no hard rules that tells us which version is "best". If all the code your team writes follows scheme 1, then that is the best code for you.
Having said that, I prefer a simple initial guard, followed by the code computing the "real" result, like this:
if (in == null)
return null;
DTOIntIdentityDocument out = new DTOIntIdentityDocument();
out.setDocumentType(this.mapDocumentTypeÇ(in.getDocumenttype()));
out.setDocumentNumber(in.getDocumentnumber());
return out;
You want to write code that is easy to read and understand. Your version one has that else block ... that actually doesn't need to be in its own block, with additional indents. On the other hand, your second snippet is using three different layers of abstraction: a simple assignment, an if-block, a simple return. That is definitely "more complex" than option 1, or the modified code I used above. But note: option 2 has its advantages, too. If you want/have to trace/log the result of that method, with option 2, you add a single trace(out) right before the return statement.
And for the record: when you go "hardcore" clean code, the method would finally read:
if (in == null)
return null;
return createDocumentFrom(in);
or something alike. Meaning: you push that code that actually creates and configures the result object into its own private method. And that method doesn't need to worry about a null parameter being passed in!
Finally: the ideal solution does not need to have to worry about null parameters. Simply because you avoid null like the plague. Not always possible, but always desirable!

if(in != null)
mapIdentityDocument(in)
public DTOIntIdentityDocument mapIdentityDocument(Identitydocument in) {
DTOIntIdentityDocument out = new DTOIntIdentityDocument();
out.setDocumentType(this.mapDocumentTypeÇ(in.getDocumenttype()));
out.setDocumentNumber(in.getDocumentnumber());
return out;
}

Iterator over TreeSet causes infinite loop

For this assignment, I'm required to save instances of a custom data class (called User) each containing 2 strings into a TreeSet. I must then search the TreeSet I created for a string taken from each line of another file. The first file is a .csv file in which each line contains an email address and a name, the .txt file contains only addresses. I have to search for every line in the .txt file, and I also have to repeat the entire operation 4000 times.
I can't use .contains to search the TreeSet because I can't search by User, since the .txt file only contains one of the two pieces of information that User does. According to information I've found in various places, I can take the iterator from my TreeSet and use that to retrieve each User in it, and then get the User's username and compare that directly to the string from the second file. I wrote my code exactly as every site I found suggested, but my program still gets stuck at an infinite loop. Here's the search code I have so far:
for (int i = 0; i < 4000; i++)//repeats search operation 4000 times
{
try
{
BufferedReader fromPasswords = new BufferedReader(new FileReader("passwordInput.txt"));
while ((line = fromPasswords.readLine()) != null)
{
Iterator it = a.iterator();
while (it.hasNext())
{
//the infinite loop happens about here, if I put a println statement here it prints over and over
if(it.next().userName.compareTo(line) == 0)
matches++; //this is an int that is supposed to go up by 1 every time a match is found
}
}
}
catch (Exception e)
{
System.out.println("Error while searching TreeSet: " + e);
System.exit(0);
}
}
For some additional info, here's my User class.
class User implements Comparable<User>
{
String userName;
String password;
public User() { userName = "none"; password = "none"; }
public User(String un, String ps) { userName = un; password = ps; }
public int compareTo(User u)
{
return userName.compareToIgnoreCase(u.userName);
}
} //User
I've done everything seemingly correctly but it looks to me like iterator doesn't move its pointer even when I call next(). Does anyone see something I'm missing?
Edit: Thanks to KevinO for pointing this out- a is the name of the TreeSet.
Edit: Here's the declaration of TreeSet.
TreeSet<User> a = new TreeSet<User>();

Are you certain there's an infinite loop? You're opening a file 4000 times and iterating through a collection for every line in the file. Depending on size of the file and the collection this could take a very long time.
Some other things to be aware of:
Later versions of Java have a more succinct way of opening a file and iterating through all the lines: Files.lines
You don't need an Iterator to iterate through a collection. A normal for-each loop will do or convert it to a stream
If all you want to do is count the matches then a stream is just as good
Putting all that together:
Path path = Paths.get("passwordInput.txt");
Set<User> users = new TreeSet<>();
long matches = Paths.lines(path)
.mapToLong(l -> users.stream()
.map(User::getName).filter(l::equals).count())
.sum();

What is the most elegant way of doing null checks in Java

Giving an example, lets say we have a code like the one below:
String phone = currentCustomer.getMainAddress().getContactInformation().getLandline()
As we know there is no elvis operator in Java and catching NPE like this:
String phone = null;
try {
phone = currentCustomer.getMainAddress().getContactInformation().getLandline()
} catch (NullPointerException npe) {}
Is not something anyone would advise. Using Java 8 Optional is one solution but the code is far from clear to read -> something along these lines:
String phone = Optional.ofNullable(currentCustomer).flatMap(Customer::getMainAddress)
.flatMap(Address::getContactInformation)
.map(ContactInfo::getLandline)
.orElse(null);
So, is there any other robust solution that does not sacrifice readability?
Edit: There were some good ideas already below, but let's assume the model is either auto generated (not convenient to alter each time) or inside a third party jar that would need to be rebuild from source to be modified.

The "heart" of the problem
This pattern currentCustomer.getMainAddress().getContactInformation().getLandline() is called TrainWreck and should be avoided. Had you done that - not only you'd have better encapsulation and less coupled code, as a "side-effect" you wouldn't have to deal with this problem you're currently facing.
How to do it?
Simple, the class of currentCustomer should expose a new method: getPhoneNumber() this way the user can call: currentCustomer.getPhoneNumber() without worrying about the implementation details (which are exposed by the train-wreck).
Does it completely solve my problem?
No. But now you can use Java 8 optional to tweak the last step. Unlike the example in the question, Optionals are used to return from a method when the returned value might be null, lets see how it can be implemented (inside class Customer):
Optional<String> getPhoneNumber() {
Optional<String> phone = Optional.empty();
try {
phone = Optional.of(mainAddress.getContactInformation().getLandline());
} catch (NullPointerException npe) {
// you might want to do something here:
// print to log, report error metric etc
}
return phone;
}
Per Nick's comment below, ideally, the method getLandline() would return an Optional<String>, this way we can skip the bad practice of swallowing up exceptions (and also raising them when we can avoid it), this would also make our code cleaner as well as more concise:
Optional<String> getPhoneNumber() {
Optional<String> phone = mainAddress.getContactInformation().getLandline();
return phone;
}

String s = null;
System.out.println(s == null);
or
String s = null;
if(s == null)System.out.println("Bad Input, please try again");
If your question was with the object being null, you should have made that clear in your question...
PhoneObject po = null;
if(po==null) System.out.println("This object is null");
If your problem is with checking whether all the parts of the line are null, then you should have also made that clear...
if(phone == null) return -1;
Customer c = phone.currentCustomer();
if(c == null)return -1;
MainAddress ma = c.getMainAddress();
if(ma == null) return -1;
ContactInfo ci = ma.getContactInformation();
if(ci == null)return -1;
LandLine ll = ci.getLandline();
if(ll == null)return -1;
else return ll.toNumber()//or whatever method
Honestly, code that's well written shouldn't have this many opportunities to return null.

Check if file with "ftp" url exists using java

please, could anyone tell me, how can i check if file exists on URL where is only FTP protocol? Im using this code:
public boolean exists(String URLName) throws IOException {
input = null;
boolean result = false;
try {
input = new URL(URLName).openStream();
System.out.println("SUCCESS");
result = true;
} catch (Exception e) {
System.out.println("FAIL");
} finally {
if (input != null) {
input.close();
input = null;
}
}
return result;
}
It doesnt work when i send there more then one or two, it just sais
sun.net.ftp.FtpProtocolException: Welcome message: 421 Too many connections (2) from this IP
at sun.net.ftp.FtpClient.openServer(FtpClient.java:490)
at sun.net.ftp.FtpClient.openServer(FtpClient.java:475)
at sun.net.www.protocol.ftp.FtpURLConnection.connect(FtpURLConnection.java:270)
at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(FtpURLConnection.java:352)
at java.net.URL.openStream(URL.java:1010)
at bibparsing.PDFImage.exists(PDFImage.java:168)
at bibparsing.PDFImage.main(PDFImage.java:189)
It works great when the protocol is HTTP. I mean adresses like:
ftp://cmp.felk.cvut.cz/pub/cmp/articles/chum/Chum-TR-2001-27.pdf
ftp://cmp.felk.cvut.cz/pub/cmp/articles/martinec/Kamberov-ISVC2006.pdf
and something like that

The problem here is that this method isn't thread safe; if two threads use this method simultaneously one can overwrite the instance variable named input, causing the other thread to not closing the connection it opened (and closing either nothing, or the connection opened by the other thread).
This is easily fixed, by making the input variable local:
InputStream input=null;
Code style: within a method, you can return the result as soon as you know it. Beginners often declare the variables first, then execute the logic and return the result at the end of the method. You can save a lot of code and complexity by
declaring variables as late as possible (when you first need them)
declaring as few variables as necessary (readability is always a good reason to add variables, but less variables means less complexity)
returning as soon as you know the result (reducing paths through your code, and thus reducing complexity)
The code can be simply written as:
public static boolean exists (String urlName) throws IOException {
try {
new URL(urlName).openStream().close();
return true;
} catch (IOException e) {
return false;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java ConcurrentHashMap corrupt values - java

Related

how do you return a boolean for if scanner finds a specified string within a file?

Best way to check if an object is null to prepare for mapping

Iterator over TreeSet causes infinite loop

What is the most elegant way of doing null checks in Java

Check if file with "ftp" url exists using java

Categories

Resources