How to detect whether String.substring copies the character data - java

I know that for Oracle Java 1.7 update 6 and newer, when using String.substring,
the internal character array of the String is copied, and for older versions, it is shared.
But I found no offical API that would tell me the current behavior.
Use Case
My use case is:
In a parser, I like to detect whether String.substring copies or shares the underlying character array.
The problem is, if the character array is shared, then my parser needs to explicitly "un-share" using new String(s) to avoid
memory problems. However, if String.substring anyway copies the data, then this is not necessary, and explicitly copying the data in the parser could be avoided. Use case:
// possibly the query is very very large
String query = "select * from test ...";
// the identifier is used outside of the parser
String identifier = query.substring(14, 18);
// avoid if possible for speed,
// but needed if identifier internally
// references the large query char array
identifier = new String(identifier);
What I Need
Basically, I would like to have a static method boolean isSubstringCopyingForSure() that would detect if new String(..) is not needed. I'm OK if detection doesn't work if there is a SecurityManager. Basically, the detection should be conservative (to avoid memory problems, I'd rather use new String(..) even if not necessary).
Options
I have a few options, but I'm not sure if they are reliable, specially for non-Oracle JVMs:
Checking for the String.offset field
/**
* #return true if substring is copying, false if not or if it is not clear
*/
static boolean isSubstringCopyingForSure() {
if (System.getSecurityManager() != null) {
// we can not reliably check it
return false;
}
try {
for (Field f : String.class.getDeclaredFields()) {
if ("offset".equals(f.getName())) {
return false;
}
}
return true;
} catch (Exception e) {
// weird, we do have a security manager?
}
return false;
}
Checking the JVM version
static boolean isSubstringCopyingForSure() {
// but what about non-Oracle JREs?
return System.getProperty("java.vendor").startsWith("Oracle") &&
System.getProperty("java.version").compareTo("1.7.0_45") >= 0;
}
Checking the behavior
There are two options, both are rather complicated. One is create a string using custom charset, then create a new string b using substring, then modify the original string and check whether b is also changed. The second options is create huge string, then a few substrings, and check the memory usage.

Right, indeed this change was made in 7u6. There is no API change for this, as this change is strictly an implementation change, not an API change, nor is there an API to detect which behavior the running JDK has. However, it is certainly possible for applications to notice a difference in performance or memory utilization because of the change. In fact, it's not difficult to write a program that works in 7u4 but fails in 7u6 and vice-versa. We expect that the tradeoff is favorable for the majority of applications, but undoubtedly there are applications that will suffer from this change.
It's interesting that you're concerned about the case where string values are shared (prior to 7u6). Most people I've heard from have the opposite concern, where they like the sharing and the 7u6 change to unshared values is causing them problems (or, they're afraid it will cause problems).
In any case the thing to do is measure, not guess!
First, compare the performance of your application between similar JDKs with and without the change, e.g. 7u4 and 7u6. Probably you should be looking at GC logs or other memory monitoring tools. If the difference is acceptable, you're done!
Assuming that the shared string values prior to 7u6 cause a problem, the next step is to try the simple workaround of new String(s.substring(...)) to force the string value to be unshared. Then measure that. Again, if the performance is acceptable on both JDKs, you're done!
If it turns out that in the unshared case, the extra call to new String() is unacceptable, then probably the best way to detect this case and make the "unsharing" call conditional is to reflect on a String's value field, which is a char[], and get its length:
int getValueLength(String s) throws Exception {
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
return ((char[])field.get(s)).length;
}
Consider a string resulting from a call to substring() that returns a string shorter than the original. In the shared case, the substring's length() will differ from the length of the value array retrieved as shown above. In the unshared case, they'll be the same. For example:
String s = "abcdefghij".substring(2, 5);
int logicalLength = s.length();
int valueLength = getValueLength(s);
System.out.printf("%d %d ", logicalLength, valueLength);
if (logicalLength != valueLength) {
System.out.println("shared");
else
System.out.println("unshared");
On JDKs older than 7u6, the value's length will be 10, whereas on 7u6 or later, the value's length will be 3. In both cases, of course, the logical length will be 3.

This is not a detail you need to care about. No really! Just call identifier = new String(identifier) in both cases (JDK6 and JDK7). Under JDK6 it will create a copy (as desired). Under JDK7, because the substring is already a unique string the constructor is essentially a no-op (no copy is performed -- read the code). Sure there is a slight overhead of object creation, but because of object reuse in the Younger generation, I challenge you to qualify a performance difference.

In older Java versions, String.substring(..) will use the same char array as the original, with a different offset and count.
In the latest Java versions (according to the comment by Thomas Mueller: since 1.7 Update 6), this has changed, and substrings are now be created with a new char array.
If you parse lots of sources, the best way to deal with it is to avoid checking the Strings' internals, but anticipate this effect and always create new Strings where you need them (as in the first code block in your question).
String identifier = query.substring(14, 18);
// older Java versions: backed by same char array, different offset and count
// newer Java versions: copy of the desired run of the original char array
identifier = new String(identifier);
// older Java versions: when the backed char array is larger than count, a copy of the desired run will be made
// newer Java versions: trivial operation, create a new String instance which is backed by the same char array, no copy needed.
That way, you end up with the same result with both variants, without having to distinguish them and without unnecessary array copy overhead.

Are you sure, that making string copy is really expensive? I belive that JVM optimizer has intrinsics about strings and avoids unnecessary copies. Also large texts are parsed with one-pass algorithms such as LALR automata, generated by compiler compilers. So, the parser input usually be an java.io.Reader or another streaming interface, not a solid String. Parsing is usially expensive itself, still not as expensive as type checking. I don't think that copying strings is a real bottleneck. You better experience with profiler and with microbenchmarks before your assumptions.

Related

Java: valueOf vs copyValueOf

What is the difference between valueOf and copyValueOf. I looked on GrepCode, only to find that both return the exact same thing.
copyValueOf:
Parameters:
data the character array.
Returns:
a String that contains the characters of the character array.
public static String copyValueOf(char data[]) {
return new String(data);
}
valueOf:
Returns the string representation of the char array argument. The contents of the character array are copied; subsequent modification of the character array does not affect the returned string.
Parameters: data the character array.
Returns:
a String that contains the characters of the character array.
public static String valueOf(char data[]) {
return new String(data);
}
So if both do the same thing, then how come one isn't deprecated?
As others have pointed out:
The two methods are equivalent.
The javadocs clearly state that the two methods are equivalent. And copyValueOf clearly points the reader to the (mildly) preferred valueOf method.
There is NO performance difference between the two versions. The implementations are identical.
Deprecating one or other method would be counter-productive because it would prompt people to "fix" code that isn't broken. That is (arguably) a waste of time, and it would annoy a lot of people.
Removing one or other method would break backwards compatibility ... for no good reason. That would really annoy a lot of people.
The only other issue is why there isn't an annotation to flag a method as "out of date". I think that the answer to that is that it doesn't matter if you use an API method that is out of date. Certainly, it doesn't matter enough for the Java team to implement such a mechanism ... and then spend a lot of time deciding whether such-and-such an API is "out of date enough" to warrant flagging, etc.
(Most folks would not want the Java team to waste their time on such things. We would prefer them to spend their time on delivering improvements to Java that will make a real difference to program performance and programmer productivity.)
A more appropriate way to deal with this issue would for someone to write or enhance 3rd-party a style checker or bug checker tool to flag use of (so-called) out of date methods. This is clearly not Oracle's problem, but if you (Dear Reader) are really concerned about this, you could make it your problem.
A program element annotated #Deprecated is one that programmers are discouraged from using, typically because it is dangerous, or because a better alternative exists. Compilers warn when a deprecated program element is used or overridden in non-deprecated code (Annotation Type Deprecated).
The two methods literally do the same thing; meaning that neither method is dangerous or better. This is perhaps the reason they haven't bothered to deprecate either method.
Both of the methods serves same purpose, but they differ a little in their inner implementation (according comments in String.java - older implementations):
copyValueOf(char data[])
Returns a string that is equivalent to the specified character array. It creates a new array and copies the characters into it.
valueOf(char data[])
Returns a string that is equivalent to the specified character array. Uses the original array as the body of the string (ie. it does not
copy it to a new array).
The older version has redudndency in its code. But in newer versions of Java, implementations as same. Older method is made available so that old programmers who are unaware of newer version can use it.

Android - deletion of resources in Java [duplicate]

String secret="foo";
WhatILookFor.securelyWipe(secret);
And I need to know that it will not be removed by java optimizer.
A String cannot be "wiped". It is immutable, and short of some really dirty and dangerous tricks you cannot alter that.
So the safest solution is to not put the data into a string in the first place. Use a StringBuilder or an array of characters instead, or some other representation that is not immutable. (And then clear it when you are done.)
For the record, there are a couple of ways that you can change the contents of a String's backing array. For example, you can use reflection to fish out a reference to the String's backing array, and overwrite its contents. However, this involves doing things that the JLS states have unspecified behaviour so you cannot guarantee that the optimizer won't do something unexpected.
My personal take on this is that you are better off locking down your application platform so that unauthorized people can't gain access to the memory / memory dump in the first place. After all, if the platform is not properly secured, the "bad guys" may be able to get hold of the string contents before you erase it. Steps like this might be warranted for small amounts of security critical state, but if you've got a lot of "confidential" information to process, it is going to be a major hassle to not be able to use normal strings and string handling.
You would need direct access to the memory.
You really wouldn't be able to do this with String, since you don't have reliable access to the string, and don't know if it's been interned somewhere, or if an object was created that you don't know about.
If you really needed to this, you'd have to do something like
public class SecureString implements CharSequence {
char[] data;
public void wipe() {
for(int i = 0; i < data.length; i++) data[i] = '.'; // random char
}
}
That being said, if you're worried about data still being in memory, you have to realize that if it was ever in memory at one point, than an attacker probably already got it. The only thing you realistically protect yourself from is if a core dump is flushed to a log file.
Regarding the optimizer, I incredibly doubt it will optimize away the operation. If you really needed it to, you could do something like this:
public int wipe() {
// wipe the array to a random value
java.util.Arrays.fill(data, (char)(rand.nextInt(60000));
// compute hash to force optimizer to do the wipe
int hash = 0;
for(int i = 0; i < data.length; i++) {
hash = hash * 31 + (int)data[i];
}
return hash;
}
This will force the compiler to do the wipe. It makes it roughly twice as long to run, but it's a pretty fast operation as it is, and doesn't increase the order of complexity.
Store the data off-heap using the "Unsafe" methods. You can then zero over it when done and be certain that it won't be pushed around the heap by the JVM.
Here is a good post on Unsafe:
http://highlyscalable.wordpress.com/2012/02/02/direct-memory-access-in-java/
If you're going to use a String then I think you are worried about it appearing in a memory dump. I suggest using String.replace() on key-characters so that when the String is used at run-time it will change and then go out of scope after it is used and won't appear correctly in a memory dump. However, I strongly recommend that you not use a String for sensitive data.

Why do we have String class, if StringBuilder or StringBuffer can do what a String does? [duplicate]

This question already has answers here:
Why can't strings be mutable in Java and .NET?
(17 answers)
Closed 7 years ago.
I've always wondered why does JAVA and C# has String (immutable & threadsafe) class, if they have StringBuilder (mutable & not threadsafe) or StringBuffer (mutable & threadsafe) class. Isn't StringBuilder/StringBuffer superset of String class? I mean, why should I use String class, if I've option of using StringBuilder/StringBuffer?
For example, Instead of using following,
String str;
Why can't I always use following?
StringBuilder strb; //or
StringBuffer strbu;
In short, my question is, How will my code get effected if I replace String with StringBuffer class? Also, StringBuffer has added advantage of mutability.
I mean, why should I use String class, if I've option of using StringBuilder/StringBuffer?
Precisely because it's immutable. Immutability has a whole host of benefits, primarily that it makes it much easier to reason about your code without creating copies of the data everywhere "just in case" something decides to mutate the value. For example:
private readonly String name;
public Person(string name)
{
if (string.IsNullOrEmpty(name)) // Or whatever
{
// Throw some exception
}
this.name = name;
}
// All the rest of the code can rely on name being a non-null
// reference to a non-empty string. Nothing can mutate it, leaving
// evil reflection aside.
Immutability makes sharing simple and efficient. That's particularly useful for multi-threaded code. It makes "modifying" (i.e. creating a new instance with different data) more painful, but in many situations that's absolutely fine, because values pass through the system without ever being modified.
Immutability is particularly useful for "simple" types such as strings, dates, numbers (BigDecimal, BigInteger etc). It allows them to be used within maps more easily, it allows a simple equality definition, etc.
1) StringBuilder as well as StringBuffer both are mutable. So it will cause a few problems like using in collections like keys in hashMap. See this link.
Another example of advantage of immutability will be what Jon has mentioned in his comments. I am just pasting here.
Someone can call Person p = new Person(builder); with a builder which initially passes my validation criteria - and then modify it afterwards, without the Person class having any say in it. In order to avoid that, the Person class would need to copy the validated data.
Immutabilty assures this does not happen.
2) As string is most extensively used object in java, the string pool offers to resuse same string, thus saving memory.
I completely agree with Jon Skeet that immutability is one reason to use String. Another reason (from a C# perspective) is that String is actually lighter weight than StringBuilder. If you look at reference source for both String and String Builder you will see that StringBuilder actually has a number of String constants in it. As a developer, you should only use what you need so unless you need the added benefits provided from StringBuilder you should use String.
Many answers have already outlined that there are shortcomings from using mutable variants such as StringBuilder. To illustrate the problem, one thing that you cannot achieve with StringBuilder is associative memory, i.e. hash tables. Sure, most implementations will allow you to use StringBuilder as a key for hashtables, but they will only find the values for the exact same instance of StringBuilder. However, the typical behavior that you would want to achieve is that it does not matter where the string comes from as only the characters are important, as you e.g. reade the string from a database or file (or any other external resource).
However, as far as I understood your question, you were mainly asking about field types. And indeed, I see your point particularly taking into account that we are doing the exact same thing with collections of other objects which are usually not immutable objects but mutable collections, such as List or ArrayList in C# or Java, respectively. In the end, a string is only a collection of characters, so why not making it mutable?
The answer I would give here is that the usual behavior of how such a string is changed is very different to usual collections. If you have a collection of subsequent elements, it is very common to only add a single element to the collection, leaving most of the collection untouched, i.e. you would not discard a list to insert an item, at least unless you are programming in Haskell :). For many strings like names, this is different as you typically replace the whole string. Given the importance of a string data type, the platforms usually offer a lot of optimization for strings such as interned strings, making the choice even more biased towards strings.
However, in the end, every program is different and you might have requirements that make it more reasonable to use StringBuilder by default, but for the given reasons, I think that these cases are rather rare.
EDIT: As you were asking for examples. Consider the following code:
stopwatch.Start();
var s = "";
for (int i = 0; i < 100000; i++)
{
s = "." + s;
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
var s2 = new StringBuilder();
for (int i = 0; i < 100000; i++)
{
s2.Insert(0, ".");
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
Technically, both bits are doing a very similar thing, they will insert a character at the first position and shift whatever comes after. Both versions will involve copying the whole string that has been there before. The version with string completes in 1750ms on my machine whereas StringBuilder took 2245ms. However, both versions are reasonably fast, making the performance impact negligible in this case.
I would like to add some differences between String and StringBuilder classes:
Yes, as mentioned above String is immutable class and content cannot be changed after string has been created. It is allow to work with the same string objects from different threads without locking.
If you need to concatenate a lot of strings together, use StringBuilder class. When you use "+" operator it creates a lot of string objects on managed heap and hurts performance.
StringBuilder is mutable class. StringBuilder stores characters in array and can manipulate with characters without creating a new string object (such as add, remove, replace, append).
If you know approximate length of result string you should set capacity. Default capacity is 16 (.NET 4.5). It gives you performance improvements because StringBuilder has inner array of chars. Array of chars recreates when count of characters exceeds current capacity.
String:
is immutable (so you can use it in collections)
every operation creates a new instance on the Heap. Technically speaking really depends on the code.
For performance and memory consumption purposes it makes sense to use StringBuilder.

"Immutable" strings in Java - actually, it's a lie [duplicate]

We all know that String is immutable in Java, but check the following code:
String s1 = "Hello World";
String s2 = "Hello World";
String s3 = s1.substring(6);
System.out.println(s1); // Hello World
System.out.println(s2); // Hello World
System.out.println(s3); // World
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
char[] value = (char[])field.get(s1);
value[6] = 'J';
value[7] = 'a';
value[8] = 'v';
value[9] = 'a';
value[10] = '!';
System.out.println(s1); // Hello Java!
System.out.println(s2); // Hello Java!
System.out.println(s3); // World
Why does this program operate like this? And why is the value of s1 and s2 changed, but not s3?
String is immutable* but this only means you cannot change it using its public API.
What you are doing here is circumventing the normal API, using reflection. The same way, you can change the values of enums, change the lookup table used in Integer autoboxing etc.
Now, the reason s1 and s2 change value, is that they both refer to the same interned string. The compiler does this (as mentioned by other answers).
The reason s3 does not was actually a bit surprising to me, as I thought it would share the value array (it did in earlier version of Java, before Java 7u6). However, looking at the source code of String, we can see that the value character array for a substring is actually copied (using Arrays.copyOfRange(..)). This is why it goes unchanged.
You can install a SecurityManager, to avoid malicious code to do such things. But keep in mind that some libraries depend on using these kind of reflection tricks (typically ORM tools, AOP libraries etc).
*) I initially wrote that Strings aren't really immutable, just "effective immutable". This might be misleading in the current implementation of String, where the value array is indeed marked private final. It's still worth noting, though, that there is no way to declare an array in Java as immutable, so care must be taken not to expose it outside its class, even with the proper access modifiers.
As this topic seems overwhelmingly popular, here's some suggested further reading: Heinz Kabutz's Reflection Madness talk from JavaZone 2009, which covers a lot of the issues in the OP, along with other reflection... well... madness.
It covers why this is sometimes useful. And why, most of the time, you should avoid it. :-)
In Java, if two string primitive variables are initialized to the same literal, it assigns the same reference to both variables:
String Test1="Hello World";
String Test2="Hello World";
System.out.println(test1==test2); // true
That is the reason the comparison returns true. The third string is created using substring() which makes a new string instead of pointing to the same.
When you access a string using reflection, you get the actual pointer:
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
So change to this will change the string holding a pointer to it, but as s3 is created with a new string due to substring() it would not change.
You are using reflection to circumvent the immutability of String - it's a form of "attack".
There are lots of examples you can create like this (eg you can even instantiate a Void object too), but it doesn't mean that String is not "immutable".
There are use cases where this type of code may be used to your advantage and be "good coding", such as clearing passwords from memory at the earliest possible moment (before GC).
Depending on the security manager, you may not be able to execute your code.
You are using reflection to access the "implementation details" of string object. Immutability is the feature of the public interface of an object.
Visibility modifiers and final (i.e. immutability) are not a measurement against malicious code in Java; they are merely tools to protect against mistakes and to make the code more maintainable (one of the big selling points of the system). That is why you can access internal implementation details like the backing char array for Strings via reflection.
The second effect you see is that all Strings change while it looks like you only change s1. It is a certain property of Java String literals that they are automatically interned, i.e. cached. Two String literals with the same value will actually be the same object. When you create a String with new it will not be interned automatically and you will not see this effect.
#substring until recently (Java 7u6) worked in a similar way, which would have explained the behaviour in the original version of your question. It didn't create a new backing char array but reused the one from the original String; it just created a new String object that used an offset and a length to present only a part of that array. This generally worked as Strings are immutable - unless you circumvent that. This property of #substring also meant that the whole original String couldn't be garbage collected when a shorter substring created from it still existed.
As of current Java and your current version of the question there is no strange behaviour of #substring.
String immutability is from the interface perspective. You are using reflection to bypass the interface and directly modify the internals of the String instances.
s1 and s2 are both changed because they are both assigned to the same "intern" String instance. You can find out a bit more about that part from this article about string equality and interning. You might be surprised to find out that in your sample code, s1 == s2 returns true!
Which version of Java are you using? From Java 1.7.0_06, Oracle has changed the internal representation of String, especially the substring.
Quoting from Oracle Tunes Java's Internal String Representation:
In the new paradigm, the String offset and count fields have been removed, so substrings no longer share the underlying char [] value.
With this change, it may happen without reflection (???).
There are really two questions here:
Are strings really immutable?
Why is s3 not changed?
To point 1: Except for ROM there is no immutable memory in your computer. Nowadays even ROM is sometimes writable. There is always some code somewhere (whether it's the kernel or native code sidestepping your managed environment) that can write to your memory address. So, in "reality", no they are not absolutely immutable.
To point 2: This is because substring is probably allocating a new string instance, which is likely copying the array. It is possible to implement substring in such a way that it won't do a copy, but that doesn't mean it does. There are tradeoffs involved.
For example, should holding a reference to reallyLargeString.substring(reallyLargeString.length - 2) cause a large amount of memory to be held alive, or only a few bytes?
That depends on how substring is implemented. A deep copy will keep less memory alive, but it will run slightly slower. A shallow copy will keep more memory alive, but it will be faster. Using a deep copy can also reduce heap fragmentation, as the string object and its buffer can be allocated in one block, as opposed to 2 separate heap allocations.
In any case, it looks like your JVM chose to use deep copies for substring calls.
To add to the #haraldK's answer - this is a security hack which could lead to a serious impact in the app.
First thing is a modification to a constant string stored in a String Pool. When string is declared as a String s = "Hello World";, it's being places into a special object pool for further potential reusing. The issue is that compiler will place a reference to the modified version at compile time and once the user modifies the string stored in this pool at runtime, all references in code will point to the modified version. This would result into a following bug:
System.out.println("Hello World");
Will print:
Hello Java!
There was another issue I experienced when I was implementing a heavy computation over such risky strings. There was a bug which happened in like 1 out of 1000000 times during the computation which made the result undeterministic. I was able to find the problem by switching off the JIT - I was always getting the same result with JIT turned off. My guess is that the reason was this String security hack which broke some of the JIT optimization contracts.
According to the concept of pooling, all the String variables containing the same value will point to the same memory address. Therefore s1 and s2, both containing the same value of “Hello World”, will point towards the same memory location (say M1).
On the other hand, s3 contains “World”, hence it will point to a different memory allocation (say M2).
So now what's happening is that the value of S1 is being changed (by using the char [ ] value). So the value at the memory location M1 pointed both by s1 and s2 has been changed.
Hence as a result, memory location M1 has been modified which causes change in the value of s1 and s2.
But the value of location M2 remains unaltered, hence s3 contains the same original value.
The reason s3 does not actually change is because in Java when you do a substring the value character array for a substring is internally copied (using Arrays.copyOfRange()).
s1 and s2 are the same because in Java they both refer to the same interned string. It's by design in Java.
String is immutable, but through reflection you're allowed to change the String class. You've just redefined the String class as mutable in real-time. You could redefine methods to be public or private or static if you wanted.
Strings are created in permanent area of the JVM heap memory. So yes, it's really immutable and cannot be changed after being created.
Because in the JVM, there are three types of heap memory:
1. Young generation
2. Old generation
3. Permanent generation.
When any object are created, it goes into the young generation heap area and PermGen area reserved for String pooling.
Here is more detail you can go and grab more information from:
How Garbage Collection works in Java .
[Disclaimer this is a deliberately opinionated style of answer as I feel a more "don't do this at home kids" answer is warranted]
The sin is the line field.setAccessible(true); which says to violate the public api by allowing access to a private field. Thats a giant security hole which can be locked down by configuring a security manager.
The phenomenon in the question are implementation details which you would never see when not using that dangerous line of code to violate the access modifiers via reflection. Clearly two (normally) immutable strings can share the same char array. Whether a substring shares the same array depends on whether it can and whether the developer thought to share it. Normally these are invisible implementation details which you should not have to know unless you shoot the access modifier through the head with that line of code.
It is simply not a good idea to rely upon such details which cannot be experienced without violating the access modifiers using reflection. The owner of that class only supports the normal public API and is free to make implementation changes in the future.
Having said all that the line of code is really very useful when you have a gun held you your head forcing you to do such dangerous things. Using that back door is usually a code smell that you need to upgrade to better library code where you don't have to sin. Another common use of that dangerous line of code is to write a "voodoo framework" (orm, injection container, ...). Many folks get religious about such frameworks (both for and against them) so I will avoid inviting a flame war by saying nothing other than the vast majority of programmers don't have to go there.
String is immutable in nature Because there is no method to modify String object.
That is the reason They introduced StringBuilder and StringBuffer classes
This is a quick guide to everything
// Character array
char[] chr = {'O', 'K', '!'};
// this is String class
String str1 = new String(chr);
// this is concat
str1 = str1.concat("another string's ");
// this is format
System.out.println(String.format(str1 + " %s ", "string"));
// this is equals
System.out.println(str1.equals("another string"));
//this is split
for(String s: str1.split(" ")){
System.out.println(s);
}
// this is length
System.out.println(str1.length());
//gives an score of the total change in the length
System.out.println(str1.compareTo("OK!another string string's"));
// trim
System.out.println(str1.trim());
// intern
System.out.println(str1.intern());
// character at
System.out.println(str1.charAt(5));
// substring
System.out.println(str1.substring(5, 12));
// to uppercase
System.out.println(str1.toUpperCase());
// to lowerCase
System.out.println(str1.toLowerCase());
// replace
System.out.println(str1.replace("another", "hello"));
// output
// OK!another string's string
// false
// OK!another
// string's
// 20
// 7
// OK!another string's
// OK!another string's
// o
// other s
// OK!ANOTHER STRING'S
// ok!another string's
// OK!hello string's

How can I securely wipe a confidential data in memory in java with guarantee it will not be 'optimized'?

String secret="foo";
WhatILookFor.securelyWipe(secret);
And I need to know that it will not be removed by java optimizer.
A String cannot be "wiped". It is immutable, and short of some really dirty and dangerous tricks you cannot alter that.
So the safest solution is to not put the data into a string in the first place. Use a StringBuilder or an array of characters instead, or some other representation that is not immutable. (And then clear it when you are done.)
For the record, there are a couple of ways that you can change the contents of a String's backing array. For example, you can use reflection to fish out a reference to the String's backing array, and overwrite its contents. However, this involves doing things that the JLS states have unspecified behaviour so you cannot guarantee that the optimizer won't do something unexpected.
My personal take on this is that you are better off locking down your application platform so that unauthorized people can't gain access to the memory / memory dump in the first place. After all, if the platform is not properly secured, the "bad guys" may be able to get hold of the string contents before you erase it. Steps like this might be warranted for small amounts of security critical state, but if you've got a lot of "confidential" information to process, it is going to be a major hassle to not be able to use normal strings and string handling.
You would need direct access to the memory.
You really wouldn't be able to do this with String, since you don't have reliable access to the string, and don't know if it's been interned somewhere, or if an object was created that you don't know about.
If you really needed to this, you'd have to do something like
public class SecureString implements CharSequence {
char[] data;
public void wipe() {
for(int i = 0; i < data.length; i++) data[i] = '.'; // random char
}
}
That being said, if you're worried about data still being in memory, you have to realize that if it was ever in memory at one point, than an attacker probably already got it. The only thing you realistically protect yourself from is if a core dump is flushed to a log file.
Regarding the optimizer, I incredibly doubt it will optimize away the operation. If you really needed it to, you could do something like this:
public int wipe() {
// wipe the array to a random value
java.util.Arrays.fill(data, (char)(rand.nextInt(60000));
// compute hash to force optimizer to do the wipe
int hash = 0;
for(int i = 0; i < data.length; i++) {
hash = hash * 31 + (int)data[i];
}
return hash;
}
This will force the compiler to do the wipe. It makes it roughly twice as long to run, but it's a pretty fast operation as it is, and doesn't increase the order of complexity.
Store the data off-heap using the "Unsafe" methods. You can then zero over it when done and be certain that it won't be pushed around the heap by the JVM.
Here is a good post on Unsafe:
http://highlyscalable.wordpress.com/2012/02/02/direct-memory-access-in-java/
If you're going to use a String then I think you are worried about it appearing in a memory dump. I suggest using String.replace() on key-characters so that when the String is used at run-time it will change and then go out of scope after it is used and won't appear correctly in a memory dump. However, I strongly recommend that you not use a String for sensitive data.

Categories

Resources