Whilst handling passwords in Java, its my understanding that they should always be handled in char[]'s to allow GC and remove hanging references.
My question is would,
char[] password = String.valueOf(authentication.getCredentials()).toCharArray();
Could the value of authentication.getCredentials() to be interned or not?
It's not a question of interning the String, any security concerns around using Strings to store passwords arise from the amount of time they are present in memory.
With a char array you have the ability to wipe the contents once you've finished reading them. With a String (which is immutable) you're left relying on the garbage collector, this means that if someone has access to your server and dumps the memory there may be password visible.
String.valueOf() doesn't intern Strings. The only way to intern Strings during runtime is with password.intern(). There's no need to use char[] for passwords. Using char[] allows you to clear the array directly after use, narrowing the attacker's timeframe to dump the memory and retrieve the plaintext password.
A String by itself is nothing special to the GC. Interning affects it a bit, but in regular use you wouldn't encounter anything out of the ordinary.
Related
I've started looking into string interning and it seems like a great feature however I haven't found a great reason for why you would want to create a string using the string constructor, after some digging I came up with this, could someone confirm (or deny) if this is a valid reason to create a string with new?
Say you have 2 strings:
String novel = "The contents of a very long novel..."
String page = new String("The contents of a single page...")
By default all string literals are stored in the string pool (such as with String novel) and by default all sub-strings of novel will be interned (assuming they are created as a string literal) to optimizing memory allocation. Creating a string using the new keyword results in the string being created on the heap rather than in string pool. A particular case when you may want to avoid interning is if you wanted to create a string that is a sub-string of a very large string literal (such as page).
For example; Say you had a very large string literal (e.g. the contents of a novel) that you wanted to process only a portion of (e.g. a single page). It may be beneficial to use the string constructor (via new keyword) when creating the string that only contains a single page of the novel. That way the very large string may be free'd from the string pool sooner and keep only the string that contains the contents of a page on the heap. In contrast, if you created a string literal that is an interned sub-string of an entire novel, a larger amount of novel may be kept alive in the string pool despite only needing a small portion of the novel string.
TL;DR: There is no good / valid reason to new a String in a modern JVM, or to call String.intern() explicitly.
Your question contains false statements of fact, and that means that the conclusions that you are drawing are incorrect.
By default all string literals are stored in the string pool (such as with String novel)
That is correct, though it is not "by default". (It is like saying "by default a square has 4 sides". Squares have 4 sides, period. There are no exceptions. And no defaults.)
and by default all sub-strings of novel will be interned (assuming they are created as a string literal) to optimizing memory allocation.
Incorrect.
A String created by the String.substring() method is NOT interned. Not in current Java releases, or (AFAIK) in any previous release. (But see below.)
Creating a string using the new keyword results in the string being created on the heap rather than in string pool.
Correct.
A particular case when you may want to avoid interning is if you wanted to create a string that is a sub-string of a very large string literal (such as page).
Incorrect.
I think you are confusing "interning" with something else.
Actually, in a modern JVM you always want to avoid interning. It is expensive, and it causes string objects to be (artificially) kept for longer than they need to me.
In fact, the only real reason that interning is still a thing is that it is necessary to guarantee certain semantic properties specified in the JLS about compile-time constant strings.
A modern JVM (Java 9 and later) performs string deduping in the garbage collector for strings that live long enough. This happens transparently ... and in cases where it is likely to be beneficial.
Historic note.
In some old JVMs, there used to be a good reason to call new String in conjunction with substring. The problem was the substring method has a "clever optimization" whereby it created the substrings to share the backing char[] with the original string1. This had the problem that references to (small) substrings could keep the (large) backing array reachable. It was a subtle kind of memory leak. You could avoid the leak by using new.
However:
The optimization was NOT interning. The substrings were created in the regular heap, and they did not have the semantics of interned strings.
The problem only affected certain String use-cases. And in practice they didn't involve large String literals.
The problem was solved long ago. The String.substring now creates a new String with its own backing array.
In summary, using new String might have been a good idea in some cases with old Java versions, but it isn't anymore. It was fixed in Java 7.
1 - Interestingly, the source code for String describes this as a speed optimization rather than a space optimization.
I feel strings can replace character array in all the scenarios. Even considering the immutability characteristic of Strings, declaration of strings in appropriate scope and java's garbage collection feature should help us avoid any memory leaks. I want to know if there is any corner case where character array should be used instead of Strings in Java.
Character arrays have some slight advantage over plain strings when it comes to storing security sensitive data. There's a lot of resources on that, for example this question: Why is char[] preferred over String for passwords? (with an answer by Jon Skeet himself).
In general it boils down to two things:
You have very little influence on how long a String stays in memory. Because of that you might leak sensitive data through a memory dump.
Leaking sensitive data accidentally in application logs as clear text is much more likely with plain strings
More reading:
Why we read password from console in char array instead of String
https://www.codebyamir.com/blog/use-character-arrays-to-store-sensitive-data-java
https://www.geeksforgeeks.org/use-char-array-string-storing-passwords-java/amp/
https://www.baeldung.com/java-storing-passwords
https://javarevisited.blogspot.com/2012/03/why-character-array-is-better-than.html
https://javainsider.wordpress.com/2012/12/10/character-array-is-better-than-string-for-storing-password-in-java/amp/
String is a class, not a build in type. It most likely does what it does by using a char array underneath, but there is no guarantee. "We dont care how it is implemented". It has methods that make sense for strings, like comparing strings. Comparing arrays?? Hmm. Doesn't really make sense to do it. You could check if they are equal sure, but less or greater than...
Back in point. One scenario is you want to operate with chars, not a string. For example you have letters of the alphabet and want to sort them. Or grades in A-F system and you want to sort them. Generally where it makes sense having chars that are not connected to have some meaning together (like in a message string, or a text message). You would not generally need to sort the chars of a text message now, would you? So, you use an array.
To sort, you can take advantage of the Arrays.sort() method for example, while i dont think there is a method that does it for strings. Perhaps 3rd part libraries.
On another note(unrelated to question) , you can use StringBuilder to if you want to modify strings often. Its better at performace.
You don't have to look much further than at methods in the JDK core API that use char[].
Such as this one (java.io.Reader):
public int read(char[] cbuf)
throws IOException
Reads characters into an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
Parameters:
cbuf - Destination buffer
Returns:
The number of characters read, or -1 if the end of the stream has been reached
Throws:
IOException - If an I/O error occurs
Instead of returning a String they ask you to pass in a char[] to use as a buffer to write the result into. The reason is efficiency.
You might be knowing String is immutable and how Substring can cause memory leak in Java.
Since Strings are immutable in Java if you store password as plain text it will be available in memory until Garbage collector clears it and since String are used in String pool for reusability there is pretty high chance that it will be remain in memory for long duration, which pose a security threat. Since any one who has access to memory dump can find the password in clear text. Since Strings are immutable there is no way contents of Strings can be changed because any change will produce new String, while if you char[] you can still set all his element as blank or zero. So Storing password in character array clearly mitigates security risk of stealing password.
Java provides the URLEncoder class for URL-encoding Strings. But it is considered insecure to store passwords as Strings. Is this code to send a password via POST over an HttpsURLConnection output stream wr secure enough?
try (DataOutputStream wr = new DataOutputStream(con.getOutputStream())) {
// Write some post params
for (char c : pass) {
wr.writeBytes(URLEncoder.encode(String.valueOf(c), "UTF-8"));
}
// Write some more
}
On the one hand, it is using Strings. On the other hand, those Strings are 1 character long, and conceptually the same after encoding. Also, it seems to me that this could fail on multi-byte characters. Would an attacker be able to locate these 1-char Strings in memory and reconstruct the original password? Is there a better way to do this?
It's true that when using Strings for passwords, you can't reliably discard them since you can't zero out the String contents manually (except through reflection) and the contents will stay in memory for an unknown time. Therefore char[] is often recommended to be used for any password input, followed by a manual zeroing of the said char[].
However the attack is in no way easy to mount since it requires access to the memory, luck with timing. It's unlikely the password will stay in the memory for very long amounts of time, as the GC does its work and the memory gets reused. In most cases this attack vector can be infeasible compared to other, simpler attacks.
An OutputStream has no method writeBytes; but it offers a write(byte[]) method that could be used to write all chars of the password with one call.
And beyond that: the whole idea of using HTTPS is that the connection itself is securely encrypted; so it really should not matter whether you transmit such content with single byte or multi byte bursts.
Then: keep in mind that all these classes are abstractions, that are layered upon each other. HTTPS uses TCP; and TCP packets have a certain size; in that sense there aren't single bytes going over the network anyway.
And regarding the second part of your question: you are iterating an array of char values. So it really depends how that char array was created; but normally, you don't need to worry (see here on that topic).
After reading this beautiful question: Why is char[] preferred over String for passwords?, I'm curious as to how this applies to servlet based web applications. Say your UI has some input field for the password, the password will be retrievable with request.getParameter("passwordFieldName") which returns a String. Even if you then convert it to a char[], you have to wait for GC to clear the String object.
Also, many of the Encryption/Hashing libraries I'm looking into using for password hashing have a method such as checkPassword(plaintext, hashed) that takes two Strings and returns true if the entered plain text string gives a hash equal to hashed. With this, even if you had a char[], you would still need to convert the array to a String with the new String(char[]) constructor.
It seems to me like you can't really avoid having your password as a String for comparing it to its stored representation. If you are worried about attacks during that small window, how do you protect yourself?
This is an overreaction and really just "security theater". There is really no scenario in which having a long String as a password in a Java application would be at all desirable to an attacker. If a memory exhaustion attack is a concern, then don't use Strings anywhere.
That being said CWE-521 states that passwords must have a max size. Strings don't really have a max size, and using a char[x] is a good way of enforcing a max size.
In reference to the link: File IO Tuning, last section titled "Further Tuning" where the author suggests using char[] to avoid generating String objects for n lines in the file, I need to understand how does
char[] arr = new char{'a','u','t','h', 'o', 'r'}
differ with
String s = "author"
in terms of memory consumption or any other performance factor? Isn't String object internally stored as a character array? I feel silly since I never thought of this before. :-)
In Oracle's JDK a String has four instance-level fields:
A character array
An integral offset
An integral character count
An integral hash value
That means that each String introduces an extra object reference (the String itself), and three integers in addition to the character array itself. (The offset and character count are there to allow sharing of the character array among String instances produced through the String#substring() methods, a design choice that some other Java library implementers have eschewed.) Beyond the extra storage cost, there's also one more level of access indirection, not to mention the bounds checking with which the String guards its character array.
If you can get away with allocating and consuming just the basic character array, there's space to be saved there. It's certainly not idiomatic to do so in Java though; judicious comments would be warranted to justify the choice, preferably with mention of evidence from having profiled the difference.
In the example you've referred to, it's because there's only a single character array being allocated for the whole loop. It's repeatedly reading into that same array, and processing it in place.
Compare that with using readLine which needs to create a new String instance on each iteration. Each String instance will contain a few int fields and a reference to a char[] containing the actual data - so it would need two new instances per iteration.
I'd usually expect the differences to be insignificant (with a decent GC throwing away unused "young" objects very efficiently) compared with the IO involved in reading the data - assuming it's from disk - but I believe that's the point the author was trying to make.
The author didn't get the reason right. The real overhead in in.readLine() is the copying a char[] buffer when making a String out of it. The additional copying is the most damning cost when dealing with large data.
It is possible to optimize this within JDK so that the additional copying is not needed.
Here are few reasons which makes sense to believe that character array is better choice in Java than String:
Say for Storing the Password
1) Since Strings are immutable in Java, if you store password as plain text it will be available in memory until Garbage collector clears it and since String are used in String pool for reusability there is pretty high chance that it will be remain in memory for long duration, which pose a security threat.
Since any one who has access to memory dump can find the password in clear text and that's another reason you should always used an encrypted password than plain text.
Since Strings are immutable there is no way contents of Strings can be changed because any change will produce new String, while if you char[] you can still set all his element as blank or zero. So Storing password in character array clearly mitigates security risk of stealing password.
2) Java itself recommends using getPassword() method of JPasswordField which returns a char[] and deprecated getText() method which returns password in clear text stating security reason. Its good to follow advice from Java team and adhering to standard rather than going against it.
3) With String there is always a risk of printing plain text in log file or console but if use Array you won't print contents of array instead its memory location get printed. though not a real reason but still make sense.
For this simple program
String strPassword="Unknown";
char[] charPassword= new char[]{'U','n','k','n','o','w','n'};
System.out.println("String password: " + strPassword);
System.out.println("Character password: " + charPassword);
Output:
String password: Unknown
Character password: [C#110b053
That's all on why character array is better choice than String for storing passwords in Java. Though using char[] is not just enough you need to erase content to be more secure.
Hope this will help.
My answer is going to focus on other stack questions along this similar line, others have already posted more direct answers.
There have been other questions similar to this, advice seems to go along the lines of using StringBuilder.
If you're concerned with string concentenation this have a look at the performance as described here between three different implementations. With another stack post which can give you some additional pointers and examples you could try yourself to see the performance.