StringBuilder constructor implementation vulnerable to exception - java

The initial capacity of a StringBuilder when initialized with an existing String or CharSequence is the length of the original text + 16 from the code in StringBuilder constructor:
super(str.length() + 16);
My query what if the original length is close to Integer.MAX_VALUE ?
Will it throw NegativeArraySizeExceptionor will it change int to long for proper execution ?

The NegativeArraySizeException is expected here as :
As per the String implementation, a String internally use a char[] to hold the individual characters. So the maximum String length is actually dependent on the char[] size .
Java internally use int (not Integer) to index the individual locations of an array and hence the maximum length of a String can be Integer.MAX_VALUE and any thing greater than that should throw an exception as JVM will not be able to index the individual locations which are more than maximum int value.
Due to this constraint NegativeArraySizeException is thrown for any array which expands beyond the max allowed limit which in this case is max int value.

It will throw a NegativeArraySizeException since the integer will have wrapped around.
Effectively, it's the same as:
int len = Integer.MAX_VALUE;
// here we are trying to create an array of size -2147483633
char [] value = new char[len + 16];

it will throw NegativeArraySizeException

Related

StringBuilder not being initialized to the length value given

I am trying to create a StringBuilder object with a given length, but every time I try to do so, the length of the StringBuilder object is printed as 0. Anyone know why?
String s = "i";
StringBuilder sb = new StringBuilder(s.length()+1);
System.out.println(sb.length());
In the above code, the length of the StringBuilder object should be 2 (1 from the length of "i" + 1), but when I print the StringBuilder's length in Eclipse, I get 0. I changed the length to 17 and 100, but I still got 0 as sb's length.
You're not setting its length in that constructor, you're setting its capacity.
Its length is the number of chars actually in it and you haven't yet appended any characters to it.
That is because you are setting the maximum capacity of the StringBuilder. The length represents the actual size of the constructed string. Let me demonstrate:
StringBuilder sb = new StringBuilder(5);
System.out.println("Length after initialization with capacity: " + sb.length());
sb.append("abcd");
System.out.println("Length after appending: " + sb.length());
Output:
Length after initialization with capacity: 0
Length after appending: 4
As per the official documentation for the StringBuilder(int capacity) constructor,
Constructs a string builder with no characters in it and an initial capacity specified by the capacity argument.
There are no characters in the StringBuilder instance yet. It’s just setting an initial capacity of the resource it uses to store the data based on the int argument you passed in.

Java String UTF-8 limits

I'm trying to deserialize Strings from files directly and I have a question about very long Strings: Java Strings have a character count limit equal to Integer.MAX_VALUE, which is 31^2-1.
But here comes my question: what happens when I have a UTF-8 String with little less than that size but formed by characters with size more than 1 byte and then I ask Java to give me the byte array?
To make it clearer, what happens if I could run this code? (I haven't got RAM enough):
String toPrint = "";
String string100 = "";
int max = Integer.MAX_VALUE -100;
for (int i = 0; i < 100; i += 10) {
string100 += "1234567ñ90";
}
for (int i = 0; i < max; i += 100) {
toPrint += string100;
}
System.out.println("String complete!");
byte[] byteArray = toPrint.getBytes(StandardCharsets.UTF_8);
System.out.println(byteArray.length);
System.exit(0);
Does it print "String complete!"? Or does it break before?
Fundamentally, the limit on Strings is that the char arrays inside of them can't be longer than the maximum array length, which is roughly Integer.MAX_VALUE and greater than your variable max. Strings store their characters in UTF-16 and therefore the UTF-16 representation of a string can't exceed the maximum array length. The number of bytes in UTF-8 and the number of logical characters (Unicode code points, or UTF-32 characters) ultimately don't matter.
Now let's move to your particular example. Since each of the 10 characters in "1234567ñ90" is a single UTF-16 value, that string takes up 10 values of a String's char array. Despite your code's horrible performance and high memory requirement, it should eventually get to "String complete!" if there is sufficient available memory. However, it will break when converting to UTF-8 because the UTF-8 representation of the string is longer than the maximum array length, since "ñ" requires more than one byte.
Array size is also limited to Integer.MAX_VALUE (which is why String size is limited, after all there's a char[] backing it) , so it's impossible to get the byte array if the encoding uses more bytes than that, no matter what the size of the String is in characters.
The end result would be an OutOfMemoryError, but creating the String in the first place would succeed.

Can an ArrayList contain more elements than the maximum value of int?

I was testing how Java (SE7) would deal with the int that exceed its maximum value through the following code:
int index = 2147483647;//the maximum value of int
long size = 2147483648L; //More than the maximum value of int by 1
int safeCounter=0; //To prevent the infinite loop
while (index<size)
{
System.out.println("Index now is : "+index);//show the int value
index++; //increment the int value
safeCounter++; //increment the number of desired loops
if (safeCounter==3){
break;//to break the loop after 3 turns
}
}
and what I got is:
Index now is : 2147483647
Index now is : -2147483648
Index now is : -2147483647
So after being confused by this, (which if I don't use the safeCounter it would keep going forever between the maximum value and the minimum value of int -- and no exception is thrown) I was wondering how would an ArrayList handle a situation where the the number of elements exceed the maximum value of int (assuming that the heap space is not an issue)?
And if ArrayList can't handle this, Is there other data structure which can?
Can you also explain the behavior I got from the int variable?
Can an ArrayList contain more elements than the maximum value of int?
In practice no. An ArrayList is backed by a single Java array, and the maximum size of an array is Integer.MAX_VALUE.
(Hypothetically, Oracle could redo the implementation of ArrayList to use an array of arrays without breaking user code. But the chances of them doing that are pretty small.)
A LinkedList can handle as many elements as you can represent in memory. Or you could implement your own list type. Indeed, you could even implement a list type that can hold more elements than you could store in memory ... or even an unbounded number of elements if your list is actually a generator.
The fact that size() returns an int result (etcetera) is not actually an impediment. The List API spec deals with this anomaly.
The behaviour of your code is simply explained. Integer arithmetic in Java has silent overflow. If you add 1 to the largest positive value for an integer type, it wraps around to the largest negative value; i.e. MAX_VALUE + 1 == MIN_VALUE ... for integer types.
ArrayList can't handle that.Maximum limit of arraylist size is Integer.MAX_VALUE.You can use LinkedList which can contain any number of elements(depends on your memory actually):-)
From ArrayList.java:
**
* The array buffer into which the elements of the ArrayList are stored.
* The capacity of the ArrayList is the length of this array buffer.
*/
private transient Object[] elementData;
As it uses array in its implementation, you cannot index beyond Integer.MAX_VALUE, so that's a limit.
For the int behviour, you can take a look at this question.
This is because Java uses signed integers. ArrayList index starts from 0 and there is no way to provide a negative index to the ArrayList.
One possible solution to your problem is, first convert the unsigned integer to signed and the n use it in ArrayList.
You could convert the signed to unsigned by using the following snippet:
public static long getUnsigned(int signed) {
if(signed > 0) return signed;
long signedVal = (long)(Math.pow(2, 32)) + signed;
return signedVal;
}

How do I convert a bitSet initialized with false in a byte containing 0 in java

I'm working on a small java project aiming to transform a BitSet into several BitSets and then those into several arrays of Bytes:
For example, I wish to split up in two parts a BitSet and convert each part into an int :
byte[] bytesToBeConverted = {(byte)0x05, (byte)0x00};
BitSet bitSetToBeConverted = BitSet.valueOf(bytesToBeConverted);
BitSet BitSetPart1 =new BitSet(8);
BitSetPart1=bitSetToBeConverted.get(0,8);
int intPart1 = (int)(BitSetPart1.toByteArray()[0]); //intPart1 ==5
BitSet BitSetPart2 =new BitSet(8);
BitSetPart2 = bitSetToBeConverted.get(8,16);
int intPart2 = (int)(BitSetPart2.toByteArray()[0]); //intPart2 == 0 is wanted
Whereas no problem does occur in the first part (converting bitSetPart1 into intPart1), the second part, where BitSetpart2 has to be initialized with false, causes an exception to be raised when accessing to the result of the method toByteArray() :java.lang.ArrayIndexOutOfBoundsException
toByteArray seems to return null in that case.
Does that mean that a zero is a forbiden value for that type of operations?
In that case would you rather extend the BitSet Class and Override the toByteArray() method?
or create a class completely separated from the BitSet with an extra method to overcome that problem?
or is there another way to perform that kind of operation that I haven't mentionned?
Thanks a lot for your answers!
From the Javadoc of toByteArray():
More precisely, if
byte[] bytes = s.toByteArray();
then
bytes.length == (s.length()+7)/8
and from the Javadoc of length():
Returns the "logical size" of this BitSet: the index of the highest set bit in the BitSet plus one. Returns zero if the BitSet contains no set bits.
Since the second BitSet contains no set bits, it returns an array of length zero, as the Javadoc clearly specifies.
If you want to pad the result of toByteArray() out to a specified number of bytes, then use Arrays.copyOf(bitSet.toByteArray(), desiredLength).
The empty bitset returns an empty array, hence getting [0] is indeed illegal.
Try
BitSetPart2 = bitSetToBeConverted.get(8,16);
byte[] temp = BitSetPart2.toByteArray();
int intPart2 = temp.length == 0 ? 0 : (int)(temp[0]);
instead

String's Maximum length in Java - calling length() method

In Java, what is the maximum size a String object may have, referring to the length() method call?
I know that length() return the size of a String as a char [];
Considering the String class' length method returns an int, the maximum length that would be returned by the method would be Integer.MAX_VALUE, which is 2^31 - 1 (or approximately 2 billion.)
In terms of lengths and indexing of arrays, (such as char[], which is probably the way the internal data representation is implemented for Strings), Chapter 10: Arrays of The Java Language Specification, Java SE 7 Edition says the following:
The variables contained in an array
have no names; instead they are
referenced by array access expressions
that use nonnegative integer index
values. These variables are called the
components of the array. If an array
has n components, we say n is the
length of the array; the components of
the array are referenced using integer
indices from 0 to n - 1, inclusive.
Furthermore, the indexing must be by int values, as mentioned in Section 10.4:
Arrays must be indexed by int values;
Therefore, it appears that the limit is indeed 2^31 - 1, as that is the maximum value for a nonnegative int value.
However, there probably are going to be other limitations, such as the maximum allocatable size for an array.
java.io.DataInput.readUTF() and java.io.DataOutput.writeUTF(String) say that a String object is represented by two bytes of length information and the modified UTF-8 representation of every character in the string. This concludes that the length of String is limited by the number of bytes of the modified UTF-8 representation of the string when used with DataInput and DataOutput.
In addition, The specification of CONSTANT_Utf8_info found in the Java virtual machine specification defines the structure as follows.
CONSTANT_Utf8_info {
u1 tag;
u2 length;
u1 bytes[length];
}
You can find that the size of 'length' is two bytes.
That the return type of a certain method (e.g. String.length()) is int does not always mean that its allowed maximum value is Integer.MAX_VALUE. Instead, in most cases, int is chosen just for performance reasons. The Java language specification says that integers whose size is smaller than that of int are converted to int before calculation (if my memory serves me correctly) and it is one reason to choose int when there is no special reason.
The maximum length at compilation time is at most 65536. Note again that the length is the number of bytes of the modified UTF-8 representation, not the number of characters in a String object.
String objects may be able to have much more characters at runtime. However, if you want to use String objects with DataInput and DataOutput interfaces, it is better to avoid using too long String objects. I found this limitation when I implemented Objective-C equivalents of DataInput.readUTF() and DataOutput.writeUTF(String).
Since arrays must be indexed with integers, the maximum length of an array is Integer.MAX_INT (231-1, or 2 147 483 647). This is assuming you have enough memory to hold an array of that size, of course.
I have a 2010 iMac with 8GB of RAM, running Eclipse Neon.2 Release (4.6.2) with Java 1.8.0_25. With the VM argument -Xmx6g, I ran the following code:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
try {
sb.append('a');
} catch (Throwable e) {
System.out.println(i);
break;
}
}
System.out.println(sb.toString().length());
This prints:
Requested array size exceeds VM limit
1207959550
So, it seems that the max array size is ~1,207,959,549. Then I realized that we don't actually care if Java runs out of memory: we're just looking for the maximum array size (which seems to be a constant defined somewhere). So:
for (int i = 0; i < 1_000; i++) {
try {
char[] array = new char[Integer.MAX_VALUE - i];
Arrays.fill(array, 'a');
String string = new String(array);
System.out.println(string.length());
} catch (Throwable e) {
System.out.println(e.getMessage());
System.out.println("Last: " + (Integer.MAX_VALUE - i));
System.out.println("Last: " + i);
}
}
Which prints:
Requested array size exceeds VM limit
Last: 2147483647
Last: 0
Requested array size exceeds VM limit
Last: 2147483646
Last: 1
Java heap space
Last: 2147483645
Last: 2
So, it seems the max is Integer.MAX_VALUE - 2, or (2^31) - 3
P.S. I'm not sure why my StringBuilder maxed out at 1207959550 while my char[] maxed out at (2^31)-3. It seems that AbstractStringBuilder doubles the size of its internal char[] to grow it, so that probably causes the issue.
apparently it's bound to an int, which is 0x7FFFFFFF (2147483647).
The Return type of the length() method of the String class is int.
public int length()
Refer http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#length()
So the maximum value of int is 2147483647.
String is considered as char array internally,So indexing is done within the maximum range.
This means we cannot index the 2147483648th member.So the maximum length of String in java is 2147483647.
Primitive data type int is 4 bytes(32 bits) in java.As 1 bit (MSB) is used as a sign bit,The range is constrained within -2^31 to 2^31-1 (-2147483648 to 2147483647).
We cannot use negative values for indexing.So obviously the range we can use is from 0 to 2147483647.
As mentioned in Takahiko Kawasaki's answer, java represents Unicode strings in the form of modified UTF-8 and in JVM-Spec CONSTANT_UTF8_info Structure, 2 bytes are allocated to length (and not the no. of characters of String).
To extend the answer, the ASM jvm bytecode library's putUTF8 method, contains this:
public ByteVector putUTF8(final String stringValue) {
int charLength = stringValue.length();
if (charLength > 65535) {
// If no. of characters> 65535, than however UTF-8 encoded length, wont fit in 2 bytes.
throw new IllegalArgumentException("UTF8 string too large");
}
for (int i = 0; i < charLength; ++i) {
char charValue = stringValue.charAt(i);
if (charValue >= '\u0001' && charValue <= '\u007F') {
// Unicode code-point encoding in utf-8 fits in 1 byte.
currentData[currentLength++] = (byte) charValue;
} else {
// doesnt fit in 1 byte.
length = currentLength;
return encodeUtf8(stringValue, i, 65535);
}
}
...
}
But when code-point mapping > 1byte, it calls encodeUTF8 method:
final ByteVector encodeUtf8(final String stringValue, final int offset, final int maxByteLength /*= 65535 */) {
int charLength = stringValue.length();
int byteLength = offset;
for (int i = offset; i < charLength; ++i) {
char charValue = stringValue.charAt(i);
if (charValue >= 0x0001 && charValue <= 0x007F) {
byteLength++;
} else if (charValue <= 0x07FF) {
byteLength += 2;
} else {
byteLength += 3;
}
}
...
}
In this sense, the max string length is 65535 bytes, i.e the utf-8 encoding length. and not char count
You can find the modified-Unicode code-point range of JVM, from the above utf8 struct link.

Categories

Resources