StringBuilder not being initialized to the length value given - java

I am trying to create a StringBuilder object with a given length, but every time I try to do so, the length of the StringBuilder object is printed as 0. Anyone know why?
String s = "i";
StringBuilder sb = new StringBuilder(s.length()+1);
System.out.println(sb.length());
In the above code, the length of the StringBuilder object should be 2 (1 from the length of "i" + 1), but when I print the StringBuilder's length in Eclipse, I get 0. I changed the length to 17 and 100, but I still got 0 as sb's length.

You're not setting its length in that constructor, you're setting its capacity.
Its length is the number of chars actually in it and you haven't yet appended any characters to it.

That is because you are setting the maximum capacity of the StringBuilder. The length represents the actual size of the constructed string. Let me demonstrate:
StringBuilder sb = new StringBuilder(5);
System.out.println("Length after initialization with capacity: " + sb.length());
sb.append("abcd");
System.out.println("Length after appending: " + sb.length());
Output:
Length after initialization with capacity: 0
Length after appending: 4

As per the official documentation for the StringBuilder(int capacity) constructor,
Constructs a string builder with no characters in it and an initial capacity specified by the capacity argument.
There are no characters in the StringBuilder instance yet. It’s just setting an initial capacity of the resource it uses to store the data based on the int argument you passed in.

Related

Split a String variable into a random substring with a specified length

In java.
It should use the random number generator to return a randomly chosen substring of text that has the specified length. If the length is either negative or greater than the length of text, the method should throw an IllegalArgumentException. For example, chooseSubstring("abcde", 4, new Random()) should return "abcd" about half the time and "bcde" about half the time.
public static String chooseSubstring (String text, int length, Random rand)
{
int randomNum = rand.nextInt(length);
String answer = text.substring(randomNum);
return answer;
}
Basically, I want to return a substring from the variable text. The substring must be the length of the variable length. The beginning of this substring should start at a random location determined by a random number generator. My problem is that the random number generator is not making sure the substring is the correct length.
System.out.println(chooseSubstring("abcde", 4, new Random()));
Should return abcd and bcde about the same amount of times. Instead it is returning:
bcde
cde
de
abcde.
Any info about how to solve this would greatly help thanks!
Your code is taking a substring at a random index between 0 and length, exclusive. You have to specify the end index so it doesn't extend to the end of the string. You also need to reduce the range of the starting index so the end index doesn't go past the string:
int randomNum = rand.nextInt(text.length() - length + 1);
String answer = text.substring(randomNum, randomNum + length);

StringBuilder constructor implementation vulnerable to exception

The initial capacity of a StringBuilder when initialized with an existing String or CharSequence is the length of the original text + 16 from the code in StringBuilder constructor:
super(str.length() + 16);
My query what if the original length is close to Integer.MAX_VALUE ?
Will it throw NegativeArraySizeExceptionor will it change int to long for proper execution ?
The NegativeArraySizeException is expected here as :
As per the String implementation, a String internally use a char[] to hold the individual characters. So the maximum String length is actually dependent on the char[] size .
Java internally use int (not Integer) to index the individual locations of an array and hence the maximum length of a String can be Integer.MAX_VALUE and any thing greater than that should throw an exception as JVM will not be able to index the individual locations which are more than maximum int value.
Due to this constraint NegativeArraySizeException is thrown for any array which expands beyond the max allowed limit which in this case is max int value.
It will throw a NegativeArraySizeException since the integer will have wrapped around.
Effectively, it's the same as:
int len = Integer.MAX_VALUE;
// here we are trying to create an array of size -2147483633
char [] value = new char[len + 16];
it will throw NegativeArraySizeException

Java String UTF-8 limits

I'm trying to deserialize Strings from files directly and I have a question about very long Strings: Java Strings have a character count limit equal to Integer.MAX_VALUE, which is 31^2-1.
But here comes my question: what happens when I have a UTF-8 String with little less than that size but formed by characters with size more than 1 byte and then I ask Java to give me the byte array?
To make it clearer, what happens if I could run this code? (I haven't got RAM enough):
String toPrint = "";
String string100 = "";
int max = Integer.MAX_VALUE -100;
for (int i = 0; i < 100; i += 10) {
string100 += "1234567ñ90";
}
for (int i = 0; i < max; i += 100) {
toPrint += string100;
}
System.out.println("String complete!");
byte[] byteArray = toPrint.getBytes(StandardCharsets.UTF_8);
System.out.println(byteArray.length);
System.exit(0);
Does it print "String complete!"? Or does it break before?
Fundamentally, the limit on Strings is that the char arrays inside of them can't be longer than the maximum array length, which is roughly Integer.MAX_VALUE and greater than your variable max. Strings store their characters in UTF-16 and therefore the UTF-16 representation of a string can't exceed the maximum array length. The number of bytes in UTF-8 and the number of logical characters (Unicode code points, or UTF-32 characters) ultimately don't matter.
Now let's move to your particular example. Since each of the 10 characters in "1234567ñ90" is a single UTF-16 value, that string takes up 10 values of a String's char array. Despite your code's horrible performance and high memory requirement, it should eventually get to "String complete!" if there is sufficient available memory. However, it will break when converting to UTF-8 because the UTF-8 representation of the string is longer than the maximum array length, since "ñ" requires more than one byte.
Array size is also limited to Integer.MAX_VALUE (which is why String size is limited, after all there's a char[] backing it) , so it's impossible to get the byte array if the encoding uses more bytes than that, no matter what the size of the String is in characters.
The end result would be an OutOfMemoryError, but creating the String in the first place would succeed.

Can Java String handle 10^15 characters ? [duplicate]

I'm trying The Next Palindrome problem from Sphere Online Judge (SPOJ) where I need to find a palindrome for a integer of up to a million digits. I thought about using Java's functions for reversing Strings, but would they allow for a String to be this long?
You should be able to get a String of length
Integer.MAX_VALUE always 2,147,483,647 (231 - 1)
(Defined by the Java specification, the maximum size of an array, which the String class uses for internal storage)
OR
Half your maximum heap size (since each character is two bytes) whichever is smaller.
I believe they can be up to 2^31-1 characters, as they are held by an internal array, and arrays are indexed by integers in Java.
While you can in theory Integer.MAX_VALUE characters, the JVM is limited in the size of the array it can use.
public static void main(String... args) {
for (int i = 0; i < 4; i++) {
int len = Integer.MAX_VALUE - i;
try {
char[] ch = new char[len];
System.out.println("len: " + len + " OK");
} catch (Error e) {
System.out.println("len: " + len + " " + e);
}
}
}
on Oracle Java 8 update 92 prints
len: 2147483647 java.lang.OutOfMemoryError: Requested array size exceeds VM limit
len: 2147483646 java.lang.OutOfMemoryError: Requested array size exceeds VM limit
len: 2147483645 OK
len: 2147483644 OK
Note: in Java 9, Strings will use byte[] which will mean that multi-byte characters will use more than one byte and reduce the maximum further. If you have all four byte code-points e.g. emojis, you will only get around 500 million characters
Have you considered using BigDecimal instead of String to hold your numbers?
Integer.MAX_VALUE is max size of string + depends of your memory size but the Problem on sphere's online judge you don't have to use those functions
Java9 uses byte[] to store String.value, so you can only get about 1GB Strings in Java9. Java8 on the other hand can have 2GB Strings.
By character I mean "char"s, some character is not representable in BMP(like some of the emojis), so it will take more(currently 2) chars.
The heap part gets worse, my friends. UTF-16 isn't guaranteed to be limited to 16 bits and can expand to 32

String's Maximum length in Java - calling length() method

In Java, what is the maximum size a String object may have, referring to the length() method call?
I know that length() return the size of a String as a char [];
Considering the String class' length method returns an int, the maximum length that would be returned by the method would be Integer.MAX_VALUE, which is 2^31 - 1 (or approximately 2 billion.)
In terms of lengths and indexing of arrays, (such as char[], which is probably the way the internal data representation is implemented for Strings), Chapter 10: Arrays of The Java Language Specification, Java SE 7 Edition says the following:
The variables contained in an array
have no names; instead they are
referenced by array access expressions
that use nonnegative integer index
values. These variables are called the
components of the array. If an array
has n components, we say n is the
length of the array; the components of
the array are referenced using integer
indices from 0 to n - 1, inclusive.
Furthermore, the indexing must be by int values, as mentioned in Section 10.4:
Arrays must be indexed by int values;
Therefore, it appears that the limit is indeed 2^31 - 1, as that is the maximum value for a nonnegative int value.
However, there probably are going to be other limitations, such as the maximum allocatable size for an array.
java.io.DataInput.readUTF() and java.io.DataOutput.writeUTF(String) say that a String object is represented by two bytes of length information and the modified UTF-8 representation of every character in the string. This concludes that the length of String is limited by the number of bytes of the modified UTF-8 representation of the string when used with DataInput and DataOutput.
In addition, The specification of CONSTANT_Utf8_info found in the Java virtual machine specification defines the structure as follows.
CONSTANT_Utf8_info {
u1 tag;
u2 length;
u1 bytes[length];
}
You can find that the size of 'length' is two bytes.
That the return type of a certain method (e.g. String.length()) is int does not always mean that its allowed maximum value is Integer.MAX_VALUE. Instead, in most cases, int is chosen just for performance reasons. The Java language specification says that integers whose size is smaller than that of int are converted to int before calculation (if my memory serves me correctly) and it is one reason to choose int when there is no special reason.
The maximum length at compilation time is at most 65536. Note again that the length is the number of bytes of the modified UTF-8 representation, not the number of characters in a String object.
String objects may be able to have much more characters at runtime. However, if you want to use String objects with DataInput and DataOutput interfaces, it is better to avoid using too long String objects. I found this limitation when I implemented Objective-C equivalents of DataInput.readUTF() and DataOutput.writeUTF(String).
Since arrays must be indexed with integers, the maximum length of an array is Integer.MAX_INT (231-1, or 2 147 483 647). This is assuming you have enough memory to hold an array of that size, of course.
I have a 2010 iMac with 8GB of RAM, running Eclipse Neon.2 Release (4.6.2) with Java 1.8.0_25. With the VM argument -Xmx6g, I ran the following code:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
try {
sb.append('a');
} catch (Throwable e) {
System.out.println(i);
break;
}
}
System.out.println(sb.toString().length());
This prints:
Requested array size exceeds VM limit
1207959550
So, it seems that the max array size is ~1,207,959,549. Then I realized that we don't actually care if Java runs out of memory: we're just looking for the maximum array size (which seems to be a constant defined somewhere). So:
for (int i = 0; i < 1_000; i++) {
try {
char[] array = new char[Integer.MAX_VALUE - i];
Arrays.fill(array, 'a');
String string = new String(array);
System.out.println(string.length());
} catch (Throwable e) {
System.out.println(e.getMessage());
System.out.println("Last: " + (Integer.MAX_VALUE - i));
System.out.println("Last: " + i);
}
}
Which prints:
Requested array size exceeds VM limit
Last: 2147483647
Last: 0
Requested array size exceeds VM limit
Last: 2147483646
Last: 1
Java heap space
Last: 2147483645
Last: 2
So, it seems the max is Integer.MAX_VALUE - 2, or (2^31) - 3
P.S. I'm not sure why my StringBuilder maxed out at 1207959550 while my char[] maxed out at (2^31)-3. It seems that AbstractStringBuilder doubles the size of its internal char[] to grow it, so that probably causes the issue.
apparently it's bound to an int, which is 0x7FFFFFFF (2147483647).
The Return type of the length() method of the String class is int.
public int length()
Refer http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#length()
So the maximum value of int is 2147483647.
String is considered as char array internally,So indexing is done within the maximum range.
This means we cannot index the 2147483648th member.So the maximum length of String in java is 2147483647.
Primitive data type int is 4 bytes(32 bits) in java.As 1 bit (MSB) is used as a sign bit,The range is constrained within -2^31 to 2^31-1 (-2147483648 to 2147483647).
We cannot use negative values for indexing.So obviously the range we can use is from 0 to 2147483647.
As mentioned in Takahiko Kawasaki's answer, java represents Unicode strings in the form of modified UTF-8 and in JVM-Spec CONSTANT_UTF8_info Structure, 2 bytes are allocated to length (and not the no. of characters of String).
To extend the answer, the ASM jvm bytecode library's putUTF8 method, contains this:
public ByteVector putUTF8(final String stringValue) {
int charLength = stringValue.length();
if (charLength > 65535) {
// If no. of characters> 65535, than however UTF-8 encoded length, wont fit in 2 bytes.
throw new IllegalArgumentException("UTF8 string too large");
}
for (int i = 0; i < charLength; ++i) {
char charValue = stringValue.charAt(i);
if (charValue >= '\u0001' && charValue <= '\u007F') {
// Unicode code-point encoding in utf-8 fits in 1 byte.
currentData[currentLength++] = (byte) charValue;
} else {
// doesnt fit in 1 byte.
length = currentLength;
return encodeUtf8(stringValue, i, 65535);
}
}
...
}
But when code-point mapping > 1byte, it calls encodeUTF8 method:
final ByteVector encodeUtf8(final String stringValue, final int offset, final int maxByteLength /*= 65535 */) {
int charLength = stringValue.length();
int byteLength = offset;
for (int i = offset; i < charLength; ++i) {
char charValue = stringValue.charAt(i);
if (charValue >= 0x0001 && charValue <= 0x007F) {
byteLength++;
} else if (charValue <= 0x07FF) {
byteLength += 2;
} else {
byteLength += 3;
}
}
...
}
In this sense, the max string length is 65535 bytes, i.e the utf-8 encoding length. and not char count
You can find the modified-Unicode code-point range of JVM, from the above utf8 struct link.

Categories

Resources