CharBuffer has methods array() and hasArray().
Why do we ever need hasArray()?
After CharBuffer buf = CharBuffer.allocate(20), hasArray() is always true, before - we cannot use buf reference b/c it is uninitialized.
The condition for hasArray is
(hb != null) && !isReadOnly
isReadOnly changes if you use asReadOnlyBuffer
CharBuffer.allocate(20).asReadOnlyBuffer();
for example.
So yes, we need it.
Take this line
final CharBuffer cb = instance.getCharBuffer(...);
Is it read-only or not? Does it hold a valid char[] array? We don't really know. If we do
cb.array();
and it is a read-only Buffer, we get a ReadOnlyBufferException.
If it isn't backed by a char[] array we get a UnsupportedOperationException.
So what we might do is
if (cb.hasArray()) {
final char[] arr = cb.array();
}
Now we are Exception-safe.
Also, you can be sure Oracle/OpenJDK/whateverJDK engineers know what they're doing ;)
Yes, but:
System.out.println(ByteBuffer.allocateDirect(100).asCharBuffer().hasArray());
returns false.
And even if it didn't, it isn't known beforehand how the buffer has been retrieved. You could imagine that the OS allocates the buffer and that it is just used by Java, e.g. when opening a text file. That you cannot directly allocate it yourself is inconsequential.
Besides that hasArray() is a function defined in the parent class Buffer, so it needs to be there for that reason alone.
As the other answer indicates, retrieving an array that is read only accessible would also return false, from the documentation:
true if, and only if, this buffer is backed by an array and is not read-only
this makes sense as you don't want to pass a read only buffer, only to have it altered by somebody retrieving the backing array and writing data to the array that way; Java arrays are always mutable after all.
Related
It seems that String.getBytes() will create a new byte array, so there is an extra memory copy. Can I encode a String directly to a ByteBuffer without an intermediate byte array?
for example:
void putString(ByteBuffer bb, String s) {
byte[] arr = s.getBytes(StandardCharsets.UTF_8);
bb.put(arr);
}
This piece of code will create a byte array, encode the string to this byte array, then copy the content of byte array to ByteBuffer.
I think the byte array is not necessary, it will bring GC and extra memory copy.
You can use CharsetEncoder to write directly to a ByteBuffer:
static void putString(ByteBuffer buffer, String str, Charset charset) {
CharsetEncoder encoder = charset.newEncoder();
encoder.encode(CharBuffer.wrap(str), buffer, true);
encoder.flush(buffer);
}
It's your responsibility to make sure enough space has been allocated. You can also check the result of the encode() method to see if it was successful.
I can't think of a simple way to completely eliminate intermediate byte arrays.
However if you're worrying about this because the String is huge, you can break it into chunks:
for(offset=0; offset<str.length(); offset+=chunkSize) {
String chunk = str.substring(offset, offset+chunkSize);
byteBuffer.put(chunk.getBytes(StandardCharsets.UTF_8));
}
However, if your input strings are huge enough that this optimisation is necessary, the overall architecture of your program is probably ill-conceived.
You should not worry about GC performance unless you've seen something unusual while profiling. The JRE is brilliant at efficient GC.
No, it is not possible. String objects don't have encoding.
String objects are immutable by purpose. The whole idea of that class is to not allow to manipulate any underlying data structures (mainly for security and performance optimization reasons).
In that sense: there is no other better approach to acquire bytes making up a string object in Java.
Recently I created a wrapper to read and write data into a byte array. To do it, I've been using an ArrayList<Byte>, but I was wondering if this is the most efficent way to do it, because:
addAll() doesn't work with byte arrays (even using Arrays.asList(), which returns me List<Byte[]>). To fix it I'm just looping and adding a byte at each loop, but I suppose this supposes a lot of function calls and so it has a performance cost.
The same happens for getting a byte[] from the ArrayList. I can't cast from Byte[] to byte[], so I have to use a loop for it.
I suppose storing Byte instead of byte uses more memory.
I know ByteArrayInputStream and ByteArrayOutputStream could be used for this, but it has some inconvenients:
I wanted to implement methods for reading different data types in different byte order (for example, readInt, readLEInt, readUInt, etc), while those classes only can read / write a byte or a byte array. This isn't really a problem because I could fix that in the wrapper. But here comes the second problem.
I wanted to be able to write and read at the same time because I'm using this to decompress some files. And so to create a wrapper for it I would need to include both ByteArrayInputStream and ByteArrayOutputStream. I don't know if those could be syncronized in some way or I'd have to write the entire data of one to the other each time I wrote to the wrapper.
And so, here comes my question: would using a ByteBuffer be more efficient? I know you can take integers, floats, etc from it, even being able to change the byte order. What I was wondering is if there is a real performance change between using a ByteBuffer and a ArrayList<Byte>.
Definitely ByteBuffer or ByteArrayOutputStream. In your case ByteBuffer seems fine. Inspect the Javadoc, as it has nice methods, For putInt/getInt and such, you might want to set order (of those 4 bytes)
byteBuffer.order(ByteBuffer.LITTLE_ENDIAN);
With files you could use getChannel() or variants and then use a MappedByteBuffer.
A ByteBuffer may wrap a byte array, or allocate.
Keep in mind that every object has overhead associated with it including a bit of memory per object and garbage collection once it goes out of scope.
Using List<Byte> would mean creating / garbage collecting an object per byte which is very wasteful.
ByteBuffer is a wrapper class around a byte array, it doesn't have dynamical size like ArrayList, but it consumes less memory per byte and is faster.
If you know the size you need, then use ByteBuffer, if you don't, then you could use ByteArrayOutputStream (and maybe wrapped by ObjectOutputStream, it has some methods to write different kinds of data). To read the data you have written to ByteArrayOutputStream you can extend the ByteArrayOutputStream, and then you can access the fields buf[] and count, those fields are protected, so you can access them from extending class, it look like:
public class ByteArrayOutputStream extends OutputStream {
/**
* The buffer where data is stored.
*/
protected byte buf[];
/**
* The number of valid bytes in the buffer.
*/
protected int count;
...
}
public class ReadableBAOS extends ByteArrayOutputStream{
public byte readByte(int index) {
if (count<index) {
throw new IndexOutOfBoundsException();
}
return buf[index];
}
}
so you can make some methods in your extending class to read some bytes from the underlying buffer without the need to make an copy of its content each time like toByteArray() method do.
When I initialize an array in Java like:
float[] array = new float[1000];
all elements are initialized to 0. Is that also the case when I allocate a direct buffer like this:
FloatBuffer buffer = ByteBuffer.allocateDirect(4*1000).asFloatBuffer();
? I always seem to get only zeroes, but perhaps it's implementation dependent...
It looks like the answer is probably.
Looking at the implementation of ByteBuffer, it uses DirectByteBuffer under the hood. Taking a look at the implementation source code of Android, it has this comment:
Constructs a new direct byte buffer of
the given capacity on newly allocated
OS memory. The memory will have been
zeroed.
So, when you allocate a buffer, all of the memory contents will be initialized to zero. The oracle implementation also does this zeroing.
This is an implementation detail though. Since the javadoc says nothing about the zeroing, it's technically incorrect to rely on it. To be correct, you should really zero the buffer yourself. In practice, if you're really worried about performance for some reason, you could leave it out, but be warned that some implementations of the JVM might not do this zeroing.
From the ducmentation to the parent abstract class Buffer:
The initial content of a buffer is, in general, undefined.
In the absence of anything to the contrary, I would assume that this applies to buffers allocated by ByteBuffer.allocateDirect(). Interestingly, I suppose that strictly it applies to ordinary array-backed buffers as well, though it is implicit in the allocation of a Java array that the array will be zeroed.
Looking at the Javadoc for Java 7 and also Java 8
it now says
The new buffer's position will be zero, its limit will be its
capacity, its mark will be undefined, and each of its elements will be
initialized to zero. Whether or not it has a backing array is
unspecified
So there is no longer any need for you to zero them yourself.
There is no way to tell so the question is futile. The initial position are zero so there is no API you can execute that will return a part of the buffer at hasn't been 'put' to yet.
All,
I was wondering if clearing a StringBuffer contents using the setLength(0) would make sense. i.e. Is it better to do :
while (<some condition>)
{
stringBufferVariable = new StringBuffer(128);
stringBufferVariable.append(<something>)
.append(<more>)
... ;
Append stringBufferVariable.toString() to a file;
stringBufferVariable.setLength(0);
}
My questions:
1 > Will this still have better performance than having a String object to append the contents?
I am not really sure how reinitializing the StringBuffer variable would affect the performance and hence the question.
Please pour in your comments
[edit]: Removed the 2nd question about comparing to StringBuilder since I have understood there nothing more to look into based on responses.
Better than concatenating strings?
If you're asking whether
stringBufferVariable.append("something")
.append("more");
...
will perform better than concatenating with +, then yes, usually. That's the whole reason these classes exist. Object creation is expensive compared to updating the values in a char array.
It appears most if not all compilers now convert string concatenation into using StringBuilder in simple cases such as str = "something" + "more" + "...";. The only performance difference I can then see is that the compiler won't have the advantage of setting the initial size. Benchmarks would tell you whether the difference is enough to matter. Using + would make for more readable code though.
From what I've read, the compiler apparently can't optimize concatenation done in a loop when it's something like
String str = "";
for (int i = 0; i < 10000; i++) {
str = str + i + ",";
}
so in those cases you would still want to explicitly use StringBuilder.
StringBuilder vs StringBuffer
StringBuilder is not thread-safe while StringBuffer is, but they are otherwise the same. The synchronization performed in StringBuffer makes it slower, so StringBuilder is faster and should be used unless you need the synchronization.
Should you use setLength?
The way your example is currently written I don't think the call to setLength gets you anything, since you're creating a new StringBuffer on each pass through the loop. What you should really do is
StringBuilder sb = new StringBuilder(128);
while (<some condition>) {
sb.append(<something>)
.append(<more>)
... ;
// Append stringBufferVariable.toString() to a file;
sb.setLength(0);
}
This avoids unnecessary object creation and setLength will only be updating an internal int variable in this case.
I'm just focusing on this part of the question. (The other parts have been asked and answered many times before on SO.)
I was wondering if clearing a StringBuffer contents using the setLength(0) would make sense.
It depends on the Java class libraries you are using. In some older Sun releases of Java, the StringBuffer.toString() was implemented on the assumption that a call to sb.toString() is the last thing that is done with the buffer. The StringBuffer's original backing array becomes part of the String returned by toString(). A subsequent attempt to use the StringBuffer resulted a new backing array being created and initialized by copying the String contents. Thus, reusing a StringBuffer as your code tries to do would actually make the application slower.
With Java 1.5 and later, a better way to code this is as follows:
bufferedWriter.append(stringBufferVariable);
stringBufferVariable.setLength(0);
This should copy the characters directly from the StringBuilder into the file buffer without any need to create a temporary String. Providing that the StringBuffer declaration is outside the loop, the setLength(0) then allows you to reuse the buffer.
Finally, you should only be worrying about all of this if you have evidence that this part of the code is (or is likely to be) a bottleneck. "Premature optimization is the root of all evil" blah, blah.
For question 2, StringBuilder will perform better than StringBuffer. StringBuffer is thread safe, meaning methods are synchronized. String Builder is not synchronized. So if the code you have is going to be run by ONE thread, then StringBuilder is going to have better performance since it does not have the overhead of doing synchronization.
As camickr suggest, please check out the API for StringBuffer and StringBuilder for more information.
Also you may be interested in this article: The Sad Tragedy of Micro-Optimization Theater
1 > Will this still have better performance than having a String object to append the contents?
Yes, concatenating Strings is slow since you keep creating new String Objects.
2 > If using StringBuilder would perform better than StringBuffer here, than why?
Have you read the API description for StringBuilder and/or StringBuffer? This issued is addressed there.
I am not really sure how reinitializing the StringBuffer variable would affect the performance and hence the question.
Well create a test program. Create a test that creates a new StringBuffer/Builder every time. Then rerun the test and just reset the characters to 0 and then compare the times.
Perhaps I am misunderstanding something in your question... why are you setting the length to 0 at the bottom if you are just creating a new one at the start of each iteration?
Assuming the variable is a local variable to the method or that it will not be using by multiple threads if it is declared outside of a method (if it is outside of a method your code probably has issues though) then make it a StringBuilder.
If you declare the StringBuilder outside of the loop then you don't need to make a new one each time you enter the loop but you would want to set the length to 0 at the end of the loop.
If you declare the StringBuilder inside of the loop then you don't need to set the length to 0 at the end of the loop.
It is likely that declaring it outside of the loop and setting the length to 0 will be faster, but I would measure both and if there isn't a large difference declare the variable inside the loop. It is good practice to limit the scope of variables.
yup! setLength(0) is a great idea! that's what its for. anything quicker would be to discard the stringBuffer & make a new one. its faster, can't say anything about it being memory efficient :)
In Java, is there a way to truncate an array without having to make a copy of it? The common idiom is Arrays.copyOf(foo, n) (where the new array is n elements long). I don't think there is an alternative, but I'm curious as to whether there is a better approach.
An array's length in Java cannot be altered after initialization, so you're forced to make a copy with the new size. Actually, the length parameter of a Java array is declared as final, so it cannot be changed once it's set.
If you need to change an array's size, I'd use an ArrayList.
I was thinking about it some more... and just for kicks, how about something like the below.
Note: This is just a "can it be done?" intellectual exercise in Java hacking. Anybody who attempts to actually use this idea in production code will deserve all the pain that will undoubtedly follow.
public class Foo
{
private static byte[] array = new byte[10];
public static void main(String[] arg) throws Exception
{
Field field = Unsafe.class.getDeclaredField("theUnsafe");
field.setAccessible(true);
Unsafe unsafe = (Unsafe) field.get(null);
Field arrayField = Foo.class.getDeclaredField("array");
long ptr = unsafe.staticFieldOffset(arrayField);
// doesn't work... there's gotta be a way though!
unsafe.reallocateMemory(ptr, 5);
System.out.println("New array size is: " + array.length);
}
}
I don't believe so. An array is allocated as a contiguous block of memory, and I can't imagine that there is any way of releasing a part of that block.
Succinctly: No, There isn't, as far as I know. A Java array is a fixed-size data-structure. The only way to "logically" resize it is create a new array and copy the wanted elements into the new array.
Instead: You could (possibly) implement a class which wraps an array into a collection and uses a "size" variable to logically reduce the length of the array without actually copying the values. This an approach has limited utility... The only case I can imagine where it's practical is when you're dealing with a huge array, which just can't be copied because of memory limitations.
Copying an array is relatively inexpensive time-wise... unless you're doing it millions of times, in which case you probably need to think up an algorithm to avoid this, and not waste your time mucking around with a "variable length array".
And of course... you could just use an ArrayList instead. Yes?