I'm creating a StringBuilder to collect strings that I periodically flush to a server. If the flush fails, I want to keep the strings to try again next time, although in the mean time I might get additional strings to send which must be added to the StringBuilder.
What I want to know is what the most efficient way to do this would be, as this is being done in an Android app where battery usage and thus CPU usage is a big concern. Does calling StringBuilder's toString() function store the resulting string it returns internally so that a subsequent call doesn't have to do the work of copying all the original strings over? Or if the call fails, should I create a new StringBuilder initialized with the return value from toString()?
Here is the OpenJDK source code for StringBuilder:
public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}
The source for the String constructor with those parameters is:
public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.offset = 0;
this.count = count;
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
So yes, it does create a new String everytime, and yes, it makes a copy of the char[] everytime.
It's important to note that this is one implementation of toString, and another implementation may obviously be different.
It would be an implementation detail. Since java strings are immutable a correct impl can choose to share or create new strings from StringBuilder.toString() even if it's not needed.
As everyone says, you can test to see if this is indeed a real performance issue for you. If it is one (clunky) workaround is to wrap StringBuilder and cache the resulting string. You can use a dirty flag to indicate the content was modified.
StringBuilder.toString API says that a new String object is allocated and initialized to contain the character sequence currently represented by this object.
Related
Given an object byte[], when we want to operate with such object often we need pieces of it. In my particular example i get byte[] from wire where first 4 bytes describe lenght of the message then another 4 bytes the type of the message (an integer that maps to concrete protobuf class) then remaining byte[] is actual content of the message... like this
length|type|content
in order to parse this message i have to pass content part to specific class which knows how to parse an instance from it... the problem is that often there are no methods provided so that you could specify from where to where parser shall read the array...
So what we end up doing is copying remaining chuks of that array, which is not effective...
As far as i know in java it is not possible to create another byte[] reference that actually refers to some original bigger byte[] array with just 2 indexes (this was approach with String that led to memory leaks)...
I wonder how do we solve situations like this? I suppose giving up on protobuf just because it does not provide some parseFrom(byte[], int, int) does not make sence... protobuf is just an example, anything could lack that api...
So does this force us to write inefficient code or there is something that can be done? (appart from adding that method)...
Normally you would tackle this kind of thing with streams.
A stream is an abstraction for reading just what you need to process the current block of data. So you can read the correct number of bytes into a byte array and pass it to your parse function.
You ask 'So does this force us to write inefficient code or there is something that can be done?'
Usually you get your data in the form of a stream and then using the technique demonstrated below will be more performant because you skip making one copy. (Two copies instead of three; once by the OS and once by you. You skip making a copy of the total byte array before you start parsing.) If you actually start out with a byte[] but it is constructed by yourself then you may want to change to constructing an object such as { int length, int type, byte[] contentBytes } instead and pass contentBytes to your parse function.
If you really, really have to start out with byte[] then the below technique is just a more convenient way to parse it, it would not be more performant.
So suppose you got a buffer of bytes from somewhere and you want to read the contents of that buffer. First you convert it to a stream:
private static List<Content> read(byte[] buffer) {
try {
ByteArrayInputStream bytesStream = new ByteArrayInputStream(buffer);
return read(bytesStream);
} catch (IOException e) {
e.printStackTrace();
}
}
The above function wraps the byte array with a stream and passes it to the function that does the actual reading.
If you can start out from a stream then obviously you can skip the above step and just pass that stream into the below function directly:
private static List<Content> read(InputStream bytesStream) throws IOException {
List<Content> results = new ArrayList<Content>();
try {
// read the content...
Content content1 = readContent(bytesStream);
results.add(content1);
// I don't know if there's more than one content block but assuming
// that there is, you can just continue reading the stream...
//
// If it's a fixed number of content blocks then just read them one
// after the other... Otherwise make this a loop
Content content2 = readContent(bytesStream);
results.add(content2);
} finally {
bytesStream.close();
}
return results;
}
Since your byte-array contains content you will want to read Content blocks from the stream. Since you have a length and a type field, I am assuming that you have different kinds of content blocks. The next function reads the length and type and passes the processing of the content bytes on to the proper class depending on the read type:
private static Content readContent(InputStream stream) throws IOException {
final int CONTENT_TYPE_A = 10;
final int CONTENT_TYPE_B = 11;
// wrap the InputStream in a DataInputStream because the latter has
// convenience functions to convert bytes to integers, etc.
// Note that DataInputStream handles the stream in a BigEndian way,
// so check that your bytes are in the same byte order. If not you'll
// have to find another stream reader that can convert to ints from
// LittleEndian byte order.
DataInputStream data = new DataInputStream(stream);
int length = data.readInt();
int type = data.readInt();
// I'm assuming that above length field was the number of bytes for the
// content. So, read length number of bytes into a buffer and pass that
// to your `parseFrom(byte[])` function
byte[] contentBytes = new byte[length];
int readCount = data.read(contentBytes, 0, contentBytes.length);
if (readCount < contentBytes.length)
throw new IOException("Unexpected end of stream");
switch (type) {
case CONTENT_TYPE_A:
return ContentTypeA.parseFrom(contentBytes);
case CONTENT_TYPE_B:
return ContentTypeB.parseFrom(contentBytes);
default:
throw new UnsupportedOperationException();
}
}
I have made up the below Content classes. I don't know what protobuf is but it can apparently convert from a byte array to an actual object with its parseFrom(byte[]) function, so take this as pseudocode:
class Content {
// common functionality
}
class ContentTypeA extends Content {
public static ContentTypeA parseFrom(byte[] contentBytes) {
return null; // do the actual parsing of a type A content
}
}
class ContentTypeB extends Content {
public static ContentTypeB parseFrom(byte[] contentBytes) {
return null; // do the actual parsing of a type B content
}
}
In Java, Array is not just section of memory - it is an object, that have some additional fields (at least - length). So you cannot link to part of array - you should:
Use array-copy functions or
Implement and use some algorithm that uses only part of byte array.
The concern seems that there is no way to create a view over an array (e.g., an array equivalent of List#subList()). A workaround might be making your parsing methods take in the reference to the entire array and two indices (or an index and a length) to specify the sub-array the method should work on.
This would not prevent the methods from reading or modifying sections of the array they should not touch. Perhaps an ByteArrayView class could be made to add a little bit of safety if this is a concern:
public class ByteArrayView {
private final byte[] array;
private final int start;
private final int length;
public ByteArrayView(byte[] array, int start, int length) { ... }
public byte[] get(int index) {
if (index < 0 || index >= length) {
throw new ArrayOutOfBoundsExceptionOrSomeOtherRelevantException();
}
return array[start + index];
}
}
But if, on the other hand, performance is a concern, then a method call to get() for fetching each byte is probably undesirable.
The code is for illustration; it's not tested or anything.
EDIT
On a second reading of my own answer, I realized that I should point this out: having a ByteArrayView will copy each byte you read from the original array -- just byte by byte rather than as a chunk. It would be inadequate for the OP's concerns.
I am doing a java code inspection. Here is a function (snippet):
String getValue() {
String res;
StringBuilder strBuilder = new StringBuilder();
// More code here that sets strBuilder
return res = strBuilder.toString();
}
First there is a warning that the value of res is not used. Secondly I don't understand the return. Why don't they just return( strBuilder.toString() ). Is there some sort of advantage?
res is not used, so there is no reason to return like that. You can remove it:
String getValue() {
StringBuilder bs = new StringBuilder();
//
// More code here that sets sb
return bs.toString();
}
That sort of code can sometimes result from incomplete removal of debug artifacts:
String getValue() {
String res;
StringBuilder bs = new StringBuilder();
//
// More code here that sets sb
res = bs.toString();
// Test and/or display res here
return res;
}
It certainly seems like a good candidate for the next round of refactoring and clean-up.
Just guessing, but some (most?) IDEs don't allow you to directly inspect the value of function returns. With this scheme, you could put a breakpoint at the end of the method, and mouse over "res" to get the return value.
You're absolutely right; assignment to res makes no sense; return bs.toString(); would do the the same.
P.S. +1 for not ignoring compiler warnings.
You cant do either
String res = strBuilder.toString();
return res ;
Or directly,
return strBuilder.toString();
Now If you want to know about benefits as you asked Is there any benefit, i always prefer directly return. My personal logic is simple as
You gonna write one line less code !!! (declaring variables allover is not a good feeling to me and also you don't have to think about the name of the variable, conflicts etc.. those silly matter )
The value will not be stored in memory and wait for the GC to collect it. SO, less memory see.....
Fast write to a variable and then read from it and return ..... more read/write isn't it?
Those things are nothing big, I had to say as you asked
Can also be written as:
String getValue() {
return new StringBuilder().toString();
}
Consider such method:
#Override
public String toString()
{
final StringBuilder sb = new StringBuilder();
for (final Room room : map)
{
sb.append(room.toString());
sb.append(System.getProperty("line.separator")); // THIS IS IMPORTANT
}
return sb.toString();
}
System.getProperty("line.separator") can be called many times.
Should I cache this value with public final static String lineSeperator = System.getProperty("line.separator")
and later use only lineSeperator?
Or System.getProperty("line.separator") is as fast as using a static field?
I see your question as presenting a false dichotomy. I would neither call getProperty every time, nor declare a static field for it. I'd simply extract it to a local variable in toString.
#Override
public String toString()
{
final StringBuilder sb = new StringBuilder();
final String newline = System.getProperty("line.separator");
for (final Room room : map) sb.append(room.toString()).append(newline);
return sb.toString();
}
BTW I have benchmarked the call. The code:
public class GetProperty
{
static char[] ary = new char[1];
#GenerateMicroBenchmark public void everyTime() {
for (int i = 0; i < 100_000; i++) ary[0] = System.getProperty("line.separator").charAt(0);
}
#GenerateMicroBenchmark public void cache() {
final char c = System.getProperty("line.separator").charAt(0);
for (int i = 0; i < 100_000; i++) ary[0] = (char)(c | ary[0]);
}
}
The results:
Benchmark Mode Thr Cnt Sec Mean Mean error Units
GetProperty.cache thrpt 1 3 5 10.318 0.223 ops/msec
GetProperty.everyTime thrpt 1 3 5 0.055 0.000 ops/msec
The cached approach is more than two orders of magnitude faster.
Do note that the overall impact of getProperty call against all that string building is very, very unlikely to be noticeable.
You do not need to fear that the line separator will change while your code is running, so I see no reason against caching it.
Caching a value is certainly faster than executing a call over and over, but the difference will probably be negligible.
If you have become aware of a performance problem that you know relates to this, yes.
If you haven't, then no, the lookup is unlikely to have enough overhead to matter.
This would fall under either or both of the general categories "micro-optimization" and "premature optimization." :-)
But if you're worried about efficiency, you probably have a much bigger opportunity in that your toString method is regenerating the string every time. If toString will be called a lot, rather than caching the line terminator, cache the generated string, and clear that whenever your map of rooms changes. E.g.:
#Override
public String toString()
{
if (cachedString == null)
{
final StringBuilder sb = new StringBuilder();
final String ls = System.getProperty("line.separator");
for (final Room room : map)
{
sb.append(room.toString());
sb.append(ls);
}
cachedString = sb.toString();
}
return cachedString;
}
...and when your map changes, do
cachedString = null;
That's a lot more bang for the buck (the buck being the overhead of an extra field). Granted it's per-instance rather than per-class, so (reference earlier comment about efficiency) only do it if you have a good reason to.
Since it's so easy to do, why not? At the very least the implementation of System.getProperty() will have to do a hash table lookup (even if cached internally) to find the property you are requesting, then the virtual method getString() will be called on the resulting Object. None of these are very expensive but will need to be called multiple times. Not to mention many String temporaries will be created and need GCing after.
If you move this out to the top of your loop and reuse the same value, you avoid all of these problems. So why not?
If the system property is guaranteed to remain constant during the application it can be cached but in general you will loose the feature of the property which is changing the behavior when you change it.
For instance a text generator could use the property to generate text for windows or for linux and allow the property to be changed dynamically in the application, why not ?
In general, catching a property implies making useless the function setProperty.
I have a little method that amongst other things also converts a string into an integer. Since the string is a parameter of the method I want to make sure that that string is convertable. So I was just wondering what would be the safest and / or fastest way.
Version A: Just leave it as it is and take the risks (which I'm trying to avoid)
public static int stringToInt(String param) {
return Integer.valueOf(param);
}
(in terms of speed, what kind of difference would it make to version B and C?)
Version B: Catch the exception
public static int stringToInt(String param) {
try {
return Integer.valueOf(param);
} catch(NumberFormatException e) {
return -1;
}
}
Version C: Check each letter of the string to see, if it's a digit number or not
public static int stringToInt(String param) {
for(char c : param.toCharArray()) {
if(!Character.isDigit(c))
return -1;
}
return Integer.valueOf(param);
}
Note that the parameter has to be a positive number and the -1 is supposed to be the "error value" in my little program, in other words, all three versions of methods would work perfectally fine in my program.
I'm very open to any other suggestion you can give me, so feel free to create your own version, if you think yours is better.
Thank you very much for your support in advance.
Guava offers a utility method for this which returns null in case your String can't be parsed.
https://google.github.io/guava/releases/19.0/api/docs/com/google/common/primitives/Ints.html#tryParse(java.lang.String)
Integer result = Ints.tryParse("1"); //returns 1
Integer result = Ints.tryParse("-1"); //returns -1
Integer result = Ints.tryParse("a"); //returns null
First, note that version C is not bulletproof: it would reject negative numbers, and would not catch numbers that are too large.
Version B is OK, yet it makes the caller change the coding style: rather than catching an error and processing it together with other errors, the caller would need to check for -1 all the time. This may be suboptimal in situations where you read multiple integers, but the error processing does not depend on which particular one has failed. In addition, new coders using your API may forget to check for -1, and use the error code inadvertently.
That's why I would stay with the first option: the code using version A would look instantly familiar to anyone who knows Java API, without the need to learn what happens inside your function.
I believe a modified B to throw an exception rather than returning -1 will be the best choice. It is good to throw the exception up to the level, where it can be processed to send the proper response to the user. Returning a value like -1 will make your code error prone. Assume that a different programmer is consuming your method and he/she just have the signature of your method. So it is not clear from the signature what he/she should code to handle an exception or error scenario. But if you throw the exception and add it to your method declaration then it will enable the other programmer to consume your method properly alongwith the required exception handling. For me this looks the best:
public static int stringToInt(String param) throws NumberFormatException {
try {
return Integer.valueOf(param);
} catch(NumberFormatException e) {
// return -1;
throw e;
}
}
Java 8 without any API:
Optional.ofNullable(strNum)
.map(Integer::valueOf).orElse(null);
public int stringToInt(String param) throws NumberFormatException {
Optional.ofNullable(param.replaceAll("\\s+", ""))
.map(Integer::valueOf).orElse(null);
/*
or
Optional.ofNullable(param.replaceAll(" ", ""))
.map(Integer::valueOf).orElse(null);
*/
}
use the replaceAll to replace white spaces the plus is cpu friendly even though seems not needed.
I used a combination of 2 answers to have it safe for nulls, empty or blank strings, and non numeric characters:
public static Integer safeStringToInt(String param) {
return Optional.ofNullable(param).map(Ints::tryParse).orElse(null);
}
I am fully aware that I am doing is:
1) Unsafe, that I have no more guarantees, type safety, and that the JVM could therefore crash
2) That I could do similar operations using ByteBuffers or JNI
3) That Unsafe is an internal class and could well disappear.
I am doing this for experimentation purposes only and am aware of the consequences.
With this in mind, I am trying to extract the data from an array using Unsafe and reflection:
I first find the field offset of the array:
public long findFieldOffset(Event event) {
try {
Class cl = event.getClass();
Field data_field = cl.getDeclaredField("data");
data_field.setAccessible(true);
long offset = unsafe.objectFieldOffset(data_field);
return offset;
} catch (NoSuchFieldException e) {
e.printStackTrace();
}
return 0;
}
I also extract the base location of the array:
int base = unsafe.arrayBaseOffset(byte[].class);
I subsequently try to extract the array from the class Event, and copy it to a buffer (byte_offset is the result of findFieldOffset).
In the code below, the first part is just a test function and prints out the correct string, whilst the second part extracts what should be the array, but when testing returns garbage unicode values:
/* Testing */
active_buffer.getBuffer().position(1);
active_buffer.getBuffer().put(event.getData());
active_buffer.getBuffer().position(1);
active_buffer.getBuffer().get(tuple, 0, (int)tuple_size);
System.out.println("Test1: " + new String(tuple)) ;
/* Test1 prints out the correct string */
unsafe.copyMemory( (Object) event, byte_offset + base, (Object) null, active_buffer.getAddress() + 1, tuple_size);
active_buffer.getBuffer().position(1);
active_buffer.getBuffer().get(tuple, 0, (int)tuple_size);
System.out.println("Test2: " + new String(tuple));
/* Garbage unicode values gets printed*/
Can anyone see anything wrong with this code?
As data is a field, it is either a primitive or a reference, in either case you cannot copy it somewhere and treat it as a byte[]
If data is a byte[] you need to treat this as the true object, not the Event.