Java - Creating string with constant value inside function - java

Which one is better?
public class A {
private static final String DOSOMETHING_METRICS = "doSomethingmetrics";
private static final String SAYSOMETHING_METRICS = "saySomethingmetrics";
public void doSomething() {
...
System.out.println("Metrics for " + DOSOMETHING_METRICS + "is something");
}
public void saySomething() {
...
System.out.println("Metrics for " + SAYSOMETHING_METRICS + "is something");
}
}
OR
public class A {
public void doSomething() {
final String DOSOMETHING_METRICS = "doSomethingmetrics";
...
System.out.println("Metrics for " + DOSOMETHING_METRICS + "is something");
}
public void saySomething() {
final String SAYSOMETHING_METRICS = "saySomethingmetrics";
...
System.out.println("Metrics for " + SAYSOMETHING_METRICS + "is something");
}
}
I think Method 1 wins in case of memory optimization as compiler allocated memory only once and the garbage collector doesn't need to deallocate the string created in every function call. However, I think good coding practice recommends that variable should be bound to the scope in which it has to be used and constants should be defined as close as to they first use in the program which is where Method 2 wins.
What is your take on this? Which aspect is more important? The functions here will be called multiple times (let's say at least 100K times).

In both cases, these are constant variables as defined in JLS 4.12.4. So not only are the strings "doSomethingmetrics" and "saySomethingmetrics" interned, but so are "Metrics for doSomethingmetricsis something" and "Metrics for saySomethingmetricsis something". (Yeah, you need to add a space before "is".)
The first version logically has a slightly smaller stack, but I'd expect the JIT to optimize that away anyway.
I would use whichever form you find most readable. If you want to know for sure about the performance in your particular app, then as ever, the right thing to do is test both ways.
Looking at the results of javap -v, it looks like the second method actually has a slight advantage in that the unconcatenated strings don't even need to appear in the constant pool, as there's no way of reaching them. So you should see your class file being ever-so-slightly smaller that way. But again, I very much doubt that it'll make any difference.

I think Method 1 wins in case of memory optimization
In both cases your string constants go to string pool and stored there. In second case you reallocate space for reference to your variable in stack frame. That's why I think that the first one is preferred one (but compiler can optimize the second case and they would be the same).

Related

In Java Is it efficient to use new String() instead of double quotes when string is really large and not using again?

I have so many use cases where I have to initialize a large string and not use the same string anywhere else.
//Code-1
public class Engine{
public void run(){
String q = "fjfjljkljflajlfjalkjdfkljaflkjdllllllllllllllsjfkjdaljdfkdjfnvnvnrrukvnfknv";
//do something
}
}
I call this run method very few times.
In Code-1 the string fjfjljkljflaj..... will get added to the string pool and will never get collected by GC. So I am thinking to initialize with the new operator.
//Code-2
public class Engine{
public void run(){
String q = new String("fjfjljkljflajlfjalkjdfkljaflkjdllllllllllllllsjfkjdaljdfkdjfnvnvnrru");
//do something
}
}
Will 2nd code save some memory or there will be other factors to consider to decide which one is efficient?
first thing -- if we create with the new String() object, the constant won't be created in the literal pool unless we call the intern() method.
In terms of optimization, we should use the String literal notation when possible. It is easier to read and it gives the compiler a chance to optimize our code.

Is there a way to create an anonymous Object array with data and an overridden toString method?

Object[] x = new Object[] {"Skye", "Eyks", 123}
{
#Override
public String toString()
{
return this[0] + " " + this[1] + " (" + this[3] + ")";
}
};
So that x.toString() would return "Skye Eyks (123)".
NetBeans says it expects a semi-colon and that it's an Illegal start of expression.
Why I want to use an anonymous array class is to display the data in a combo-box and get all the other data in my array once the user submits the form.
No, this is impossible.
Your code paste strongly suggests an alternate solution:
Java is a statically and nominally typed object oriented language. Java is very very bad at dealing with data stored in a heterogenerous, untyped, and unnamed 'grabbag of unknown mystery' - which is what Object[] is.
This is presumably what you're looking for:
public class Word {
final String word;
final int score;
public String getWord() {
return word;
}
public String getReverse() {
// ...
}
public int getScore() {
return score;
}
#Override public String toString() {
return word + " " + getReverse() + " (" + score + ")";
}
}
Now, it's got structure (the compiler now knows, and your editor can now help you out): a Word has properties like getWord() and getReverse(); getReverse() is far more informative than [1]. You now have a place to add documentation if you want (how do you intend to 'document' new Object[]?), and you have room for flexibility (for example, getReverse() could be calculated on the fly instead of passed in at construction).
You can now write methods that take a Word. This:
public void printWord(Word word) {}
is almost self evident. Compare to this:
/**
* This function requires that you pass a 3-sized object array, with:
* The first argument must be a string - this is the word to print.
* The second argument must be that string, reversed. It will be printed in light grey for a mirror effect.
* The third argument must be a boxed integer, this will be printed in the lower right corner as page number
*/
public void printWord(Object[] o) {}
That's a ton of documentation, and this is considerably worse: This documentation is unstructured - whereas with an actual class, the names of methods can carry most of this meaning and lets you document each fragment independently. You can also farm out any checks and other code to the right place, instead of ending up in a scenario where the code to check if the input array is proper needs to be called in many places, and you need to go out of your way to document, for everything, what happens if you pass invalid input (vs. having to do that only once, in Word's constructors).
If you end up with an Object[] due to external forces, such as, say, the arguments passed along to your main function, then the general aim is to convert that to a proper object once, and as soon as possible, so that your java code remains as uninfected by this heterogenous, untyped and unnamed mysterymeat as possible.
NB: Yes, that means you need to make a ton of classes, for everything you can think of, so you end up with clean code. Lombok's #Value can help with this, as can java15's records.

How to use method result properly in Java

I will use result of a method call in some calculation. I have two ways:
Invoke method once and store the return into a local value, then use the local value in some calculation.
Use call method many times.
Please see my sample code:
public class TestMethod {
public void doSomething_way1() {
String prop1 = this.getProp1();
if (prop1 != null) {
String value = prop1 + " - another value";
System.out.println(value);
}
}
public void doSomething_way2() {
if (this.getProp1() != null) {
String value = this.getProp1() + " - another value";
System.out.println(value);
}
}
public String getProp1() {
return "return the same value";
}
}
NOTE that, the method doSomething will be invoked a lots at a time ( In web environment )
Can someone show me which way I should use in the case the result of method will be used at least 3 times?
I believe using the method call many times is more intuitive and makes the code more readable.
In your case it wont matter even if you give call to the getProp1() method multiple times. Because it does not perform any computation, object creation or I/O operation which may cause performance issues.
You could go a step further:
public void doSomething_way2() {
if (this.getProp1() != null) {
System.out.println(this.getProp1() + " - another value");
}
}
If the method is getting called a lot (I mean many, many times a second), creating the extra variable could change performance a tiny bit with respect to garbage collection and what not... I think its trivial.
In some cases, getting the value more than once could raise thread-safety issues, if the value weren't final, whereas if you fetch the value once, at least the entire operation of way1 will be consistent with a single value for prop1.
But even if threading weren't an issue, I still think it's better, stylistically, to 'cache' the value in a local variable which is well named.
(I'm assuming that your real code does something more significant than return the fixed String "something") - the getProp1 method as written is pretty thread-safe. :)
From a performance standpoint, at least from the examples given, it does not appear to be any fundamental difference doing it one way or another. Object allocations for small numbers of iterations (unless they are heavyweight objects) should be minimal.
However, from a programming design and implementation standpoint, it may be helpful to keep the program 'cohesive', i.e. have classes more closely represent things.
In which case the local variable from the return of the method (as it is a different 'thing') and subsequent calculation.
e.g.:
interface Dog{
void wagTail();
Dung eat(DogFood f);
}
interface Human{
void pickUpDung(Dung d);
}
codeLoop(Human m, Dog d, DogFood f){
d.wagTail();
Dung poo = d.eat(f);
m.pickUpDung(poo);
}
whereas a less cohesive example would be
interface Dog{
void wagTail();
void eatAndHaveHumanPickUp(DogFood f);
}
// you can fill in the rest...
it wouldn't follow the principle of cohesion, because you wouldn't normally expect a dog call to have this kind of method...

Java Best Practices: Performance with method parameters

Which is faster and/or less resources consuming:
class Foo()
{
public int value;
}
This way?
public int doSomeStuff(Foo f)
{
return (f.value + 1);
}
public int doOtherStuff()
{
...
Foo f = new Foo();
int x = doSomeStuff(f);
...
)
or this way?
public int doSomeStuff(int v)
{
return (v + 1);
}
public int doOtherStuff()
{
...
Foo f = new Foo();
int x = doSomeStuff(f.value);
...
)
In both cases, "doSomeStuff" will not change nothing in foo class. It just needs to know the "value".
They both perform the same, the same sequence of operations occurs. Your main concern is maintainability and sensible design here. Think carefully about which methods need which data and design it properly.
If you do have issues, you can optimise later. But you should always optimise last.
In terms of resource consuming, it is exactly the same.
But the second option is clearly better in terms of programming because if doSomeStuff only needs value, then there is no point to passing f.
I don't think there is any performance difference at all. And Java compiler will optimize to the best one anyway...
Depends how often you're going to call doSomeStuff without calling doOtherStuff, but generally performance difference is negligible and if you only call doOtherStuff then they'll be equally performant.
Probably even better:
Decalre doSomeStuff() as a method of foo, and invoke: f.doSomeStuff()
It is much more readable and will be easier to maintain doing it so, since if you have a
sub class of foo: Bar, and you want to calculate things a bit different - all you have to do is override doSomeStuff() in Bar
You should prefer readability over micro optimizations - let the compiler take care of those for you.
code snap:
class foo() {
public int value;
public int doSomeStuff() {
return value + 1;
}
}
and:
public int doOtherStuff() {
...
foo f = new foo();
int x = f.doSomeStuff();
...
}
The difference between doing:
object.intvariable + 1
and
int + 1
is so negligible as to be irrelevant for real world apps. It's probably one or two more JVM opcodes to look up foo and find its value variable which is not worth mentioning. You'd never notice that unless you were trying to create a pseudo real-time app in Java (which is all but an exercise in futility).
However, that said, the way you are doing it is very bad. You should not be exposing value directly, but be using proper data encapsulation via getter and setter methods.
It does not matter from performance perspective.
The recommendation is: do not think about pre-mature optimization. Think about correctness and good design of your code.
For example your code
Does not follow naming conventions: class names must start with capital letter
Contains public fields. It is forbidden. Use bean notation (getters and setters).
Cannot be compiled (there is no type integer. Choose among int and Integer

String.intern() vs manual string-to-identifier mapping?

I recall seeing a couple of string-intensive programs that do a lot of string comparison but relatively few string manipulation, and that have used a separate table to map strings to identifiers for efficient equality and lower memory footprint, e.g.:
public class Name {
public static Map<String, Name> names = new SomeMap<String, Name>();
public static Name from(String s) {
Name n = names.get(s);
if (n == null) {
n = new Name(s);
names.put(s, n);
}
return n;
}
private final String str;
private Name(String str) { this.str = str; }
#Override public String toString() { return str; }
// equals() and hashCode() are not overridden!
}
I'm pretty sure one of these programs was javac from OpenJDK, so not some toy application. Of course the actual class was more complex (and also I think it implemented CharSequence), but you get the idea - the entire program was littered with Name in any location you would expect String, and on the rare cases where string manipulation was needed, it converted to strings and then cached them again, conceptually like:
Name newName = Name.from(name.toString().substring(5));
I think I understand the point of this - especially when there are a lot of identical strings all around and a lot of comparisons - but couldn't the same be achieved by just using regular strings and interning them? The documentation for String.intern() explicitly says:
...
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.
...
So, what are the advantages and disadvantages of manually managing a Name-like class vs using intern()?
What I've thought about so far was:
Manually managing the map means using regular heap, intern() uses the permgen.
When manually managing the map you enjoy type-checking that can verify something is a Name, while an interned string and a non-interned string share the same type so it's possible to forget interning in some places.
Relying on intern() means reusing an existing, optimized, tried-and-tested mechanism without coding any extra classes.
Manually managing the map results in a code more confusing to new users, and strign operations become more cumbersome.
... but I feel like I'm missing something else here.
Unfortunately, String.intern() can be slower than a simple synchronized HashMap. It doesn't need to be so slow, but as of today in Oracle's JDK, it is slow (probably due to JNI)
Another thing to consider: you are writing a parser; you collected some chars in a char[], and you need to make a String out of them. Since the string is probably common and can be shared, we'd like to use a pool.
String.intern() uses such a pool; yet to look up, you'll need a String to begin with. So we need to new String(char[],offset,length) first.
We can avoid that overhead in a custom pool, where lookup can be done directly based on a char[],offset,length. For example, the pool is a trie. The string most likely is in the pool, so we'll get the String without any memory allocation.
If we don't want to write our own pool, but use the good old HashMap, we'll still need to create a key object that wraps char[],offset,length (something like CharSequence). This is still cheaper than a new String, since we don't copy chars.
I would always go with the Map because intern() has to do a (probably linear) search inside the internal String's pool of strings. If you do that quite often it is not as efficient as Map - Map is made for fast search.
what are the advantages and disadvantages of manually managing a Name-like class vs using intern()
Type checking is a major concern, but invariant preservation is also a significant concern.
Adding a simple check to the Name constructor
Name(String s) {
if (!isValidName(s)) { throw new IllegalArgumentException(s); }
...
}
can ensure* that there exist no Name instances corresponding to invalid names like "12#blue,," which means that methods that take Names as arguments and that consume Names returned by other methods don't need to worry about where invalid Names might creep in.
To generalize this argument, imagine your code is a castle with walls designed to protect it from invalid inputs. You want some inputs to get through so you install gates with guards that check inputs as they come through. The Name constructor is an example of a guard.
The difference between String and Name is that Strings can't be guarded against. Any piece of code, malicious or naive, inside or outside the perimeter, can create any string value. Buggy String manipulation code is analogous to a zombie outbreak inside the castle. The guards can't protect the invariants because the zombies don't need to get past them. The zombies just spread and corrupt data as they go.
That a value "is a" String satisfies fewer useful invariants than that a value "is a" Name.
See stringly typed for another way to look at the same topic.
* - usual caveat re deserializing of Serializable allowing bypass of constructor.
String.intern() in Java 5.0 & 6 uses the perm gen space which usually has a low maximum size. It can mean you run out of space even though there is plenty of free heap.
Java 7 uses its the regular heap to store intern()ed Strings.
String comparison it pretty fast and I don't imagine there is much advantage in cutting comparison times when you consider the overhead.
Another reason this might be done is if there are many duplicate strings. If there is enough duplication, this can save a lot of memory.
A simpler way to cache Strings is to use a LRU cache like LinkedHashMap
private static final int MAX_SIZE = 10000;
private static final Map<String, String> STRING_CACHE = new LinkedHashMap<String, String>(MAX_SIZE*10/7, 0.70f, true) {
#Override
protected boolean removeEldestEntry(Map.Entry<String, String> eldest) {
return size() > 10000;
}
};
public static String intern(String s) {
// s2 is a String equals to s, or null if its not there.
String s2 = STRING_CACHE.get(s);
if (s2 == null) {
// put the string in the map if its not there already.
s2 = s;
STRING_CACHE.put(s2,s2);
}
return s2;
}
Here is an example of how it works.
public static void main(String... args) {
String lo = "lo";
for (int i = 0; i < 10; i++) {
String a = "hel" + lo + " " + (i & 1);
String b = intern(a);
System.out.println("String \"" + a + "\" has an id of "
+ Integer.toHexString(System.identityHashCode(a))
+ " after interning is has an id of "
+ Integer.toHexString(System.identityHashCode(b))
);
}
System.out.println("The cache contains "+STRING_CACHE);
}
prints
String "hello 0" has an id of 237360be after interning is has an id of 237360be
String "hello 1" has an id of 5736ab79 after interning is has an id of 5736ab79
String "hello 0" has an id of 38b72ce1 after interning is has an id of 237360be
String "hello 1" has an id of 64a06824 after interning is has an id of 5736ab79
String "hello 0" has an id of 115d533d after interning is has an id of 237360be
String "hello 1" has an id of 603d2b3 after interning is has an id of 5736ab79
String "hello 0" has an id of 64fde8da after interning is has an id of 237360be
String "hello 1" has an id of 59c27402 after interning is has an id of 5736ab79
String "hello 0" has an id of 6d4e5d57 after interning is has an id of 237360be
String "hello 1" has an id of 2a36bb87 after interning is has an id of 5736ab79
The cache contains {hello 0=hello 0, hello 1=hello 1}
This ensure the cache of intern()ed Strings will be limited in number.
A faster but less effective way is to use a fixed array.
private static final int MAX_SIZE = 10191;
private static final String[] STRING_CACHE = new String[MAX_SIZE];
public static String intern(String s) {
int hash = (s.hashCode() & 0x7FFFFFFF) % MAX_SIZE;
String s2 = STRING_CACHE[hash];
if (!s.equals(s2))
STRING_CACHE[hash] = s2 = s;
return s2;
}
The test above works the same except you need
System.out.println("The cache contains "+ new HashSet<String>(Arrays.asList(STRING_CACHE)));
to print out the contents which shows the following include on null for the empty entries.
The cache contains [null, hello 1, hello 0]
The advantage of this approach is speed and that it can be safely used by multiple thread without locking. i.e. it doesn't matter if different threads have different view of STRING_CACHE.
So, what are the advantages and disadvantages of manually managing a
Name-like class vs using intern()?
One advantage is:
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
In a program where many many small strings must be compared often, this may pay off.
Also, it saves space in the end. Consider a source program that uses names like AbstractSyntaxTreeNodeItemFactorySerializer quite often. With intern(), this string will be stored once and that is it. Everything else if just references to that, but the references you have anyway.

Categories

Resources