Please have a look at the following code
private StringBuffer populateStringWithUnmatchingWords(ArrayList<String>unmatchingWordsHolder)
{
StringBuffer unMatchingWordsStr = new StringBuffer("");
for(int u=0;u<unmatchingWordsHolder.size();u++)
{
Iterator iterInWordMap = wordMap.entrySet().iterator();
while(iterInWordMap.hasNext())
{
Map.Entry mEntry = (Map.Entry)iterInWordMap.next();
if(mEntry.getValue().equals(unmatchingWordsHolder.get(u)))
{
//out.println(matchingWords.get(m)+" : "+true);
unMatchingWordsStr.append(mEntry.getKey());
unMatchingWordsStr.append(",");
}
}
}
return unMatchingWordsStr;
}
This for loop takes 8387ms to complete. The unmatchingWordsHolder is pretty big too. wordMap is a HashMap and contains somewhat around 5000 elements as well.
This loop will search whether elements in unmatchingWordsHolder are available in wordMap. If they are available, then they will be loaded into unMatchingWordsStr.
Is there any way for me to speed up this task?
Does using Collection.contains() help at all? That would be much more readable, if nothing else, to my mind. It depends on the relative sizes of the List and the Map though, as the easiest way to do it would be something like this, although since you're iterating over the Map and doing the lookup on the List, if the Map is far larger than the List this isn't going to be ideal:
private StringBuffer populateStringWithUnmatchingWords(ArrayList<String>unmatchingWordsHolder) {
StringBuffer unMatchingWordsStr = new StringBuffer();
for (Entry<String, String> entry : wordMap.entrySet()) {
if(unmatchingWordsHolder.contains(entry.getValue())) {
//out.println(matchingWords.get(m)+" : "+true);
unMatchingWordsStr.append(entry.getKey());
unMatchingWordsStr.append(",");
}
}
return unMatchingWordsStr;
}
As noted elsewhere, if you don't need thread safety, StringBuilder is generally preferred to StringBuffer, but I didn't want to mess with your method signatures.
You are iterating through every element in the Map. A better way to do this is to use a HashMap and use contains() to determine if it exists in the HashMap.
Not sure if I got your problem statement correctly, but if you want to return a comma separated string of all the words that are found in another set of words then here's how you would do in Java 8:
private String populateContainedWords(List<String> words, Set<String> wordSet)
{
StringJoiner joiner = new StringJoiner(", ");
words.stream().filter(wordSet::contains).forEach(joiner::add);
return joiner.toString();
}
And if you only want to have distinct words in this comma separated string, then use the following approach:
private String populateDistinctlyContainedWords(List<String> words, Set<String> wordSet)
{
StringJoiner joiner = new StringJoiner(", ");
words.stream().distinct().filter(wordSet::contains).forEach(joiner::add);
return joiner.toString();
}
And if you want a comma separated string of words from the words list that are NOT contained in the wordSet then here's how that's done:
private String populateDisjointWords(List<String> words, Set<String> wordSet)
{
StringJoiner joiner = new StringJoiner(", ");
words.stream().filter(n -> !wordSet.contains(n)).forEach(joiner::add);
return joiner.toString();
}
Related
I have a condition like :
public String createId(List<String> list)
{
String id="";
if(list.contains("name"))
id+="TEST VALUE NAME";
if(list.contains("age"))
id+="Test Value AGE";
.
.
. likewise many if condition
return id;
}
As per my understanding we should use StringBuilder in loop condition and String in simple concatenation. So here wanted to ask I should use String or StringBuilder? Kindly suggest
StringBuilder is the best for this scenario because it's mutable. the String is immutable so when you modify the string it creates a new object.
It seems that for the given task it would be better to get rid of the multiple duplicated if statements by defining a list of the keys to match the input list and use Stream API to generate the string id, e.g. Collectors.joining with delimiter or without the delimiter.
Assuming that there is a single rule to create a part of the id: append "Test Value " + key.toUpperCase(), the implementation may look as follows:
final List<String> keys = Arrays.asList(
"name", "age" /* and other needed keys*/
);
public String createId(List<String> list) {
return keys
.stream()
.filter(list::contains)
.map(String::toUpperCase)
.map(str -> "Test Value " + str)
.collect(Collectors.joining("_")); // or Collectors.joining()
}
System.out.println(createId(Arrays.asList("age", "name", "surname")));
// output: Test Value NAME_Test Value AGE
If custom parts should be provided for name, age, etc., a Map of matches should be prepared and used, also it may make sense to convert the input list into Set<String to facilitate look-ups:
final Map<String, String> keys = new LinkedHashMap<>(); {
// fill the map in special order
keys.put("name", "Name Part");
keys.put("age", "Test Age");
/* and other needed keys*/
}
public String createId(List<String> list) {
Set<String> words = new HashSet<>(list);
return keys.keySet()
.stream()
.filter(words::contains) // faster lookup O(1) at the cost of another collection
.map(keys::get)
.collect(Collectors.joining("_")); // or Collectors.joining()
}
System.out.println(createId(Arrays.asList("age", "surname", "name")));
// output: Name Part_Test Age
In general your understanding is correct about when to use String concatenation vs StringBuilder. The Java Language Specification says
To increase the performance of repeated string concatenation, a Java
compiler may use the StringBuffer class or a similar technique to
reduce the number of intermediate String objects that are created by
evaluation of an expression.
For the larger majority of cases you should use whichever method results in better readability and maintainability.
I want to print the content of this attribute:
private HashMap<RegionVersObj, ArrayList<String[]>> region;
I have done this with the below code.
Inside the toString() method I am iterating with the Map Entry.
I have created a StringBuffer object and appending the content in it.
public String toString() {
StringBuffer regionToPrint = new StringBuffer();
for (Map.Entry<RegionVersObj, ArrayList<String[]>> entry : region.entrySet())
{
regionToPrint.append(entry.getKey().toString());
regionToPrint.append("=[");
for(String[] s:entry.getValue()){
regionToPrint.append("[");
for(String s1:s){
regionToPrint.append(s1);
regionToPrint.append(",");
}
regionToPrint.append("],");
}
}
return "region=" + regionToPrint.toString();
}
This is the way I am trying to solving this. But I want to know is there any better way in which I can solve this?
Kindly use the JAVA 1.8. Since it has the more in build functions. Example we can make the String[] array into comma separated values using String.join()
region.forEach((k,v)->{
regionToPrint.append(k.toString());
regionToPrint.append("=[");
for(String[] s:v){
regionToPrint.append("[");
regionToPrint.append(String.join(",", s));
regionToPrint.append("],");
}
regionToPrint.append("],");
});
And how to add 2 enchantments at once in bukkit with
myItem.addEnchantments( Enchantment.KNOCKBACK, 1 /* TODO Here goes 2nd enchantment */ );
'addEnchantments' accepts 'addEnchantments(Map < Enchantment, int >)'
You rather use addEnchantment twice (or more):
myItem.addEnchantment(Enchantment.KNOCKBACK, 1);
myItem.addEnchantment(Enchantment.THRONS, 2);
If you insist on using addEnchantments you'll need to create a map, populate it and pass it:
Map<Enhancement, Integer> map = new HashMap<Enhancement, Integer>();
map.put(Enchantment.KNOCKBACK, 1);
map.put(Enchantment.THRONS, 2);
myItem.addEnchantments(map);
In your case, I would go with option 1
#John Smith's second question: (How to convert string to hashmap)
You can convert hashmap to string but java (as I know) doesn't have string to hashmap. You can make a function that does this (might be impossible) or make a couple functions that convert hashmap to string and string to hashmap. In this case you want a hashmap with Enchantment and an Integer, so you would simply do something like this:
public /*static*/ String hashMapToString(HashMap<Enchantment, Integer> hashMap) {
StringBuilder serializedString = new StringBuilder();
for (Enchantment enchant : hashMap.keySet()) {
serializedString.append(enchant.getName() + "<>" + hashMap.get(enchant) + ";");
}
return serializedString.toString();
}
then you would create a function to convert that back to a hashmap:
public /*static*/ HashMap<Enchantment, Integer> stringToHashMap(String hashMapString) {
HashMap<Enchantment, Integer> hashMap = new HashMap<>();
for (String split : hashMapString.split(";")) {
String[] splited = split.split("<>");
hashMap.put(Enchantment.getByName(splited[0]), Integer.valueOf(splited[1]))
}
return hashMap;
}
You can even make them static (remove the comment marks and if you don't want it at all just remove what is inside the comment marks with the comment marks)
I have a basic method which reads in ~1000 files with ~10,000 lines each from the hard drive. Also, I have an array of String called userDescription which has all the "description words" of the user. I have created a HashMap whose data structure is HashMap<String, HashMap<String, Integer>> which corresponds to HashMap<eachUserDescriptionWords, HashMap<TweetWord, Tweet_Word_Frequency>>.
The file is organized as:
<User=A>\t<Tweet="tweet...">\n
<User=A>\t<Tweet="tweet2...">\n
<User=B>\t<Tweet="tweet3...">\n
....
My method to do this is:
for (File file : tweetList) {
if (file.getName().endsWith(".txt")) {
System.out.println(file.getName());
BufferedReader in;
try {
in = new BufferedReader(new FileReader(file));
String str;
while ((str = in.readLine()) != null) {
// String split[] = str.split("\t");
String split[] = ptnTab.split(str);
String user = ptnEquals.split(split[1])[1];
String tweet = ptnEquals.split(split[2])[1];
// String user = split[1].split("=")[1];
// String tweet = split[2].split("=")[1];
if (tweet.length() == 0)
continue;
if (!prevUser.equals(user)) {
description = userDescription.get(user);
if (description == null)
continue;
if (prevUser.length() > 0 && wordsCount.size() > 0) {
for (String profileWord : description) {
if (wordsCorr.containsKey(profileWord)) {
HashMap<String, Integer> temp = wordsCorr
.get(profileWord);
wordsCorr.put(profileWord,
addValues(wordsCount, temp));
} else {
wordsCorr.put(profileWord, wordsCount);
}
}
}
// wordsCount = new HashMap<String, Integer>();
wordsCount.clear();
}
setTweetWordCount(wordsCount, tweet);
prevUser = user;
}
} catch (IOException e) {
System.err.println("Something went wrong: "
+ e.getMessage());
}
}
}
Here, the method setTweetWord counts the word frequency of all the tweets of a single user. The method is:
private void setTweetWordCount(HashMap<String, Integer> wordsCount,
String tweet) {
ArrayList<String> currTweet = new ArrayList<String>(
Arrays.asList(removeUnwantedStrings(tweet)));
if (currTweet.size() == 0)
return;
for (String word : currTweet) {
try {
if (word.equals("") || word.equals(null))
continue;
} catch (NullPointerException e) {
continue;
}
Integer countWord = wordsCount.get(word);
wordsCount.put(word, (countWord == null) ? 1 : countWord + 1);
}
}
The method addValues checks to see if wordCount has words that is already in the giant HashMap wordsCorr. If it does, it increases the count of the word in the original HashMap wordsCorr.
Now, my problem is no matter what I do the program is very very slow. I ran this version in my server which has fairly good hardware but its been 28 hours and the number of files scanned is just ~450. I tried to see if I was doing anything repeatedly which might be unnecessary and I corrected few of them. But still the program is very slow.
Also, I have increased the heap size to 1500m which is the maximum that I can go.
Is there anything I might be doing wrong?
Thank you for your help!
EDIT: Profiling Results
first of all I really want to thank you guys for the comments. I have changed some of the stuffs in my program. I now have precompiled regex instead of direct String.split() and other optimization. However, after profiling, my addValues method is taking the highest time. So, here's my code for addValues. Is there something that I should be optimizing here? Oh, and I've also changed my startProcess method a bit.
private HashMap<String, Integer> addValues(
HashMap<String, Integer> wordsCount, HashMap<String, Integer> temp) {
HashMap<String, Integer> merged = new HashMap<String, Integer>();
for (String x : wordsCount.keySet()) {
Integer y = temp.get(x);
if (y == null) {
merged.put(x, wordsCount.get(x));
} else {
merged.put(x, wordsCount.get(x) + y);
}
}
for (String x : temp.keySet()) {
if (merged.get(x) == null) {
merged.put(x, temp.get(x));
}
}
return merged;
}
EDIT2: Even after trying so hard with it, the program didn't run as expected. I did all the optimization of the "slow method" addValues but it didn't work. So I went to different path of creating word dictionary and assigning index to each word first and then do the processing. Lets see where it goes. Thank you for your help!
Two things come to mind:
You are using String.split(), which uses a regular expression to do the splitting. That's completely oversized. Use one of the many splitXYZ() methods from Apache StringUtils instead.
You are probably creating really huge hash maps. When having very large hash maps, the hash collisions will make the hashmap functions much slower. This can be improved by using more widely spread hash values. See an example over here: Java HashMap performance optimization / alternative
One suggestion (I don't know how much of an improvement you'll get from it) is based on the observation that curTweet is never modified. There is no need for creating a copy. I.e.
ArrayList<String> currTweet = new ArrayList<String>(
Arrays.asList(removeUnwantedStrings(tweet)));
can be replaced with
List<String> currTweet = Arrays.asList(removeUnwantedStrings(tweet));
or you can use the array directly (which will be marginally faster). I.e.
String[] currTweet = removeUnwantedStrings(tweet);
Also,
word.equals(null)
is always false by the definition of the contract of equals. The right way to null-check is:
if (null == word || word.equals(""))
Additionally, you won't need that null-pointer-exception try-catch if you do this. Exception handling is expensive when it happens, so if your word array tends to return lots of nulls, this could be slowing down your code.
More generally though, this is one of those cases where you should profile the code and figure out where the actual bottleneck is (if there is a bottleneck) instead of looking for things to optimize ad-hoc.
You would gain from a few more optimizations:
String.split recompiles the input regex (in string form) to a pattern every time. You should have a single static final Pattern ptnTab = Pattern.compile( "\\t" ), ptnEquals = Pattern.compile( "=" ); and call, e.g., ptnTab.split( str ). The resulting performance should be close to StringTokenizer.
word.equals( "" ) || word.equals( null ). Lots of wasted cycles here. If you are actually seeing null words, then you are catching NPEs, which is very expensive. See the response from #trutheality above.
You should allocate the HashMap with a very large initial capacity to avoid all the resizing that is bound to happen.
split() uses regular expressions, which are not "fast". try using a StringTokenizer or something instead.
Have you thought about using db instead of Java. Using db tools you can load the data using dataload tools that comes with DB in tables and from there you can do set processing. One challenge that I see is loading data in table as fields are not delimited with common seprator like "'" or ":"
You could rewrite addValues like this to make it faster - a few notes:
I have not tested the code but I think it is equivalent to yours.
I have not tested that it is quicker (but would be surprised if it wasn't)
I have assumed that wordsCount is larger than temp, if not exchange them in the code
I have also replaced all the HashMaps by Maps which does not make any difference for you but makes the code easier to change later on
private Map<String, Integer> addValues(Map<String, Integer> wordsCount, Map<String, Integer> temp) {
Map<String, Integer> merged = new HashMap<String, Integer>(wordsCount); //puts everyting in wordCounts
for (Map.Entry<String, Integer> e : temp.entrySet()) {
Integer countInWords = merged.get(e.getKey()); //the number in wordsCount
Integer countInTemp = e.getValue();
int newCount = countInTemp + (countInWords == null ? 0 : countInWords); //the sum
merged.put(e.getKey(), newCount);
}
return merged;
}
I have a few Set<String>s and want to transform each of these into a single String where each element of the original Set is separated by a whitespace " ".
A naive first approach is doing it like this
Set<String> set_1;
Set<String> set_2;
StringBuilder builder = new StringBuilder();
for (String str : set_1) {
builder.append(str).append(" ");
}
this.string_1 = builder.toString();
builder = new StringBuilder();
for (String str : set_2) {
builder.append(str).append(" ");
}
this.string_2 = builder.toString();
Can anyone think of a faster, prettier or more efficient way to do this?
With commons/lang you can do this using StringUtils.join:
String str_1 = StringUtils.join(set_1, " ");
You can't really beat that for brevity.
Update:
Re-reading this answer, I would prefer the other answer regarding Guava's Joiner now. In fact, these days I don't go near apache commons.
Another Update:
Java 8 introduced the method String.join()
String joined = String.join(",", set);
While this isn't as flexible as the Guava version, it's handy when you don't have the Guava library on your classpath.
If you are using Java 8, you can use the native
String.join(CharSequence delimiter, Iterable<? extends CharSequence> elements)
method:
Returns a new String composed of copies of the CharSequence elements joined together with a copy of the specified delimiter.
For example:
Set<String> strings = new LinkedHashSet<>();
strings.add("Java"); strings.add("is");
strings.add("very"); strings.add("cool");
String message = String.join("-", strings);
//message returned is: "Java-is-very-cool"
Set implements Iterable, so simply use:
String.join(" ", set_1);
As a counterpoint to Seanizer's commons-lang answer, if you're using Google's Guava Libraries (which I'd consider the 'successor' to commons-lang, in many ways), you'd use Joiner:
Joiner.on(" ").join(set_1);
with the advantage of a few helper methods to do things like:
Joiner.on(" ").skipNulls().join(set_1);
// If 2nd item was null, would produce "1, 3"
or
Joiner.on(" ").useForNull("<unknown>").join(set_1);
// If 2nd item was null, would produce "1, <unknown>, 3"
It also has support for appending direct to StringBuilders and Writers, and other such niceties.
Maybe a shorter solution:
public String test78 (Set<String> set) {
return set
.stream()
.collect(Collectors.joining(" "));
}
or
public String test77 (Set<String> set) {
return set
.stream()
.reduce("", (a,b)->(a + " " + b));
}
but native, definitely faster
public String test76 (Set<String> set) {
return String.join(" ", set);
}
I don't have the StringUtil library available (I have no choice over that) so using standard Java I came up with this ..
If you're confident that your set data won't include any commas or square brackets, you could use:
mySet.toString().replaceAll("\\[|\\]","").replaceAll(","," ");
A set of "a", "b", "c" converts via .toString() to string "[a,b,c]".
Then replace the extra punctuation as necesary.
Filth.
I use this method:
public static String join(Set<String> set, String sep) {
String result = null;
if(set != null) {
StringBuilder sb = new StringBuilder();
Iterator<String> it = set.iterator();
if(it.hasNext()) {
sb.append(it.next());
}
while(it.hasNext()) {
sb.append(sep).append(it.next());
}
result = sb.toString();
}
return result;
}
I'm confused about the code replication, why not factor it into a function that takes one set and returns one string?
Other than that, I'm not sure that there is much that you can do, except maybe giving the stringbuilder a hint about the expected capacity (if you can calculate it based on set size and reasonable expectation of string length).
There are library functions for this as well, but I doubt they're significantly more efficient.
This can be done by creating a stream out of the set and then combine the elements using a reduce operation as shown below (for more details about Java 8 streams check here):
Optional<String> joinedString = set1.stream().reduce(new
BinaryOperator<String>() {
#Override
public String apply(String t, String u) {
return t + " " + u;
}
});
return joinedString.orElse("");