I have few lines of code, which I'm trying to convert to functional paradigm. The code is:
private String test(List<String> strings, String input) {
for (String s : strings) {
input = input.replace(s, ", ");
}
return input;
}
I need to make this one instruction chain. It must replace all strings from given list with coma IN given String input. I tried to do it with map method, but with no success. I'm aware I can do it if I appended input string into list at beginning and call map then, but the list is immutable, so I cannot do that.
I believe you can do this with a simple reduce:
strings.stream().reduce(input, (in, s) -> in.replace(s, ", "));
It takes the input, and replaces each occurence of the first string with ", ". Then it takes that result, and uses it as the input along with the next string, and repeats for every item in the list.
As Louis Wasserman points out, this approach cannot be used with parallelStream, so it won't work if you want parallelization.
The only think I can think of -- which is pretty awkward -- is
strings.stream()
.map(s -> (Function<String, String>) (x -> x.replace(s, ", ")))
.reduce(Function.identity(), Function::andThen)
.apply(input)
The following does pretty much the same thing.
private String test(List<String> strings, String input) {
return input.replaceAll(
strings
.stream()
.map(Pattern::quote)
.collect(Collectors.joining("|"))
, ", "
);
}
The main difference is that it first combines all the search strings into a single regex, and applies them all at once. Depending on size of your input strings, this may perform even better than your original use case.
If the list of strings is fixed, or changes only rarely, you can get some more speed from precompiling the joined pattern.
Related
I have a somewhat unusual problem.I am currently trying to program a chat filter for discord in Java 16.
Here I ran into the problem that in German there are several ways to write a word to get around this filter.
As an example I now take the insult "Hurensohn".
Now you could simply write "Huränsohn" or "Hur3nsohn" in the chat and thus bypass the filter quite easily.
Since I don't want to manually pack every possibility into the filter, I thought about how I could do it automatically.So the first thing I did was to create a hashmap with all possible alternativeven letters, which looked something like this:
Map<String, List<String>> alternativeCharacters = new HashMap<>();
alternativeCharacters.put( "E", List.of( "ä", "3" ) );
I tried to change the corresponding letters in the words and add them to the chat filter, which actually worked.
But now we come to the problem:
To be able to cover all possible combinations, it doesn't do me much good to change only one type of letter in a word.
If we now take the word "Einschalter" and change the letter "e" here, we could also simply change the "e" here with a "3" or with an "ä", whereby then the following would come out:
3einschal3r
Einschalt3r
3inschalter
and
Äinschalär
Einschaltär
Äinschalter
But now I also want "mixed" words to be created. e.g. "3inschalär", where both the "Ä" and the "3" are used to create a word. Where then the following combinations would come out:
3inschalär
Äinschalt3r
Does anyone know how I can relaize something like that? With the normal replace() method I haven't found a way yet to create "mixed" replaces.
I hope people understand what kind of problem I have and what I want to do. :D
Current method used for replacing:
public static List<String> replace( String word, String from, String... to ) {
final int[] index = { 0 };
List<String> strings = new ArrayList<>();
/* Replaces all letters */
List.of( to ).forEach( value -> strings.add( word.replaceAll( from, value ) ) );
/* Here is the problem. Here only one letter is edited at a time and thus changed in the word */
List.of( to ).forEach( value -> {
List.of( word.split( "" ) ).forEach( letters -> {
if ( letters.equalsIgnoreCase( from ) ) {
strings.add( word.substring( 0, index[0] ) + value + "" + word.substring( index[0] + 1 ) );
}
index[0]++;
} );
index[0] = 0;
} );
return strings;
}
As said by others, you can’t keep up with the creativity of people. But if you want to continue using such a check, you should use the right tool for the job, i.e. a RuleBasedCollator.
RuleBasedCollator c = new RuleBasedCollator("<i,I=1=!<e=ä,E=3=Ä<o=0,O");
c.setStrength(Collator.PRIMARY);
String a = "3inschaltär", b = "Einschalter";
if(c.compare(a, b) == 0) {
System.out.println(a + " matches " + b);
}
3inschaltär matches Einschalter
This class even allows efficient hash lookups
// using c from above
// prepare map
var map = new HashMap<CollationKey, String>();
for(String s: List.of("Einschalter", "Hicks-Boson")) {
map.put(c.getCollationKey(s), s);
}
// use map for lookup
for(String s: List.of("Ä!nschalt3r", "H1cks-B0sOn")) {
System.out.println(s);
String match = map.get(c.getCollationKey(s));
if(match != null) System.out.println("\ta variant of " + match);
}
Ä!nschalt3r
a variant of Einschalter
H1cks-B0sOn
a variant of Hicks-Boson
While a Collator can be used for sorting, you’re only interested in identifying equals strings. Therefore, I didn’t care to specify a useful order, which simplifies the rules, as we only need to specify the characters supposed to be equal.
The linked documentation explains the syntax; in short, I=1=! defines the character I, 1, and ! as equal, whereas prepending i, defines i to be a different case of the other characters. Likewise, e=ä,E=3=Ä defines e equal to ä and both being different case than the characters E, 3, Ä. Eventually, the < separator defines characters to be different. It’s also defining a sorting order which, as said, we don’t care about in this usage.
As an addendum, the following can be used to remove accents and other marking from characters, except for umlauts, as you want to match German words. This would remove the requirement to deal with an exploding number of obfuscated character combinations, especially from people who know about Zalgo text converters:
String s = "òñę ảëîöū";
String n = Normalizer.normalize(s, Normalizer.Form.NFD)
.replaceAll("(?!(?<=[aou])\u0308)\\p{Mn}", "");
System.out.println(s + " -> " + n);
òñę ảëîöū -> one aeiöu
Off the top of my head, you may try to approach this using regular expressions, compiling patterns by replacing the respective letters where multiple ways of writing may occur in your dictionary.
E.g. in the direction of:
record LetterReplacements(String letter, List<String> replacements){}
public Predicate<String> generatePredicateForDictionaryWord(String word){
var letterA = new LetterReplacements("a", List.of("a", "A", "4"));
var writingStyles = letterA.replacements.stream()
.collect(Collectors.joining("|", "(", ")"));
var pattern = word.replaceAll(letterA.letter, writingStyles);
return Pattern.compile(pattern).asPredicate();
}
Example usage:
#ParameterizedTest
#CsvSource({
"maus,true",
"m4us,true",
"mAus,true",
"mous,false"
})
void testDictionaryPredicates(String word, boolean expectedResult) {
var predicate = underTest.generatePredicateForDictionaryWord("maus");
assertThat(predicate.test(word)).isEqualTo(expectedResult);
}
However I doubt that any approach in this direction would yield sufficient results in terms of performance, especially since I expect your dictionary to grow rather fast and the number of different writing "styles" to be rather large.
So please regard the snippet above only as explanation for the approach I was talking about. Again, I doubt you would yield sufficient performance, even if precompiling all patterns and the predicate combinations beforehand.
I'm having a lot of trouble with trying to average the values of a map in java. My method takes in a text file and sees the average length of each word starting with a certain letter (case insensitive and goes through all words in the text file.
For example, let's say I have a text file that contains the following::
"Apple arrow are very common Because bees behave Cant you come home"
My method currently returns:
{A=5, a=8, B=7, b=10, c=10, C=5, v=4, h=4, y=3}
Because it is looking at the letters and finding the average length of the word, but it is still case sensitive.
It should return:
{A=5, a=8, B=7, b=10, c=10, C=5, v=4, h=4, y=3}
{a=4.3, b=5.5, c=5.0, v=4.0, h=4.0, y=3}
This is what I have so far.
public static Map<String, Integer> findAverageLength(String filename) {
Map<String, Integer> wordcount = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
try
{
Scanner in = new Scanner(new File(filename));
List<String> wordList = new ArrayList<>();
while (in.hasNext())
{
wordList.add(in.next());
}
wordcount = wordList.stream().collect(Collectors.toConcurrentMap(w->w.substring(0,1), w -> w.length(), Integer::sum));
System.out.println(wordcount);
}
catch (IOException e)
{
System.out.println("File: " + filename + " not found");
}
return wordcount;
}
You are almost there.
You could try the following.
We group by the first character of the word, converted to lowercase. This lets us collect into a Map<Character, …>, where the key is the first letter of each word. A typical map entry would then look like
a = [ Apple, arrow, are ]
Then, the average of each group of word lengths is calculated, using the averagingDouble method. A typical map entry would then look like
a = 4.33333333
Here is the code:
// groupingBy and averagingDouble are static imports from
// java.util.stream.Collectors
Map<Character, Double> map = Arrays.stream(str.split(" "))
.collect(groupingBy(word -> Character.toLowerCase(word.charAt(0)),
averagingDouble(String::length)));
Note that, for brevity, I left out additional things like null checks, empty strings and Locales.
Also note that this code was heavily improved responding to the comments of Olivier Grégoire and Holger below.
You can try with the following:
String str = "Apple arrow are very common Because bees behave Cant you come home";
Map<String, Double> map = Arrays.stream(str.split(" "))
.collect(Collectors.groupingBy(s -> String.valueOf(Character.toLowerCase(s.charAt(0))),
Collectors.averagingDouble(String::length)));
The split method will split the string into an array of strings using the delimiter " ". Then, you want to group by the average of the string length. Hence, the use the of Collectors.groupingBy method and the downstream parameter Collectors.averagingDouble(String::length). Finally, given the constraints that you have described we need to group by lower case (or up case) of the first char in the String (i.e., Character.toLowerCase(s.charAt(0)))).
and then print the map:
map.entrySet().forEach(System.out::println);
If you do not need to keep the map structure you can do it in one go:
Arrays.stream(str.split(" "))
.collect(Collectors.groupingBy(s -> String.valueOf(Character.toLowerCase(s.charAt(0))), Collectors.averagingDouble(String::length)))
.entrySet().forEach(System.out::println);
Just convert the first letter, which you obtain using substring, to the same case. Upper or lower, doesn't matter.
w.substring(0,1).toLowercase()
You've defined a case-insensitive map, but you haven't used it. Try Collectors.toMap(w->w.substring(0,1), w -> w.length(), Integer::sum, () -> new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER)), or just Collectors.toMap(w->w.toUpperCase().substring(0,1), w -> w.length(), Integer::sum)
I have an assignment where we're reading textfiles and counting the occurrences of each word (ignoring punctuation). We don't have to use streams but I want to practice using them.
So far I am able to read a text file and put each line in a string, and all the strings in a list using this:
try (Stream<String> p = Files.lines(FOLDER_OF_TEXT_FILES)) {
list = p.map(line -> line.replaceAll("[^A-Za-z0-9 ]", ""))
.collect(Collectors.toList());
}
However, so far, it simply makes all the lines a single String, so each element of the list is not a word, but a line. Is there a way using streams that I can have each element be a single word, using something like String's split method with regex? Or will I have to handle this outside the stream itself?
I may misunderstood your question. But if you just want comma separated words you can try below code
Replace line.replaceAll("[^A-Za-z0-9 ]", "") with Arrays.asList(line.replaceAll("[^A-Za-z0-9 ]", "").split(" ")).stream().collect(Collectors.joining(","))
Again use joining method on the list to get comma separated String of words .
String commaSeperated = list.stream().collect(Collectors.joining(",")) ;
You can perform further operations on the final string as per your requirement.
Instead of applying replaceAll on a line, do it on words of the line as follows:
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
String str = "Harry is a good cricketer. Tanya is an intelligent student. Bravo!";
List<String> words = Arrays.stream(str.split("\\s+")).map(s -> s.replaceAll("[^A-Za-z0-9 ]", ""))
.collect(Collectors.toList());
System.out.println(words);
}
}
Output:
[Harry, is, a, good, cricketer, Tanya, is, an, intelligent, student, Bravo]
Note: The regex, \\s+ splits a string on space(s).
try this:
String fileName = "file.txt";
try {
Map<String, Long> wordCount = Files.lines(Path.of(fileName))
.flatMap(line -> Arrays.stream(line.split("\\s+")))
.filter(w->w.matches("[a-zA-Z]+"))
.sorted(Comparator.comparing(String::length)
.thenComparing(String.CASE_INSENSITIVE_ORDER))
.collect(Collectors.groupingBy(w -> w,
LinkedHashMap::new, Collectors.counting()));
wordCount.entrySet().forEach(System.out::println);
}catch (Exception e) {
e.printStackTrace();
}
This is relatively simple. It just splits on white space and counts the words by putting them in a map where the Key is the word and the Value is a long containing the count.
I included a filter to only capture words of nothing but letters.
The way this works is that the Lines put into a stream. Each line is then split into words using String.split. Since this creates an array, the flatMap converts all these individual streams of words into a single stream where they are processed.
The work horse of this is the Collectors.groupingBy which will group the values in a particular way for each key. In this case, I specified the Collectors.counting() method to increase the count each time the key (i.e. word) appeared.
As an option, I sorted the words first on length and then alphabetically, ignoring case.
First, for each line, we're removing all non-alphanumeric characters (excluding spaces), then we split on space, so all elements are single words. Since we're flatmapping, the stream consists of all words. Then we simply collect using the groupingBy collector, and use counting() as downstream collector. That'll leaves us with a Map<String, Long> were the key is the word and the value is the number of occurrences.
list = p
.flatMap(line -> Arrays.stream(line.replaceAll("[^0-9A-Za-z ]+", "").split("\\s+")))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting());
Since line boundaries are irrelevant when want to process words, the preferred way is not to bother with splitting into lines, just to split lines into words, but split the file into words in the first place. You can use something like:
Map<String,Long> wordsAndCounts;
try(Scanner s = new Scanner(Paths.get(path))) {
wordsAndCounts = s.findAll("\\w+")
.collect(Collectors.groupingBy(MatchResult::group, Collectors.counting()));
}
wordsAndCounts.forEach((w,c) -> System.out.println(w+":\t"+c));
The findAll method of Scanner requires Java 9 or newer. This answer contains an implementation of findAll for Java 8. This allows to use it on Java 8 and easily migrate to newer versions by just switching to the standard method.
For the entire "read a text file and count each word using streams", I suggest using something like this:
try (Stream<String> lines = Files.lines(FOLDER_OF_TEXT_FILES)) {
lines.flatMap(l -> Arrays.stream(l.split(" ")))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}
There is no need to first collect everything into a list, this can be done inline.
Also it's good that you used try-with-resources.
one could use a Pattern.splitAsStream to split a string in a performant way and at the same time replace all non word characters before creating a map of occurrence counts:
Pattern splitter = Pattern.compile("(\\W*\\s+\\W*)+");
String fileStr = Files.readString(Path.of(FOLDER_OF_TEXT_FILES));
Map<String, Long> collect = splitter.splitAsStream(fileStr)
.collect(groupingBy(Function.identity(), counting()));
System.out.println(collect);
For splitting and removal of non word characters we are using the pattern (\W*\s+\W*)+ where we look for optional non word characters then a space and then again for optional non word characters.
In the below someString String I want to remove FAIL: and extract the last ID_ number and ignore all other ID_ number in the string. Why does the method in the first system.out doesn't work but the second one does?
Or what is the best way to achieve this?
public static void main(String[] args) {
String someString = "FAIL: some random message with ID_temptemptemp and with original ID_1234567890";
System.out.println(someString.split("FAIL: ")[1].substring(someString.lastIndexOf("ID_")));
String newString = someString.split("FAIL: ")[1];
System.out.println(newString.substring(newString.lastIndexOf("ID_") + 3));
}
Output:
4567890
1234567890
In my opinion, this sort of problem is usually best to use regular expression. While using a combination of replace substring indexOf can work but it can be difficult for the next dev to understand the real logic.
This is the regular expression solution:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class testmain {
public static void main(String[] args) {
String someString = "FAIL: some random message with ID_temptemptemp and with original ID_1234567890";
Pattern pattern3 = Pattern.compile("ID_(\\d+)");
Matcher matcher3 = pattern3.matcher(someString);
String s = null;
while (matcher3.find()) {
s = matcher3.group(1); // Keep overriding until the last set of captured value
}
System.out.println(s);
}
}
Sample output:
1234567890
The expression ID_(\\d+) means that we are looking for occurrances of the word "ID_" and if matched, capture the remaining digits.
The following while loop is just to go through all the captured patterns and keep overriding the s variable with the captures until the last one, which fits your requirement.
The original problem:
The issue with your initial solution was that after you did a .split(...) the length of the splited value in position [1] string is no longer the same as the original string value, and hence you should be doing lastIndexOf(someString.split("FAIL: ")[1]) instead to compare.
Therefore giving you the output ID_1234567890
Example:
System.out.println(someString.split("FAIL: ")[1].substring(someString.split("FAIL: ")[1].lastIndexOf("ID_")));
The remaining code works just fine.
Tips:
Tips on debugging, maybe get an IDE like IntelliJ and step through the code to see what the code is doing on each step. This would give you a better idea.
Another way would be to split on space and then use replace
String test = "FAIL: some random message with
ID_temptemptemp and with original ID_1234567890";
String arr[] = test.split (" ");
System.out.println(arr[arr.length -1].replace("ID_", ""));
Arrays.stream(str.split(" "))
.filter(x -> x.startsWith("ID_"))
.map(x -> x.substring(3))
.reduce((x, y) -> y)
.ifPresent(System.out::println);
or
String id = Arrays.stream(str.split(" "))
.filter(x -> x.startsWith("ID_"))
.map(x -> x.substring(3))
.reduce((x, y) -> y)
.orElse(null);
You first attempt does not work because once you have done this someString.split("FAIL: ")[1] the immutable string "someString" was split and two new string objects were created. So substring(someString.lastIndexOf("ID_")) goes wrong becuase you are trying to use the length from original string for a substring operation for a smaller string
The best way to solve this as I see it is someString.substring(someString.lastIndexOf("ID_") + 3)
Because I see no use of stripping out "FAIL" part
It works in the second one because you String is immutable and cannot be replaced inline.
In the first one you try to do it inline and hence it fails. But in the second one you do replace and store it in another string:
String newString = someString.split("FAIL: ")[1];
System.out.println(newString.substring(newString.lastIndexOf("ID_") + 3));
If you want to use inline then you may try StringBuffer and StringBuilder
I have searched a lot but couldn't find anything that will allow me to insert words at certain indexes simultaneously. For example :
I have a string :
rock climbing is fun, I love rock climbing.
I have a hashmap for certain words which indicate their index in the string :
e.g. :
rock -> 0,29
climbing -> 5,34
fun -> 17
Now my question is :
I want to put [start] tag at the start of all these words and [end] tag at the end of them, in the string. I can't do this one by one since in that case once I insert [start] at index 0, then all the other indexes will be modified and I'll have to recalculate them.
Is there a way in which I can insert all of the tags at once or something? Can somebody suggest some other solution to this problem?
I can't use regular expressions(replaceall method), since sometimes I'll have a sentence like :
rocks are hard.
and hashmap will be :
rock -> 0
I am looking for faster solutions here.
edit :
for the sentence :
rocks are hard but frocks are beautiful.
rocks -> 0
Here I don't want to replace frocks with the tags.
This is not exactly doing what you want, but consider this as an alternative solution what tries to achieve your goal via different means.
I will suppose two different things, firstly, suppose there exists a List<String> words, which contains the words you want to replace.
Then code will be:
public String insertTags(final String input) {
for (String word : words) {
input.replace(word, "[start]" + word + "[end]");
}
return input;
}
Second case, closer to your example but not using the indices, suppose there exists a Map<String, List<Integer>>, which contains the words and the indices to replaces them at in a list representation.
Then code would be:
public String insertTags(final String input) {
for (Map.Entry<String, List<Integer>> entry : words.entrySet()) {
String word = entry.getKey();
input.replace(word, "[start]" + word + "[end]");
}
return input;
}
The latter is definately more complex and does not even use the indices, so preferably you should rewrite it to the former.
Hope this helps you without having to worry about the indices.