Extracting strings that contains particular words - java

This code can extract sentences that contain a particular word. The problem is if I want to extract several sentences based on different words I must copy it several times. Is there a way of doing this with several words? possibly feeding an array to it?
String o = "Trying to extract this string. And also the one next to it.";
String[] sent = o.split("\\.");
List<String> output = new ArrayList<String>();
for (String sentence : sent) {
if (sentence.contains("this")) {
output.add(sentence);
}
}
System.out.println(">>output=" + output);

You can try this:
String o = "Trying to extract this string. And also the one next to it.";
String[] sent = o.split("\\.");
List<String> keyList = new ArrayList<String>();
keyList.add("this");
keyList.add("these");
keyList.add("that");
List<String> output = new ArrayList<String>();
for (String sentence : sent) {
for (String key : keyList) {
if (sentence.contains(key)) {
output.add(sentence);
break;
}
}
}
System.out.println(">>output=" + output);

String sentence = "First String. Second Int. Third String. Fourth Array. Fifth Double. Sixth Boolean. Seventh String";
List<String> output = new ArrayList<String>();
for(String each: sentence.split("\\.")){
if(inKeyword(each)) output.add(each);
}
System.out.println(output);
Helper Function:
public static Boolean inKeyword(String currentSentence){
String[] keyword = {"int", "double"};
for(String each: keyword){
if(currentSentence.toLowerCase().contains(each)) return true;
}
return false;
}

If you have a list of words to filter for called filter and an array of sentences you could use Collections.disjoint to compare if the words of that sentence does not overlap with the words to filter for. Sadly, this does not work if you filter for "However" and your sentence contains "However,".
Collection<String> filter = /**/;
String[] sentences = /**/;
List<String> result = new ArrayList();
for(String sentence : sentences) {
Collection<String> words = Arrays.asList(sentence.split(" "));
// If they do not not overlap, they overlap
if (!Collections.disjoint(words, filter)) {
result.add(sentence);
}
}

You can use String.matches as follows.
String sentence = ...;
if (sentence.matches(".*(you|can|use).*")) { // Or:
if (sentence.matches(".*\\b(you|can|use)\\b.*")) { // With word boundaries
if (sentence.matches("(?i).*(you|can|use).*")) { // Case insensitive ("You")
In java 8 the following variations might do:
String pattern = ".*(you|can|use).*";
String pattern = new StringJoiner("|", ".*(", ").*)
.add("you")
.add("can")
.add("use")
.toString();
// Or a stream on the words with a joining collector
Arrays.stream(o.split("\\.\\s*"))
filter(sentence -> sentence.matches(pattern))
forEach(System.out::println);

With streams (splitting into sentences and words):
String o = "Trying to extract this string. And also the one next to it.";
Set<String> words = new HashSet<>(Arrays.asList("this", "also"));
List<String> output = Arrays.stream(o.split("\\.")).filter(
sentence -> Arrays.stream(sentence.split("\\s")).anyMatch(
word -> words.contains(word)
)
).collect(Collectors.toList());
System.out.println(">>output=" + output);

Related

Transform string format to list - it is not a json format string

How to transform a string with this format into a list?
[[["Census_county_divisions","Populated_places_in_the_United_States","Populated_places_by_country","Geography_by_country","Geography_by_place","Geography","Main_topic_classifications"]],[["example","text","thanks"]],[["name","surname","age"]]]
From that string I would like to have 3 lists:
List 1:
"Census_county_divisions","Populated_places_in_the_United_States","Populated_places_by_country","Geography_by_country","Geography_by_place","Geography","Main_topic_classifications"
List 2:"example","text","thanks"
List 3:"name","surname","age"
I have tried different approachs do process this string, with split, with method StringUtils.substringBetween, with indexOf, with regex, with Json Parser.... I always get an error, is it an easier way out??
Comments: I don't see this string as a Json format, since the Json format would be "name":"John", If I'm wrong, please let me know how I could process it as a Json....
I have tried also with JsonParser and had the Exception in thread "main" java.lang.IllegalStateException: Not a JSON Object:
[[["Census_county_divisions","Popula
I write this code:
remove the [[[,]]] strings
replace the ]],[[ for | character
split the string
///The String to convert
String arg = "[[[\"Census_county_divisions\",....
[[\"example\",\"text\",\"thanks\"]],[[\"name\",\"surname\",\"age\"]]]";
System.out.println(arg);
////Replace
arg = arg.replace("[[[", "");
arg = arg.replace("]],[[", "|");
arg = arg.replace("]]]", "");
System.out.println(arg);
////Split
String[] array=arg.split("\\|");
List<String> list = Arrays.asList(array);
///Verify
for(String s: list) {
System.out.println(s);
}
Regards
This worked for me. I only trimmed the start and end character set and then split it into 3 different strings which yielded the lists.
str = new StringBuilder(str).replace(0, 4, "")
.reverse().replace(0, 4, "")
.reverse()
.toString();
String[] arr1 = str.split("\"\\]\\],\\[\\[\"");
List<String> list1 = Arrays.asList(arr1[0]
.split("\",\""));
List<String> list2 = Arrays.asList(arr1[1]
.split("\",\""));
List<String> list3 = Arrays.asList(arr1[2]
.split("\",\""));
Maybe you could use Regex to extract everything inside the "[[" "]]" and trimming the "[" and "]" at the start and the end of each group. After that, you could split the result and put it into a List.
Here a simple example:
List<List<String>> lists = new ArrayList<>();
String string = "[[[\"Census_cunty_divisions\",\"Populated_places_in_the_United_States\",\"Populated_places_by_country\",\"Geography_by_country\",\"Geography_by_place\",\"Geography\",\"Main_topic_classifications\"]],[[\"example\",\"text\",\"thanks\"]],[[\"name\",\"surname\",\"age\"]]]";
Pattern pattern = Pattern.compile("\\[\\[(.*?)\\]\\]");
Matcher matcher = pattern.matcher(string);
String proccesed;
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
proccesed = StringUtils.strip(matcher.group(i), "[");
proccesed = StringUtils.strip(proccesed, "]");
lists.add(Arrays.asList(proccesed.split(",")));
}
}
int i = 0;
for(List<String> stringList : lists){
System.out.printf("List # %s \n", i);
for(String elementOfList:stringList){
System.out.printf("Element %s \n", elementOfList);
}
i++;
}
Here you will have a dynamic list depending on the initial String.
I've used the org.apache.commons commons-text library to strip the matches.
I hope it's useful.

replace multiple sub-strings in a string

This function is used to replace certain substrings in a string with respective values.
// map(string_to_replace, string_to_replace_with)
String template = "ola ala kala pala sala";
StringBuilder populatedTemplate = new StringBuilder();
HashMap<String, String> map = new HashMap<>();
map.put("ola", "patola");
map.put("pala", "papala");
int i=0;
for (String word : template.split("'")) {
populatedTemplate.append( map.getOrDefault(word, word));
populatedTemplate.append(" ");
}
System.out.println(populatedTemplate.toString());
This above function works fine if substring to be replaced is surrounded by " "(space).
Ex- String => "Hey {how} are $=you"
if substrings to be replaced is "Hey" or "you", then it works fine. The issue is when I want to replace "how" and "you".
How can I achieve this without additional complexity ?
I you want to replace only the words that you have in the map and keep the rest as it is, you can proceed as next:
String template = "Hey {how} are $=you";
StringBuilder populatedTemplate = new StringBuilder();
Map<String, String> map = new HashMap<>();
map.put("how", "HH");
map.put("you", "YY");
// Pattern allowing to extract only the words
Pattern pattern = Pattern.compile("\\w+");
Matcher matcher = pattern.matcher(template);
int fromIndex = 0;
while (matcher.find(fromIndex)) {
// The start index of the current word
int startIdx = matcher.start();
if (fromIndex < startIdx) {
// Add what we have between two words
populatedTemplate.append(template, fromIndex, startIdx);
}
// The current word
String word = matcher.group();
// Replace the word by itself or what we have in the map
populatedTemplate.append(map.getOrDefault(word, word));
// Start the next find from the end index of the current word
fromIndex = matcher.end();
}
if (fromIndex < template.length()) {
// Add the remaining sub String
populatedTemplate.append(template, fromIndex, template.length());
}
System.out.println(populatedTemplate);
Output:
Hey {HH} are $=YY
Response Update:
Assuming that you want to be able to replace not only words but anything like ${questionNumber}, you will need to create the regular expression dynamically like this:
String template = "Hey {how} are $=you id=minScaleBox-${questionNumber}";
...
map.put("${questionNumber}", "foo");
StringBuilder regex = new StringBuilder();
boolean first = true;
for (String word : map.keySet()) {
if (first) {
first = false;
} else {
regex.append('|');
}
regex.append(Pattern.quote(word));
}
Pattern pattern = Pattern.compile(regex.toString());
...
Output:
Hey {HH} are $=YY id=minScaleBox-foo

How to remove duplicate values in string which has delimiters

I have string with value as ||HelpDesk||IT Staff||IT Staff||Admin||Audit||HelpDesk||
I am trying to write code which should remove duplicates and return the unique values retaining the demiliters like this ||HelpDesk||IT Staff||Admin||Audit||
My code is using HashSet to remove duplicates but the problem is it is removing delimiters. How can I retain delimiters by removing duplicate values only.
Below is my code after removing duplicates and adding back delimiters. But not sure if there is easy way of doing this.
public static void main(String[] args) {
TestDuplicates testDuplicates = new TestDuplicates();
String bRole = "||HelpDesk||IT Staff||IT Staff||Admin||Audit||HelpDesk||";
List<String> listWithoutDuplicates = new ArrayList<String>();
String noDup = "";
List<String> splittedStringList =
new ArrayList<String>();
SplitOperations splitOperations =
new SplitOperations();
splittedStringList =
splitOperations.splitString(bRole);
for (int i = 0; i < splittedStringList.size(); i++) {
HashSet<String> listToSet = new HashSet<String>(splittedStringList);
listWithoutDuplicates = new ArrayList<String>(listToSet);
}
for(int i=0;i<listWithoutDuplicates.size();i++){
noDup = noDup + "||"+listWithoutDuplicates.get(i);
System.out.println(listWithoutDuplicates.get(i));
}
System.out.println("No Duplicate is::"+ noDup+"||");
}
Thanks
You could use a LinkedHashSet to preserve insertion order. Once you splitted the String by "||" just add the delimiters when constructing back the String.
String s = "||HelpDesk||IT Staff||IT Staff||Admin||Audit||HelpDesk||";
Set<String> set = new LinkedHashSet<>(Arrays.asList(s.split(Pattern.quote("||"))));
String noDup = "||";
for(String st : set) {
if(st.isEmpty()) continue;
noDup += st+"||";
}
Or using the new java 8 Stream API :
String noDup = "||"+
Arrays.stream(s.split(Pattern.quote("||")))
.distinct()
.filter(st -> !st.isEmpty()) //we need to remove the empty String produced by the split
.collect(Collectors.joining("||"))+"||";
Both approaches yield the same result (||HelpDesk||IT Staff||Admin||Audit||).
public String removeDublicate () {
String str = "||HelpDesk||IT Staff||IT Staff||Admin||Audit||HelpDesk||";
String split[] = str.split("\\|\\|");
String newStr = "";
for (String s : split) {
if (!s.isEmpty() && !newStr.contains(s)) {
newStr += "||" + s;
}
}
newStr += "||";
return newStr;
}
Something like that? str could be an argument.
Edit #1
If you want to get rid of && !newStr.contains(s) you can use a HashSet<String> instead. I think that is overkill however. .contains(s) will do the trick when the string is small such as this.
This should work, also it will maintain sequence of elements if you want. Note that I have not written code to put delimiters again.
public static void main(String s[]){
String a = "||HelpDesk||IT Staff||IT Staff||Admin||Audit||HelpDesk||";
a = a.replaceAll("\\|\\|",",");
String arr[] = a.split(",");
//linked hash set in case you want to maintain the sequence of elements
Set<String> set = new LinkedHashSet<String>(Arrays.asList(arr));
set.remove("");
System.out.println(set);
//Iterate through the set and put your delimiters here again
}
Using Guava lib it's a one liner:
Joiner.on("||").skipNulls(Splitter.on("||").trimResults().split(<target_string>);)
Here is my attempt at it:
import java.util.*;
public class Seperator {
public static void main(String[] args) {
String bRole = "||HelpDesk||IT Staff||IT Staff||Admin||Audit||HelpDesk||";
List<String> listWithoutDuplicates = new ArrayList<String>();
String noDup = "";
List<String> splittedStringList = new ArrayList<String>();
splittedStringList = Arrays.asList(bRole.split("\\|\\|"));
LinkedHashSet<String> listToSet = new LinkedHashSet<String>(splittedStringList);
noDup = Seperator.join(listToSet, "||");
System.out.println("No Duplicate is::"+ noDup+"||");
}
public static String join(Set<String> set, String sep) {
String result = null;
if(set != null) {
StringBuilder sb = new StringBuilder();
Iterator<String> it = set.iterator();
if(it.hasNext()) {
sb.append(it.next());
}
while(it.hasNext()) {
sb.append(sep).append(it.next());
}
result = sb.toString();
}
return result;
}
}
LinkedHashSet are mainly used to preserve the order and ofcourse get uniques elements. Joining is pretty standard, but we can use Google's Guava Library also (Joiner):
So, instead of Seperator.join(listToSet, "||");
You'll have: Joiner.on("||").join(listToSet);
Here's a regex-based one liner:
str = str.replaceAll("(\\|[^|]+)(?=.*\\1\\|)", "");
This works by replacing every term that is followed by itself somewhere ahead via a look ahead assertion that uses a back reference.
Here's a non-regex java 8 one liner:
Arrays.stream(str.substring(1).split("[|]")).distinct().collect(Collectors.joining("|", "|", "|"));

Counting occurrences of string in an arraylist using case-insensitive comparison

I have to count the occurrences of each String in an ArrayList that includes Strings and ints. Right now I have to ignore the int variable that corresponds to the quantity of every item and just count the repetitions of each String in that list.
My problem is that in class we only did this with ints. Now with Strings I'm having a problem with the casing because "abc" is different than "Abc" and "def of Ghi" is different from "Def of ghi".
Right now the code I have is this:
Map<String, Integer> getCount1 = new HashMap<>();
{
for (ItemsList i : list) {
Integer count = getCount1.get(i.name);
if (count == null) {
count = 0;
}
getCount1.put(i.name, (count.intValue() + 1));
}
for (Map.Entry<String, Integer> entry : getCount1.entrySet())
f.println(entry.getKey() + " : " + entry.getValue());
}
But as I said: it is not counting the occurrences correctly. I have, for example, one occurrence in my list called "Abc of abc" then in the input.txt file list I have that same occurrence 4 times - "Abc of abc"; "abc of Abc"; "Abc of Abc" and "abc Of abc" - all written differently and it's counting them separately instead of the same occurrence 4 times.
Early on when I was working on the totals and averages I was able to use equalsIgnoreCase() so it works fine in there, so regardless of the casing, it's counting them in the right list but not as just one occurrence several times.
Is there a way that I could use the ignore case or convert everything to the same case before counting them?
Just an update: Instead of trying the .toLowerCase() there I used it in the FileReader when it reads the .txt file and it worked i.name = name.toLowerCase();
Thanks for your time and help anyway
Try this :
public void getCount(){
Map<String, Integer> countMap = new HashMap<String, Integer>();
for(ItemsList i : itemsList){
if(countMap.containsKey(i.Name.toLowerCase())){
countMap.get(i.Name.toLowerCase())++;
}
else{
countMap.put(i.Name.toLowerCase(),1);
}
}
}
The hashing function for a HashMap is case sensitive, so you need to uppercase or lowercase the string values. See below your modified code:
Map<String, Integer> getCount1 = new HashMap<>();
{
for (ItemsList i : list) {
Integer count = getCount1.get(i.name);
if (count == null) {
count = 0;
}
getCount1.put(i.name.toString(). toLowerCase() , (count.intValue() + 1));
}
for (Map.Entry<String, Integer> entry : getCount1.entrySet())
f.println(entry.getKey() + " : " + entry.getValue());
}
As a point of style I'd use a more descriptive name for the items in your ItemsList like item.
Instead of trying the .toLowerCase() there I used it in the FileReader when it reads the .txt file and it worked i.name = name.toLowerCase();
So in the end my code there was this:
static void readFile(ArrayList<Items> list) throws IOException {
BufferedReader in = new BufferedReader(
new FileReader("input.txt")
);
String text;
while( (text = in.readLine()) != null ) {
if(text.length()==0) break;
Scanner line = new Scanner(text);
linha.useDelimiter("\\s*:\\s*");
String name = line.next();
int qtt = line.nextInt();
Items i = new Items();
i.name = name.toLowerCase();
i.qtt = qtt;
list.add(i);
}
in.close();
}

How to load a line of string input from data file into an array

I am complete beginner. Have a data input file with multiple lines like this
name, number, number,...
I can read the line as a string but need to input the values into an array. How do I do that. I've spent 2 days trying to figure it out.
As you have mentioned each line has words separated with comma. So you can read the lines and then split the lines to get the array for each line. You may want to add these arrays to another collection such as an ArrayList:
So I am assuming you have the file reading logic and each line you are getting in a string called line. You need to put these pieces at the right places in your code:
// collection to hold array of words in each line
ArrayList<String[]> listOfWords = new ArrayList<String[]>();
// split your line into an array of words
String wordsInLine[] = line.split("[ ,]+");
// add words array to your collection
listofWords.add(wordsInLine);
Things to check out:
String.split()
Scanner
StringTokenizer
But particularly, String.split():
String list = "first second third";
String[] items = list.split(" "); // = { "first", "second", "third" };
Or with regular expressions:
String list = "first, second, third";
String[] items = list.split("[ ,]+"); // = { "first", "second", "third" };
Then you have things like StringTokenizer, which can do it all, e.g.:
String values = "name, 1, 2, 3";
String name = null;
List<Integer> numbers = new ArrayList<Integer>();
StringTokenizer tokenizer = new StringTokenizer(value, " ,");
if (tokenizer.hasMoreTokens()) // first token is name
name = tokenizer.nextToken();
else
throw new Exception("No name!");
while (tokenizer.hasMoreTokens()) // rest are numbres
numbers.add(Integer.parseInt(tokenizer.nextToken());
Or you can do fun stuff with Scanners:
String values = "name, 1, 2, 3";
Scanner scanner = new Scanner(values);
scanner.useDelimiter("[ ,]+"); // regular expression, 1 or more spaces/commas
String name = null;
List<Integer> numbers = new ArrayList<Integer>();
if (scanner.hasNext()) // first token is name
name = scanner.next();
else
throw new Exception("No name!");
while (scanner.hasNextInt()) // rest are numbers
numbers.add(scanner.nextInt());
There's a lot of ways to skin this cat.
Arrays are fixed sized, so you would have to know how many elements you want to add via a loop. I would recommend using an ArrayList.
This should help: http://docs.oracle.com/javase/6/docs/api/java/util/ArrayList.html
ArrayList<String> listOfStrings = new ArrayList<String>(); // instantiating the object
listOfStrings.add(STRING_VARIABLE) // adding onto the ArrayList.
listOfStrings.get(INDEX_NUMBER) // gets the actual String object, using the index number
String strFromFile = "name,number,number,";
if(strFromFile.contains(","))
{
String[] splitInput = strFromFile.split(",");
String part1 = splitInput[0]; //gives you name
String part2 = splitInput[1]; // gives you number
.....
}
else
{
throw new IllegalArgumentException("String " + strFromFile + " does not contain ,");
}
List<String[]> finalList = new ArrayList<String[]>();
finalList.add(splitInput);

Categories

Resources