Chunking of corresponding data in java

Chunking of corresponding data in java - java

I am new in java I want to chunk string which is present in arraylist.
For Example : My String ="I am going to Department of agriculture science near to my house" then i want to chunk Department-of-agriculture-science which i added in arraylist in my code
Another Example like "You can use either Iterator or ListIterator for traversing on Java ArrayList" From my code it will give problem something is wrong. but i want output as "You can use either Iterator or ListIterator-for-traversing-on-Java-ArrayList. "ListIterator for traversing" and "traversing on Java ArrayList" is already in arraylist.
I want to chunk token only which i added in list like "Department of ayurveda" "ListIterator for traversing". if my string is "You can use either Iterator or ListIterator for traversing on Java ArrayList" then output should be "You can use either Iterator or ListIterator-for-traversing-on-Java-ArrayList"
My Code :
String input = "You can use either Iterator or ListIterator for traversing on Java ArrayList";
ArrayList<String> alPatterns = new ArrayList<String>();
alPatterns.add("Department of ayurveda");
alPatterns.add("science");
alPatterns.add("ListIterator for traversing");
alPatterns.add("for traversing on Java ArrayList");
int NoOfRecords = alPatterns.size();
int counter = 0;
final int flags = Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE;
Iterator it = alPatterns.iterator();
while (counter++ < NoOfRecords) {
try {
String sPat = (String) it.next();
Pattern p = Pattern.compile(sPat, flags);
Matcher m = p.matcher(input);
OUTER: while (m.find()) {
String sMatchFound = m.group();
String newstring = sMatchFound.replaceAll(" ", "-");
input = input.replaceAll(sMatchFound, newstring);
continue OUTER;
}
System.out.println(input);
} catch (Exception e) {
e.printStackTrace();
}
}
}

Related

Java replace word in curly braces by name

I have a string like:
String message = "This is a message for {ID_PW}. Your result is {exam_result}. Please quote {ID_PW} if replying";
I am importing data from CSV that I would like to use to replace the items between curly braces.
// Using OpenCSV to read in CSV...code omitted for brevity
values = (Map<String, String>) reader.readMap();
// values has 'ID_PW', 'exam_result', etc keys
How can I replace the items in curly braces in the message with the equivalent value of the key in values?

Probably you are looking for:
String s = "I bought {0,number,integer} mangos. From {1}, the fruit seller. Out of them {2,number,percent} were bad.";
MessageFormat formatter = new MessageFormat(s);
Object[] argz = {22, "John", 0.3};
System.out.println(formatter.format(argz));
This outputs:
I bought 22 mangos. From John, the fruit seller. Out of them 30% were bad.
Refer https://docs.oracle.com/javase/8/docs/api/java/text/MessageFormat.html for more details.

String message = "This is a message for {ID_PW}. Your result is {exam_result}. Please quote {ID_PW} if replying";
LinkedHashSet<String> fields = new LinkedHashSet<>(); // 'Automatically' handle duplicates
Pattern p = Pattern.compile("\\{([^}]*)\\}");
Matcher m = p.matcher(message);
// Find 'fields' in the message that are wrapped in curly braces and add to hash set
while (m.find()) {
fields.add((m.group(1)));
}
// Go through CSV and parse the message with the associated fields
while (((values = (Map<String, String>) reader.readMap())) != null)
{
Iterator itr = fields.iterator();
String newMsg = message;
while (itr.hasNext()) {
String field = (String) itr.next();
String value = values.get(field);
if(value != null) {
newMsg = newMsg.replaceAll("\\{" + field + "\\}", value);
}
}
}

Use StringBuilder. StringBuilder is explicitly designed to be a mutable type of String. Next, don't use regular expressions in a loop. Regular expressions can be powerful, but since you will be using a loop to search for multiple patterns there is nothing regular involved (multiple patterns means multiple expressions).
I would just search left to right for { and then } extract the key and search for it in the values map. Something like,
Map<String, String> values = new HashMap<>();
values.put("ID_PW", "SimpleOne");
values.put("exam_result", "84");
String message = "This is a message for {ID_PW}. Your result "
+ "is {exam_result}. Please quote {ID_PW} if replying";
StringBuilder sb = new StringBuilder(message);
int p = -1;
while ((p = sb.indexOf("{", p + 1)) > -1) {
int e = sb.indexOf("}", p + 1);
if (e > -1) {
String key = sb.substring(p + 1, e);
if (values.containsKey(key)) {
sb.replace(p, p + key.length() + 2, values.get(key));
}
}
}
System.out.println(sb);
Outputs
This is a message for SimpleOne. Your result is 84. Please quote SimpleOne if replying

How to merge many List<String> elements in one based on double quote delimiter in java

I have a CSV file generated in other platform (Salesforce), by default it seems Salesforce is not handling break lines in the file generation in some large text fields, so in my CSV file I have some rows with break lines like this that I need to fix:
"column1","column2","my column with text
here the text continues
more text in the same field
here we finish this","column3","column4"
Same idea using this piece of code:
List<String> listWords = new ArrayList<String>();
listWords.add("\"Hi all");
listWords.add("This is a test");
listWords.add("of how to remove");
listWords.add("");
listWords.add("breaklines and merge all in one\"");
listWords.add("\"This is a new Line with the whole text in one row\"");
in this case I would like to merge the elements. My first approach was to check for the lines were the last char is not a ("), concatenates the next line and just like that until we see the las char contains another double quote.
this is a non working sample of what I was trying to achieve but I hope it gives you an idea
String[] csvLines = csvContent.split("\n");
Integer iterator = 0;
String mergedRows = "";
for(String row:csvLines){
newCsvfile.add(row);
if(row != null){
if(!row.isEmpty()){
String lastChar = String.valueOf(row.charAt(row.length()-1));
if(!lastChar.contains("\"")){
//row += row+" "+csvLines[iterator+1].replaceAll("\r", "").replaceAll("\n", "").replaceAll("","").replaceAll("\r\n?|\n", "");
mergedRows += row+" "+csvLines[iterator+1].replaceAll("\r", "").replaceAll("\n", "").replaceAll("","").replaceAll("\r\n?|\n", "");
row = mergedRows;
csvLines[iterator+1] = null;
}
}
newCsvfile.add(row);
}
iterator++;
}
My final result should look like (based on the list sample):
"Hi all This is a test of how to remove break lines and merge all in one"
"This is a new Line with the whole text in one row".
What is the best approach to achieve this?

In case you don't want to use a CSV reading library like #RealSkeptic suggested...
Going from your listWords to your expected solution is fairly simple:
List<String> listSentences = new ArrayList<>();
String tmp = "";
for (String s : listWords) {
tmp = tmp.concat(" " + s);
if (s.endsWith("\"")){
listSentences.add(tmp);
tmp = "";
}
}

Extracting strings that contains particular words

This code can extract sentences that contain a particular word. The problem is if I want to extract several sentences based on different words I must copy it several times. Is there a way of doing this with several words? possibly feeding an array to it?
String o = "Trying to extract this string. And also the one next to it.";
String[] sent = o.split("\\.");
List<String> output = new ArrayList<String>();
for (String sentence : sent) {
if (sentence.contains("this")) {
output.add(sentence);
}
}
System.out.println(">>output=" + output);

You can try this:
String o = "Trying to extract this string. And also the one next to it.";
String[] sent = o.split("\\.");
List<String> keyList = new ArrayList<String>();
keyList.add("this");
keyList.add("these");
keyList.add("that");
List<String> output = new ArrayList<String>();
for (String sentence : sent) {
for (String key : keyList) {
if (sentence.contains(key)) {
output.add(sentence);
break;
}
}
}
System.out.println(">>output=" + output);

String sentence = "First String. Second Int. Third String. Fourth Array. Fifth Double. Sixth Boolean. Seventh String";
List<String> output = new ArrayList<String>();
for(String each: sentence.split("\\.")){
if(inKeyword(each)) output.add(each);
}
System.out.println(output);
Helper Function:
public static Boolean inKeyword(String currentSentence){
String[] keyword = {"int", "double"};
for(String each: keyword){
if(currentSentence.toLowerCase().contains(each)) return true;
}
return false;
}

If you have a list of words to filter for called filter and an array of sentences you could use Collections.disjoint to compare if the words of that sentence does not overlap with the words to filter for. Sadly, this does not work if you filter for "However" and your sentence contains "However,".
Collection<String> filter = /**/;
String[] sentences = /**/;
List<String> result = new ArrayList();
for(String sentence : sentences) {
Collection<String> words = Arrays.asList(sentence.split(" "));
// If they do not not overlap, they overlap
if (!Collections.disjoint(words, filter)) {
result.add(sentence);
}
}

You can use String.matches as follows.
String sentence = ...;
if (sentence.matches(".*(you|can|use).*")) { // Or:
if (sentence.matches(".*\\b(you|can|use)\\b.*")) { // With word boundaries
if (sentence.matches("(?i).*(you|can|use).*")) { // Case insensitive ("You")
In java 8 the following variations might do:
String pattern = ".*(you|can|use).*";
String pattern = new StringJoiner("|", ".*(", ").*)
.add("you")
.add("can")
.add("use")
.toString();
// Or a stream on the words with a joining collector
Arrays.stream(o.split("\\.\\s*"))
filter(sentence -> sentence.matches(pattern))
forEach(System.out::println);

With streams (splitting into sentences and words):
String o = "Trying to extract this string. And also the one next to it.";
Set<String> words = new HashSet<>(Arrays.asList("this", "also"));
List<String> output = Arrays.stream(o.split("\\.")).filter(
sentence -> Arrays.stream(sentence.split("\\s")).anyMatch(
word -> words.contains(word)
)
).collect(Collectors.toList());
System.out.println(">>output=" + output);

replace multiple sub-strings in a string

This function is used to replace certain substrings in a string with respective values.
// map(string_to_replace, string_to_replace_with)
String template = "ola ala kala pala sala";
StringBuilder populatedTemplate = new StringBuilder();
HashMap<String, String> map = new HashMap<>();
map.put("ola", "patola");
map.put("pala", "papala");
int i=0;
for (String word : template.split("'")) {
populatedTemplate.append( map.getOrDefault(word, word));
populatedTemplate.append(" ");
}
System.out.println(populatedTemplate.toString());
This above function works fine if substring to be replaced is surrounded by " "(space).
Ex- String => "Hey {how} are $=you"
if substrings to be replaced is "Hey" or "you", then it works fine. The issue is when I want to replace "how" and "you".
How can I achieve this without additional complexity ?

I you want to replace only the words that you have in the map and keep the rest as it is, you can proceed as next:
String template = "Hey {how} are $=you";
StringBuilder populatedTemplate = new StringBuilder();
Map<String, String> map = new HashMap<>();
map.put("how", "HH");
map.put("you", "YY");
// Pattern allowing to extract only the words
Pattern pattern = Pattern.compile("\\w+");
Matcher matcher = pattern.matcher(template);
int fromIndex = 0;
while (matcher.find(fromIndex)) {
// The start index of the current word
int startIdx = matcher.start();
if (fromIndex < startIdx) {
// Add what we have between two words
populatedTemplate.append(template, fromIndex, startIdx);
}
// The current word
String word = matcher.group();
// Replace the word by itself or what we have in the map
populatedTemplate.append(map.getOrDefault(word, word));
// Start the next find from the end index of the current word
fromIndex = matcher.end();
}
if (fromIndex < template.length()) {
// Add the remaining sub String
populatedTemplate.append(template, fromIndex, template.length());
}
System.out.println(populatedTemplate);
Output:
Hey {HH} are $=YY
Response Update:
Assuming that you want to be able to replace not only words but anything like ${questionNumber}, you will need to create the regular expression dynamically like this:
String template = "Hey {how} are $=you id=minScaleBox-${questionNumber}";
...
map.put("${questionNumber}", "foo");
StringBuilder regex = new StringBuilder();
boolean first = true;
for (String word : map.keySet()) {
if (first) {
first = false;
} else {
regex.append('|');
}
regex.append(Pattern.quote(word));
}
Pattern pattern = Pattern.compile(regex.toString());
...
Output:
Hey {HH} are $=YY id=minScaleBox-foo

Counting occurrences of string in an arraylist using case-insensitive comparison

I have to count the occurrences of each String in an ArrayList that includes Strings and ints. Right now I have to ignore the int variable that corresponds to the quantity of every item and just count the repetitions of each String in that list.
My problem is that in class we only did this with ints. Now with Strings I'm having a problem with the casing because "abc" is different than "Abc" and "def of Ghi" is different from "Def of ghi".
Right now the code I have is this:
Map<String, Integer> getCount1 = new HashMap<>();
{
for (ItemsList i : list) {
Integer count = getCount1.get(i.name);
if (count == null) {
count = 0;
}
getCount1.put(i.name, (count.intValue() + 1));
}
for (Map.Entry<String, Integer> entry : getCount1.entrySet())
f.println(entry.getKey() + " : " + entry.getValue());
}
But as I said: it is not counting the occurrences correctly. I have, for example, one occurrence in my list called "Abc of abc" then in the input.txt file list I have that same occurrence 4 times - "Abc of abc"; "abc of Abc"; "Abc of Abc" and "abc Of abc" - all written differently and it's counting them separately instead of the same occurrence 4 times.
Early on when I was working on the totals and averages I was able to use equalsIgnoreCase() so it works fine in there, so regardless of the casing, it's counting them in the right list but not as just one occurrence several times.
Is there a way that I could use the ignore case or convert everything to the same case before counting them?
Just an update: Instead of trying the .toLowerCase() there I used it in the FileReader when it reads the .txt file and it worked i.name = name.toLowerCase();
Thanks for your time and help anyway

Try this :
public void getCount(){
Map<String, Integer> countMap = new HashMap<String, Integer>();
for(ItemsList i : itemsList){
if(countMap.containsKey(i.Name.toLowerCase())){
countMap.get(i.Name.toLowerCase())++;
}
else{
countMap.put(i.Name.toLowerCase(),1);
}
}
}

The hashing function for a HashMap is case sensitive, so you need to uppercase or lowercase the string values. See below your modified code:
Map<String, Integer> getCount1 = new HashMap<>();
{
for (ItemsList i : list) {
Integer count = getCount1.get(i.name);
if (count == null) {
count = 0;
}
getCount1.put(i.name.toString(). toLowerCase() , (count.intValue() + 1));
}
for (Map.Entry<String, Integer> entry : getCount1.entrySet())
f.println(entry.getKey() + " : " + entry.getValue());
}
As a point of style I'd use a more descriptive name for the items in your ItemsList like item.

Instead of trying the .toLowerCase() there I used it in the FileReader when it reads the .txt file and it worked i.name = name.toLowerCase();
So in the end my code there was this:
static void readFile(ArrayList<Items> list) throws IOException {
BufferedReader in = new BufferedReader(
new FileReader("input.txt")
);
String text;
while( (text = in.readLine()) != null ) {
if(text.length()==0) break;
Scanner line = new Scanner(text);
linha.useDelimiter("\\s*:\\s*");
String name = line.next();
int qtt = line.nextInt();
Items i = new Items();
i.name = name.toLowerCase();
i.qtt = qtt;
list.add(i);
}
in.close();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Chunking of corresponding data in java - java

Related

Java replace word in curly braces by name

How to merge many List<String> elements in one based on double quote delimiter in java

Extracting strings that contains particular words

replace multiple sub-strings in a string

Counting occurrences of string in an arraylist using case-insensitive comparison

Categories

Resources