How can I find compound string in text

How can I find compound string in text - java

I have been searching for solution to find strings like this howareyou in sentence and remove them from it. For example:
We have a sentence - Hello there, how are you?
And compound - how are you
As a result I want to have this string - Hello there, ? With compound removed.
My current solution is splitting string into words and checking if compound contains each word, but it's not working well, because if you have other words that match that compound they will also be removed, e.g.:
If we will look for foreseenfuture in this string - I have foreseen future for all of you, then, according to my solution for will also be removed, because it is inside of compound.
Code
String[] words = text.split("[^a-zA-Z]");
String compound = "foreseenfuture";
int startIndex = -1;
int endIndex = -1;
for(String word : words){
if(compound.contains(word)){
if(startIndex == -1){
startIndex = text.indexOf(word);
}
endIndex = text.indexOf(word) + word.length() - 1;
}
}
if(startIndex != -1 && endIndex != -1){
text = text.substring(0, startIndex) + "" + text.substring(endIndex + 1, text.length() - 1);
}
So, is there any other way to solve this?

I'm going to assume that when you compound you only remove whitespace. So with this assumption "for,seen future. for seen future" would become "for,seen future. " since the comma breaks up the other compound. In this case then this should work:
String example1 = "how are you?";
String example2 = "how, are you... here?";
String example3 = "Madam, how are you finding the accommodations?";
String example4 = "how are you how are you how are you taco";
String compound = "howareyou";
StringBuilder compoundRegexBuilder = new StringBuilder();
//This matches to a word boundary before the first word
compoundRegexBuilder.append("\\b");
// inserts each character into the regex
for(int i = 0; i < compound.length(); i++) {
compoundRegexBuilder.append(compound.charAt(i));
// between each letter there could be any amount of whitespace
if(i<compound.length()-1) {
compoundRegexBuilder.append("\\s*");
}
}
// Makes sure the last word isn't part of a larger word
compoundRegexBuilder.append("\\b");
String compoundRegex = compoundRegexBuilder.toString();
System.out.println(compoundRegex);
System.out.println("Example 1:\n" + example1 + "\n" + example1.replaceAll(compoundRegex, ""));
System.out.println("\nExample 2:\n" + example2 + "\n" + example2.replaceAll(compoundRegex, ""));
System.out.println("\nExample 3:\n" + example3 + "\n" + example3.replaceAll(compoundRegex, ""));
System.out.println("\nExample 4:\n" + example4 + "\n" + example4.replaceAll(compoundRegex, ""));
The output is as follows:
\bh\s*o\s*w\s*a\s*r\s*e\s*y\s*o\s*u\b
Example 1:
how are you?
?
Example 2:
how, are you... here?
how, are you... here?
Example 3:
Madam, how are you finding the accommodations?
Madam, finding the accommodations?
Example 4:
how are you how are you how are you taco
taco
You can also use this to match any other alpha-numeric compound.

Related

How to split a string into 2 equal parts in Java

I want to divide a string like this:
String = "Titanic";
into two strings of equal length, and if it isn't divisible by 2 it will have 1 letter or extra on first or second part. like this
//if dividle by 2
Str1 = "BikG";
Str2 = "amer";
//if it isnt dividle by 2
Str1 = "Tita";
Str2 = "nic";

You can do it for example like this:
String base = "somestring";
int half = base.length() % 2 == 0 ? base.length()/2 : base.length()/2 + 1;
String first = base.substring(0, half);
String second = base.substring(half);
Simply when n is the string's length, if n is divisible by 2, split the string in n/2, otherwise split in n/2 + 1 so that first substring is one character longer than second.

What do you do to divide an odd number e.g. 15 with the same requirement?
You store the result of 15 / 2 into an int variable say
int half = 15 / 2
which gives you 7. As per your requirement, you need to add 1 to half to make the first half (i.e. 8) and the remaining half will be 15 - 8 = 7.
On the other hand, in case of an even number, you simply divide it by 2 to have two halves.
You have to apply the same logic in the case of a String as well. Given below is a demo:
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
int half;
String str1 = "Titanic";
half = str1.length() / 2;
String str1Part1 = str1.substring(0, half + 1);
String str1Part2 = str1.substring(half + 1);
System.out.println(str1Part1 + ", " + str1Part2);
String str2 = "HelloWorld";
half = str2.length() / 2;
String str2Part1 = str2.substring(0, half);
String str2Part2 = str2.substring(half);
System.out.println(str2Part1 + ", " + str2Part2);
Scanner in = new Scanner(System.in);
do {
System.out.print("Enter a string: ");
String str = in.nextLine();
half = str.length() / 2;
System.out.println(str.length() % 2 == 1 ? str.substring(0, half + 1) + ", " + str.substring(half + 1)
: str.substring(0, half) + ", " + str.substring(half));
System.out.print("Enter Y to continue or any input to exit: ");
} while (in.nextLine().toUpperCase().equals("Y"));
}
}
A sample run:
Tita, nic
Hello, World
Enter a string: Arvind
Arv, ind
Would you like to continue? [Y/N]: y
Enter a string: Kumar
Kum, ar
Would you like to continue? [Y/N]: Y
Enter a string: Avinash
Avin, ash
Would you like to continue? [Y/N]: n
Note:
% is a modulo operator.
Check String substring(int beginIndex, int endIndex) and String substring(int beginIndex) to learn more about substring functions of String.
Check https://docs.oracle.com/javase/tutorial/java/nutsandbolts/op2.html to learn about the ternary operator.

You could use String.substring() for this purpose:
String s = "Titanic";
int half = (s.length()+1)/2;
String part1 = s.substring(0, half);
String part2 = s.substring(half);
System.out.println(part1); // Prints: Tita
System.out.println(part2); // Prints: nic
Here (s.length()+1)/2 will auto-truncate 0.5 if it occurs because the division is between ints.

How to return the 3 middle characters of an odd string using the substring method?

I'm trying to return the middle 3 characters of a word using the substring method but how do I return the middle 3 letters of a word if the word can be any size (ODD only)?
My code looks like this.
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
Scanner scnr = new Scanner(System.in);
String inputWord;
inputWord = scnr.next();
System.out.println("Enter word: " + inputWord + " Midfix: " + inputWord.substring(2,5));
}
}
The reason I have a 2 and 5 in the substring method is because I have tried it with the word "puzzled" and it returned the middle three letters as it was supposed to do. But if I try, for instance "xxxtoyxxx", It prints out "xto" instead of "toy".
P.S. Please don't bash me I'm new to coding :)

Consider the following code:
String str = originalString.substring(startingPoint, startingPoint + length)
To determine the startingPoint, we need to find the middle of the String and go back half the number of characters as the length we want to retrieve (in your case 3):
int startingPoint = (str.length() / 2) - (length / 2);
You could even build a helper method for this:
private String getMiddleString(String str, int length) {
if (str.length() <= length) {
return str;
}
final int startingPoint = (str.length() / 2) - (length / 2);
return "[" + str.substring(startingPoint, startingPoint + length) + "]";
}
Complete Example:
class Sample {
public static void main(String[] args) {
String text = "car";
System.out.println(getMiddleString(text, 3));
}
private static String getMiddleString(String str, int length) {
// Just return the entire string if the length is greater than or equal to the size of the String
if (str.length() <= length) {
return str;
}
// Determine the starting point of the text. We need first find the midpoint of the String and then go back
// x spaces (which is half of the length we want to get.
final int startingPoint = (str.length() / 2) - (length / 2);
return "[" + str.substring(startingPoint, startingPoint + length) + "]";
}
}
Here, I've put the output in [] brackets to reflect any spaces that may exist. The output of the above example is: [ppl]
Using this dynamic approach will allow you to run the same method on any length of String. For example, if our text String is "This is a much longer String..." our output would be: [ lo]
Considerations:
What if the input text has an even number of characters, but the length is odd? You would need to determine if you want to round the length up/down or return a slightly off-center set of characters.

I think what you can do is to calculate the string length then divided by 2. This gives you the string in the middle, then you can subtract one to the start and add 2 to the end. If you want to get the first two for an odd string, then subtract 2 to the start index and add 1 to the end.
String word_length = inputWord.length()/2;
System.out.println("Enter word: " + inputWord + " Midfix: " + inputWord.substring((word_length-1, word_length+2));
Hope this helps.

This will get the middle of the string, and return the characters at the middle, and +- 1 from the middle index.
public static String getMiddleThree(String str) {
int position, length;
if (str.length() % 2 == 0) {
position = str.length() / 2 - 1;
length = 2;
} else {
position = str.length() / 2;
length = 1;
}
int start = position >= 1 ? position - 1 : position;
return str.substring(start, position + 1);
}
The work left you have to do is make sure the end position is not greater than the length of the string, otherwise, choose the position as the final index

Java adding whitespace to whitespace

I have a small problem that looks unsolvable:
GOAL: Justify lines (strings) in an ArrayList, by adding whitespaces to single whitespace characters, as much as needed for the text to get justified.
package com.mycompany.app;
import java.util.ArrayList;
import java.util.List;
public class MaxLengthLine {
String[] words;
int size;
int qtySpaces;
public MaxLengthLine (String text, int size){
this.words = text.split(" ");
this.size = size;
}
List<String> lines = new ArrayList<String>();
public void lineResize() {
int index = 0;
for (int i = 0; i < words.length - index; i++){
String curLine = "";
while((curLine + words[index]).length() <= size){
curLine += words[index] + " ";
index++;
}
curLine = curLine.substring(0, curLine.length()-1);
lines.add(curLine);
}
String curLine = "";
while(index < words.length){
curLine += words[index] + " ";
index++;
}
curLine = curLine.substring(0, curLine.length()-1);
lines.add(curLine);
}
public void lineJustify() {
for (int i = 0; i < lines.size(); i++){
while (lines.get(i).length() < size){
String test = lines.get(i).replaceFirst(" ", " ");
lines.set(i, test);
}
}
}
public String getTextFull (){
String output = "";
for(int i = 0; i < lines.size();i++){
output += lines.get(i) + "\n";
}
while (output.contains(" ")){
output = output.replace(" ", " ");
}
return output;
}
}
This code is the most straightfoward solution I thought at first (besides I have already tried plenty of others), but for some reason the result keeps coming the same.
Actual output:
In the beginning God created the heavens
and the earth. Now the earth was
formless and empty, darkness was over
the surface of the deep, and the Spirit
of God was hovering over the waters.
And God said, "Let there be light," and
there was light. God saw that the light
was good, and he separated the light
from the darkness. God called the light
"day," and the darkness he called
"night." And there was evening, and
there was morning - the first day.
Desired output:
In the beginning God created the heavens
and the earth. Now the earth was
formless and empty, darkness was over
the surface of the deep, and the Spirit
of God was hovering over the waters.
And God said, "Let there be light," and
there was light. God saw that the light
was good, and he separated the light
from the darkness. God called the light
"day," and the darkness he called
"night." And there was evening, and
there was morning - the first day.
Edit: The input:
In the beginning God created the heavens and the earth. Now the earth was formless and empty, darkness was over the surface of the deep, and the Spirit of God was hovering over the waters.
And God said, "Let there be light," and there was light. God saw that the light was good, and he separated the light from the darkness. God called the light "day," and the darkness he called "night." And there was evening, and there was morning - the first day.
(I already have code that breaks lines correctly at 40 chars without breaking words, so the last part is that function to justify the text to 40 chars)
EDIT 2: I changed that piece of code for the whole class to be more clear, the size being set on my teste is 40.

public static List<String> justifyLines(String input, int lineLength) {
String[] words = input.split(" ");
List<String> result = new ArrayList<>();
StringBuilder line = new StringBuilder();
//here we store positions of all spaces in the current line to add more spaces there
List<Integer> spacesPositions = new ArrayList<>();
for (String word : words) {
if (word.length() <= lineLength - line.length()) {
line.append(word).append(" ");
spacesPositions.add(line.length() - 1);
} else {
result.add(justifyLine(line, lineLength, spacesPositions));
line.setLength(0);
spacesPositions.clear();
line.append(word).append(" ");
spacesPositions.add(line.length() - 1);
}
}
if (line.length() > 0) {
result.add(justifyLine(line, lineLength, spacesPositions));
}
return result;
}
private static String justifyLine(StringBuilder line, int lineLength, List<Integer> spacesPositions) {
//if line ends with space - remove it
if (line.lastIndexOf(" ") == line.length() - 1) line.setLength(line.length() - 1);
int spacesToAdd = lineLength - line.length();
for (int j = 0; j < spacesToAdd; j++) {
//It's the most complicated part, but I'll try to explain
line.insert(
// We're adding one space to each space in the line and then, if there are still spaces to insert,
// repeating this process from the beginning - that's why we're using %
spacesPositions.get(j % (spacesPositions.size() - 1))
// But each time we insert a new space, we need to take it into account for the following positions
// j % (spacesPositions.size() - 1) is the number of space in the line
// j / (spacesPositions.size() - 1) + 1 is the iteration number
+ j % (spacesPositions.size() - 1) * (j / (spacesPositions.size() - 1) + 1), " ");
}
return line.toString();
}
So for (String s : justifyLines("In the beginning...", 40)) System.out.println(s); prints:
In the beginning God created the heavens
and the earth. Now the earth was
formless and empty, darkness was over
the surface of the deep, and the Spirit
of God was hovering over the waters. And
God said, "Let there be light," and
there was light. God saw that the light
was good, and he separated the light
from the darkness. God called the light
"day," and the darkness he called
"night." And there was evening, and
there was morning - the first day.

From Java String class:
public String replaceFirst(String regex,
String replacement)
Replaces the first substring of this string that matches the given regular expression with the given replacement.
So you need to give it a regex, not the space itself. In java regex for a whitespace is "\s"
String test = lines.get(i).replaceFirst("\\s", " ");
Also, as something to consider, replaceFirst only replaces the first substring that matches the regex, so this code will add whitespace only to the first whitespace you find, not evenly distributed like you want it to be (because the first space of the double space " " will still match to the regex "\s".)
Do check on this.

Get all captured groups in Java

I want to match a single word inside brackets(including the brackets), my Regex below is working but it's not returning me all groups.
Here's my code:
String text = "This_is_a_[sample]_text[notworking]";
Matcher matcher = Pattern.compile("\\[([a-zA-Z_]+)\\]").matcher(text);
if (matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println("------------------------------------");
System.out.println("Group " + i + ": " + matcher.group(i));
}
Also I've tested it in Regex Planet and it seems to work.
It must return 4 groups:
------------------------------------
Group 0: [sample]
------------------------------------
Group 1: sample
------------------------------------
Group 2: [notworking]
------------------------------------
Group 3: notworking
But it's returning just it:
------------------------------------
Group 0: [sample]
------------------------------------
Group 1: sample
What's wrong?

JAVA does not offer fancy global option to find all the matches at once. So, you need while loop here
int i = 0;
while (matcher.find()) {
for (int j = 0; j <= matcher.groupCount(); j++) {
System.out.println("------------------------------------");
System.out.println("Group " + i + ": " + matcher.group(j));
i++;
}
}
Ideone Demo

Groups aren't thought to find several matches. They are thought to identify several subparts in a single match, e.g. the expression "([A-Za-z]*):([A-Za-z]*)" would match a key-value pair and you could get the key as group 1 and the value as group 2.
There is only 1 group (= one brackets pair) in your expression and therefore only the groups 0 (always the whole matched expression, independently of your manually defined groups) and 1 (the single group you defined) are returned.
In your case, try calling find iteratively, if you want more matches.
int i = 0;
while (matcher.find()) {
System.out.println("Match " + i + ": " + matcher.group(1));
i++;
}

Also if you know the amount of matches you will get, you can find groups and add them to list then just take values from list when needed to assigned them somewhere else
public static List<String> groupList(String text, Pattern pattern) {
List<String> list = new ArrayList<>();
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
list.add(matcher.group(i));
}
}
return list;

why does a string who clearly contains a piece of other, not result in a found hit using .contains()?

this program attempts to scan a text file radio music log, and then match the songs to a directory of wav files. all files are named with the same convention: artist-title, ie: lukebryan-kickthedustup.wav. i swap the locations of the title and artist using the delimiter feature, which allows for easy comparison to the music log, which is already formatted the same way: title, artist.
now, lets say i'm searching the term "lovingyoueasyza", which is Loving You Easy by the Zac Brown Band... when it reaches the file in the directory with the assigned string "lovingyoueasyzacbrownband", it ignores it, even though it contains that string. you'll see i'm calling:
if(searchMe.contains(findMe))
yet it doesn't return a hit. it will return matches if the findMe string only contains the song title, but if any part of the artist title creeps into that string, it stops working. why!? for shorter titles its critical i be able to search for artist name as well, which is why i can't just search by song title.
i've tried using .trim() to no avail. here is some sample output of when a match is found:
searching term: "onehellofanamen"
comparing to: "onehellofanamendbrantleygilbert"
Match found!
value of findMe: onehellofanamen
value of searchMe: onehellofanamendbrantleygilbert
value of y: 49
value of x: 79
here is sample output of a failed attempt to match:
searching term: "lovingyoueasyza"
comparing to: "keepmeinminddzacbrownband"
searching term: "lovingyoueasyza"
comparing to: "lovingyoueasydzacbrownband"
searching term: "lovingyoueasyza"
comparing to: "nohurrydzacbrownband"
searching term: "lovingyoueasyza"
comparing to: "toesdzacbrownband"
searching term: "lovingyoueasyza"
this is what the findMe's go into the method as:
fileToProcess var is: C:\test\06012015.TXT
slot #0: topofhouridplac
slot #1: lovemelikeyoume
slot #2: wearetonightbil
slot #3: turnitoneliyoun
slot #4: lonelytonightbl
slot #5: stopset
slot #6: alrightdariusru
slot #7: lovingyoueasyza
slot #8: sundazefloridag
slot #9: stopset
the final output of matchesFound is like this:
Item Number: 0 ****TOP OF HOUR****
Item Number: 1 d:\tocn\kelseaballerini-lovemelikeyoumeanit.wav
Item Number: 2 null
Item Number: 3 null
Item Number: 4 null
Item Number: 5 ****STOP SET****
Item Number: 6 null
... through 82.
public static String[] regionMatches(String[] directoryArray,
String[] musicLogArray) throws InterruptedException {
String[] matchesFound = new String[musicLogArray.length];
String[] originalFileList = new String[directoryArray.length];
for (int y = 0; y < directoryArray.length; y++) {
originalFileList[y] = directoryArray[y];
System.out.println("o value: " + originalFileList[y]);
System.out.println("d value: " + directoryArray[y]);
}
for (int q = 0; q < originalFileList.length; q++) {
originalFileList[q] = originalFileList[q].replaceAll(".wav", "");
originalFileList[q] = originalFileList[q].replaceAll("\\\\", "");
originalFileList[q] = originalFileList[q].replaceAll("[+.^:,]", "");
originalFileList[q] = originalFileList[q].replaceAll("ctestmusic",
"");
originalFileList[q] = originalFileList[q].replaceAll("tocn", "");
originalFileList[q] = originalFileList[q].toLowerCase();
String[] parts = originalFileList[q].split("-");
originalFileList[q] = parts[1] + parts[0];
System.out.println(originalFileList[q]);
}
for (int x = 0; x < musicLogArray.length; x++) {
for (int y = 0; y < directoryArray.length; y++) {
//System.out.println("value of x: " + x);
//System.out.println("value of y: " + y);
String searchMe = originalFileList[y];
String findMe = musicLogArray[x];
int searchMeLength = searchMe.length();
int findMeLength = findMe.length();
boolean foundIt = false;
updateDisplay("searching term: " + "\"" + findMe+"\"");
updateDisplay("comparing to: " + "\"" + searchMe + "\"");
//for (int i = 0; i <= (searchMeLength - findMeLength); i++) {
if(searchMe.contains(findMe)){
updateDisplay("Match found!");
updateDisplay("value of findMe: " + findMe);
updateDisplay("value of searchMe: " + searchMe);
updateDisplay("value of y: " + y);
updateDisplay("value of x: " + x);
matchesFound[x] = directoryArray[y];
break;
// if (searchMe.regionMatches(i, findMe, 0, findMeLength)) {
// foundIt = true;
// updateDisplay("MATCH FOUND!: "
// + searchMe.substring(i, i + findMeLength));
//
// matchesFound[x] = directoryArray[y];
//
// break;
} else if (findMe.contains("stopset")){
matchesFound[x] = "****STOP SET****";
break;
} else if (findMe.contains("topofho")) {
matchesFound[x] = "****TOP OF HOUR****";
break;
}
}
//if (!foundIt) {
// updateDisplay("No match found.");
//}
}
//}
return matchesFound;
}

It seems to me that your music directory has a bunch of unwanted d's in the file where you put the pieces back together.
searching term: "lovingyoueasyza"
comparing to: "lovingyoueasydzacbrownband"
The comparing to string does not contain the search term because after "easy" there is a "d" which ruins the search which is why you are having errors including artist names.

Here:
searching term: "lovingyoueasyza"
comparing to: "lovingyoueasydzacbrownband"
In your second string, note that there is an extra d after easy.
So the second string does not contain the first string.
I think you are adding an extra 'd' when combining song name with the artist name.
The same thing is happening for all your other strings, e.g.
searching term: "onehellofanamen"
comparing to: "onehellofanamendbrantleygilbert"
which I suppose is one hell of an amen + the extra 'd' + brantley gilbert.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I find compound string in text - java

Related

How to split a string into 2 equal parts in Java

How to return the 3 middle characters of an odd string using the substring method?

Java adding whitespace to whitespace

Get all captured groups in Java

why does a string who clearly contains a piece of other, not result in a found hit using .contains()?

Categories

Resources