How to pull out substrings (words) from string?

How to pull out substrings (words) from string? - java

I'm trying to input a four word sentence, and then be able to print out each word individually using indexOf and substrings. Any ideas what I'm doing wrong?
Edited
So is this what it should look like? I've ran it twice and received two different answers, so I'm not sure if my program running the program is faulty or my program itself is faulty.
import java.util.Scanner;
public class arithmetic {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String sentence;
String word1, word2, word3, word4;
int w1, w2, w3, w4;
int p, p2, p3, p4;
System.out.print("Enter a sentence with 4 words: ");
sentence = in.nextLine();
p = sentence.indexOf(" ");
word1 = sentence.substring(0,p)+" ";
w1 = 1 + word1.length();
p2 = word1.indexOf(" ");
word2 = sentence.substring(w1,p2);
w2 = w1+1+word2.length();
p3 = word2.indexOf(" ");
word3 = sentence.substring(w2,p3);
w3 = w1+w2+1+word3.length();
p4 = word3.indexOf(" ");
word4 = sentence.substring(w3,p4);
w4 = w1+w2+w3+1+word4.length();

I see at least two things:
You're not computing the indices correctly. The starting index for the third word should be something like length of first word + 1 + length of second word + 1, but it looks like you're leaving out the length of the first word. Similarly, when you're getting the fourth word, you're leaving out the lengths of the first two words.
indexOf(" ") will only get you the index of the first occurrence of a space. After you get the first space, you're reusing that index instead of using the indices of the other spaces.
Lastly, after you fix those two, if you know that the words are going to be delimited by spaces, then you might want to look at the String.split function. Using that, you could split your sentence without having to do all of the space-finding yourself.

I hardly suggest not to use substring and indexOf, both for performance reasons, readability, and bugs. Consider any of the following (all of these are considering words as non-whitespace characters):
public static void main (String[] args) throws java.lang.Exception
{
int wordNo = 0;
System.out.println("using a Scanner (exactly 4 words):");
InputStream in0 = new ByteArrayInputStream("a four word sentence".getBytes("UTF-8"));
Scanner scanner = new Scanner(/*System.*/in0);
try {
String word1 = scanner.next();
String word2 = scanner.next();
String word3 = scanner.next();
String word4 = scanner.next();
System.out.printf("1: %s, 2: %s, 3: %s, 4: %s\n", word1, word2, word3, word4);
} catch(NoSuchElementException ex) {
System.err.println("The sentence is shorter than 4 words");
}
System.out.println("\nusing a Scanner (general):");
InputStream in1 = new ByteArrayInputStream("this is a sentence".getBytes("UTF-8"));
for(Scanner scanner1 = new Scanner(/*System.*/in1); scanner1.hasNext(); ) {
String word = scanner1.next();
System.out.printf("%d: %s\n", ++wordNo, word);
}
System.out.println("\nUsing BufferedReader and split:");
InputStream in2 = new ByteArrayInputStream("this is another sentence".getBytes("UTF-8"));
BufferedReader reader = new BufferedReader(new InputStreamReader(/*System.*/in2));
String line = null;
while((line = reader.readLine()) != null) {
for(String word : line.split("\\s+")) {
System.out.printf("%d: %s\n", ++wordNo, word);
}
}
}

Related

java regex parse

Thanks for checking out my question.
Here the user enter the string in the format: "xD xS xP xH". The program takes the string, splits it on the space bar, then uses regex to parse the string. There is an issue with my "final string regex" and I am not sure where.
final String regex = "([0-9]+)[D|d]| ([0-9]+)[S|s]| ([0-9]+)[P|p]| ([0-9]+)[H|h]";
Lastly, the loop prints out only the value for D so I suspect it reaches an error moving to match S or s.
public class parseStack
{
public parseStack()
{
System.out.print('\u000c');
String CurrencyFormat = "xD xS xP xH";
System.out.println("Please enter currency in the following format: \""+CurrencyFormat+"\" where x is any integer");
Scanner scan = new Scanner(System.in);
String currencyIn = scan.nextLine();
currencyFinal = currencyIn.toUpperCase();
System.out.println("This is the currency you entered: "+currencyFinal);
String[] tokens = currencyFinal.split(" ");
final String input = tokens[0];
final String regex = "([0-9]+)[D|d]| ([0-9]+)[S|s]| ([0-9]+)[P|p]| ([0-9]+)[H|h]";
if (input.matches(regex) == false) {
throw new IllegalArgumentException("Input is malformed.");
}
long[] values = Arrays.stream(input.replaceAll(regex, "$1 $2 $3 $4").split(" "))
.mapToLong(Long::parseLong)
.toArray();
for (int i=0; i<values.length; i++)
{
System.out.println("value of i: "+i+ " |" +values[i]+ "|");
}
//pause to print
System.out.println("Please press enter to continue . . . ");
Scanner itScan = new Scanner(System.in);
String nextIt = itScan.nextLine();
}
}

Your regular expression should be [\d]+[DdSsPpHh].
The problem you are having is you split the string into chunks, then you match chunks with a RegEx that matches the original string that you have split.
HOWEVER this answer only addresses a problem in your code. Your routine doesn't seem to cater your expectation. And your expectation is not clear at all.
EDIT
Added the multidigit requirement.

Your regex can be simplified somewhat.
"(?i)(\d+d) (\d+s) (\d+p) (\d+h)"
will do a case-insensitive match against multiple digits ( \d+ )
This can be further simplified into
"(?i)(\d+[dsph])"
which will iteratively match the various groups in your currency string.

First of all your regex looks a bit to complex. You input format is "xD xS xP xH" also you are converting the input to uppercase currencyIn = currencyIn.toUpperCase(); but this isn't the problem.
The problem is
String[] tokens = currencyIn.split(" ");
final String input = tokens[0];
You are splitting the input and only use the first part which would be "xD"
The fixed code would look like:
String currencyIn = scan.nextLine();
currencyIn = currencyIn.toUpperCase();
System.out.println("This is the currency you entered: "+currencyIn);
final String regex = "([0-9]+)D ([0-9]+)S ([0-9]+)P ([0-9]+)H";
if (!currencyIn.matches(regex)) {
throw new IllegalArgumentException("Input is malformed.");
}
long[] values = Arrays.stream(currencyIn.replaceAll(regex, "$1 $2 $3 $4").split(" "))
.mapToLong(Long::parseLong)
.toArray();
for (int i=0; i<values.length; i++) {
System.out.println("value of i: "+i+ " |" +values[i]+ "|");
}

Search a text file for two strings and display how many strings are in between

How can I use Binary search in Java to find how many strings lie between the two strings given by the user? I have a large text file to search through.
I was thinking ((word position 2 - word position 1)-1) would give the position from an array but I am not quite sure how to put it into code. I got stuck after checking the file for the words.
String[] allWords = new String[400000];
int wordCount = 0;
Scanner input = new Scanner(new File("C:\\text.txt"));
while (input.hasNext()) {
String word = input.next();
allWords[wordCount] = word;
wordCount++;
System.out.println(wordCount);
}
Scanner sc = new Scanner(new File("C:\\text.txt"));
while(sc.hasNextLine()){
String in = sc.nextLine();
System.out.println("Enter a string:");
Scanner sc2 = new Scanner(System.in);
String str = sc2.nextLine();
System.out.println("Enter a string:");
Scanner sc3 = new Scanner(System.in);
String str2 = sc3.nextLine();
if (str.contains(str)) {
System.out.println("yes");
}
if (str.contains(str2)) {
System.out.println("yes");
}

Your math is correct. As you have surmised, subtract one from the difference of the positions. If you have any issues with the code, post your attempt to your question.

You could try something like this pseudocode.
int start
int end
a = startingString
b = startingString
String[] lines = StringFromFile.split("\n");
for(x in lines)
if(x=a)
start = position of x
for(x in lines)
if(x=b)
end = position of x
String[] newLines = Arrays.copyOfRange(lines, start,end)
return newLines.length

How can I move the punctuation from the end of a string to the beginning?

I am attempting to write a program that reverses a string's order, even the punctuation. But when my backwards string prints. The punctuation mark at the end of the last word stays at the end of the word instead of being treated as an individual character.
How can I split the end punctuation mark from the last word so I can move it around?
For example:
When I type in : Hello my name is jason!
I want: !jason is name my Hello
instead I get: jason! is name my Hello
import java.util.*;
class Ideone
{
public static void main(String[] args) {
Scanner userInput = new Scanner(System.in);
System.out.print("Enter a sentence: ");
String input = userInput.nextLine();
String[] sentence= input.split(" ");
String backwards = "";
for (int i = sentence.length - 1; i >= 0; i--) {
backwards += sentence[i] + " ";
}
System.out.print(input + "\n");
System.out.print(backwards);
}
}

Manually rearranging Strings tends to become complicated in no time. It's usually better (if possible) to code what you want to do, not how you want to do it.
String input = "Hello my name is jason! Nice to meet you. What's your name?";
// this is *what* you want to do, part 1:
// split the input at each ' ', '.', '?' and '!', keep delimiter tokens
StringTokenizer st = new StringTokenizer(input, " .?!", true);
StringBuilder sb = new StringBuilder();
while(st.hasMoreTokens()) {
String token = st.nextToken();
// *what* you want to do, part 2:
// add each token to the start of the string
sb.insert(0, token);
}
String backwards = sb.toString();
System.out.print(input + "\n");
System.out.print(backwards);
Output:
Hello my name is jason! Nice to meet you. What's your name?
?name your What's .you meet to Nice !jason is name my Hello
This will be a lot easier to understand for the next person working on that piece of code, or your future self.
This assumes that you want to move every punctuation char. If you only want the one at the end of the input string, you'd have to cut it off the input, do the reordering, and finally place it at the start of the string:
String punctuation = "";
String input = "Hello my name is jason! Nice to meet you. What's your name?";
System.out.print(input + "\n");
if(input.substring(input.length() -1).matches("[.!?]")) {
punctuation = input.substring(input.length() -1);
input = input.substring(0, input.length() -1);
}
StringTokenizer st = new StringTokenizer(input, " ", true);
StringBuilder sb = new StringBuilder();
while(st.hasMoreTokens()) {
sb.insert(0, st.nextToken());
}
sb.insert(0, punctuation);
System.out.print(sb);
Output:
Hello my name is jason! Nice to meet you. What's your name?
?name your What's you. meet to Nice jason! is name my Hello

Like the other answers, need to separate out the punctuation first, and then reorder the words and finally place the punctuation at the beginning.
You could take advantage of String.join() and Collections.reverse(), String.endsWith() for a simpler answer...
String input = "Hello my name is jason!";
String punctuation = "";
if (input.endsWith("?") || input.endsWith("!")) {
punctuation = input.substring(input.length() - 1, input.length());
input = input.substring(0, input.length() - 1);
}
List<String> words = Arrays.asList(input.split(" "));
Collections.reverse(words);
String reordered = punctuation + String.join(" ", words);
System.out.println(reordered);

The below code should work for you
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ReplaceSample {
public static void main(String[] args) {
String originalString = "TestStr?";
String updatedString = "";
String regex = "end\\p{Punct}+|\\p{Punct}+$";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(originalString);
while (matcher.find()) {
int start = matcher.start();
updatedString = matcher.group() + originalString.substring(0, start);<br>
}
System.out.println("Original -->" + originalString + "\nReplaced -->" + updatedString);
}
}

You need to follow the below steps:
(1) Check for the ! character in the input
(2) If input contains ! then prefix it to the empty output string variable
(3) If input does not contain ! then create empty output string variable
(4) Split the input string and iterate in reverse order (you are already doing this)
You can refer the below code:
public static void main(String[] args) {
Scanner userInput = new Scanner(System.in);
System.out.print("Enter a sentence: ");
String originalInput = userInput.nextLine();
String backwards = "";
String input = originalInput;
//Define your punctuation chars into an array
char[] punctuationChars = {'!', '?' , '.'};
String backwards = "";
//Remove ! from the input
for(int i=0;i<punctuationChars.length;i++) {
if(input.charAt(input.length()-1) == punctuationChars[i]) {
input = input.substring(0, input.length()-1);
backwards = punctuationChars[i]+"";
break;
}
}
String[] sentence= input.split(" ");
for (int i = sentence.length - 1; i >= 0; i--) {
backwards += sentence[i] + " ";
}
System.out.print(originalInput + "\n");
System.out.print(input + "\n");
System.out.print(backwards);
}

Don't split by spaces; split by word boundaries. Then you don't need to care about punctuation or even putting spaces back, because you just reverse them too!
And it's only 1 line:
Arrays.stream(input.split("\\b"))
.reduce((a, b) -> b + a)
.ifPresent(System.out::println);
See live demo.

How to separate words by spaces?

What I'm trying to do in this code is separate each word of a five-word input into the five words that it's made of. I managed to get the first word separated from the rest of the input using indexOf and substring, but I have problems separating the rest of the words. I am just wondering what I could do to fix this.
import java.util.Scanner;
public class CryptographyLab {
public static void main (String [] args) {
fiveWords();
}
public static void fiveWords () {
Scanner input = new Scanner(System.in);
for (int i = 1; i <= 3; i++) {
if (i > 1) {
String clear = input.nextLine();
// I was having some problems with the input buffer not clearing, and I know this is a funky way to clear it but my skills are pretty limited wher I am right now
}
System.out.print("Enter five words: ");
String fW = input.nextLine();
System.out.println();
// What I'm trying to do here is separate a Scanner input into each word, by finding the index of the space.
int sF = fW.indexOf(" ");
String fS = fW.substring(0, sF);
System.out.println(fS);
int dF = fW.indexOf(" ");
String fD = fW.substring(sF, dF);
System.out.println(fD);
int gF = fW.indexOf(" ");
String fG = fW.substring(dF, gF);
//I stopped putting println commands here because it wasn't working.
int hF = fW.indexOf(" ");
String fH = fW.substring(gF, hF);
int jF = fW.indexOf(" ");
String fJ = fW.substring(hF, jF);
System.out.print("Enter five integers: ");
int fI = input.nextInt();
int f2 = input.nextInt();
int f3 = input.nextInt();
int f4 = input.nextInt();
int f5 = input.nextInt();
//this part is unimportant because I haven't worked out the rest yet
System.out.println();
}
}
}

The Scanner class has a next() method that returns the next "token" from the input. In this case, I think calling next() five times in succession should return your 5 words.
As Alex Yan points out in his answer, you can also use the split method on a string to split on some delimiter (in this case, a space).

You're extracting the strings incorrectly. But there is another simpler solution which I'll explain after.
The problem with your approach is that you're not supplying the indices correctly.
After the first round of extraction:
fW = "this should be five words"
sf = indexOf(" ") = 4
fS = fW.substring(0, sF) = "this"
This appears correct. But after the second round:
fW = "this should be five words". Nothing changed
df = indexOf(" ") = 4. Same as above
fD = fW.substring(sF, dF) = substring(4, 4). You get a null string
We see that the problem is because indexOf() finds the first occurrence of the supplied substring. substring() doesn't remove the portion that you substring. If you want to keep doing it this way, you should trim off the word you just substringed.
space = input.indexOf(" ");
firstWord = input.substring(0, space);
input = input.substring(space).trim(); // sets input to "should be five words" so that indexOf() looks for the next space during the next round
A simple solution is to just use String.split() to split this into an array of substrings.
String[] words = fw.split(" ");
If input is "this should be five words"
for (int i = 0; i < words.length; ++i)
System.out.println(words[i]);
Should print:
this
should
be
five
words

java regex: capitalize words with certain number of characters

I am trying to capitalize the words in a string with more than 5 characters.
I was able to retrieve the number of words that are greater 5 characters using .length, and I could exclude the words that were greater than 5 characters but I couldn't capitalize them.
Ex. input: "i love eating pie"
Ex. output: "i love Eating pie"
Here's my code:
public static void main(String[] args) {
String sentence = "";
Scanner input = new Scanner(System.in);
System.out.println("Enter a sentence: ");
sentence = input.nextLine();
String[] myString = sentence.split("\\s\\w{6,}".toUpperCase());
for (String myStrings : myString) {
System.out.println(sentence);
System.out.println(myStrings);
}

String sentence = "";
StringBuilder sb = new StringBuilder(sentence.length());
Scanner input = new Scanner(System.in);
System.out.println("Enter a sentence: ");
sentence = input.nextLine();
/*
* \\s (match whitespace character)
* (<?g1> (named group with name g1)
* \\w{6,}) (match word of length 6) (end of g1)
* | (or)
* (?<g2> (named group with name g2)
* \\S+) (match any non-whitespace characters) (end of g2)
*/
Pattern pattern = Pattern.compile("\\s(?<g1>\\w{6,})|(?<g2>\\S+)");
Matcher matcher = pattern.matcher(sentence);
//check if the matcher found a match
while (matcher.find())
{
//get value from g1 group (null if not found)
String g1 = matcher.group("g1");
//get value from g2 group (null if not found)
String g2 = matcher.group("g2");
//if g1 is not null and is not an empty string
if (g1 != null && g1.length() > 0)
{
//get the first character of this word and upercase it then append it to the StringBuilder
sb.append(Character.toUpperCase(g1.charAt(0)));
//sanity check to stop us from getting IndexOutOfBoundsException
//check if g1 length is more than 1 and append the rest of the word to the StringBuilder
if(g1.length() > 1) sb.append(g1.substring(1, g1.length()));
//append a space
sb.append(" ");
}
//we only need to check if g2 is not null here
if (g2 != null)
{
//g2 is smaller than 5 characters so just append it to the StringBuilder
sb.append(g2);
//append a space
sb.append(" ");
}
}
System.out.println("Original Sentence: " + sentence);
System.out.println("Modified Sentence: " + sb.toString());

Split input sentence with space as delimiter and use intiCap method if length greater than 5:
PS: System.out.print to be replaced with StringBuilder.
String delim = " ";
String[] myString = sentence.split(delim);
for (int i = 0; i < myString.length; i++) {
if (i != 0) System.out.print(delim);
if (myString[i].length() > 5)
System.out.print(intiCap(myString[i]));
else
System.out.print(myString[i]);
}
private static String intiCap(String string) {
return Character.toUpperCase(string.charAt(0)) + string.substring(1);
}

You can use the following (short and sweet :P):
Pattern p = Pattern.compile("(?=\\b\\w{6,})([a-z])\\w+");
Matcher m = p.matcher(sentence);
StringBuffer s = new StringBuffer();
while (m.find()){
m.appendReplacement(s, m.group(1).toUpperCase() + m.group(0).substring(1));
}
System.out.println(s.toString());
See Ideone Demo

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to pull out substrings (words) from string? - java

Related

java regex parse

Search a text file for two strings and display how many strings are in between

How can I move the punctuation from the end of a string to the beginning?

How to separate words by spaces?

java regex: capitalize words with certain number of characters

Categories

Resources