Parsing a string that contains multiple symbols results in crash

Parsing a string that contains multiple symbols results in crash - java

Im trying to split some characters in Java that contains "," , ":" and "-"
For instance ,
if the input is 58,1:2-4, it should produce the following output
Booknumber: 58
Chapter Number: 1
Verses = [2,3,4] (since 2-4 is the
values from 2 to 4)
Following is the code that I have tried,
private int getBookNumber() {
bookNumber = chapterNumber.split("[,]")[0];
return Integer.valueOf(bookNumber);
}
private int getChapterNumber() {
chapterNumber = sample.split("[:]")[0];
verseNumbers = sample.split("[:]")[1];
return Integer.valueOf(chapterNumber);
}
private List<Integer> getVerseNumbers(String bookValue) {
List<Integer> verseNumList = new ArrayList<>();
if (bookValue.contains("-")) {
//TODO parse - separated string
} else {
verseNumList.add(Integer.valueOf(bookValue));
}
return verseNumList;
}
I would invoke them in the following manner sequentially
int chapterNumber = getChapterNumber();
int bookNumber = getBookNumber();
List<Integer> verseNumbers = getVerseNumbers(this.verseNumbers);
But Im getting Caused by: java.lang.NumberFormatException: Invalid int: "58 , 1 " in the line int chapterNumber = getChapterNumber();
is there an efficient way to parse this string ?

You should change getChapterNumber like this:
private int getChapterNumber() {
chapterNumber = sample.split("[:]")[0];
verseNumbers = sample.split("[:]")[1];
return Integer.valueOf(chapterNumber.split("[,]")[1]);
}
But the best would be to use matcher:
String line = "58,1:2-4";
Pattern pattern = Pattern.compile("(\\d+),(\\d+):(.*)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
System.out.println("group 2: " + matcher.group(2));
System.out.println("group 3: " + matcher.group(3));
}
Output:
group 1: 58
group 2: 1
group 3: 2-4

I might approach this using base string methods to avoid the heavy equipment which comes with a regex matcher:
String input = "58,1:2-4";
int commaIndex = input.indexOf(",");
int colonIndex = input.indexOf(":");
int bookNumber = Integer.valueOf(input.substring(0, commaIndex));
int chapterNumber = Integer.valueOf(input.substring(commaIndex+1, colonIndex));
String verseString = input.substring(colonIndex+1);
String[] verses = verseString.split("-");
int startVerse = Integer.valueOf(verses[0]);
int endVerse = Integer.valueOf(verses[1]);
int[] allVerses = new int[endVerse - startVerse + 1];
for (int i=0; i < allVerses.length; ++i) {
allVerses[i] = startVerse + i;
}

Related

How to use a regular expression to print repeating characters only once and non repeating characters in the same order as they appear in a String?

I'm writing a function to print decimal representation of a rational number( in the form of numerator and denominator) and trying to print the repeating part of digits inside a parenthesis and decimal part remains the same.
for EX: 1) 2/3=0.(3)
2) 2/4=0.5(0)
3)22/7=3.(142857)
For this I tried using a regular expression to capture the repeating characters of decimal part but my regular expression captures the repeating characters once and non repeating characters.
Here is my code...Can someone help me on this!!!
div = ((double) num)/deno;
String str = String.valueOf(div);
String arr[] = str.split("\\.");
String wp = arr[0];
String dp = arr[1];
String repeated = dp.replaceAll("(.+?)\\1+", "$1");
System.out.println("repeated is " + repeated);
System.out.println(wp + "." + "(" + repeated + ")");`
output I'm getting is:-
Input given 22/7
Integer part: 3
Decimal part: 142857142857143
repeating characters captured by regular expression- 142857143
final output-3.(142857143)

When you are replacing the repeating part, the last 143 is not getting replaced with `` empty string. So it remains in the output.
You can use Pattern class, with regex (\d+)+\1, like this:
public class Test
{
public static void main(String[] args) throws Exception
{
double[] nums = {2.0/3, 2.0/4, 22.0/7};
for(double d : nums)
print(d);
}
static void print(double div) {
String str = String.valueOf(div);
String arr[] = str.split("\\.");
String wp = arr[0];
String dp = arr[1];
String repeated = dp;
Pattern ptrn = Pattern.compile("(\\d+)+\\1");
Matcher m = ptrn.matcher(dp);
if(m.find()) {
repeated = m.group(1);
System.out.println(str + " -> "+ wp + "." + "(" + repeated + ")");
} else {
System.out.println(str + " -> "+ wp + "." + dp +"(0)");
}
}
}
Output:
0.6666666666666666 -> 0.(6)
0.5 -> 0.5(0)
3.142857142857143 -> 3.(142857)

Your regex is pretty close to work.
Alternative:
Matcher matcher = Pattern.compile("(.+?)\\1").matcher(decimalPart);
String repeated = matcher.find() ? matcher.group(1) : "0";
See alternative in context:
public static void main(String[] args) {
List<String> divisions = Arrays.asList("2/3", "2/4", "22/7");
List<String> quotientsAsString = getQuotientsAsString(divisions);
List<String> repeatedResult = getRepeatedResult(quotientsAsString);
printResult(divisions, quotientsAsString, repeatedResult);
}
private static void printResult(List<String> divisions, List<String> quotientsAsString,
List<String> repeatedResult) {
for (int i = 0; i < divisions.size(); i++) {
System.out.printf("%d) %s = %s => %s%n", (i + 1), divisions.get(i)
, quotientsAsString.get(i), repeatedResult.get(i));
}
}
private static List<String> getRepeatedResult(List<String> quotientsAsString) {
//Pre-compile regex before enter loop
Pattern dotSignPattern = Pattern.compile("\\.");
Pattern repeatedDecimalPattern = Pattern.compile("(.+?)\\1");
List<String> repeatedResult = new ArrayList<>();
for (String quotient : quotientsAsString) {
String[] quotientParts = dotSignPattern.split(quotient);
String integerPart = quotientParts[0];
String decimalPart = quotientParts[1];
// Pattern in context!!!
Matcher matcher = repeatedDecimalPattern.matcher(decimalPart);
String repeated = matcher.find() ? matcher.group(1) : "0";
String resultRepeated = String.format("%s.(%s)", integerPart, repeated);
String resultZeroRepeated = String.format("%s.%s(%s)", integerPart, decimalPart, repeated);
String result = repeated.equals("0") ? resultZeroRepeated : resultRepeated;
repeatedResult.add(result);
}
return repeatedResult;
}
private static List<String> getQuotientsAsString(List<String> divisions) {
//Pre-compile regex before enter loop
Pattern divSignPattern = Pattern.compile("/");
List<String> quotientsAsString = new ArrayList<>();
for (String div : divisions) {
String[] divParts = divSignPattern.split(div);
Double dividend = Double.valueOf(divParts[0]);
Double divisor = Double.valueOf(divParts[1]);
Double quotient = dividend / divisor;
quotientsAsString.add(String.valueOf(quotient));
}
return quotientsAsString;
}
Output:
1) 2/3 = 0.6666666666666666 => 0.(6)
2) 2/4 = 0.5 => 0.5(0)
3) 22/7 = 3.142857142857143 => 3.(142857)

Regex/Java Pattern : Finding occurrences of a sub string in a string with fault tolerance of 1 or more characters

How to find occurrences of a sub string in a string with fault tolerance of 1 or more characters?
Example.
Source : John Smith
With Fault tolerance 1 Character:
Sub String 1: Jahn should result to 1
Sub String 2: Jonn should result to 1
Sub String 3: Johm should result to 1
Sub String 4:
johm should result to 1 //ignore case
With Fault tolerance 2 Character:
Sub String 1: Jann should result to 1
Sub String 2: Joom should result to 1
and etc...
Any Regex Solution ??
Java Pattern Matching? In this case, a method like this
int countOccurrenceWithFaultTolerance(String source, String subString, int faultTolerance) {
// TODO
return 0;
}

May be I found this,
public class HelloWorld{
public static void main(String []args){
final int mismatchTolerance = 1;
final String text = "bubbles";
final String pattern = "bu";
final int textIndexMax = text.length() - pattern.length() + 1;
for (int textIndex = 0; textIndex < textIndexMax; textIndex++) {
int missed = 0;
for (int patternIndex = 0; patternIndex < pattern.length(); patternIndex++) {
final char textChar = text.charAt(textIndex + patternIndex);
final char patternChar = pattern.charAt(patternIndex);
if (textChar != patternChar) {
missed++;
}
if (missed > mismatchTolerance) {
break;
}
}
if (missed <= mismatchTolerance) {
final String match = text.substring(textIndex, textIndex + pattern.length());
System.out.println("Index: " + textIndex + " Match: " + match);
}
}
}
}

fast way to get an element from a string in java

I am trying to get an element from my string wich I have attained thtough a getSelectedValue().toString from a JList.
It returns [1] testString
What I am trying to do it only get the 1 from the string. Is there a way to only get that element from the string or remove all else from the string?
I have tried:
String longstring = Customer_list.getSelectedValue().toString();
int index = shortstring.indexOf(']');
String firstPart = myStr.substring(0, index);

You have many ways to do it, for example
Regex
String#replaceAll
String#substring
See below code to use all methods.
import java.util.*;
import java.lang.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Test {
public static void main(String args[]) {
String[] data = { "[1] test", " [2] [3] text ", " just some text " };
for (String s : data) {
String r0 = null;
Matcher matcher = Pattern.compile("\\[(.*?)\\]").matcher(s);
if (matcher.find()) {
r0 = matcher.group(1);
}
System.out.print(r0 + " ");
}
System.out.println();
for (String s : data) {
String r1 = null;
r1 = s.replaceAll(".*\\[|\\].*", "");
System.out.print(r1 + " ");
}
System.out.println();
for (String s : data) {
String r2 = null;
int i = s.indexOf("[");
int j = s.indexOf("]");
if (i != -1 && j != -1) {
r2 = s.substring(i + 1, j);
}
System.out.print(r2 + " ");
}
System.out.println();
}
}
However results may vary, for example String#replaceAll will give you wrong results when input is not what you expecting.
1 2 null
1 3 just some text
1 2 null

What worked best for me is the String#replace(charSequence, charSequence) combined with String#substring(int,int)
I have done as followed:
String longstring = Customer_list.getSelectedValue().toString();
String shortstring = longstring.substring(0, longstring.indexOf("]") + 1);
String shota = shortstring.replace("[", "");
String shortb = shota.replace("]", "");
My string has been shortened, and the [ and ] have been removed thereafter in 2 steps.

How can I parse the content of file to variables?

I have text file which looks like this :
ABC=-1 Temp=2 Try=34 Message="some text" SYS=3
ABC=-1 Temp=5 Try=40 Message="some more and different text" SYS=6
and the pattern continues but only the numeric values and text inside the " " is changed.
NOTE: the Message= could have multiple quotes as well.
I want to store the value of ABC,Temp,Try and SYS to int variables
And Message to a String variable.
I am currently using:
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
int count = line.indexOf("ABC=");
if (count >= 0) {
int clear = line.charAt(count + 3);
}
}
scanner.close();
I thought of using the Scanner class and read line by line, but I am confused about how can I classify the line in different variables?

First make a class that represents the data:
public static class MyData { // please pick a better name
final int abc;
final int temp;
final int tryNumber; // try is a keyword
final String message;
final int sys;
public MyData(int abc, int temp, int tryNumber, String message, int sys) {
this.abc = abc;
this.temp = temp;
this.tryNumber = tryNumber;
this.message = message;
this.sys = sys;
}
}
Then make a method that transforms a String into this class using Regex capture groups:
private static Pattern p =
Pattern.compile("ABC=([^ ]+) Temp=([^ ]+) Try=([^ ]+) Message=\"(.+)\" SYS=([^ ]+)");
private static MyData makeData(String input) {
int abc = 0, temp = 0, tryNumber = 0, sys = 0;
String message = "";
Matcher m = p.matcher(input);
if (!(m.find()) return null;
abc = Integer.parseInt(m.group(1));
temp = Integer.parseInt(m.group(2));
tryNumber = Integer.parseInt(m.group(3));
message = m.group(4);
sys = Integer.parseInt(m.group(5));
return new MyData(abc, temp, tryNumber, message, sys);
}
Then read the file using a scanner:
public static void main (String... args) throws Exception {
File file = new File("/path/to/your/file.txt");
List<MyData> dataList = new ArrayList<>();
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
MyData data = makeData(line);
if(data != null) dataList.add(data);
}
scanner.close();
}
Here's a completely working demo on ideone

You can use regex for this kind of parsing with a pattern of:
"ABC=([+-]?\\d+) Temp=([+-]?\\d+) Try=([+-]?\\d+) Message=\"(.+)\" SYS=([+-]?\\d+)"
Pattern Breakdown (Pattern Reference):
ABC= - literal string
([+-]?\\d+) - captures a positive or negative number in capture group 1
Temp= - literal string
([+-]?\\d+) - captures a positive or negative number in capture group 2
Try= - literal string
([+-]?\\d+) - captures a positive or negative number in capture group 3
Message= - literal string
\"(.+)\" captures a string in between double quotes in capture group 4
SYS= - literal string
([+-]?\\d+) - captures a positive or negative number in capture group 5
If the String matches the pattern you can extract your values like this:
public static void main(String[] args) throws Exception {
List<String> data = new ArrayList() {{
add("ABC=-1 Temp=2 Try=34 Message=\"some text\" SYS=3");
add("ABC=-1 Temp=5 Try=40 Message=\"some more \"and\" different text\" SYS=6");
}};
String pattern = "ABC=([+-]?\\d+) Temp=([+-]?\\d+) Try=([+-]?\\d+) Message=\"(.+)\" SYS=([+-]?\\d+)";
int abc = 0;
int temp = 0;
int tryNum = 0;
String message = "";
int sys = 0;
for (String d : data) {
Matcher matcher = Pattern.compile(pattern).matcher(d);
if (matcher.matches()) {
abc = Integer.parseInt(matcher.group(1));
temp = Integer.parseInt(matcher.group(2));
tryNum = Integer.parseInt(matcher.group(3));
message = matcher.group(4);
sys = Integer.parseInt(matcher.group(5));
System.out.printf("%d %d %d %s %d%n", abc, temp, tryNum, message, sys);
}
}
}
Results:
-1 2 34 some text 3
-1 5 40 some more "and" different text 6

If you are already using the indexOf approach, the following code will work
String a = "ABC=-1 Temp=2 Try=34 Message=\"some text\" SYS=3";
int abc_index = a.indexOf("ABC");
int temp_index = a.indexOf("Temp");
int try_index = a.indexOf("Try");
int message_index = a.indexOf("Message");
int sys_index = a.indexOf("SYS");
int length = a.length();
int abc = Integer.parseInt(a.substring(abc_index + 4, temp_index - 1));
int temp = Integer.parseInt(a.substring(temp_index + 5, try_index - 1));
int try_ = Integer.parseInt(a.substring(try_index + 4, message_index - 1));
String message = a.substring(message_index + 9, sys_index - 2);
int sys = Integer.parseInt(a.substring(sys_index + 4, length));
System.out.println("abc : " + abc);
System.out.println("temp : " + temp);
System.out.println("try : " + try_);
System.out.println("message : " + message);
System.out.println("sys : " + sys);
This will give you the following
abc : -1
temp : 2
try : 34
message : some text
sys : 3
This will work only if the string data you get has this exact syntax, ie, it contains ABC, Temp, Try, Message, and SYS. Hope this helps.

Find Shortest Part of Sentence containing given words

Ex:
if there is a sentence given:
My name is not eugene. my pet name is not eugene.
And we have to search the smallest part in the sentence that Contains the given words
my and eugene
then the answer will be
eugene. my.
No need to check the uppercase or lowercase or special charaters or numerics.
I have pasted my code but getting wrong answer for some test cases.
can any one have any idea what is the problem with the code . I don't have the test case for which it is wrong.
import java.io.*;
import java.util.*;
public class ShortestSegment
{
static String[] pas;
static String[] words;
static int k,st,en,fst,fen,match,d;
static boolean found=false;
static int[] loc;
static boolean[] matches ;
public static void main(String s[]) throws IOException
{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
pas = in.readLine().replaceAll("[^A-Za-z ]", "").split(" ");
k = Integer.parseInt(in.readLine());
words = new String[k];
matches = new boolean[k];
loc = new int[k];
for(int i=0;i<k;i++)
{
words[i] = in.readLine();
}
en = fen = pas.length;
find(0);
if(found==false)
System.out.println("NO SUBSEGMENT FOUND");
else
{
for(int j=fst;j<=fen;j++)
System.out.print(pas[j]+" ");
}
}
private static void find(int min)
{
if(min==pas.length)
return;
for(int i=0;i<k;i++)
{
if(pas[min].equalsIgnoreCase(words[i]))
{
if(matches[i]==false)
{
loc[i]=min;
matches[i] =true;
match++;
}
else
{
loc[i]=min;
}
if(match==k)
{
en=min;
st = min();
found=true;
if((fen-fst)>(en-st))
{
fen=en;
fst=st;
}
match--;
matches[getIdx()]=false;
}
}
}
find(min+1);
}
private static int getIdx()
{
for(int i=0;i<k;i++)
{
if(words[i].equalsIgnoreCase(pas[st]))
return i;
}
return -1;
}
private static int min()
{
int min=loc[0];
for(int i=1;i<loc.length;i++)
if(min>loc[i])
min=loc[i];
return min;
}
}

The code you've given will produce incorrect output for the following input. I'm assuming, the word length also matters when you want to 'Find Shortest Part of Sentence containing given words'
String: 'My firstname is eugene. My fn is eugene.'
Number of search strings: 2
string1: 'my'
string2: 'is'
Your solution is: 'My firstname is'
The correct answer is: 'My fn is'
The problem in your code is, it considers both 'firstname' and 'fn' as same length. In the comparison (fen-fst)>(en-st) you're only considering whether the number of words has minimized and not whether the word lengths has shortened.

the following codes (junit):
#Test
public void testIt() {
final String s = "My name is not eugene. my pet name is not eugene.";
final String tmp = s.toLowerCase().replaceAll("[^a-zA-Z]", " ");//here we need the placeholder (blank)
final String w1 = "my "; // leave a blank at the end to avoid those words e.g. "myself", "myth"..
final String w2 = "eugene ";//same as above
final List<Integer> l1 = getList(tmp, w1); //indexes list
final List<Integer> l2 = getList(tmp, w2);
int min = Integer.MAX_VALUE;
final int[] idx = new int[] { 0, 0 };
//loop to find out the result
for (final int i : l1) {
for (final int j : l2) {
if (Math.abs(j - i) < min) {
final int x = j - i;
min = Math.abs(j - i);
idx[0] = j - i > 0 ? i : j;
idx[1] = j - i > 0 ? j + w2.length() + 2 : i + w1.length() + 2;
}
}
}
System.out.println("indexes: " + Arrays.toString(idx));
System.out.println("result: " + s.substring(idx[0], idx[1]));
}
private List<Integer> getList(final String input, final String search) {
String t = new String(input);
final List<Integer> list = new ArrayList<Integer>();
int tmp = 0;
while (t.length() > 0) {
final int x = t.indexOf(search);
if (x < 0 || x > t.length()) {
break;
}
tmp += x;
list.add(tmp);
t = t.substring(search.length() + x);
}
return list;
}
give output:
indexes: [15, 25]
result: eugene. my
I think the codes with inline comments are pretty easy to understand. basically, playing with index+wordlength.
Note
the "Not Found" case is not implemented.
codes are just showing the
idea, it can be optimized. e.g. at least one abs() could be saved.
etc...
hope it helps.

I think it can be handled in another way :
First , find a matching result , and minimize the bound to the current result and then find a matching result from the current result .It can be coded as follows:
/**This method intends to check the shortest interval between two words
* #param s : the string to be processed at
* #param first : one of the words
* #param second : one of the words
*/
public static void getShortestInterval(String s , String first , String second)
{
String situationOne = first + "(.*?)" + second;
String situationTwo = second + "(.*?)" + first;
Pattern patternOne = Pattern.compile(situationOne,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
Pattern patternTwo = Pattern.compile(situationTwo,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
List<Integer> result = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
/**first , test the first choice*/
Matcher matcherOne = patternOne.matcher(s);
findTheMax(first.length(),matcherOne, result);
/**then , test the second choice*/
Matcher matcherTwo = patternTwo.matcher(s);
findTheMax(second.length(),matcherTwo,result);
if(result.get(0)!=Integer.MAX_VALUE)
{
System.out.println("The shortest length is " + result.get(0));
System.out.println("Which start # " + result.get(1));
System.out.println("And end # " + result.get(2));
}else
System.out.println("No matching result is found!");
}
private static void findTheMax(int headLength , Matcher matcher , List<Integer> result)
{
int length = result.get(0);
int startIndex = result.get(1);
int endIndex = result.get(2);
while(matcher.find())
{
int temp = matcher.group(1).length();
int start = matcher.start();
List<Integer> minimize = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
System.out.println(matcher.group().substring(headLength));
findTheMax(headLength, matcher.pattern().matcher(matcher.group().substring(headLength)), minimize);
if(minimize.get(0) != Integer.MAX_VALUE)
{
start = start + minimize.get(1) + headLength;
temp = minimize.get(0);
}
if(temp<length)
{
length = temp;
startIndex = start;
endIndex = matcher.end();
}
}
result.set(0, length);
result.set(1, startIndex);
result.set(2, endIndex);
}
Note that this can handle two situations , regardless of the sequence of the two words!

you can use Knuth Morris Pratt algorithm to find indexes of all occurrences of every given word in your text. Imagine you have text of length N and M words (w1 ... wM). Using KMP algorithm you can get array:
occur = string[N];
occur[i] = 1, if w1 starts at position i
...
occur[i] = M, if wM starts at position i
occur[i] = 0, if no word from w1...wM starts at position i
you loop through this array and from every non-zero position search forward for other M-1 words.
This is approximate pseudocode. Just to understand the idea. It definitely won't work if you just recode it on java:
for i=0 to N-1 {
if occur[i] != 0 {
for j = i + w[occur[i] - 1].length - 1 { // searching forward
if occur[j] != 0 and !foundWords.contains(occur[j]) {
foundWords.add(occur[j]);
lastWordInd = j;
if foundWords.containAllWords() break;
}
foundTextPeaceLen = j + w[occur[lastWordInd]].length - i;
if foundTextPeaceLen < minTextPeaceLen {
minTextPeaceLen = foundTextPeaceLen;
// also remember start and end indexes of text peace
}
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing a string that contains multiple symbols results in crash - java

Related

How to use a regular expression to print repeating characters only once and non repeating characters in the same order as they appear in a String?

Regex/Java Pattern : Finding occurrences of a sub string in a string with fault tolerance of 1 or more characters

fast way to get an element from a string in java

How can I parse the content of file to variables?

Find Shortest Part of Sentence containing given words

Categories

Resources