Parsing a string with letters and numbers

Parsing a string with letters and numbers - java

Running this method gives a string of total post output
String numberOfPost = test.runNewAdvancedSearch(query, waitTime, startDate, endDate, selectedBrowser, data1, "");
int numberOfPostsInt = Integer.parseInt(numberOfPosts.replace(",", ""));
this parseInt does not work
How do I parse this out below?
"Total Posts: 5,203"

Try this fully functional example:
public class Temp {
public static void main(String[] args) {
String s = "Total Posts: 5,203";
s = s.replaceAll("[^0-9]+", "");
System.out.println(s);
}
}
Meaning of the regex pattern [^0-9]+ - Remove all characters which occurs one or more times + which does NOT ^ belong to the list [ ] of characters 0 to 9.

if comma is decimal separator:
double d = Double.parseDouble(s.substring(s.lastIndexOf(' ') + 1).replace(",", "."));
if comma is grouping separator:
long d = Long.parseLong(s.substring(s.lastIndexOf(' ') + 1).replace(",", ""));

Try This:
String numberOfPost = test.runNewAdvancedSearch(query, waitTime, startDate, endDate, selectedBrowser, data1, ""); // get the parse string "Total Posts: 5,203
int index = numberOfPost.lastIndexOf(":");
String number = numberOfPost.substring(index + 1);
int numOfPost = Integer.parseInt(number.replace(",", "").trim());
System.out.println(numOfPost); // 5203enter code here

Related

Replace part of substring with specific characters based on delimiter

String s = "abc//jason:1234567#123.123.213.212/";
I want to replace all the substring before and after ":" delimiter with "......."
I want my final output to be :
"abc//.....:.......#123.123.213:212/"
I tried doing this since there is a second : in the string it gets messed up, is it there better way to be able to get my output:
String [] headersplit;
headersplit = s.split(":");

If you want to locate only symbols between "//" and "#" then algorithm is simple, provided that mention symbols are compulsory.
public class Main {
public static void main(String[] args) {
String s = "abc//jason:1234567#123.123.213.212/";
System.out.println(replaceSensitiveInfo(s));
}
static String replaceSensitiveInfo(String src) {
int slashes = src.indexOf("//");
int colon = src.indexOf(":", slashes);
int at = src.indexOf("#", colon);
StringBuilder sb = new StringBuilder(src);
sb.replace(slashes + 2, colon, ".".repeat(colon - slashes - 2));
sb.replace(colon + 1, at, ".".repeat(at - colon - 1));
return sb.toString();
}
}

Not the best way but it works for your example and should work for others:
String s = "abc//jason:1234567#123.123.213:212/";
String result = replaceSensitiveInfo(s);
private String replaceSensitiveInfo(String info){
StringBuilder sb = new StringBuilder(info);
String substitute = ".";
int start = sb.indexOf("//") + 2;
int end = sb.indexOf(":");
String firstReplace = substitute.repeat(end - start);
sb.replace(start, end, firstReplace);
int start2 = sb.indexOf(":") + 1;
int end2 = sb.indexOf("#");
String secondReplace = substitute.repeat(end2 - start2);
sb.replace(start2, end2, secondReplace);
return sb.toString();
}

How to use a regular expression to print repeating characters only once and non repeating characters in the same order as they appear in a String?

I'm writing a function to print decimal representation of a rational number( in the form of numerator and denominator) and trying to print the repeating part of digits inside a parenthesis and decimal part remains the same.
for EX: 1) 2/3=0.(3)
2) 2/4=0.5(0)
3)22/7=3.(142857)
For this I tried using a regular expression to capture the repeating characters of decimal part but my regular expression captures the repeating characters once and non repeating characters.
Here is my code...Can someone help me on this!!!
div = ((double) num)/deno;
String str = String.valueOf(div);
String arr[] = str.split("\\.");
String wp = arr[0];
String dp = arr[1];
String repeated = dp.replaceAll("(.+?)\\1+", "$1");
System.out.println("repeated is " + repeated);
System.out.println(wp + "." + "(" + repeated + ")");`
output I'm getting is:-
Input given 22/7
Integer part: 3
Decimal part: 142857142857143
repeating characters captured by regular expression- 142857143
final output-3.(142857143)

When you are replacing the repeating part, the last 143 is not getting replaced with `` empty string. So it remains in the output.
You can use Pattern class, with regex (\d+)+\1, like this:
public class Test
{
public static void main(String[] args) throws Exception
{
double[] nums = {2.0/3, 2.0/4, 22.0/7};
for(double d : nums)
print(d);
}
static void print(double div) {
String str = String.valueOf(div);
String arr[] = str.split("\\.");
String wp = arr[0];
String dp = arr[1];
String repeated = dp;
Pattern ptrn = Pattern.compile("(\\d+)+\\1");
Matcher m = ptrn.matcher(dp);
if(m.find()) {
repeated = m.group(1);
System.out.println(str + " -> "+ wp + "." + "(" + repeated + ")");
} else {
System.out.println(str + " -> "+ wp + "." + dp +"(0)");
}
}
}
Output:
0.6666666666666666 -> 0.(6)
0.5 -> 0.5(0)
3.142857142857143 -> 3.(142857)

Your regex is pretty close to work.
Alternative:
Matcher matcher = Pattern.compile("(.+?)\\1").matcher(decimalPart);
String repeated = matcher.find() ? matcher.group(1) : "0";
See alternative in context:
public static void main(String[] args) {
List<String> divisions = Arrays.asList("2/3", "2/4", "22/7");
List<String> quotientsAsString = getQuotientsAsString(divisions);
List<String> repeatedResult = getRepeatedResult(quotientsAsString);
printResult(divisions, quotientsAsString, repeatedResult);
}
private static void printResult(List<String> divisions, List<String> quotientsAsString,
List<String> repeatedResult) {
for (int i = 0; i < divisions.size(); i++) {
System.out.printf("%d) %s = %s => %s%n", (i + 1), divisions.get(i)
, quotientsAsString.get(i), repeatedResult.get(i));
}
}
private static List<String> getRepeatedResult(List<String> quotientsAsString) {
//Pre-compile regex before enter loop
Pattern dotSignPattern = Pattern.compile("\\.");
Pattern repeatedDecimalPattern = Pattern.compile("(.+?)\\1");
List<String> repeatedResult = new ArrayList<>();
for (String quotient : quotientsAsString) {
String[] quotientParts = dotSignPattern.split(quotient);
String integerPart = quotientParts[0];
String decimalPart = quotientParts[1];
// Pattern in context!!!
Matcher matcher = repeatedDecimalPattern.matcher(decimalPart);
String repeated = matcher.find() ? matcher.group(1) : "0";
String resultRepeated = String.format("%s.(%s)", integerPart, repeated);
String resultZeroRepeated = String.format("%s.%s(%s)", integerPart, decimalPart, repeated);
String result = repeated.equals("0") ? resultZeroRepeated : resultRepeated;
repeatedResult.add(result);
}
return repeatedResult;
}
private static List<String> getQuotientsAsString(List<String> divisions) {
//Pre-compile regex before enter loop
Pattern divSignPattern = Pattern.compile("/");
List<String> quotientsAsString = new ArrayList<>();
for (String div : divisions) {
String[] divParts = divSignPattern.split(div);
Double dividend = Double.valueOf(divParts[0]);
Double divisor = Double.valueOf(divParts[1]);
Double quotient = dividend / divisor;
quotientsAsString.add(String.valueOf(quotient));
}
return quotientsAsString;
}
Output:
1) 2/3 = 0.6666666666666666 => 0.(6)
2) 2/4 = 0.5 => 0.5(0)
3) 22/7 = 3.142857142857143 => 3.(142857)

Split String from the last iteration

This post is an update to this one : get specific character in a string with regex and remove unused zero
In the first place, i wanted to remove with an regular expression the unused zero in the last match.
I found that the regular expression is a bit overkill for what i need.
Here is what i would like now,
I would like to use split() method
to get from this :
String myString = "2020-LI50532-3329-00100"
this :
String data1 = "2020"
String data2 = "LI50532"
String data3 = "3329"
String data4 = "00100"
So then i can remove from the LAST data the unused Zero
to convert "00100" in "100"
And then concatenate all the data to get this
"2020-LI50532-3329-100"
Im not familiar with the split method, if anyone can enlight me about this ^^

You can use substring method to get rid of the leading zeros...
String myString = "2020-LI50532-3329-00100";
String[] data = myString.split("-");
data[3] = data[3].substring(2);
StringBuilder sb = new StringBuilder();
sb.append(data[0] + "-" + data[1] + "-" + data[2] + "-" + data[3]);
String result = sb.toString();
System.out.println(result);

Assuming that we want to remove the leading zeroes of ONLY the last block, maybe we can:
Extract the last block
Convert it to Integer and back to String to remove leading zeroes
Replace the last block with the String obtained in above step
Something like this:
public String removeLeadingZeroesFromLastBlock(String text) {
int indexOfLastDelimiter = text.lastIndexOf('-');
if (indexOfLastDelimiter >= 0) {
String lastBlock = text.substring(indexOfLastDelimiter + 1);
String lastBlockWithoutLeadingZeroes = String.valueOf(Integer.valueOf(lastBlock)); // will throw exception if last block is not an int
return text.substring(0, indexOfLastDelimiter + 1).concat(lastBlockWithoutLeadingZeroes);
}
return text;
}

Solution using regex:
public class Main {
public static void main(String[] args) {
// Test
System.out.println(parse("2020-LI50532-3329-00100"));
System.out.println(parse("2020-LI50532-3329-00001"));
System.out.println(parse("2020-LI50532-03329-00100"));
System.out.println(parse("2020-LI50532-03329-00001"));
}
static String parse(String str) {
return str.replaceAll("0+(?=[1-9]\\d*$)", "");
}
}
Output:
2020-LI50532-3329-100
2020-LI50532-3329-1
2020-LI50532-03329-100
2020-LI50532-03329-1
Explanation of the regex:
One or more zeros followed by a non-zero digit which can be optionally followed by any digit(s) until the end of the string (specified by $).
Solution without using regex:
You can do it also by using Integer.parseInt which can parse a string like 00100 into 100.
public class Main {
public static void main(String[] args) {
// Test
System.out.println(parse("2020-LI50532-3329-00100"));
System.out.println(parse("2020-LI50532-3329-00001"));
System.out.println(parse("2020-LI50532-03329-00100"));
System.out.println(parse("2020-LI50532-03329-00001"));
}
static String parse(String str) {
String[] parts = str.split("-");
try {
parts[parts.length - 1] = String.valueOf(Integer.parseInt(parts[parts.length - 1]));
} catch (NumberFormatException e) {
// Do nothing
}
return String.join("-", parts);
}
}
Output:
2020-LI50532-3329-100
2020-LI50532-3329-1
2020-LI50532-03329-100
2020-LI50532-03329-1

you can convert the last string portion to integer type like below for removing unused zeros:
String myString = "2020-LI50532-3329-00100";
String[] data = myString.split("-");
data[3] = data[3].substring(2);
StringBuilder sb = new StringBuilder();
sb.append(data[0] + "-" + data[1] + "-" + data[2] + "-" + Integer.parseInt(data[3]));
String result = sb.toString();
System.out.println(result);

You should avoid String manipulation where possible and rely on existing types in the Java language. One such type is the Integer. It looks like your code consists of 4 parts - Year (Integer) - String - Integer - Integer.
So to properly validate it I would use the following code:
Scanner scan = new Scanner("2020-LI50532-3329-00100");
scan.useDelimiter("-");
Integer firstPart = scan.nextInt();
String secondPart = scan.next();
Integer thirdPart = scan.nextInt();
Integer fourthPart = scan.nextInt();
Or alternatively something like:
String str = "00100";
int num = Integer.parseInt(str);
System.out.println(num);
If you want to reconstruct your original value, you should probably use a NumberFormat to add the missing 0s.
The main points are:
Always try to reuse existing code and tools available in your language
Always try to use available types (LocalDate, Integer, Long)
Create your own types (classes) and use the expressiveness of the Object Oriented language

public class Test {
public static void main(String[] args) {
System.out.println(trimLeadingZeroesFromLastPart("2020-LI50532-03329-00100"));
}
private static String trimLeadingZeroesFromLastPart(String input) {
String delem = "-";
String result = "";
if (input != null && !input.isEmpty()) {
String[] data = input.split(delem);
StringBuilder tempStrBldr = new StringBuilder();
for (int idx = 0; idx < data.length; idx++) {
if (idx == data.length - 1) {
tempStrBldr.append(trimLeadingZeroes(data[idx]));
} else {
tempStrBldr.append(data[idx]);
}
tempStrBldr.append(delem);
}
result = tempStrBldr.substring(0, tempStrBldr.length() - 1);
}
return result;
}
private static String trimLeadingZeroes(String input) {
int idx;
for (idx = 0; idx < input.length() - 1; idx++) {
if (input.charAt(idx) != '0') {
break;
}
}
return input.substring(idx);
}
}
Output:
2020-LI50532-3329-100

printing a split string

I am trying to print my string in the following format. ua, login, login ---> ua, navigation, fault Average = 500 milliseconds. I am storing the 2 strings into one string called keyString and putting it into the hashmap seperated by "|". I am then splitting that when I am iterating over the keyset to get it in the format I originally stated but it is showing up like this ---> ua, ctiq, export|ua, ctiq export, transfer Average = 600 milliseconds. Any ideas?
public static void ProcessLines(Map<String, NumberHolder> uaCount,String firstLine, String secondLine) throws ParseException
{
String [] arr1 = firstLine.split("-- ", 2);
String [] arr2 = secondLine.split("-- ", 2);
String str1 = arr1[1];
String str2 = arr2[1];
......
String keyString = str1 + "|" + str2;
NumberHolder hashValue = uaCount.get(keyString);
if(hashValue == null)
{
hashValue = new NumberHolder();
uaCount.put(keyString, hashValue);
}
hashValue.sumtime_in_milliseconds += diffMilliSeconds;
hashValue.occurences++;
public static class NumberHolder
{
public int occurences;
public int sumtime_in_milliseconds;
}
and heres the printing part
for(String str : uaCount.keySet())
{
String [] arr = str.split("|",2);
long average = uaCount.get(str).sumtime_in_milliseconds / uaCount.get(str).occurences;
//System.out.println(str);
System.out.println(arr[0] + " ---> " + arr[1] + " Average = " + average + " milliseconds");
}

split uses regular expression to match place to split, and in "RegEx" | means OR. To use | as literal you need to escape it with \ which in String is written as "\\". Alternatively you can use [|]. Try
str.split("\\|",2);

How to get a string between two characters?

I have a string,
String s = "test string (67)";
I want to get the no 67 which is the string between ( and ).
Can anyone please tell me how to do this?

There's probably a really neat RegExp, but I'm noob in that area, so instead...
String s = "test string (67)";
s = s.substring(s.indexOf("(") + 1);
s = s.substring(0, s.indexOf(")"));
System.out.println(s);

A very useful solution to this issue which doesn't require from you to do the indexOf is using Apache Commons libraries.
StringUtils.substringBetween(s, "(", ")");
This method will allow you even handle even if there multiple occurrences of the closing string which wont be easy by looking for indexOf closing string.
You can download this library from here:
https://mvnrepository.com/artifact/org.apache.commons/commons-lang3/3.4

Try it like this
String s="test string(67)";
String requiredString = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
The method's signature for substring is:
s.substring(int start, int end);

By using regular expression :
String s = "test string (67)";
Pattern p = Pattern.compile("\\(.*?\\)");
Matcher m = p.matcher(s);
if(m.find())
System.out.println(m.group().subSequence(1, m.group().length()-1));

Java supports Regular Expressions, but they're kind of cumbersome if you actually want to use them to extract matches. I think the easiest way to get at the string you want in your example is to just use the Regular Expression support in the String class's replaceAll method:
String x = "test string (67)".replaceAll(".*\\(|\\).*", "");
// x is now the String "67"
This simply deletes everything up-to-and-including the first (, and the same for the ) and everything thereafter. This just leaves the stuff between the parenthesis.
However, the result of this is still a String. If you want an integer result instead then you need to do another conversion:
int n = Integer.parseInt(x);
// n is now the integer 67

In a single line, I suggest:
String input = "test string (67)";
input = input.subString(input.indexOf("(")+1, input.lastIndexOf(")"));
System.out.println(input);`

You could use apache common library's StringUtils to do this.
import org.apache.commons.lang3.StringUtils;
...
String s = "test string (67)";
s = StringUtils.substringBetween(s, "(", ")");
....

Test String test string (67) from which you need to get the String which is nested in-between two Strings.
String str = "test string (67) and (77)", open = "(", close = ")";
Listed some possible ways: Simple Generic Solution:
String subStr = str.substring(str.indexOf( open ) + 1, str.indexOf( close ));
System.out.format("String[%s] Parsed IntValue[%d]\n", subStr, Integer.parseInt( subStr ));
Apache Software Foundation commons.lang3.
StringUtils class substringBetween() function gets the String that is nested in between two Strings. Only the first match is returned.
String substringBetween = StringUtils.substringBetween(subStr, open, close);
System.out.println("Commons Lang3 : "+ substringBetween);
Replaces the given String, with the String which is nested in between two Strings. #395
Pattern with Regular-Expressions: (\()(.*?)(\)).*
The Dot Matches (Almost) Any Character
.? = .{0,1}, .* = .{0,}, .+ = .{1,}
String patternMatch = patternMatch(generateRegex(open, close), str);
System.out.println("Regular expression Value : "+ patternMatch);
Regular-Expression with the utility class RegexUtils and some functions.
Pattern.DOTALL: Matches any character, including a line terminator.
Pattern.MULTILINE: Matches entire String from the start^ till end$ of the input sequence.
public static String generateRegex(String open, String close) {
return "(" + RegexUtils.escapeQuotes(open) + ")(.*?)(" + RegexUtils.escapeQuotes(close) + ").*";
}
public static String patternMatch(String regex, CharSequence string) {
final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
final Matcher matcher = pattern .matcher(string);
String returnGroupValue = null;
if (matcher.find()) { // while() { Pattern.MULTILINE }
System.out.println("Full match: " + matcher.group(0));
System.out.format("Character Index [Start:End]«[%d:%d]\n",matcher.start(),matcher.end());
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
if( i == 2 ) returnGroupValue = matcher.group( 2 );
}
}
return returnGroupValue;
}

String s = "test string (67)";
int start = 0; // '(' position in string
int end = 0; // ')' position in string
for(int i = 0; i < s.length(); i++) {
if(s.charAt(i) == '(') // Looking for '(' position in string
start = i;
else if(s.charAt(i) == ')') // Looking for ')' position in string
end = i;
}
String number = s.substring(start+1, end); // you take value between start and end

String result = s.substring(s.indexOf("(") + 1, s.indexOf(")"));

public String getStringBetweenTwoChars(String input, String startChar, String endChar) {
try {
int start = input.indexOf(startChar);
if (start != -1) {
int end = input.indexOf(endChar, start + startChar.length());
if (end != -1) {
return input.substring(start + startChar.length(), end);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return input; // return null; || return "" ;
}
Usage :
String input = "test string (67)";
String startChar = "(";
String endChar = ")";
String output = getStringBetweenTwoChars(input, startChar, endChar);
System.out.println(output);
// Output: "67"

Another way of doing using split method
public static void main(String[] args) {
String s = "test string (67)";
String[] ss;
ss= s.split("\\(");
ss = ss[1].split("\\)");
System.out.println(ss[0]);
}

Use Pattern and Matcher
public class Chk {
public static void main(String[] args) {
String s = "test string (67)";
ArrayList<String> arL = new ArrayList<String>();
ArrayList<String> inL = new ArrayList<String>();
Pattern pat = Pattern.compile("\\(\\w+\\)");
Matcher mat = pat.matcher(s);
while (mat.find()) {
arL.add(mat.group());
System.out.println(mat.group());
}
for (String sx : arL) {
Pattern p = Pattern.compile("(\\w+)");
Matcher m = p.matcher(sx);
while (m.find()) {
inL.add(m.group());
System.out.println(m.group());
}
}
System.out.println(inL);
}
}

The "generic" way of doing this is to parse the string from the start, throwing away all the characters before the first bracket, recording the characters after the first bracket, and throwing away the characters after the second bracket.
I'm sure there's a regex library or something to do it though.

The least generic way I found to do this with Regex and Pattern / Matcher classes:
String text = "test string (67)";
String START = "\\("; // A literal "(" character in regex
String END = "\\)"; // A literal ")" character in regex
// Captures the word(s) between the above two character(s)
String pattern = START + "(\w+)" + END;
Pattern pattern = Pattern.compile(pattern);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println(matcher.group()
.replace(START, "").replace(END, ""));
}
This may help for more complex regex problems where you want to get the text between two set of characters.

The other possible solution is to use lastIndexOf where it will look for character or String from backward.
In my scenario, I had following String and I had to extract <<UserName>>
1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc
So, indexOf and StringUtils.substringBetween was not helpful as they start looking for character from beginning.
So, I used lastIndexOf
String str = "1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc";
String userName = str.substring(str.lastIndexOf("_") + 1, str.lastIndexOf("."));
And, it gives me
<<UserName>>

String s = "test string (67)";
System.out.println(s.substring(s.indexOf("(")+1,s.indexOf(")")));

Something like this:
public static String innerSubString(String txt, char prefix, char suffix) {
if(txt != null && txt.length() > 1) {
int start = 0, end = 0;
char token;
for(int i = 0; i < txt.length(); i++) {
token = txt.charAt(i);
if(token == prefix)
start = i;
else if(token == suffix)
end = i;
}
if(start + 1 < end)
return txt.substring(start+1, end);
}
return null;
}

This is a simple use \D+ regex and job done.
This select all chars except digits, no need to complicate
/\D+/

it will return original string if no match regex
var iAm67 = "test string (67)".replaceFirst("test string \\((.*)\\)", "$1");
add matches to the code
String str = "test string (67)";
String regx = "test string \\((.*)\\)";
if (str.matches(regx)) {
var iAm67 = str.replaceFirst(regx, "$1");
}
---EDIT---
i use https://www.freeformatter.com/java-regex-tester.html#ad-output to test regex.
turn out it's better to add ? after * for less match. something like this:
String str = "test string (67)(69)";
String regx1 = "test string \\((.*)\\).*";
String regx2 = "test string \\((.*?)\\).*";
String ans1 = str.replaceFirst(regx1, "$1");
String ans2 = str.replaceFirst(regx2, "$1");
System.out.println("ans1:"+ans1+"\nans2:"+ans2);
// ans1:67)(69
// ans2:67

String s = "(69)";
System.out.println(s.substring(s.lastIndexOf('(')+1,s.lastIndexOf(')')));

Little extension to top (MadProgrammer) answer
public static String getTextBetween(final String wholeString, final String str1, String str2){
String s = wholeString.substring(wholeString.indexOf(str1) + str1.length());
s = s.substring(0, s.indexOf(str2));
return s;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing a string with letters and numbers - java

if comma is decimal separator: double d = Double.parseDouble(s.substring(s.lastIndexOf(' ') + 1).replace(",", ".")); if comma is grouping separator: long d = Long.parseLong(s.substring(s.lastIndexOf(' ') + 1).replace(",", ""));

Related

Replace part of substring with specific characters based on delimiter

How to use a regular expression to print repeating characters only once and non repeating characters in the same order as they appear in a String?

Split String from the last iteration

printing a split string

How to get a string between two characters?

Categories

Resources