Java string parsing with different regex to split

Java string parsing with different regex to split - java

str="Tick for symbol .ISEQ-IDX descriptor id 1 timestamp_sec 20130628030105 timestamp_usec 384000;EXCH_TIME 1372388465384;SENDING_TIME 0;PRICE 3957.890000;MIC XDUBIND;"
I dont have any control on changing the format of how this string is created.
I tried this but I cant really get the values of first keys "Tick for symbol","timestamp_sec" etc.
Not only in this specific string but I was curious about how to parse a string with multiple regex splits. Any help will be appreciated.
String[] s = line.split(";");
Map<String, String> m = new HashMap<String, String>();
for (int i = 0; i < s.length; i++)
{
String[] split = s[i].split("\\s+");
for (String string2 : split)
{
//Adding key value pair. to a map for further usage.
m.put(split[0], split[1]);
}
}
Edit
Desired output into a map:
(Tick for Symbol, .ISEQ-IDX)
(descriptor id, 1)
(timestamp_sec,20130628030105)
(timestamp_usec,384000)
(EXCH_TIME,1372388465384)
(SENDING_TIME,0)
(PRICE, 3957.890000)
(MIC, XDUBIND)

How about the following? You specify a list of key-value pattern pairs. Keys are specified directly as strings, values as regexes. Then you go thru this list and search the text for the key followed by the value pattern, if you find it you extract the value.
I assume the keys can be in any order, not all have to be present, there might be more than one space separating them. If you know the order of the keys, you can always start find on the place where the previous find ended. If you know all keys are obligatory, you can throw an exception if you do not find what you look for.
static String test="Tick for symbol .ISEQ-IDX descriptor id 1 timestamp_sec 20130628030105 timestamp_usec 384000;EXCH_TIME 1372388465384;SENDING_TIME 0;PRICE 3957.890000;MIC XDUBIND;";
static List<String> patterns = Arrays.asList(
"Tick for symbol", "\\S+",
"descriptor id", "\\d+",
"timestamp_sec", "\\d+",
"timestamp_usec", "\\d+",
"EXCH_TIME", "\\d+",
"SENDING_TIME","\\d+",
"PRICE", "\\d+.\\d",
"MIC", "\\S+"
);
public static void main(String[] args) {
Map<String,String> map = new HashMap<>();
for (int i = 0; i<patterns.size();i+=2) {
String key = patterns.get(i);
String val = patterns.get(i+1);
String pattern = "\\Q" +key + "\\E\\s+(" + val + ")";
Matcher m = Pattern.compile(pattern).matcher(test);
if (m.find()) {
map.put(key, m.group(1));
}
}
System.out.println(map);
}

I don't think a regex will help you here, whoever designed that output String clearly didn't have splitting in mind.
I suggest simply parsing through the String with a loop and doing the whole thing manually. Alternatively you can just look through the String for substrings (suck as "Tick for symbol"), then take whatever word comes after (until the next space), since the second parameter always seems to be one words.

Using the Pattern class from java.util.regex package, described step by step in this java Regex tutorial:
private static final Pattern splitPattern = Pattern.compile("^Tick for symbol (.*) descriptor id (\\d+) timestamp_sec (\\d+) timestamp_usec (\\d+);EXCH_TIME (\\d+);SENDING_TIME ?(\\d+);PRICE (.*);MIC (\\w+);$");
private static String printExtracted(final String str) {
final Matcher m = splitPattern.matcher(str);
if (m.matches()) {
final String tickForSymbol = m.group(1);
final long descriptorId = Long.parseLong(m.group(2), 10);
final long timestampSec = Long.parseLong(m.group(3), 10);
final long timestampUsec = Long.parseLong(m.group(4), 10);
final long exchTime = Long.parseLong(m.group(5), 10);
final long sendingTime = Long.parseLong(m.group(6), 10);
final double price = Double.parseDouble(m.group(7));
final String mic = m.group(8);
return "(Tick for Symbol, " + tickForSymbol + ")\n" +
"(descriptor id, " + descriptorId + ")\n" +
"(timestamp_sec, " + timestampSec + ")\n" +
"(timestamp_usec, " + timestampUsec + ")\n" +
"(EXCH_TIME, " + exchTime + ")\n" +
"(SENDING_TIME, " + sendingTime +")\n" +
"(PRICE, " + price + ")\n" +
"(MIC, " + mic + ")";
} else {
throw new IllegalArgumentException("Argument " + str + " doesn't match pattern.");
}
}
Edit: Using group instead of replaceAll as it makes more sense and and is also faster.

Related

Get a specific data values from a string in Java (String without comma)

I want to get the value from a string.
I have a string value like this:
String myData= "Number: 34678 Type: Internal Qty: 34";
How can I get the Number, Type, Qty values separately?
Give me any suggestion on this.
Input:
String myData= "Number: 34678 Type: Internal Qty: 34";
Output:
Number value is 34678
Type values is Internal
Qty value is 34

Here is one way to do it. It looks for a word following by a colon followed by zero or more spaces followed by another word. This works regardless of the order or names of the fields.
String myData = "Number: 34678 Type: Internal Qty: 34";
Matcher m = Pattern.compile("(\\S+):\\s*(\\S+)").matcher(myData);
while (m.find()) {
System.out.println(m.group(1) + " value is " + m.group(2));
}

You can use regex to do this cleanly:
Pattern p = Pattern.compile("Number: (\\d*) Type: (.*) Qty: (\\d*)");
Matcher m = p.matcher(myData);
m.find()
So you'll get the number with m.group(1), the Type m.group(2) and the Qty m.group(3).
I assume you accept a limited number of types. So you can change the regex to match only if the type is correct, for eg. either Internal or External: "Number: (\\d*) Type: (Internal|External) Qty: (\\d*)"
Here's a nice explanation of how this works

If you just want to print them with fixed pattern of input data, a simplest way is shown as follows: (Just for fun!)
System.out.print(myData.replace(" Type", "\nType")
.replace(" Qty", "\nQty")
.replace(":", " value is"));

I suppose the string is always formatted like that. I.e., n attribute names each followed by a value that does not contain spaces. In other words, the 2n entities are separated from each other by 1 or more spaces.
If so, try this:
String[] parts;
int limit;
int counter;
String name;
String value;
parts = myData.split("[ ]+");
limit = (parts.length / 2) * 2; // Make sure an even number of elements is considered
for (counter = 0; counter < limit; counter += 2)
{
name = parts[counter].replace(":", "");
value = parts[counter + 1];
System.out.println(name + " value is " + value);
}

This Should work
String replace = val.replace(": ", "|");
StringBuilder number = new StringBuilder();
StringBuilder type = new StringBuilder();
StringBuilder qty = new StringBuilder();
String[] getValues = replace.split(" ");
int i=0;
while(i<getValues.length-1){
String[] splitNumebr = getValues[i].split("\\|");
number.append(splitNumebr[1]);
String[] splitType = getValues[i+=1].split("\\|");
type.append(splitType[1]);
String[] splitQty = getValues[i+=1].split("\\|");
qty.append(splitQty[1]);
}
System.out.println(String.format("Number value is %s",number.toString()));
System.out.println(String.format("Type value is %s",type.toString()));
System.out.println(String.format("Qty value is %s",qty.toString()));
}
Output
Number value is 34678
Type value is Internal
Qty value is 34

How to remove text between brackets in multiple lines

I have a big text files and I want to remove everything that is between
double curly brackets.
So given the text below:
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
String cleanedText = Pattern.compile("(?<=\\{\\{).*?\\}\\}", Pattern.DOTALL).matcher(text).replaceAll("");
System.out.println(cleanedText);
I want the output to be:
This is what I want.
I have googled around and tried many different things but I couldn't find anything close to my case and as soon as I change it a little bit everything gets worse.
Thanks in advance

You can use this :
public static void main(String[] args) {
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
String cleanedText = text.replaceAll("\\n", "");
while (cleanedText.contains("{{") && cleanedText.contains("}}")) {
cleanedText = cleanedText.replaceAll("\\{\\{[a-zA-Z\\s]*\\}\\}", "");
}
System.out.println(cleanedText);
}

A regular expression cannot express arbitrarily nested structures; i.e. any syntax that requires a recursive grammar to describe.
If you want to solve this using Java Pattern, you need to do it by repeated pattern matching. Here is one solution:
String res = input;
while (true) {
String tmp = res.replaceAll("\\{\\{[^}]*\\}\\}", "");
if (tmp.equals(res)) {
break;
}
res = tmp;
}
This is not very efficient ...
That can be transformed into an equivalent, but more concise form:
String res = input;
String tmp;
while (!(tmp = res.replaceAll("\\{\\{[^}]*\\}\\}", "")).equals(res)) {
res = tmp;
}
... but I prefer the first version because it is (IMO) a lot more readable.

I am not an expert in regular expression, so I just write a loop which does this for you. If you don't have/want to use a regEx, then it could be helpful for you;)
public static void main(String args[]) {
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
int openBrackets = 0;
String output = "";
char[] input = text.toCharArray();
for(int i=0;i<input.length;i++){
if(input[i] == '{'){
openBrackets++;
continue;
}
if(input[i] == '}'){
openBrackets--;
continue;
}
if(openBrackets==0){
output += input[i];
}
}
System.out.println(output);
}

My suggestion is to remove anything between curly brackets, starting at the innermost pair:
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
Pattern p = Pattern.compile("\\{\\{[^{}]+?}}", Pattern.MULTILINE);
while (p.matcher(text).find()) {
text = p.matcher(text).replaceAll("");
}
resulting in the output
This is
what I
want.
This might fail when having single curly brackets or unpaired pair of brackets, but could be good enough for your case.

Multiple string replacements without affecting substituted text in subsequent iterations

I've posted about letters earlier, but this is an another topic, I have a json response that contain 2 objects, from and to , from is what to change, and to is what it will be changed to .
My code is :
// for example, the EnteredText is "ab b test a b" .
EnteredString = EnteredText.getText().toString();
for (int i = 0; i < m_jArry.length(); i++) {
JSONObject jo_inside = m_jArry.getJSONObject(i);
String Original = jo_inside.getString("from");
String To = jo_inside.getString("to");
if(isMethodConvertingIn){
EnteredString = EnteredString.replace(" ","_");
EnteredString = EnteredString.replace(Original,To + " ");
} else {
EnteredString = EnteredString.replace("_"," ");
EnteredString = EnteredString.replace(To + " ", Original);
}
}
LoadingProgress.setVisibility(View.GONE);
SetResultText(EnteredString);
ShowResultCardView();
For example, the json response is :
{
"Response":[
{"from":"a","to":"bhduh"},{"from":"b","to":"eieja"},{"from":"tes","to":"neesj"}
]
}
String.replace() method won't work here, because first it will replace a to bhduh, then b to eieja, BUT here's the problem, it will convert b in bhduh to eieja, which i don't want to.
I want to perfectly convert the letters and "words" in the String according the Json, but that what i'm failing at .
New Code :
if(m_jArry.length() > 0){
HashMap<String, String> m_li;
EnteredString = EnteredText.getText().toString();
Log.i("TestAf_","Before Converting: " + EnteredString);
HashMap<String,String> replacements = new HashMap<String,String>();
for (int i = 0; i < m_jArry.length(); i++) {
JSONObject jo_inside = m_jArry.getJSONObject(i);
String Original = jo_inside.getString("from");
String To = jo_inside.getString("to");
if(isMethodConvertingIn){
//EnteredString = EnteredString.replace(" ","_");
replacements.put(Original,To);
Log.i("TestAf_","From: " + Original + " - To: " + To + " - Loop: " + i);
//EnteredString = EnteredString.replace(" ","_");
//EnteredString = EnteredString.replace(Original,To + " ");
} else {
EnteredString = EnteredString.replace("_"," ");
EnteredString = EnteredString.replace("'" + To + "'", Original);
}
}
Log.i("TestAf_","After Converting: " + replaceTokens(EnteredString,replacements));
// Replace Logic Here
// When Finish, Do :
LoadingProgress.setVisibility(View.GONE);
SetResultText(replaceTokens(EnteredString,replacements));
ShowResultCardView();
Output :
10-10 19:51:19.757 12113-12113/? I/TestAf_: Before Converting: ab a ba
10-10 19:51:19.757 12113-12113/? I/TestAf_: From: a - To: bhduh - Loop: 0
10-10 19:51:19.757 12113-12113/? I/TestAf_: From: b - To: eieja - Loop: 1
10-10 19:51:19.757 12113-12113/? I/TestAf_: From: o - To: neesj - Loop: 2
10-10 19:51:19.758 12113-12113/? I/TestAf_: After Converting: ab a ba

You question would be clearer if you gave the expected output for the function.
Assuming it is: ab b test a b >>>> bhduheieja eieja neesjt bhduh eieja
then see the following, the key point in the Javadoc being "This will not repeat"
http://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/StringUtils.html#replaceEach(java.lang.String,%20java.lang.String[],%20java.lang.String[])
Replaces all occurrences of Strings within another String.
A null reference passed to this method is a no-op, or if any "search
string" or "string to replace" is null, that replace will be ignored.
This will not repeat. For repeating replaces, call the overloaded
method.
Example 1
import org.apache.commons.lang3.StringUtils;
public class StringReplacer {
public static void main(String[] args) {
String input = "ab b test a b";
String output = StringUtils.replaceEach(input, new String[] { "a", "b", "tes" },
new String[] { "bhduh", "eieja", "neesj" });
System.out.println(input + " >>>> " + output);
}
}
Example 2
import org.apache.commons.lang3.StringUtils;
public class StringReplacer {
public static void main(String[] args) {
String input = "this is a test string with foo";
String output = StringUtils.replaceEach(input, new String[] { "a", "foo" },
new String[] { "foo", "bar"});
System.out.println(input + " >>>> " + output);
}
}

Try following:
Solution 1:
Traverse the String characters one by one and move the new String to a new StringBuffer or StringBuilder, then call toString() to get the result. This will need you to implement string matching algorithm.
Solution 2 (Using Regex):
For this, you must know the domain of your string. For example, it is [a-zA-Z] then other arbitrary characters (not part of domain) can be used for intermediate step. First replace the actual characters with arbitrary one then arbitrary ones with the target. In example below, [!##] are the arbitrary characters. These can be any random \uxxxx value as well.
String input = "a-b-c";
String output = input.replaceAll("[a]", "!").replaceAll("[b]", "#").replaceAll("[c]", "#");
output = output.replaceAll("[!]", "bcd").replaceAll("[#]", "cde").replaceAll("[#]", "def");
System.out.println("input: " + input);
System.out.println("Expected: bcd-cde-def");
System.out.println("Actual: " + output);

Your issue is quite common. To sum things up :
String test = "this is a test string with foo";
System.out.println(test.replace("a", "foo").replace("foo", "bar"));
Gives : this is bar test string with bar
Expected by you : this is foo test string with bar
You can use StrSubstitutor from Apache Commons Lang
But first you will have to inject placeholders in your string :
String test = "this is a test string with foo";
Map<String, String> valuesMap = new HashMap<>();
valuesMap.put("a", "foo");
valuesMap.put("foo", "bar");
String testWithPlaceholder = test;
// Preparing the placeholders
for (String value : valuesMap.keySet())
{
testWithPlaceholder = testWithPlaceholder.replace(value, "${"+value+"}");
}
And then, use StrSubstitutor
System.out.println(StrSubstitutor.replace(testWithPlaceholder, valuesMap));
It gives : this is foo test string with bar

Here is an method which is strictly just Java. I tried not to use any Java 8 methods here.
public static String translate(final String str, List<String> from, List<String> to, int index) {
StringBuilder components = new StringBuilder();
String token, replace;
int p;
if (index < from.size()) {
token = from.get(index);
replace = to.get(index);
p = 0;
for (int i = str.indexOf(token, p); i != -1; i = str.indexOf(token, p)) {
if (i != p) {
components.append(translate(str.substring(p, i), from, to, index + 1));
}
components.append(replace);
p = i + token.length();
}
return components.append(translate(str.substring(p), from, to, index + 1)).toString();
}
return str;
}
public static String translate(final String str, List<String> from, List<String> to) {
if (null == str) {
return null;
}
return translate(str, from, to, 0);
}
Sample test program
public static void main(String []args) {
String EnteredString = "aa hjkyu batesh a";
List<String> from = new ArrayList<>(Arrays.asList("a", "b", "tes"));
List<String> to = new ArrayList<>(Arrays.asList("bhduh", "eieja", "neesj"));
System.out.println(translate(EnteredString, from, to));
}
Output:
bhduhbhduh hjkyu eiejabhduhneesjh bhduh
Explaination
The algorithm is recursive, and it simply does the following
If a pattern found in the string matches a pattern in the from list
if there is any string before that pattern, apply the algorithm to that string
replace the found pattern with the corresponding pattern in the to list
append the replacement to the new string
discard the pattern in the from list and repeat the algorithm for the rest of the string
Otherwise append the rest of the string to the new string

You could use split like:
String[] pieces = jsonResponse.split("},{");
then you just parse the from and to in each piece and apply them with replace() then put the string back together again. (and please get your capitalization of your variables/methods right - makes it very hard to read the way you have it)

Apache Commons StringUtils::replaceEach does this.
String[] froms = new String[] {"a", "b"};
String[] tos = new String[] {"b","c"};
String result = StringUtils.replaceEach("ab", froms, tos);
// result is "bc"

Why not keep it very simple (if the JSON is always in same format, EG: from the same system). Instead of replacing from with to, replace the entire markup:
replace "from":"*from*" with "from":"*to*"

Why not just change the actual "to" and "from" labels? That way, you don't run into a situation where "bhudh" becomes "eieja". Just do a string replace on "from" and "to".

How to replace a word with specific word

I have a String:
String s="<p>Dear <span>{customerName}, your {accountName} is actived </span></p><p> </p><p><span>Congrats!.....</span></p>";
So I want to take CustomerName and accountName words and replace with customers details. Can anyone please tell me how can I replace. Here customerName and accountName are dynamically changing ..because those are columns in database sometimes different columns. So i want to find the words within the { and } and need to replace with column data.

Use the following code
s = s.replace("{customerName}", realCustomerName);
s = s.replace("{accountName}", realAccountNAme);
With String's replace function, the first argument is the string you want to replace, and the second argument is the string you want to insert.

Try:
s=s.replace('{customerName}',CustomerName ).replace('{accountName}',accountName);
where CustomerName and accountName will be the strings holding your customers details

If you simply want to replace the words, you could do the following:
String s="<p>Dear <span>{customerName}, your {accountName} is actived </span></p><p> </p><p><span>Congrats!.....</span></p>";
s.replace( "{customerName}", customer.getName() );
s.replace( "{accountName}", account.getName() );
Or, if you are building the string yourself and you can modify it, it might be better to do the following:
String s="<p>Dear <span>%1$s, your %1$s is actived </span></p><p> </p><p><span>Congrats!.....</span></p>";
// You may also just create a new String object...
s = String.format( s, customer.getName(), account.getName() );

Finally, I found the answer to replace the words using regular expressions. Here words b/w ~ need to replace and these words are not fixed and dynamically will be added to string from UI text Area.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularEx {
/**
* #param args
*/
public static void main(String args[]) {
Pattern pattern = Pattern.compile("\\~.*?\\~");
StringBuilder s = new StringBuilder(
"~ABCD~~BBCc~All the best ~ABCD~~BBCc~~in~~Raja~ Such kind of people ~in~~Raja~~ABCD~~BBCc~~in~~Raja~rajasekhar~ABCD~~BBCc~~in~~Raja~ Bayanapalli ~Chinthalacheruvu~");
Matcher matcher = pattern.matcher(s);
// using Matcher find(), group(), start() and end() methods
String s1 =new String("~ABCD~~BBCc~All the best ~ABCD~~BBCc~~in~~Raja~ Such kind of people ~in~~Raja~~ABCD~~BBCc~~in~~Raja~rajasekhar~ABCD~~BBCc~~in~~Raja~ Bayanapalli ~Chinthalacheruvu~");
int i = 0;
while (matcher.find()) {
String grp = matcher.group();
int si = matcher.start();
int ei = matcher.end();
System.out.println("Found the text \"" + grp
+ "\" starting at " + si + " index and ending at index " + ei);
s1=s1.replaceAll(grp, "Raja");
//System.out.println("FinalString" + s1);
}
System.out.println("------------------------------------\nFinalString" + s1);
}
}

s = s.replace("{customerName}", "John Doe");
s = s.replace("{accountName}", "jdoe");

How to create variables from an input file

In my program I need to loop through a variety of dates. I am writing this program in java, and have a bit of experience with readers, but I do not know which reader would complete this task the best, or if another class would work better.
The dates would be input into a text file in the format as follows:
1/1/2013 to 1/7/2013
1/8/2013 to 1/15/2013
Or something of this manner. I would need to break each range of dates into 6 local variables for the loop, then change them for the next loop. The variables would be coded for example:
private static String startingMonth = "1";
private static String startingDay = "1";
private static String startingYear = "2013";
private static String endingMonth = "1";
private static String endingDay = "7";
private static String endingYear = "2013";
I imagine this could be done creating several delimiters to look for, but I do not know that this would be the easiest way. I have been looking at this post for help, but cant seem to find a relevant answer. What would be the best way to go about this?

There are several options.
You could use the scanner, and set the delimiter to include the slash. If you want the values as ints and not string, just use sc.nextInt()
Scanner sc = new Scanner(input).useDelimiter("\\s*|/");
// You can skip the loop to just read a single line.
while(sc.hasNext()) {
startingMonth = sc.next();
startingDay = sc.next();
startingYear = sc.next();
// skip "to"
sc.next()
endingMonth = sc.next();
endingDay = sc.next();
endingYear = sc.next();
}
You can use regex, as alfasin suggest, but this case is rather simple so you can just match the first and last space.
String str = "1/1/2013 to 1/7/2013";
String startDate = str.substring(0,str.indexOf(" "));
String endDate = str.substring(str.lastIndexOf(" ")+1);¨
// The rest is the same:
String[] start = startDate.split("/");
System.out.println(start[0] + "-" + start[1] + "-" + start[2]);
String[] end = endDate.split("/");
System.out.println(end[0] + "-" + end[1] + "-" + end[2]);

String str = "1/1/2013 to 1/7/2013";
Pattern pattern = Pattern.compile("(\\d+/\\d+/\\d+)");
Matcher matcher = pattern.matcher(str);
matcher.find();
String startDate = matcher.group();
matcher.find();
String endDate = matcher.group();
String[] start = startDate.split("/");
System.out.println(start[0] + "-" + start[1] + "-" + start[2]);
String[] end = endDate.split("/");
System.out.println(end[0] + "-" + end[1] + "-" + end[2]);
...
OUTPUT
1-1-2013
1-7-2013

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java string parsing with different regex to split - java

Related

Get a specific data values from a string in Java (String without comma)

How to remove text between brackets in multiple lines

Multiple string replacements without affecting substituted text in subsequent iterations

How to replace a word with specific word

How to create variables from an input file

Categories

Resources