Regex to extract float from String

Regex to extract float from String - java

I have a file which contains the following string data
...
v -0.570000 -0.950000 -0.100000
v 0.570000 -0.950000 -0.100000
v -0.570000 -0.760000 -0.100000
v 0.570000 -0.760000 -0.100000
...
f 34 22
f 3 35 3
f 345 22
f 55 632 76
f 55 632
....
From this file I want to extract all the numbers from the lines starting with 'v' and 'f'. I have written the following regex for it.
v(?:\s([0-9\-\.]+))+
Output:
group 1: -0.100000
f(?:\s([0-9]+))+
Output:
group 1: 22
But as you can see the output is only extracting the last numbers from each line, I want the output as follows:
Output:
group 1: -0.570000
group 2: -0.950000
group 3: -0.100000
Output:
group 1: 34
group 2: 22
Can someone please help me here?

Here is the solution which I used finally, in case someone needs it.
I used two Regex instead of using string splitting as suggested in the above comments.
public class ObjParser {
private static final Pattern vertexLinePattern = Pattern.compile("^v\\s(.+)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
private static final Pattern faceLinePattern = Pattern.compile("^f\\s(.+)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
private static final Pattern vertexValuePattern = Pattern.compile("([0-9\\-\\.]+)");
private static final Pattern faceValuePattern = Pattern.compile("([0-9]+)");
private List<Float> vertices = new LinkedList<Float>();
private List<Short> faces = new LinkedList<Short>();
public void parseVertices(String data) {
Matcher matcher1 = vertexLinePattern.matcher(data);
while(matcher1.find()) {
String line = matcher1.group(1);
Matcher matcher2 = vertexValuePattern.matcher(line);
while(matcher2.find()) {
vertices.add(Float.parseFloat(matcher2.group(1)));
}
}
}
public void parseFaces(String data) {
Matcher matcher1 = faceLinePattern.matcher(data);
while(matcher1.find()) {
String line = matcher1.group(1);
Matcher matcher2 = faceValuePattern.matcher(line);
while(matcher2.find()) {
short no = (short)(Integer.parseInt(matcher2.group(1)) - 1); // -1 due to 1 based index in OBJ files
faces.add(no);
}
}
}
public List<Float> getVertices() {
return vertices;
}
public List<Short> getFaces() {
return faces;
}
}

Related

RegEx for matching the last number in vectors of different sizes

I have this string:
values="[72, 216, 930],[250],[72],[228, 1539],[12]";
am trying to combine two patterns in order to get the last number in first [] type and the number in the second [] type.
pattern="\\, ([0-9]+)\\]|\\[([0-9]+)\\]"
But it outputs null:
930, null, null, 1539, null
How do I solve this problem?

Here, we might not want to bound it from the left, and simply use the ] from right, then we swipe to left and collect our digits, maybe similar to this expression:
([0-9]+)\]
Graph
This graph shows how it would work:
If you like, we can also bound it from the left, similar to this expression:
([\[\s,])([0-9]+)(\])
Graph
This graph shows how the second one would work:

Try this.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = ", ([0-9]+)]";
final String string = "[72, 216, 930],[250],[72],[228, 1539],[12]";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
Output:
Full match: , 930]
Group 1: 930
Full match: , 1539]
Group 1: 1539

package Sample;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class StackOverFlow{
final static String regex = "\\d*]";
final static String string = "[72, 216, 930],[250],[72],[228, 1539],[12]";
final static Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final static Matcher matcher = pattern.matcher(string);
public static void main(String[] args) {
while (matcher.find()) {
String val = matcher.group(0).replace("]", "");
System.out.println(val);
}
}
}
output
930
250
72
1539
12

To make sure that the data is actually in between square brackets, you could use a capturing group, start the match with [ and end the match with ]
\[(?:\d+,\h+)*(\d+)]
In Java
\\[(?:\\d+,\\h+)*(\\d+)]
\[ Match [
(?:\d+,\h+)* Repeat 0+ times matching 1+ digit, comma and 1+ horizontal whitespace chars
(\d+) Capture in group 1 matching 1+ digit
] Match closing square bracket
Regex demo | Java demo
For example:
String regex = "\\[(?:\\d+,\\h+)*(\\d+)]";
String string = "[72, 216, 930],[250],[72],[228, 1539],[12]";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result:
930
250
72
1539
12

It seems like for a structure such as this, it's likely beneficial to parse the whole thing into memory, then index into the elements you're particularly interested in to your heart's content. Should the structure change unexpectedly/dynamically, you won't need to rewrite your regex, just index as needed as many times as you wish:
import java.util.*;
class Main {
public static void main(String[] args) {
String values = "[72, 216, 930],[250],[72],[228, 1539],[12]";
String[] data = values.substring(1, values.length() - 1).split("\\]\\s*,\\s*\\[");
ArrayList<String[]> result = new ArrayList<>();
for (String d : data) {
result.add(d.split("\\s*,\\s*"));
}
System.out.println(result.get(0)[result.get(0).length-1]); // => 930
System.out.println(result.get(1)[0]); // => 250
}
}

REGEX: Get double (positive or negative) from string [duplicate]

let's say i have string like that:
eXamPLestring>1.67>>ReSTOfString
my task is to extract only 1.67 from string above.
I assume regex will be usefull, but i can't figure out how to write propper expression.

If you want to extract all Int's and Float's from a String, you can follow my solution:
private ArrayList<String> parseIntsAndFloats(String raw) {
ArrayList<String> listBuffer = new ArrayList<String>();
Pattern p = Pattern.compile("[0-9]*\\.?[0-9]+");
Matcher m = p.matcher(raw);
while (m.find()) {
listBuffer.add(m.group());
}
return listBuffer;
}
If you want to parse also negative values you can add [-]? to the pattern like this:
Pattern p = Pattern.compile("[-]?[0-9]*\\.?[0-9]+");
And if you also want to set , as a separator you can add ,? to the pattern like this:
Pattern p = Pattern.compile("[-]?[0-9]*\\.?,?[0-9]+");
.
To test the patterns you can use this online tool: http://gskinner.com/RegExr/
Note: For this tool remember to unescape if you are trying my examples (you just need to take off one of the \)

You could try matching the digits using a regular expression
\\d+\\.\\d+
This could look something like
Pattern p = Pattern.compile("\\d+\\.\\d+");
Matcher m = p.matcher("eXamPLestring>1.67>>ReSTOfString");
while (m.find()) {
Float.parseFloat(m.group());
}

Here's how to do it in one line,
String f = input.replaceAll(".*?(-?[\\d.]+)?.*", "$1");
Which returns a blank String if there is no float found.
If you actually want a float, you can do it in one line:
float f = Float.parseFloat(input.replaceAll(".*?(-?[\\d.]+).*", "$1"));
but since a blank cannot be parsed as a float, you would have to do it in two steps - testing if the string is blank before parsing - if it's possible for there to be no float.

String s = "eXamPLestring>1.67>>ReSTOfString>>0.99>>ahgf>>.9>>>123>>>2323.12";
Pattern p = Pattern.compile("\\d*\\.\\d+");
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(">> "+ m.group());
}
Gives only floats
>> 1.67
>> 0.99
>> .9
>> 2323.12

You can use the regex \d*\.?,?\d* This will work for floats like 1.0 and 1,0

Have a look at this link, they also explain a few things that you need to keep in mind when building such a regex.
[-+]?[0-9]*\.?[0-9]+
example code:
String[] strings = new String[3];
strings[0] = "eXamPLestring>1.67>>ReSTOfString";
strings[1] = "eXamPLestring>0.57>>ReSTOfString";
strings[2] = "eXamPLestring>2547.758>>ReSTOfString";
Pattern pattern = Pattern.compile("[-+]?[0-9]*\\.?[0-9]+");
for (String string : strings)
{
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
System.out.println("# float value: " + matcher.group());
}
}
output:
# float value: 1.67
# float value: 0.57
# float value: 2547.758

/**
* Extracts the first number out of a text.
* Works for 1.000,1 and also for 1,000.1 returning 1000.1 (1000 plus 1 decimal).
* When only a , or a . is used it is assumed as the float separator.
*
* #param sample The sample text.
*
* #return A float representation of the number.
*/
static public Float extractFloat(String sample) {
Pattern pattern = Pattern.compile("[\\d.,]+");
Matcher matcher = pattern.matcher(sample);
if (!matcher.find()) {
return null;
}
String floatStr = matcher.group();
if (floatStr.matches("\\d+,+\\d+")) {
floatStr = floatStr.replaceAll(",+", ".");
} else if (floatStr.matches("\\d+\\.+\\d+")) {
floatStr = floatStr.replaceAll("\\.\\.+", ".");
} else if (floatStr.matches("(\\d+\\.+)+\\d+(,+\\d+)?")) {
floatStr = floatStr.replaceAll("\\.+", "").replaceAll(",+", ".");
} else if (floatStr.matches("(\\d+,+)+\\d+(.+\\d+)?")) {
floatStr = floatStr.replaceAll(",", "").replaceAll("\\.\\.+", ".");
}
try {
return new Float(floatStr);
} catch (NumberFormatException ex) {
throw new AssertionError("Unexpected non float text: " + floatStr);
}
}

check if the text has more than one link

I want to check if the text has more than one link or not
so for that i started with the following code:
private static void twoOrMorelinks(String commentstr){
String urlPattern = "^.*((?:http|https):\\/\\/\\S+){1,}.*((?:http|https):\\/\\/\\S+){1,}.*$";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.find()) {
System.out.println("yes");
}
}
But the above code is not very professional and I am looking for something as follow:
private static void twoOrMorelinks(String commentstr){
String urlPattern = "^.*((?:http|https):\\/\\/\\S+){2,}.*$";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.find()) {
System.out.println("yes");
}
}
But this code does not work for instance I expect the code to show match for the following text but it does not:
They say 2's company watch live on...? http://www.le testin this code http://www.lexilogos.com
any idea?

Just use this to count how many links you have:
private static int countLinks(String str) {
int total = 0;
Pattern p = Pattern.compile("(?:http|https):\\/\\/");
Matcher m = p.matcher(str);
while (m.find()) {
total++;
}
return total;
}
Then
boolean hasMoreThanTwo = countLinks("They say 2's company watch live on...? http://www.le testin this code http://www.lexilogos.com") >= 2;
If you just want to know if you have two or more, just exit after you found two.

I suggest to use the find method instead of the matches that must check all the string. I rewrite your pattern to limit the amount of backtracking:
String urlPattern = "\\bhttps?://[^h]*+(?:(?:\\Bh|h(?!ttps?://))[^h]*)*+https?://";
Pattern p = Pattern.compile(urlPattern, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
if (m.find()) {
// true
} else {
// false
}
pattern details:
\\b # word boundary
https?:// # scheme for http or https
[^h]*+ # all that is not an "h"
(?:
(?:
\\Bh # an "h" not preceded by a word boundary
| # OR
h(?!ttps?://) # an "h" not followed by "ttp://" or "ttps://"
)
[^h]*
)*+
https?:// # an other scheme

Regex to match a number or nothing

i need to get a regex that can match something like this :
1234 <CIRCLE> 12 12 12 </CIRCLE>
1234 <RECTANGLE> 12 12 12 12 </RECTANGLE>
i've come around to write this regex :
(\\d+?) <([A-Z]+?)> (\\d+?) (\\d+?) (\\d+?) (\\d*)? (</[A-Z]+?>)
It works fine for when i'm trying to match the rectangle, but it doesn't work for the circle
the problem is my fifth group is not capturing though it should be ??

Try
(\\d+?) <([A-Z]+?)> (\\d+?) (\\d+?) (\\d+?) (\\d+ )?(</[A-Z]+?>)
(I changed the last "\d" group to make the space optional too.)

That is because only (\\d*)? part is optional, but spaces before and after it are mandatory, so you end up requiring two spaces at end, if last (\\d*) would not be found. Try maybe with something like
(\\d+?) <([A-Z]+?)> (:?(\\d+?) ){3,4}(</[A-Z]+?>)
Oh, and if you want to make sure that closing tag is same as opening one you can use group references like \\1 will represent match from first group. So maybe update your regex to something like
(\\d+?) <([A-Z]+?)> (:?(\\d+?) ){3,4}(</\\2>)
// ^^^^^^^-----------------------^^^
// group 2 here value need to match one from group 2

Solution for just the numbers:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.annotation.Nonnull;
public class Q26005150
{
private static final Pattern P = Pattern.compile("(\\d+)");
public static void main(String[] args)
{
final String s1 = "1234 <CIRCLE> 12 12 12 </CIRCLE>";
final String s2 = "1234 <RECTANGLE> 12 12 12 12 </RECTANGLE>";
final List<Integer> l1 = getAllMatches(s1);
final List<Integer> l2 = getAllMatches(s2);
System.out.println("l1 = " + l1);
System.out.println("l2 = " + l2);
}
private static List<Integer> getAllMatches(#Nonnull final String s)
{
final Matcher m = P.matcher(s);
final List<Integer> matches = new ArrayList<Integer>();
while(m.find())
{
matches.add(Integer.valueOf(m.group(1)));
}
return matches;
}
}
Outputs:
l1 = [1234, 12, 12, 12]
l2 = [1234, 12, 12, 12, 12]
Answer on GitHub
Stackoverflow GitHub repository

Solution for the Numbers and the Tags
private static final Pattern P = Pattern.compile("(<\/?(\w+)>|(\d+))");
public static void main(String[] args)
{
final String s1 = "1234 <CIRCLE> 12 12 12 </CIRCLE>";
final String s2 = "1234 <RECTANGLE> 12 12 12 12 </RECTANGLE>";
final List<String> l1 = getAllMatches(s1);
final List<String> l2 = getAllMatches(s2);
System.out.println("l1 = " + l1);
System.out.println("l2 = " + l2);
}
private static List<String> getAllMatches(#Nonnull final String s)
{
final Matcher m = P.matcher(s);
final List<String> matches = new ArrayList<String>();
while(m.find())
{
final String match = m.group(1);
matches.add(match);
}
return matches;
}
Outputs:
l1 = [1234, <CIRCLE>, 12, 12, 12, </CIRCLE>]
l2 = [1234, <RECTANGLE>, 12, 12, 12, 12, </RECTANGLE>]
Answer on GitHub
Stackoverflow GitHub repository

assuming the labels between "<" & ">" has to match and the numbers in between are identical
use this pattern
^\d+\s<([A-Z]+)>\s(\d+\s)(\2)+<\/(\1)>$
Demo
or if numbers in the middle do not have to be identical and or optional:
^\d+\s<([A-Z]+)>\s(\d+\s)*<\/(\1)>$

How do you replace groups in a regular expression?

How, exactly, do you replace groups while appending them to a string buffer?
For Example:
(a)(b)(c)
How can you replace group 1 with d, group 2 with e and so on?
I'm working with the Java regex engine.
Thanks in advance.

You could use Matcher's appendReplacement
Here is an example sample using:
input: "hello bob How is your cat?"
regular expression: "(bob|cat)"
output: "hello alice How is your dog"
public static void main(String[] args) {
Pattern p = Pattern.compile("(bob|cat)");
Matcher m = p.matcher("hello bob How is your cat?");
StringBuffer s = new StringBuffer();
while (m.find()) {
m.appendReplacement(s, doReplace(m.group(1)));
}
m.appendTail(s);
System.out.println(s.toString());
}
public static String doReplace(String s) {
if(s.equals("bob")) {
return "alice";
}
if(s.equals("cat")) {
return "dog";
}
return "";
}

You could use Matcher#start(group) and Matcher#end(group) to build a generic replacement method:
public static String replaceGroup(String regex, String source, int groupToReplace, String replacement) {
return replaceGroup(regex, source, groupToReplace, 1, replacement);
}
public static String replaceGroup(String regex, String source, int groupToReplace, int groupOccurrence, String replacement) {
Matcher m = Pattern.compile(regex).matcher(source);
for (int i = 0; i < groupOccurrence; i++)
if (!m.find()) return source; // pattern not met, may also throw an exception here
return new StringBuilder(source).replace(m.start(groupToReplace), m.end(groupToReplace), replacement).toString();
}
public static void main(String[] args) {
// replace with "%" what was matched by group 1
// input: aaa123ccc
// output: %123ccc
System.out.println(replaceGroup("([a-z]+)([0-9]+)([a-z]+)", "aaa123ccc", 1, "%"));
// replace with "!!!" what was matched the 4th time by the group 2
// input: a1b2c3d4e5
// output: a1b2c3d!!!e5
System.out.println(replaceGroup("([a-z])(\\d)", "a1b2c3d4e5", 2, 4, "!!!"));
}
Check online demo here.

Are you looking for something like this?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Program1 {
public static void main(String[] args) {
Pattern p = Pattern.compile("(a)(b)(c)");
String str = "111abc222abc333";
String out = null;
Matcher m = p.matcher(str);
out = m.replaceAll("z$3y$2x$1");
System.out.println(out);
}
}
This gives 111zcybxa222zcybxa333 as output.
I guess you will see what this example does.
But OK, I think there's no ready built-in
method through which you can say e.g.:
- replace group 3 with zzz
- replace group 2 with yyy
- replace group 1 with xxx

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex to extract float from String - java

Related

RegEx for matching the last number in vectors of different sizes

REGEX: Get double (positive or negative) from string [duplicate]

check if the text has more than one link

Regex to match a number or nothing

How do you replace groups in a regular expression?

Categories

Resources