cannot parse String with Java Regex

cannot parse String with Java Regex - java

I have a string formatted as below:
source1.type1.8371-(12345)->source2.type3.3281-(38270)->source4.type2.903..
It's a path, the number in () is the weight for the edge, I tried to split it using java Pattern as following:
[a-zA-Z.0-9]+-{1}({1}\\d+){1}
[a-zA-Z_]+.[a-zA-Z_]+.(\\d)+-(\\d+)
[a-zA-Z.0-9]+-{1}({1}\\d+){1}-{1}>{1}
hopefully it split the string into fields like
source1.type1.8371-(12345)
source2.type3.3281-(38270)
..
but none of them work, it always return the whole string as the field.

It looks like you just want String.split("->") (javadoc). This splits on the symbol -> and returns an array containing the parts between ->.
String str = "source1.type1.8371-(12345)->source2.type3.3281-(38270)->source4.type2.903..";
for(String s : str.split("->")){
System.out.println(s);
}
Output
source1.type1.8371-(12345)
source2.type3.3281-(38270)
source4.type2.903..

It seems to me like you want to split at the ->'s. So you could use something like str.split("->") If you were more specific about why you need this maybe we could understand why you were trying to use those complicated regexes

Related

Splitting string with similar starting pattern

So, I've been trying to split something I'm reading from a file. But everything that I've tried does not give me only the part that I want.
What I have as string is this:
Scenario:
Bunch of stuf here
Just typing stuff for the example...
Scenario:
More stuff here
A lot more stuff here
XX123
I want to get everything from 'Scenario:' to 'XX123'
Like this:
Scenario:
More stuff here
A lot more stuff here
XX123
The file that I'm reading from have a lot of those 'Scenarios:' and using Pattern from java doesn't give me only the part that I want. Instead it gives from the first 'Scenario:' it finds until 'XX123'
I also tried to use StringUtils.substringBetween, same result.
Thanks in advance

The old-fashioned way to do it would look something like this:
String inputText;
String END_MARKER = "XXX123";
int indexOfEnd = inputText.indexOf(END_MARKER);
// search in reverse
int indexOfScenario = inputText.lastIndexOf("Scenario", indexOfEnd);
String result = inputText.substring(indexOfScenario,
indexOfEnd + END_MARKER.length());

How to replace xml empty tags using regex

I have a lot of empty xml tags which needs to be removed from string.
String dealData = dealDataWriter.toString();
someData = someData.replaceAll("<somerandomField1/>", "");
someData = someData.replaceAll("<somerandomField2/>", "");
someData = someData.replaceAll("<somerandomField3/>", "");
someData = someData.replaceAll("<somerandomField4/>", "");
This uses a lot of string operations which is not efficient, what can be better ways to avoid these operations.

I would not suggest to use Regex when operating on HTML/XML... but for a simple case like yours maybe it is ok to use a rule like this one:
someData.replaceAll("<\\w+?\\/>", "");
Test: link
If you want to consider also the optional spaces before and after the tag names:
someData.replaceAll("<\\s*\\w+?\\s*\\/>", "");
Test: link

Try the following code, You can remove all the tag which does not have any space in it.
someData.replaceAll("<\w+/>","");

Alternatively to using regex or string matching, you can use an xml parser to find empty tags and remove them.
See the answers given over here: Java Remove empty XML tags

If you like to remove <tagA></tagA> and also <tagB/> you can use following regex. Please note that \1 is used to back reference matching group.
// identifies empty tag i.e <tag1></tag> or <tag/>
// it also supports the possibilities of white spaces around or within the tag. however tags with whitespace as value will not match.
private static final String EMPTY_VALUED_TAG_REGEX = "\\s*<\\s*(\\w+)\\s*></\\s*\\1\\s*>|\\s*<\\s*\\w+\\s*/\\s*>";
Run the code on ideone

Java: reading a string in a particular format

I am not posting any code I am struck with. I am trying this in Java:
Issue:
I have words like:
,xxxx-1223
yyyyy,xxdd-345
$,xxxxr-7
sdsdsdd-18
so what ever format I have I should be able to read the last one:
xxxx-1223
xxdd-345
xxxxr-7
sdsdsdd-18
what so may be the words, all I need to to get the words as shown.

Use String#lastIndexOf(int) to find where the last comma occurs, and use String#substring(int) to get the rest of the string that follows.
String input = /* whatever */;
int lastComma = input.lastIndexOf(',');
String output = input.substring(lastComma + 1);

String[] str=yourWord.split(",");
String output=str[str.length-1];

You can use this Regex: -
(\\w+-\\d+)$
Or this specific problem can simply be solved using String.split() or String.substring(int) methods

Add exact number of spaces to start of string using java.util.Formatter

I am using Formatter to output Java code to a file. I want to add a specific number of spaces to the start of each line. My problem is I cannot find a way to do this "neatly". The standard options seem to only allow adding a minimum number of spaces but not a specific number of spaces.
As a work around, I am currently doing the following:
out.format("%7s%s", "", "My text"); but I'd like to do it with only two arguments like this out.format("%7s", "My text");.
Does anyone know if there is a way to do this using the standard Formatter options?

I'm not exactly sure what you want here:
out.format("xxx%10sxxx", "My text");
// prints: xxx My textxxx
While:
out.format("xxx%-10sxxx", "My text");
// prints: xxxMy text xxx
As far as I know, there is no way to do the old C-style formatting to specify the size in an argument like "%*s" because then you could pass in (str.length() + 7).
I'm afraid that your way seems to the the most "neat". If you can explain why you don't like it maybe we can find a better workaround.

You can prepend text into your string.
Another way to reapet any string which you can use this code:-
String str = "abc";
String repeated = StringUtils.repeat(str, 3);
here StringUtils is org.apache.commons.lang3.StringUtils class.

Use Commons Lang
String line = "Hello World!";
int numberOfSpaces = 2;
String lineWithSpacePadding = StringUtils.leftPad(line, line.length() + numberOfSpaces);

Java regular expression for extracting the data between tags

I am trying to a regular expression which extracs the data from a string like
<B Att="text">Test</B><C>Test1</C>
The extracted output needs to be Test and Test1. This is what I have done till now:
public class HelloWorld {
public static void main(String[] args)
{
String s = "<B>Test</B>";
String reg = "<.*?>(.*)<\\/.*?>";
Pattern p = Pattern.compile(reg);
Matcher m = p.matcher(s);
while(m.find())
{
String s1 = m.group();
System.out.println(s1);
}
}
}
But this is producing the result <B>Test</B>. Can anybody point out what I am doing wrong?

Three problems:
Your test string is incorrect.
You need a non-greedy modifier in the group.
You need to specify which group you want (group 1).
Try this:
String s = "<B Att=\"text\">Test</B><C>Test1</C>"; // <-- Fix 1
String reg = "<.*?>(.*?)</.*?>"; // <-- Fix 2
// ...
String s1 = m.group(1); // <-- Fix 3
You also don't need to escape a forward slash, so I removed that.
See it running on ideone.
(Also, don't use regular expressions to parse HTML - use an HTML parser.)

If u are using eclipse there is nice plugin that will help you check your regular expression without writing any class to check it.
Here is link:
http://regex-util.sourceforge.net/update/
You will need to show view by choosing Window -> Show View -> Other, and than Regex Util
I hope it will help you fighting with regular expressions

It almost looks like you're trying to use regex on XML and/or HTML. I'd suggest not using regex and instead creating a parser or lexer to handle this type of arrangement.

I think the bestway to handle and get value of XML nodes is just treating it as an XML.
If you really want to stick to regex try:
<B[^>]*>(.+?)</B\s*>
understanding that you will get always the value of B tag.
Or if you want the value of any tag you will be using something like:
<.*?>(.*?)</.*?>

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

cannot parse String with Java Regex - java

It seems to me like you want to split at the ->'s. So you could use something like str.split("->") If you were more specific about why you need this maybe we could understand why you were trying to use those complicated regexes

Related

Splitting string with similar starting pattern

How to replace xml empty tags using regex

Java: reading a string in a particular format

Add exact number of spaces to start of string using java.util.Formatter

Java regular expression for extracting the data between tags

Categories

Resources