regex or string parsing

regex or string parsing - java

I am trying to parse a string which has a specific pattern. An example valid string is as follows:
<STX><DATA><ETX>
<STX>A?123<ETX>
<STX><DATA><ETX>
<STX>name!xyz<ETX>
<STX>age!27y<ETX>
<STX></DATA><ETX>
<STX>A?234<ETX>
<STX><DATA><ETX>
<STX>name!abc<ETX>
<STX>age!24y<ETX>
<STX></DATA><ETX>
<STX>A?345<ETX>
<STX><DATA><ETX>
<STX>name!bac<ETX>
<STX>age!22y<ETX>
<STX></DATA><ETX>
<STX>OK<ETX>
<STX></DATA><ETX>
this data is sent by device. All I need is to parse this string with id:123 name:xyz, age 27y.
I am trying to use this regex:
final Pattern regex = Pattern.compile("(.*?)", Pattern.DOTALL);
this does output the required data :
<ETX>
<STX>A?123<ETX>
<STX><DATA><ETX>
<STX>name!xyz<ETX>
<STX>age!27y<ETX>
<STX>
How can I loop the string recursively to copy all into list of string.
I am trying to loop over and delete the extracted pattern but it doesn't delete.
final Pattern regex = Pattern.compile("<DATA>(.*?)</DATA>", Pattern.DOTALL);// Q?(.*?)
final StringBuffer buff = new StringBuffer(frame);
final Matcher matcher = regex.matcher(buff);
while (matcher.find())
{
final String dataElements = matcher.group();
System.out.println("Data:" + dataElements);
}
}
Are there any beter ways to do this.
This is the output I am currently getting:
Data:<DATA><ETX><STX>A?123<ETX><STX><DATA><ETX><STX>name!xyz<ETX><STX>age!27y<ETX><STX> </DATA>
Data:<DATA><ETX><STX>name!abc<ETX><STX>age!24y<ETX><STX></DATA>
Data:<DATA><ETX><STX>name!bac<ETX><STX>age!22y<ETX><STX></DATA>
I am missing the A?234 and A?345 in the next two matches.

I really dont know what exactly you want to achieve by this but if you want to remove the occurances of that pattern this line:
buff.toString().replace(dataElements, "")
doesn't look good. you are just editing the string representation of that buff. You have to again replace the edited version back into the buff (after casting).

Using this regex solves my issue:
<STX>(A*)(.*?)<DATA>(.*?)</DATA>

Related

Regex Redirect URL excludes token

I'm trying to create a redirect URL for my client. We have a service that you specify "fromUrl" -> "toUrl" that is using a java regex Matcher. But I can't get it work to include the token in when it converts it. For example:
/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
Should be:
/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
but it excludes the token so the result I get is:
/fromurl/login/
/tourl/login/
I tried various regex patterns like: " ?.* and [%5E//?]+)/([^/?]+)/(?.*)?$ and (/*) etc" but no one seems to work.
I'm not that familiar with regex. How can I solve this?

This can be easily done using simple string replace but if you insist on using regular expressions:
Pattern p = Pattern.compile("fromurl");
String originalUrlAsString = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf ";
String newRedirectedUrlAsString = p.matcher(originalUrlAsString).replaceAll("tourl");
System.out.println(newRedirectedUrlAsString);

If I understand you correctly you need something like this?
String from = "/my/old/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceAll("\\/(.*)\\/", "/my/new/url/");
System.out.println(to); // /my/new/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
This will replace everything between the first and the last forward slash.

Can you detail more exactly what the original expression is like? This is necessary because the regular expression is based on it.
Assuming that the first occurrence of fromurl should simply be replaced with the following code:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceFirst("fromurl", "tourl");
But if it is necessary to use more complex rules to determine the substring to replace, you can use:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = "";
String regularExpresion = "(<<pre>>)(fromurl)(<<pos>>)";
Pattern pattern = Pattern.compile(regularExpresion);
Matcher matcher = pattern.matcher(from);
if (matcher.matches()) {
to = from.replaceAll(regularExpresion, "$1tourl$3");
}
NOTE: pre and pos targets are referencial because I don't know the real expresion of the url
NOTE 2: $1 and $3 refer to the first and the third group

Although existing answers should solve the issue and some are similar, maybe below solution would be of help, with quite an easy regex being used (assuming you get input of same format as your example):
private static String replaceUrl(String inputUrl){
String regex = "/.*(/login\\?token=.*)";
String toUrl = "/tourl";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(inputUrl);
if (matcher.find()) {
return toUrl + matcher.group(1);
} else
return null;
}
You can write a test if it works for other expected inputs/outputs if you want to change format and adjust regex:
String inputUrl = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String expectedUrl = "/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
if (expectedUrl.equals(replaceUrl(inputUrl))){
System.out.println("Success");
}

How to remove an id out of a path using a Java Regex?

I am trying to get rid of an "id" in URI paths and I can only use Java regex transformation.
The paths look like this:
/web/service/1223345/add
/web/service/1223345/delete
/web/service/v2/1223345/add
/web/service/1223345
/web/service/do
The id is always a series of numbers. In the example above it is "1223345".
I have tried a couple of regexes but none of them worked. Here are my tries:
(/\w.*)/?[0-9]*/(.*)
([^0-9]+){0,}
(/.*/)[0-9]*(/.*)
Thanks for your help

String input = "/web/service/1223345/add";
System.out.println(input.replaceAll("/\\d*/","/"));
Output:
/web/service/add

If you are after removing id, you could do the following:
String input = "/web/service/v2/1223345/add";
String removed = input.replaceAll("/\\d*/?", "/");
System.out.println(removed);
Note that arnoud's regex "/\d*/" will not work for e.g. /web/service/1223345.
Question mark at the end of the regex takes care of such cases: "/\d*/?"
If on the other hand you are after extracting id:
Pattern pattern = Pattern.compile(".*?/(\\d*?)(/.*)?$");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
String id = matcher.group(1);
System.out.println(id);
}

regex: Java: match word between 2 spaces

How can I extract the "id" from the following string using regex.
string = 11,"col=""book"" id=""title"" length=""10""
I need to be able to extract the "id" header along with the value "title".
outcome: id=""title""
I am trying to the use split function with a regex to extract the identifier from the string.

Try this:
String result = "col=\"book\" id=\"title\" length=\"10\"";
String pattern = ".*(id\\s*=\\s*\"[^\"]*\").*";
System.out.println(result.replaceAll(pattern,"$1"));
Cheers!

Use Pattern and Matcher classes to find what you are looking for. Try to find these regex \\bid=[^ ]*.
String data = "string = 11,\"col=\"\"book\"\" id=\"\"title\"\" length=\"\"10\"\"";
Matcher m = Pattern.compile("\\bid=[^ ]*").matcher(data);
if (m.find())
System.out.println(m.group());

How to extract word from string?

Suppose I have a string:
String message = "you should try http://google.com/";
Now, I want to send "http://google.com/" to a new
String url
What I want to do is:
check if a "word" in the string begins with "http://" and extract that word, where a word is
something that's surrounded by spaces (general english definition of word).
I have no idea how to extract the string, and the best I can do is use startsWith on the string. How to I use startsWith on a word, and extract the word?
Sorry if this is a little bit difficult to explain.
Thanks in advance!
EDIT: Also, what should I do to extract the word from the REGEX operation? And how should I handle it if there is more than 1 url in the string?

Use Pattern & Matcher classes.
String str = "blabla http://www.mywebsite.com blabla";
String regex = "((https?:\\/\\/)?(www.)?(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_/.0-9#:+?%=&;,]*)?)?)";
Matcher m = Pattern.compile(regex).matcher(str);
if (m.find()) {
String url = m.group(); //value "http://www.mywebsite.com"
}
This regex will work for http://..., https://... and even www... URLs. Others regex can be easily found on the net.

You can try this:
String str = "blabla http://www.mywebsite.com blabla";
Matcher m = Pattern.compile("(http://.*)").matcher(str);
if (m.find()) {
String url = (new StringTokenizer(m.group(), " ")).nextToken();
}

The "correct" way to perform this task is to split the String by whitespace -- String#split("\s") -- and then pipe it to the URL constructor. If the string starts with your prefix and a MalformedURLException is thrown it is invalid. The URL class constructor is far better tested and more robust than any solution that you or I could come up with. So, use it, please and don't reinvent the wheel.

You can use Java Regex for this:
The following regex catches any string starting with http:// or https:// till the next whitespace character:
Pattern urlPattern = Pattern.compile("(http(s)?://[.^[\\S]]*)");
Matcher matcher = compile.matcher(myString);
if (matcher.find()) {
String url = matcher.group();
}

How can i derive specific data from the string?

I have the following string and i want to derive the number (104321) from the a href tag . How can i derive this number .
Hello this is testing string Ap<img src=\"Image Url" width=\"222\" height=\"149\"/><br/><br/>test\u00e4n p\u00e4\u00e4ll\u00e4 test, test\u00e4, test?
i want the final output to be like this.
String[] strExample= {"testing", "104321","test\u00e4n p\u00e4\u00e4ll\u00e4 test, test\u00e4, test?"};
Any help is appreciated.

You could try a simple Pattern matcher with the regexp:
String THE_PATTERN = "<a\\s+href\\s*=\\s*\"/([a-zA-Z]+)/([0-9]+)";
Matcher m = Pattern.compile(THE_PATTERN).matcher(THE_INPUT_STRING);
String[] results = new String[2];
if (m.find()) {
results[0] = m.group(1);
results[1] = m.group(2);
}
Haven't tried it though, so there could be small/easy-to-fix errors.

For that single case
String[] strExample = str.split("^.+?\\\"/|\\\\\">.+<br/>|/");
will work. It will break if the string you want to parse changes much though. Some more examples would probably be in place if there are more patterns you need to account for.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex or string parsing - java

Using this regex solves my issue: <STX>(A)(.?)<DATA>(.*?)</DATA>

Related

Regex Redirect URL excludes token

How to remove an id out of a path using a Java Regex?

regex: Java: match word between 2 spaces

How to extract word from string?

How can i derive specific data from the string?

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex or string parsing - java

Using this regex solves my issue: <STX>(A*)(.*?)<DATA>(.*?)</DATA>

Related

Regex Redirect URL excludes token

How to remove an id out of a path using a Java Regex?

regex: Java: match word between 2 spaces

How to extract word from string?

How can i derive specific data from the string?

Categories

Resources

Using this regex solves my issue: <STX>(A)(.?)<DATA>(.*?)</DATA>