How to use regular expression for fetching specific data? - java

I have input stream with the following data:
---------------------------------------------
manil#manil-ubvm:~$ db2level
DB21085I Instance "manil" uses "64" bits and DB2 code release "SQL10010" with
level identifier "0201010E".
Informational tokens are "DB2 v10.1.0.0", "s120403", "LINUXAMD64101", and Fix
Pack "0".
Product is installed at "/home/manil/sqllib".
---------------------------------------------
From above i need v10.1.0.0 to be stored in a string variable.
How to do that using java regular expression?

Use something like this to capture the version pattern :
import java.util.regex.*;
public class RTest {
public static void main(String [] args) {
String raw_data = "asdkgjasdbf984 sdkjfashfiu 4qwsadkfjnv w98sa-asdf08gywbfsd v1231.123.12.11.1 fkjsdfn9823isd";
Pattern version_find = Pattern.compile("v[\\d+\\.?]+");
Pattern directory_find = Pattern.compile("[\\/[^\\/]+]+");
Matcher version_finder = version_find.matcher(raw_data);
while(version_finder.find()) {
System.out.println(version_finder.group());
}
}
}
Output is :
v1231.123.12.11.1
/isd/asdasd2903 ajshdaq09r34/adsj 38/
You really need to understand regexes deeply if you are a programmer. They are one of the essentials. They are hard at first, but once you 'crack them' you don't forget it. Like riding a bike.

This will suit your needs:
String version = yourLine.replaceAll(".*(v\\d+([.]\\d+){3}).*", "$1")

You dont need regularExpression here
just use
String .contain() method and String substring()

Related

Manipulate strings in mongodb that match the regex search

In java this is possible
public static void main(String[] args) {
Matcher m = Pattern .compile("^(.*?[.].*?[.].*?[.].*?)[.].*")
.matcher(
"com.SEM.Google.Generico.space.test");
if (m.matches()) {
System.out.println(m.group(1));
}
}
This would give me as result: com.SEM.Google.Generico
If I have a string in mongodb
"dv" : "com.SEM.Google.Generico.space.test"
can I use the mongo aggregation framework somehow to get com.SEM.Google.Generico as result?
It should be as generic as possible. So not something like
$project: {
pathString: {
$substr: ["$path.dv", 0, 23]
}
}
Is this possible at all?
Thanks.
No, there is no way to do this.
This feature has been requested two years ago, but it hasn't been implemented yet ( see the open jira issue: https://jira.mongodb.org/browse/SERVER-11947 ).
If you don't want to use $substr I guess that you should apply the regex on the query results...
var pattern = "Your Regex Pattern"
db.yourCollectionName.find( { "dv": { $regex: pattern} } )
should produce the desired results

One line check if String contains bannedSubstrings

I have a String title and a List<String> bannedSubstrings. Now I want to perform a one line check if title is free of those bannedSubstrings.
My approach:
if(bannedSubstrings.stream().filter(bannedSubstring -> title.contains(bannedSubstring)).isEmpty()){
...
}
Unfortunately, there is no isEmpty() method for streams. So how would you solve the problem? Is there a one line solution?
Sounds like you want to read up on anyMatch:
if (bannedSubstrings.stream().anyMatch(title::contains)) {
// bad words!
}
Inversely, there's also noneMatch:
if (bannedSubstrings.stream().noneMatch(title::contains)) {
// no bad words :D
}
This isn't very efficient if title is a long string (but titles usually aren't supposed to be long, I suppose).
If you want an efficient solution and you have many bannedSubstrings, I guess, it would be faster to join them into single regexp like this:
Pattern badWords = Pattern.compile(bannedSubstrings.stream().map(Pattern::quote)
.collect(Collectors.joining("|")));
Then use it like this:
if (badWords.matcher(title).find()) {
...
}
This should build a prefix tree from your substrings, so scanning will be significantly faster. If performance is not the concern in your case, use other answers.
I suppose you are looking for something like this:
if(bannedSubstrings.stream().anyMatch(title::contains)){
}
The answer you've selected is pretty good, but for real performance you'd probably be better off pre-compiling the list of bad words into a regex.
public class BannedWordChecker {
public final Pattern bannedWords;
public BannedWordChecker(Collection<String> bannedWords) {
this.bannedWords =
Pattern.compile(
bannedWords.stream()
.map(Pattern::quote)
.collect(Collectors.joining("|")));
}
public boolean containsBannedWords(String string) {
return bannedWords.matcher(string).find();
}
}

Conditional code generation using StringTemplate

In my project, I have a class Device like this:
public class Device {
private Set<String> abilities = new HashSet<String>();
public Device(Set<String> abilities) {
this.abilities = abilities;
}
public Set<String> getAbilities() {
return abilities;
}
}
I am initializing this Device class with:
Set<String> device1Abilities = new HashSet<String>();
device1Abilities.add("BadgeReader");
device1Abilities.add("TemperatureSensor");
device1Abilities.add("xyz");
Device d1 = new Device(device1Abilities);
In my stringTemplateFile, I am retrieving abilities using
$device.abilities :{ sc | abilities.add("$sc$"); }$
which will generates following code =>
abilities.add("BadgeReader");
abilities.add("TemperatureSensor");
abilities.add("xyz");
Now, my requirement is ----- I do not want to generate this line of code:
abilities.add("xyz");
What condition should I specify in
$device.abilities :{ sc | abilities.add("$sc$"); }$
so that it does not generate that line?
That computation really belongs in the model so you should do the filtering of the list that you passed to the template. The template should not figure out which data to display. It should display the data that your model says it should display. hope this helps.
See here. You are using an anonymous sub-template abilities.add("$sc$");. Instead you can use a template call with sc as parameter. And there you can test on "xyz". Though maybe someone with more StringTemplate experience knows a shorter notation.

Java Cut Links from shoutcast stream url

the string
[playlist]numberofentries=2File1=http://66.162.107.142/cpr1_K128OV.oggTitle1=KCFR NewsLength1=-1File2=http://66.162.107.141:8000/cpr1_K128OV.oggTitle2=KCFR News BackupLength2=-1Version=2
i wanna cut all of the links in this file, how to?
The following class
package regexpso;
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
Pattern p = Pattern.compile("(http:.*?.ogg)");
Matcher m = p.matcher("[playlist]numberofentries=2File1=http://66.162.107.142/cpr1_K128OV.oggTitle1=KCFR NewsLength1=-1File2=http://66.162.107.141:8000/cpr1_K128OV.oggTitle2=KCFR News BackupLength2=-1Version=2");
while (m.find()) {
System.out.println(m.group());
}
}
}
prints
http://66.162.107.142/cpr1_K128OV.ogg
http://66.162.107.141:8000/cpr1_K128OV.ogg
as result.
Use a regular expression to find and replace the URLs. Be aware this sort of thing is fraught with peril. Post an example of what you want the end result to look like for a better answer. Are all the URLs IP addresses?

How to replace tokens in a string without StringTokenizer

Given a string like so:
Hello {FIRST_NAME}, this is a personalized message for you.
Where FIRST_NAME is an arbitrary token (a key in a map passed to the method), to write a routine which would turn that string into:
Hello Jim, this is a personalized message for you.
given a map with an entry FIRST_NAME -> Jim.
It would seem that StringTokenizer is the most straight forward approach, but the Javadocs really say you should prefer to use the regex aproach. How would you do that in a regex based solution?
Thanks everyone for the answers!
Gizmo's answer was definitely out of the box, and a great solution, but unfortunately not appropriate as the format can't be limited to what the Formatter class does in this case.
Adam Paynter really got to the heart of the matter, with the right pattern.
Peter Nix and Sean Bright had a great workaround to avoid all of the complexities of the regex, but I needed to raise some errors if there were bad tokens, which that didn't do.
But in terms of both doing a regex and a reasonable replace loop, this is the answer I came up with (with a little help from Google and the existing answer, including Sean Bright's comment about how to use group(1) vs group()):
private static Pattern tokenPattern = Pattern.compile("\\{([^}]*)\\}");
public static String process(String template, Map<String, Object> params) {
StringBuffer sb = new StringBuffer();
Matcher myMatcher = tokenPattern.matcher(template);
while (myMatcher.find()) {
String field = myMatcher.group(1);
myMatcher.appendReplacement(sb, "");
sb.append(doParameter(field, params));
}
myMatcher.appendTail(sb);
return sb.toString();
}
Where doParameter gets the value out of the map and converts it to a string and throws an exception if it isn't there.
Note also I changed the pattern to find empty braces (i.e. {}), as that is an error condition explicitly checked for.
EDIT: Note that appendReplacement is not agnostic about the content of the string. Per the javadocs, it recognizes $ and backslash as a special character, so I added some escaping to handle that to the sample above. Not done in the most performance conscious way, but in my case it isn't a big enough deal to be worth attempting to micro-optimize the string creations.
Thanks to the comment from Alan M, this can be made even simpler to avoid the special character issues of appendReplacement.
Well, I would rather use String.format(), or better MessageFormat.
String.replaceAll("{FIRST_NAME}", actualName);
Check out the javadocs for it here.
Try this:
Note: The author's final solution builds upon this sample and is much more concise.
public class TokenReplacer {
private Pattern tokenPattern;
public TokenReplacer() {
tokenPattern = Pattern.compile("\\{([^}]+)\\}");
}
public String replaceTokens(String text, Map<String, String> valuesByKey) {
StringBuilder output = new StringBuilder();
Matcher tokenMatcher = tokenPattern.matcher(text);
int cursor = 0;
while (tokenMatcher.find()) {
// A token is defined as a sequence of the format "{...}".
// A key is defined as the content between the brackets.
int tokenStart = tokenMatcher.start();
int tokenEnd = tokenMatcher.end();
int keyStart = tokenMatcher.start(1);
int keyEnd = tokenMatcher.end(1);
output.append(text.substring(cursor, tokenStart));
String token = text.substring(tokenStart, tokenEnd);
String key = text.substring(keyStart, keyEnd);
if (valuesByKey.containsKey(key)) {
String value = valuesByKey.get(key);
output.append(value);
} else {
output.append(token);
}
cursor = tokenEnd;
}
output.append(text.substring(cursor));
return output.toString();
}
}
With import java.util.regex.*:
Pattern p = Pattern.compile("{([^{}]*)}");
Matcher m = p.matcher(line); // line being "Hello, {FIRST_NAME}..."
while (m.find) {
String key = m.group(1);
if (map.containsKey(key)) {
String value= map.get(key);
m.replaceFirst(value);
}
}
So, the regex is recommended because it can easily identify the places that require substitution in the string, as well as extracting the name of the key for substitution. It's much more efficient than breaking the whole string.
You'll probably want to loop with the Matcher line inside and the Pattern line outside, so you can replace all lines. The pattern never needs to be recompiled, and it's more efficient to avoid doing so unnecessarily.
The most straight forward would seem to be something along the lines of this:
public static void main(String[] args) {
String tokenString = "Hello {FIRST_NAME}, this is a personalized message for you.";
Map<String, String> tokenMap = new HashMap<String, String>();
tokenMap.put("{FIRST_NAME}", "Jim");
String transformedString = tokenString;
for (String token : tokenMap.keySet()) {
transformedString = transformedString.replace(token, tokenMap.get(token));
}
System.out.println("New String: " + transformedString);
}
It loops through all your tokens and replaces every token with what you need, and uses the standard String method for replacement, thus skipping the whole RegEx frustrations.
Depending on how ridiculously complex your string is, you could try using a more serious string templating language, like Velocity. In Velocity's case, you'd do something like this:
Velocity.init();
VelocityContext context = new VelocityContext();
context.put( "name", "Bob" );
StringWriter output = new StringWriter();
Velocity.evaluate( context, output, "",
"Hello, #name, this is a personalized message for you.");
System.out.println(output.toString());
But that is likely overkill if you only want to replace one or two values.
import java.util.HashMap;
public class ReplaceTest {
public static void main(String[] args) {
HashMap<String, String> map = new HashMap<String, String>();
map.put("FIRST_NAME", "Jim");
map.put("LAST_NAME", "Johnson");
map.put("PHONE", "410-555-1212");
String s = "Hello {FIRST_NAME} {LAST_NAME}, this is a personalized message for you.";
for (String key : map.keySet()) {
s = s.replaceAll("\\{" + key + "\\}", map.get(key));
}
System.out.println(s);
}
}
The docs mean that you should prefer writing a regex-based tokenizer, IIRC. What might work better for you is a standard regex search-replace.
Generally we'd use MessageFormat in a case like this, coupled with loading the actual message text from a ResourceBundle. This gives you the added benefit of being G10N friendly.

Categories

Resources