Regex find optional entries in Java

Regex find optional entries in Java - java

I'm currently working on some stuff with regex and struggel alot with regex latetly.
I wanted to build some script engine, for that I need to load some presets:
example:
create <Type> [after;before;at;between(2);<Integer>, <DateTime>, <Date>, <Time>, <String>] : Creator
edit <Type> [after;before;at;between(2);<Integer>, <DateTime>, <Date>, <Time>, <String>]
run [<File>, <Command>]
So I want to make sure I can read <Type> [after;before;at;between(2);<Integer>, <DateTime>, <Date>, <Time>, <String>] and [<File>, <Command>].
For the understanding:
NAME <IMPORTANT_PARAMETER> [TEXT_PARAMETER(AMOUNT_OF_OPTIONAL_PAREMETER);<OPTIONAL_PARAMETER(S)>].
In this example I used 'command names' as IMPORTANT_PARAMETER.
For the first rule I made this regex: \<(\w+)\>(?:\s+\[(?:(.*;))(.*)\])?(?:\s+\:\s+(\w+))? and it kinda works within my code:
Pattern p = Pattern.compile("\\<(\\w+)\\>(?:\\s+\\[(?:(.*;))(.*)\\])?(?:\\s+\\:\\s+(\\w+))?");
Matcher m = p.matcher(parameters);
if(m.matches()){
Command command2 = new Command(command);
command2.addParameter(new Parameter(m.group(1)));
String text = m.group(2);
String[] texts = null;
if(text != null){
texts = text.split(";");
command2.addTexts(Arrays.asList(texts));
}
String type = m.group(3);
String[] types = null;
if(type != null){
types = type.split(", ");
for (String string : types) {
Pattern pTypes = Pattern.compile("\\<(?:(\\w+))\\>");
Matcher mTypes = pTypes.matcher(string);
if(mTypes.matches()){
command2.addParameter(new Parameter(mTypes.group(1), true));
}
}
}
String className = m.group(4);
if(className != null){
command2.addClassName(className);
}
commandslist.add(command2);
}
I tried to use \[\<(\w+)\>(?:,\s+\<(\w+)\>)+\] but it only worked for two entries -> example run [<File>, <Command>]. It would be better having a "list" of those optional elements [<File>, <Command>]. So in the end I want to have m.group(1) = File; m.group(2) = Command; m.group(3) = blablabla; and so on.
I hope I could show you my problem good enough, hit me with questions if there is anything more to explain.
Here is a link to the regexr: REGEXR or regex101: REGEX101
Thanks for helping :)

My suggestion is to match the stuff between the words you are after:
public static void main (String[] args) {
final String STR1 = "run [<File>, <Command1>, <Command2>, <Command3>]";
final String STR2 = "run [<File>, <Command1>, <Command2>, <Command3>, <Command4>]";
System.out.println(parse(STR1));
System.out.println(parse(STR2));
}
private static List parse(String str) {
List<String> list = new ArrayList<>();
Pattern p = Pattern.compile("(?:\\G,\\s+|^run\\s+\\[(?:<\\w+>,\\s+)+?)<(\\w+)>");
Matcher m = p.matcher(str);
while (m.find()) {
list.add(m.group(1));
}
return list;
}
which results in the output:
[Command1, Command2, Command3]
[Command1, Command2, Command3, Command4]

Related

string break for java

I am coding in java and I have string like this [A⋈ (B⋈C)]⋈[D⋈ (E⋈F)] I want to split it in a way that I get (B⋈C) in different sub string and (E⋈F) in different string.How can I do that?
I try to do it by regex and string split but it do not work for me.
String[] items = str.split(Pattern.quote("(?=-)"));
ArrayList<String> itemList = new ArrayList<String>();
for (String item : items)
{
itemList.add(item);
}
System.out.println(itemList);

You can this Regex: "\\([A-Za-z⋈]*\\)"
String mData = "[A⋈ (B⋈C)]⋈[D⋈ (E⋈F)] ";
Pattern pattern = Pattern.compile("\\([A-Za-z⋈]*\\)");
Matcher m = pattern.matcher(mData);
while (m.find()) {
System.out.println(mData.substring(m.start(), m.end()));
}

I think it should work : Try this regex "(.{3})"

Uppercase all characters but not those in quoted strings

I have a String and I would like to uppercase everything that is not quoted.
Example:
My name is 'Angela'
Result:
MY NAME IS 'Angela'
Currently, I am matching every quoted string then looping and concatenating to get the result.
Is it possible to achieve this in one regex expression maybe using replace?

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
String format = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());
System.out.println(input);
System.out.println("----------------------");
System.out.println(format);
Input: 's'Hello This is 'Java' Not '.NET'
Output: 's'HELLO THIS IS 'Java' NOT '.NET'

You could use a regular expression like this:
([^'"]+)(['"]+[^'"]+['"]+)(.*)
# match and capture everything up to a single or double quote (but not including)
# match and capture a quoted string
# match and capture any rest which might or might not be there.
This will only work with one quoted string, obviously. See a working demo here.

Ok. This will do it for you.. Not efficient, but will work for all cases. I actually don't suggest this solution as it will be too slow.
public static void main(String[] args) {
String s = "'Peter' said, My name is 'Angela' and I will not change my name to 'Pamela'.";
Pattern p = Pattern.compile("('\\w+')");
Matcher m = p.matcher(s);
List<String> quotedStrings = new ArrayList<>();
while(m.find()) {
quotedStrings.add(m.group(1));
}
s=s.toUpperCase();
// System.out.println(s);
for (String str : quotedStrings)
s= s.replaceAll("(?i)"+str, str);
System.out.println(s);
}
O/P :
'Peter' SAID, MY NAME IS 'Angela' AND I WILL NOT CHANGE MY NAME TO 'Pamela'.

Adding to the answer by #jan_kiran, we need to call the
appendTail()
method appendTail(). Updated code is:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
regexMatcher.appendTail(sb);
String formatted_string = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());

I did not find my luck with these solutions, as they seemed to remove trailing non-quoted text.
This code works for me, and treats both ' and " by remembering the last opening quotation mark type. Replace toLowerCase appropriately, of course...
Maybe this is extremely slow; I don't know:
private static String toLowercaseExceptInQuotes(String line) {
StringBuffer sb = new StringBuffer(line);
boolean nowInQuotes = false;
char lastQuoteType = 0;
for (int i = 0; i < sb.length(); ++i) {
char cchar = sb.charAt(i);
if (cchar == '"' || cchar == '\''){
if (!nowInQuotes) {
nowInQuotes = true;
lastQuoteType = cchar;
}
else {
if (lastQuoteType == cchar) {
nowInQuotes = false;
}
}
}
else if (!nowInQuotes) {
sb.setCharAt(i, Character.toLowerCase(sb.charAt(i)));
}
}
return sb.toString();
}

Why is Java placing the string before the word and not after?

from the String value want to getting word before and after the <in>
String ref = "application<in>rid and test<in>efd";
int result = ref.indexOf("<in>");
int result1 = ref.lastIndexOf("<in>");
String firstWord = ref.substring(0, result);
String[] wor = ref.split("<in>");
for (int i = 0; i < wor.length; i++) {
System.out.println(wor[i]);
}
}
my Expected Output
String[] output ={application,rid,test,efd}
i tried with 2 Option first one IndexOf but if the String have more than two <in>i 'm not getting my expected output
Second One splitits also not getting with my expected Output
please suggest best option to getting the word(before and after <in>)

You could use an expression like so: \b([^ ]+?)<in>([^ ]+?)\b (example here). This should match the string prior and after the <in> tag and place them in two groups.
Thus, given this:
String ref = "application<in>rid and test<in>efd";
Pattern p = Pattern.compile("\\b([^ ]+?)<in>([^ ]+?)\\b");
Matcher m = p.matcher(ref);
while(m.find())
System.out.println("Prior: " + m.group(1) + " After: " + m.group(2));
Yields:
Prior: application After: rid
Prior: test After: efd
Alternatively using split:
String[] phrases = ref.split("\\s+");
for(String s : phrases)
if(s.contains("<in>"))
{
String[] split = s.split("<in>");
for(String t : split)
System.out.println(t);
}
Yields:
application
rid
test
efd

Regex is your friend :)
public static void main(String args[]) throws Exception {
String ref = "application<in>rid and test<in>efd";
Pattern p = Pattern.compile("\\w+(?=<in>)|(?<=<in>)\\w+");
Matcher m = p.matcher(ref);
while (m.find()) {
System.out.println(m.group());
}
}
O/P :
application
rid
test
efd

No doubt matching what you need using Pattern/Matcher API is simpler for tis problem.
However if you're looking for a short and quick String#split solution then you can consider:
String ref = "application<in>rid and test<in>efd";
String[] toks = ref.split("<in>|\\s+.*?(?=\\b\\w+<in>)");
Output:
application
rid
test
efd
RegEx Demo
This regex splits on <in> or a pattern that matches a space followed by 0 more chars followed by a word and <in>.

You can also try the below code, it is quite simple
class StringReplace1
{
public static void main(String args[])
{
String ref = "application<in>rid and test<in>efd";
System.out.println((ref.replaceAll("<in>", " ")).replaceAll(" and "," "));
}
}

getting all pid's from pstree

I am trying to get all pid's from pstree -pA <PID> output in linux.
I am working in java and thought about doing it with regular expression.
I attached an example output below:
eclipse(45905)---java(45906)-+-{java}(45907)
|-{java}(45908)
|-{java}(45909)
|-{java}(45910)
|-{java}(45911)
I have written the following code:
private static Pattern PATTERN = Pattern.compile("\\d+");
static List<String> getPidsFromOutput(String output) {
List<String> $ = Lists.newArrayList();
List<String> list = Splitter.on(CharMatcher.anyOf("()\n")).splitToList(output);
for (String string : list) {
Matcher matcher = PATTERN.matcher(string);
if (matcher.matches()) {
$.add(string);
}
}
return $ ;
}
The problem is with processes that their name (ie: the executed file) is a number. it will catch them also and this is buggy.
Do you have any suggestion to fix that? or any other solution?

you should make sure the pid is surrounded by braces,
in addition your code catches threads as well, to avoid them you should ignore the process that has {} around its name.
private static Pattern PATTERN = Pattern.compile(".*[^}]\\((\\d+)\\).*");
private Integer pid;
static Set<String> getPidsFromOutput(String output) {
Set<String> $ = Sets.newHashSet();
List<String> list = Splitter.on(CharMatcher.anyOf("\n")).splitToList(output);
for (String line : list) {
List<String> perProcess = Splitter.on(CharMatcher.anyOf("-")).splitToList(line);
for (String p : perProcess) {
Matcher matcher = PATTERN.matcher(p);
if (matcher.matches()) {
$.add(matcher.group(1));
}
}
}
log.info("pids from pstree: " + $);
return $;
}

Look for numbers that are surrounded by braces
\((\d+)\)
since process names are surrounded by curly braces it will only get the PID

Java. Regular Expression. How Parse?

As input parameters I can have two types of String:
codeName=SomeCodeName&codeValue=SomeCodeValue
or
codeName=SomeCodeName
without codeValue.
codeName and codeValue are the keys.
How can I use regular expression to return the key's values? In this example it would return only SomeCodeName and SomeCodeValue.

I wouldn't bother with a regex for that. String.split with simple tokens ('&', '=') will do the job.
String[] args = inputParams.split("&");
for (String arg: args) {
String[] split = arg.split("=");
String name = split[0];
String value = split[1];
}

Consider using Guava's Splitter
String myinput = "...";
Map<String, String> mappedValues =
Splitter.on("&")
.withKeyValueSeparator("=")
.split(myinput);

The simple way is to split the source string first and then to run 2 separate regular expressions against 2 parts.
Pattern pCodeName = Pattern.compile("codeName=(.*)");
Pattern pCodeValue = Pattern.compile("codeValue=(.*)");
String[] parts = str.split("\\&");
Matcher m = pCodeName.matcher(parts[0]);
String codeName = m.find() ? m.group(1) : null;
String codeValue = null;
if (parts.length > 1) {
m = pCodeValue.matcher(parts[1]);
codeValue = m.find() ? m.group(1) : null;
}
}
But if you want you can also say:
Pattern p = Pattern.compile("codeName=(\\w+)(\\&codeValue=(\\w+))?");
Matcher m = p.matcher(str);
String codeName = null;
String codeValue = null;
if (m.find()) {
codeName = m.group(1);
codeValue = m.groupCount() > 1 ? m.group(2) : null;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex find optional entries in Java - java

Related

string break for java

Uppercase all characters but not those in quoted strings

Why is Java placing the string before the word and not after?

getting all pid's from pstree

Java. Regular Expression. How Parse?

Categories

Resources