Java Regular expression with multi variable and arrayList of string

Java Regular expression with multi variable and arrayList of string - java

I created this Java method:
public String isInTheList(List<String> listOfStrings)
{
/*
* Iterates through the list, and if the list contains the input of the user,
* it will be returned.
*/
for(String string : listOfStrings)
{
if(this.answer.matches("(?i).*" + string + ".*"))
{
return string;
}
}
return null;
}
I use this method in a while block in order to validate user input. I want to check if that input matches the concatenation of two different predefined ArrayLists of Strings.
The format of the input must be like this:
(elementOfThefirstList + " " + elementOfTheSecondList)
where the Strings elementOfThefirstList and elementOfTheSecondList are both elements from their respective list.
for(int i = 0; i < firstListOfString.size(); i++)
{
if(userInput.contains(firstListOfString.get(i) + " " + userInput.isInTheList(secondListOfString)))
{
isValid = true;//condition for exit from the while block
}
}
It work if the user input is like this:
elementOfThefirstList + " " + elementOfTheSecondList
However, it will also work if the user input is like this:
elementOfThefirstList + " " + elementOfTheSecondList + " " + anotherElementOfTheFirstList
How can I modify my regular expression, as well as my method, in order to have exactly one repetition of elements in both lists concatenated with a space between them?
I tried with another regular expression and I think that I will use this: "{1}". However, I am not able to do that with a variable.

With the information you provide as to how you are getting this issue, there is little that can be said about how to fix it. I strongly encourage you to look at this quantifiers tutorial before moving forward.
Let's look at some solutions.
For example, lets look at the line:if(this.answer.matches("(?i).*" + string + ".*"))What you are trying to do is to see if this.answer contains string, ignoring case (I doubt you need the last .*). But you are using a Greedy Quantifier to compare them. If the issue is arising due to an input error in this comparison, I would consider looking at the linked tutorial for Reluctant Quantifiers.
Okay, so it wasn't a quantifier issue. The other possible fix may be this block of code:
for(int i = 0; i < firstListOfString.size(); i++)
{
if(userInput.contains(firstListOfString.get(i) + " " + userInput.isInTheList(secondListOfString)))
{
isValid = true;//condition for exit from the while block
}
}
I don't know you you got userInput to have the containsmethod, but I assume that you used containment to call the String method. If this is the case, there could be a solution to the issue. You would only have to state that it is valid if and only if it is equal to an element from the first list and a matching element from the second string.
The final solution I have for you is simple. If there are no other spaces present within the list elements, you could split the concatenated String on a space and check how many elements the resulting array contains. If it is greater than two, then you have an invalid concatenation.
Hopefully this helps!

Related

if statement unintentionally prematurely terminated a for loop(Regex)

I'm trying to make sure that the first letters of the forename and surname strings are capital. I have some java code as follows and for the life of me I do not know why it only works on the first character in the stringbuffer and wont carry out the rest of the loop. I believe this is an error in my regex which i'm not quite clear on.
I'm 90% sure it's because of the space & colon presents in the original string.
the original string reads as
StringBuffer output = new StringBuffer(forename + ", " + surname);
Java
int length_of_names = Director.getSurname().length() + Director.getForename().length() + 2;
Pattern pattern = Pattern.compile("\\b([A-Z][a-z]*)\\b");
Matcher matcher = pattern.matcher(output.append(Director));
for(int i = 0; i < length_of_names; i++)
{
if (matcher.find() == true)
{
output.setCharAt(i, Character.toUpperCase(output.charAt(i)) );
continue;
}
}
A nice, quick 101 on regex statements and how to compose them would also be well appreciated

Disclaimer: This answer makes lots of assumptions. Purpose of answer is to show problems with code in question, which is relevant even if assumptions are wrong.
Assumptions:
Value of forename is same as returned by Director.getForename().
Value of surname is same as returned by Director.getSurname().
Value of output at time of matcher(...) call is as shown earlier.
Director.toString() is implemented as return surname + ", " + forename;. Exact implementation doesn't matter, but rest of answer assumes this implementation.
For purpose of illustration, forename = "John" and surname = "Doe".
Now, lets go thru the code and see what's going on:
StringBuffer output = new StringBuffer(forename + ", " + surname);
Value of output is now "John, Doe" (9 characters).
int length_of_names = Director.getSurname().length() + Director.getForename().length() + 2;
Value of length_of_names calculates to 9.
This could be done better using int length_of_names = output.length().
Pattern pattern = Pattern.compile("\\b([A-Z][a-z]*)\\b");
Matcher matcher = pattern.matcher(output.append(Director));
The string returned by Director.toString() ("Doe, John") is appended to output, resulting in value being "John, DoeDoe, John". That value is given to the matcher.
With that regex pattern, the matcher will find "John", and "John". It will not find "DoeDoe", since that has an uppercase letter in the middle.
Result is that find() returns true twice, and all subsequent calls will return false.
for(int i = 0; i < length_of_names; i++)
{
if (matcher.find() == true)
{
output.setCharAt(i, Character.toUpperCase(output.charAt(i)) );
continue;
}
}
Loop iterate 9 times, with values of i from 0 to 8 (inclusive).
First two iterations enters the if statement, so code will uppercase first two characters in output, resulting in value "JOhn, DoeDoe, John".
The continue statement has no effect, since loop continues anyway.
OOPS!!
That is not what code should do. So, to fix it:
Don't append Director to output.
Don't iterate 9 times. Instead, iterate until find() returns false.
Use position of the found text to locate the character to uppercase.
That makes code look like this:
StringBuffer output = new StringBuffer(forename + ", " + surname);
Pattern pattern = Pattern.compile("\\b([A-Z][a-z]*)\\b");
Matcher matcher = pattern.matcher(output);
while (matcher.find()) {
int i = matcher.start();
output.setCharAt(i, Character.toUpperCase(output.charAt(i)));
}
Of course, the code is still totally meaningless, since you matched words starting with uppercase letter, so changing the first letter to uppercase will do nothing at all.

How to remove all comments from string without affecting URL in java

I need to remove all types of comments from my string without affecting the URL defined in that string. When i tried removing comments from string using regular expression some part of the URL also removed from the string.
I tried the following regex but the same issue happening.
String sourceCode= "/*\n"
+ " * Multi-line comment\n"
+ " * Creates a new Object.\n"
+ " */\n"
+ "public Object someFunction() {\n"
+ " // single line comment\n"
+ " Object obj = new Object();\n"
+ " return obj; /* single-line comment */\n"
+ "}"
+ "\n"
+ "https://stackoverflow.com/questions/18040431/remove-comments-in-a-string";
sourceCode=sourceCode.replaceAll("//.*|/\\*((.|\\n)(?!=*/))+\\*/", "");
System.out.println(sourceCode);
but anyway the comments are removed but the out put is showing like this
public Object someFunction() {
Object obj = new Object();
return obj;
}
https:
please help me to find out a solution for this.

[^:]//.*|/\\*((.|\\n)(?!=*/))+\\*/
Changes are in first few characters - [^:]. This means that symbol before // must be not :.
I usually use regex101.com to work with regular expressions. Select python language for your case (since languages use a little bit different escaping).
This is quite complex regexp to be read by human, so another solultion may be in using several simple expressions and process incoming text in multiple passes. Like
Remove one-line comments
Remove multiline comments
Process some special cases
Note: Processing regexp costs pretty much time. So if performance is required, you should check for another solution - your own processor or third-party libraries.
EDITED
As suggested #Wiktor expression [^:]//.*|/\\*((?!=*/)(?s:.))+\\*/ is faster solution. At least 2-3 times faster.

You can split your String by "\n" and check each line. Here is the tested code:
String sourceCode= "/*\n"
+ " * Multi-line comment\n"
+ " * Creates a new Object.\n"
+ " */\n"
+ "public Object someFunction() {\n"
+ " // single line comment\n"
+ " Object obj = new Object();\n"
+ " return obj; /* single-line comment */\n"
+ "}"
+ "\n"
+ "https://stackoverflow.com/questions/18040431/remove-comments-in-a-string";
String [] parts = sourceCode.split("\n");
System.out.println(getUrlFromText(parts));
Here is the fetching method:
private static String getUrlFromText(String []parts) {
for (String part : parts) {
if(part.startsWith("http")) {
return part;
}
}
return null;
}

For more specific this EXP should be use
.*[^:]//.*|/\\*((.|\\n)(?!=*/))*\\*/
Your provided pattern was not able to remove /**/ portion of code if it is there.(If it is special requirement then its fine)
So Your EXP is like :
And it should be:
For more understanding visit and use your EXP .*[^:]\/\/.*|\/\*((.|\n)(?!=*\/))*\*\/ it will show you graph for that.

How can I check if ArrayMap.keySet() contains a certain variable + Regex?

I have an ArrayMap, of which the keys are something like tag - randomWord. I want to check if the tag part of the key matches a certain variable.
I have tried messing around with Patterns, but to no success. The only way I can get this working at this moment, is iterating through all the keys in a for loop, then splitting the key on ' - ', and getting the first value from that, to compare to my variable.
for (String s : testArray) {
if ((s.split("(\\s)(-)(\\s)(.*)")[0]).equals(variableA)) {
// Do stuff
}
}
This seems very devious to me, especially since I only need to know if the keySet contains the variable, that's all I'm interested in. I was thinking about using the contains() method, and put in (variableA + "(\\s)(-)(\\s)(.*)"), but that doesn't seem to work.
Is there a way to use the .contains() method for this case, or do I have to loop the keys manually?

You should split these tasks into two steps - first extract the tag, then compare it. Your code should look something like this:
for (String s : testArray) {
if (arrayMap. keySet().contains(extractTag(s)) {
// Do stuff
}
}
Notice that we've separated our concerns into two steps, making it easier to verify each step behaves correctly individually. So now the question is "How do we implement extractTag()?"
The ( ) symbols in a regular expression create a group match, which you can retrieve via Matcher.group() - if you only care about tag you could use a Pattern like so:
"(\\S+)\\s-\\s.*"
In which case your extractTag() method would look like:
private static final Pattern TAG_PATTERN = Pattern.compile("(\\S+)\\s-\\s.*");
private static String extractTag(String s) {
Matcher m = TAG_PATTERN.matcher(s);
if (m.matches()) {
return m.group(1);
}
throw new IllegalArgumentException(
"'" + s + "' didn't match " TAG_PATTERN.pattern());
}
If you'd rather use String.split() you just need to define a regular expression that matches the delimiter, in this case -; you could use the following regular expression in a split() call:
"\\s-\\s"
It's often a good idea to use + after \\s to support one or more spaces, but it depends on what inputs you need to process. If you know it's always exactly one-space-followed-by-one-dash-followed-by-one-space, you could just split on:
" - "
In which case your extractTag() method would look like:
private static String extractTag(String s) {
String[] parts = s.split(" - ");
if (parts.length > 1) {
return s[0];
}
throw new IllegalArgumentException("Could not extract tag from '" + s + "'");
}

Remove Punctuation issue

Im trying to find a word in a string. However, due to a period it fails to recognize one word. Im trying to remove punctuation, however it seems to have no effect. Am I missing something here? This is the line of code I am using: s.replaceAll("([a-z] +) [?:!.,;]*","$1");
String test = "This is a line about testing tests. Tests are used to examine stuff";
String key = "tests";
int counter = 0;
String[] testArray = test.toLowerCase().split(" ");
for(String s : testArray)
{
s.replaceAll("([a-z] +) [?:!.,;]*","$1");
System.out.println(s);
if(s.equals(key))
{
System.out.println(key + " FOUND");
counter++;
}
}
System.out.println(key + " has been found " + counter + " times.");
}
I managed to find a solution (though may not be ideal) through using s = s.replaceAll("\W",""); Thanks for everyones guidance on how to solve this problem.

You could also take advantage of the regex in the split operation. Try this:
String[] testArray = test.toLowerCase().split("\\W+");
This will split on apostrophe, so you may need to tweak it a bit with a specific list of characters.

Strings are immutable. You would need assign the result of replaceAll to the new String:
s = s.replaceAll("([a-z] +)*[?:!.,;]*", "$1");
^
Also your regex requires that a space exist between the word and the the punctuation. In the case of tests., this isn't true. You can adjust you regex with an optional (zero or more) character to account for this.

Your regex doesn't seem to work as you want.
If you want to find something which has period after that then this will work
([a-z]*) [?(:!.,;)*]
it returns "tests." when it's run on your given string.
Also
[?(:!.,;)*]
just points out the punctuation which will then can be replaced.
However I am not sure why you are not using substring() function.

How do you parse non-standard form function?

A standard form function like A*B+A*B' is easy to parse (spliting by + and then spliting by *). How do you parse a function, if it doesn't take a standard form?
Example: a function can take the following forms:
A*B+A(A+B')
A*B+(A+B')A
A*B+A*B(A+B)
Any ideas?
P.S: I would like to parse the function in Java.

A standard form function like A*B+A*B' is easy to parse (splitting by + and then splitting by *).
Good. Now, all that's left is to deal with those pesky parenthesis. First, we will remove them with array.split, and then we will add the necessary logic to carry out the multiplications:
Once you have split the string A(A+B')C, you will end up with an array of three strings A, A+B, and C. And notice that in this method odd-number strings are ALWAYS the ones inside the parenthesis. So all we have to do is check to see if the last and first characters of odd strings are letters (A, B, C) or operators (*,+).
String firstString = "A*B+A*B(A+B)+A*B+A*B(A+B)";
String leftOfParenthesis;
String insideParenthesis;
String rightOfParenthesis
String last;
String first;
String[] masterArray;
masterArray = str.split(firstString);
for(int i=0; i<masterArray.length; i+2){
leftOfParenthesis = masterArray[i];
insideParenthesis = masterArray[i+1];
rightParenthesis = masterArray[i+2];
last = leftOfParenthesis.substring(leftOfParenthesis.length()-1);
first = rightParenthesis.substring(0,1);
if(last.isLetter() && first.isLetter()){
leftOfParenthesis.append("*" + insideParenthesis + "*" +
last + "+last*" + insideParenthesis + "*" + first);
rightOfParenthesis[0] = last;
}
else if(last.isLetter()){
leftOfParenthesis.append("*" + insideParenthesis + "*" + last);
}
else if(first.isLetter()){
leftOfParenthesis.append("+" + first + "*" +
insideParenthesis + "*" );
}
}
That's the basic logic. There will be some issues with the rightParenthesis = masterArray[i+2]; if you run past the end of your input string and there aren't that many terms left. So you will have to add some if statements to check for that. And this isn't totally generally, if you have parenthesis inside parenthesis or more than two terms inside a pair of parenthesis, you will have to add special logic to deal with that.

Rather than trying to parse with ad hoc methods (which always ends badly), you
are better off
writing an BNF grammar for your expression forms, in all
variants
code a recursive descent parser (See
https://stackoverflow.com/a/2336769/120163)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regular expression with multi variable and arrayList of string - java

Related

if statement unintentionally prematurely terminated a for loop(Regex)

How to remove all comments from string without affecting URL in java

How can I check if ArrayMap.keySet() contains a certain variable + Regex?

Remove Punctuation issue

How do you parse non-standard form function?

Categories

Resources