I'm currently working on something where the code inputs about thousands of lines of strings. Each line must follow a specific format like the following:
"Name,#,#,#,#,#,#"
Where 'name' is the name of a movie (we can assume the name won't have any numbers), and # is any number from 0-10. Each value MUST be separated by a comma.
My code is the following:
if (line.matches(".*[a-zA-z].*,([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)")) {
System.out.println("no");
}
else {
System.out.println(line);
The issue is that the title of the film can't have commas in it. If it does, it needs to be printed. However, my 'matches()' doesn't seem to pick up lines that have a comma in the title. It seems to me that my code specifically outlines that if the next entry (separated by a comma) is not an integer, then it does not match, and therefore the 'line' needs to be printed.
Can anyone see where I'm going wrong in this?
You are saying that rules are:
Lines must be 7 comma-separated values: a name and 6 numbers in range 0-10.
The name must not contain a comma.
We can assume the name won't have any numbers, but it is not a requirement that it cannot.
Since the only invalid character in a name is a comma, so regex would be:
[^,]*,(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10)
If you want to capture the fields, you would use this code:
Pattern p = Pattern.compile("([^,]*),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)");
for (String line : lines) {
Matcher m = p.matcher(line);
if (! m.matches()) {
System.out.println("Invalid line: " + line);
} else {
System.out.println("Name: " + m.group(1));
System.out.println(" Values: " + m.group(2)
+ " " + m.group(3)
+ " " + m.group(4)
+ " " + m.group(5)
+ " " + m.group(6)
+ " " + m.group(7));
}
}
Test
String[] lines = { "Buffalo Bill and the Indians, or Sitting Bull's History Lesson,0,1,2,3,4,5",
"Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb,6,7,8,9,10,0",
"300,1,2,3,4,5,6"};
Output
Invalid line: Buffalo Bill and the Indians, or Sitting Bull's History Lesson,0,1,2,3,4,5
Name: Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb
Values: 6 7 8 9 10 0
Name: 300
Values: 1 2 3 4 5 6
First movie name has a comma, so it doesn't match.
Second movie name has special characters (. and :), but no comma, so it matches.
Third movie name is "300", which is an actual movie, so it matches.
The problem lies within with the .*. This part is able to include the comma.
Fri,dayaervsere,6,4,78,7
<--><--------->^
.* [a-zA-Z] ,( [...]
So, basically you only need to get rid of the .*. Instead, apply a quantifier to your first group:
[a-zA-Z]* // to match any number of characters
or
[a-zA-Z]+ // to match at least one character
If you do use regex to solve this, I'd recommend allowing commas in the 'Name' part of your regex. Focus on making sure there are 6 numbers, each following a comma. You can check to see if the name fits an appropriate criteria later.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
// before your for-loop, create a pattern (Assuming no digits in title)
Pattern p = Pattern.compile("([^0-9]+),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)");
// ...
// later on in your actual for-loop for each line.
Matcher m = p.matcher(line);
if (m.matches())
{
String title = m.group(1);
// do extra checking for the title if needed
}
else
{
// print no
}
The following regex supposed to solve your problem:
^([a-zA-Z ]+),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)
Or the shorter version of it, with no code duplication:
^([a-zA-Z ]+)(,([0-9]|10)){6}
Testing
"The Killer,6,7,3,6,8,1" matches the pattern.
"The Kill,er,6,7,3,6,8,1" doesn't match the pattern, as you wanted.
Also, spaces in the title are supported.
You can play with it here.
I am trying to write a regex that will match urls inside strings of text that may be html-encoded. I am having a considerable amount of trouble with lookaround though. I need something that would correctly match both links in the string below:
some text "http://www.notarealwebsite.com/?q=asdf&searchOrder=1" "http://www.notarealwebsite.com" some other text
A verbose description of what I want would be: "http://" followed by any number of characters that are not spaces, quotes, or the string ""[semicolon]" (I don't care about accepting other non-url-safe characters as delimiters)
I have tried a few regexes using lookahead to check for &'s followed by q's followed by u's and so on, but as soon as I put one into the [^...] negation it just completely breaks down and evaluates more like: "http:// followed by any number of characters that are not spaces, quotes, ampersands, q's, u's, o's, t's, or semicolons" which is obviously not what I am looking for.
This will correctly match the &'s at the beginning of the "[semicolon]:
&(?=q(?=u(?=o(?=t(?=;)))))
But this does not work:
http://[^ "&(?=q(?=u(?=o(?=t(?=;)))))]*
I know just enough about regexes to get into trouble, and that includes not knowing why this won't work the way I want it to. I understand to some extent positive and negative lookaround, but I don't understand why it breaks down inside the [^...]. Is it possible to do this with regexes? Or am I wasting my time trying to make it work?
If your regex implementation supports it, use a positive look ahead and a backreference with a non-greedy expression in the body.
Here is one with your conditions: (["\s]|")(http://.*?)(?=\1)
For example, in Python:
import re
p = re.compile(r'(["\s]|")(https?://.*?)(?=\1)', re.IGNORECASE)
url = "http://test.url/here.php?var1=val&var2=val2"
formatstr = 'text "{0}" more text {0} and more "{0}" test greed"'
data = formatstr.format(url)
for m in p.finditer(data):
print "Found:", m.group(2)
Produces:
Found: http://test.url/here.php?var1=val&var2=val2
Found: http://test.url/here.php?var1=val&var2=val2
Found: http://test.url/here.php?var1=val&var2=val2
Or in Java:
#Test
public void testRegex() {
Pattern p = Pattern.compile("([\"\\s]|")(https?://.*?)(?=\\1)",
Pattern.CASE_INSENSITIVE);
final String URL = "http://test.url/here.php?var1=val&var2=val2";
final String INPUT = "some text " + URL + " more text + \"" + URL +
"\" more then "" + URL + "" testing greed "";
Matcher m = p.matcher(INPUT);
while( m.find() ) {
System.out.println("Found: " + m.group(2));
}
}
Produces the same output.
I want to find every instance of a number, followed by a comma (no space), followed by any number of characters in a string. I was able to get a regex to find all the instances of what I was looking for, but I want to print them individually rather than all together. I'm new to regex in general, so maybe my pattern is wrong?
This is my code:
String test = "1 2,A 3,B 4,23";
Pattern p = Pattern.compile("\\d+,.+");
Matcher m = p.matcher(test);
while(m.find()) {
System.out.println("found: " + m.group());
}
This is what it prints:
found: 2,A 3,B 4,23
This is what I want it to print:
found: 2,A
found: 3,B
found: 4,23
Thanks in advance!
try this regex
Pattern p = Pattern.compile("\\d+,.+?(?= |$)");
You could take an easier route and split by space, then ignore anything without a comma:
String values = test.split(' ');
for (String value : values) {
if (value.contains(",") {
System.out.println("found: " + value);
}
}
What you apparently left out of your requirements statement is where "any number of characters" is supposed to end. As it stands, it ends at the end of the string; from your sample output, it seems you want it to end at the first space.
Try this pattern: "\\d+,[^\\s]*"
I have a string and I want to remove the string Input! + word + digits and Calc! + word + digits from it. I have also included my attempt.
Input : IF(Input!B34 + Calc!B45)
Output : Input!B34 Calc!B45
My attempt :
Pattern findMyPattern = Pattern.compile("Input!\\w\\d|" + worksheetName+ "!.+?");
Matcher foundAMatch = findMyPattern.matcher(input);
HashSet hashSet = new HashSet();
while (foundAMatch.find()) {
String s = foundAMatch.group(0);
hashSet.add(s);
}
What regular expression should I use ? I tried using a few of them. But I am not expert in them. Some idea will be useful.
You can use this regex:
"(?:Input|Calc)![a-zA-Z]\\d+"
Explanation:
(?:Input|Calc) // Match `Input or Calc`
! // Followed by !
[a-zA-Z] // Followed by an alphabetical character
\\d+ // Then digits.
And use it with Matcher#find and then add matcher.group() to your Set.
I have a sort of a problem with this code:
String[] paragraph;
if(paragraph[searchKeyword_counter].matches("(.*)(\\b)"+"is"+"(\\b)(.*)")){
if i am not mistaken to use .matches() and search a particular character in a string i need a .* but what i want to happen is to search a character without matching it to another word.
For example is the keyword i am going to search I do not want it to match with words that contain is character like ship, his, this. so i used \b for boundary but the code above is not working for me.
Example:
String[] Content= {"is,","his","fish","ish","its","is"};
String keyword = "is";
for(int i=0;i<Content.length;i++){
if(content[i].matches("(.*)(\\b)"+keyword+"(\\b)(.*)")){
System.out.println("There are "+i+" is.");
}
}
What i want to happen here is that it will only match with is is, but not with his fish. So is should match with is, and is meaning I want it to match even the character is beside a non-alphanumerical character and spaces.
What is the problem with the code above?
what if one of the content has a uppercase character example IS and it is compared with is, it will be unmatched. Correct my if i am wrong. How to match a lower cased character to a upper cased character without changing the content of the source?
String string = "...";
String word = "is";
Pattern p = Pattern.compile("\\b" + Pattern.quote(word) + "\\b");
Matcher m = p.matcher(string);
if (m.find()) {
...
}
just add spaces like this:
suppose message equal your content string and pattern is your keyword
if ((message).matches(".* " + pattern + " .*")||(message).matches("^" + pattern + " .*")
||(message).matches(".* " + pattern + "$")) {