I'm having trouble with implementing a method into a bit of code for an output. Essentially, I created a method called getName which assigns a persons name to a unique number. Then, I get an input file containing chat logs. After I filter out the lines I need, I need it to be able to display the person's name instead of the number. Here is a snippet of my method:
public static String getName(int id) {
// External identifiers specified
switch (id) {
case 5644:
return "Steve Jobs";
case 5640:
return "John Smith";
case 5663:
return "Johnny Appleseed";
And here is the code that takes my input, and displays the output I need:
try
{
// assigns the input file to a filereader object
BufferedReader infile = new BufferedReader(new FileReader(log));
sc = new Scanner(log);
while(sc.hasNext())
{
String line=sc.nextLine();
if(line.contains("LANTALK")){
Pattern pattern = Pattern.compile("
(\\d{2}:\\d{2}:\\d{2}\\.\\d{3})\\s\\[D\\].+<MBXID>(\\d+)
<\\/MBXID><MBXTO>(\\d+)<\\/MBXTO>.+<MSGTEXT>(.+)
<\\/MSGTEXT>", Pattern.MULTILINE + Pattern.DOTALL);
// Multiline is used to capture the LANMSG more than
once, and Dotall is used to make the '.' term in regex
also match the newline in the input
Matcher matcher = pattern.matcher(line);
while (matcher.find())
{
String output = matcher.group(1) + " [" +
matcher.group(2) + "] to [" + matcher.group(3) + "] "
+ matcher.group(4);
System.out.println(output);
System.out.println();
}
} // End of if
} // End of while
Each line of output looks like this:
14:49:28.817 [1095] to [5607] I could poke around with it a bit and see what's available.
I just need the numbers 1095 and 5607 to display the person's name that I've specified in my method. So I'm asking how I implement that into my code? Is there a special way that I need to call upon the method in order for it to recognize the numbers? Do I use some sort of regular expression or XML? Thanks for the help!
You should be using a Dictionary
Benefits of using a dictionary:
You would be able to access both the user's names and their ID numbers as key-value pairs.
They are fast.
You would eliminate the need for switch statements to keep track of user's IDs.
Links:
Associative Array (another name for dictionary)
Dictionaries, Hash-Tables and Sets
Java Dictionary example
Related
I guys, i'm trying to create an interactive constructor that take name and surname of the users as input in a scanner...all works, but for design now i want to accept only letters in name and surname, i've tryed with Pattern and Matcher, but the costructor still set number as name or surname (it tell me that is an invalid input but still set it in the User)
public User(){
System.out.println("insert name and surname");
System.out.println("Name: ");
Scanner input = new Scanner(System.in);
Pattern p = Pattern.compile("a,z - A,Z");
name = input.nextLine();
Matcher m = p.matcher(name);
if(m.matches()) {
this.setName(name);
}else{
System.out.println("invalid input");
}
System.out.println("SURNAME:");
surname= input.nextLine();
this.setSurname(surname);
p.matcher(surname);
System.out.println(Welcome);
System.out.println("--------------------------");
}
There's a lot of things going on here that aren't quite right. You are on your own for the stuff other than the regex-issue but consider the other points noted below.
the constructor should not be interactive - collect inputs and pass them to the constructor
your regex pattern is wrong so it will not match the inputs you actually want
you are reading the name into the name variable and then testing it - this is why it reports bad input but still stores it
you have no error recovery for handling bad input
write methods to do thing like build a user or get user input rather than trying to do everything in one place. Limit responsibilities and it is easier to write, debug, and maintain.
Regex
As written, your pattern will probably only match itself because the pattern is not well-defined. I think what you are trying to do with your regex is "^[a-zA-Z]+$".
The ^ starts the match at the beginning of the String and the $ ends the match at the end of the String. Together it means the input must be an exact match to the pattern (i.e. no extraneous characters).
The [a-zA-Z] defines a character class of alphabet characters.
The + indicates one or more characters of the preceding character class match.
Note that String has a convenience method for pattern-matching so you can do something like
String regex = "^[a-zA-Z]+$";
String input = ...
if (input.matches(regex)) { ...
Regarding how to create an instance of the User. Write methods to do things and let the constructor simply construct the object.
// Constructor is simple - just assign parameter arguments to members
public User(String name, String surname) {
this.name = name;
this.surname = surname;
}
// a method for constructing a User - get the user input and invoke constructor
public User buildUser(Scanner in) {
// define the regex in a constant so its not repeated
String name = promptForString("Enter name", NAME_REGEX, in);
String surname = promptForString("Enter surname", in);
return new User(name, surname);
}
// get the input, check against regex, return it if valid
public String promptForString(String prompt, String regex, Scanner in) {
for (;;) {
String input = in.readLine();
if (!input.matches(regex)) {
System.out.println("Invalid input - try again");
continue;
}
return input;
}
}
First I'd say such complex logic shouldn't be used in a constructor. Use it in our main method or any other dedicated method and construct the object from the processed results:
...
Scanner scanner = new Scanner(System.in);
Pattern p = Pattern.compile(???);
User user = new User();
String name = scanner.nextLine().trim()
if( p.matcher(name).matches() )
{
user.setName(name);
}
...
Now let's talk about why you code is probably not working correctly and that's because of Regex. The regex expression you use maybe does not what you think. With external third-party tools like Regexr you can see what your expressions do.
In your case you only want to allow character but not numbers. This can be done with the Regex [A-Za-z] Now we'd check if the string is a single letter... However names rarely consist of a single character so we've to add a quantifier operator to the regex. Now let's assume a name can range between 0 and infinite letters. In such a case the correct quantifier would be + (at least 1) which leads to our final result: Pattern p = Pattern.compile("[A-Za-z]+")
Code I used for testing:
String val = "";
Pattern pattern = Pattern.compile( "^[A-Za-z]+$" );
try ( Scanner scanner = new Scanner( System.in ) )
{
String input = scanner.nextLine().trim();
Matcher m = pattern.matcher( input );
if(m.matches())
val = input;
}
System.out.println("Value: " + val);
I have an ArrayMap, of which the keys are something like tag - randomWord. I want to check if the tag part of the key matches a certain variable.
I have tried messing around with Patterns, but to no success. The only way I can get this working at this moment, is iterating through all the keys in a for loop, then splitting the key on ' - ', and getting the first value from that, to compare to my variable.
for (String s : testArray) {
if ((s.split("(\\s)(-)(\\s)(.*)")[0]).equals(variableA)) {
// Do stuff
}
}
This seems very devious to me, especially since I only need to know if the keySet contains the variable, that's all I'm interested in. I was thinking about using the contains() method, and put in (variableA + "(\\s)(-)(\\s)(.*)"), but that doesn't seem to work.
Is there a way to use the .contains() method for this case, or do I have to loop the keys manually?
You should split these tasks into two steps - first extract the tag, then compare it. Your code should look something like this:
for (String s : testArray) {
if (arrayMap. keySet().contains(extractTag(s)) {
// Do stuff
}
}
Notice that we've separated our concerns into two steps, making it easier to verify each step behaves correctly individually. So now the question is "How do we implement extractTag()?"
The ( ) symbols in a regular expression create a group match, which you can retrieve via Matcher.group() - if you only care about tag you could use a Pattern like so:
"(\\S+)\\s-\\s.*"
In which case your extractTag() method would look like:
private static final Pattern TAG_PATTERN = Pattern.compile("(\\S+)\\s-\\s.*");
private static String extractTag(String s) {
Matcher m = TAG_PATTERN.matcher(s);
if (m.matches()) {
return m.group(1);
}
throw new IllegalArgumentException(
"'" + s + "' didn't match " TAG_PATTERN.pattern());
}
If you'd rather use String.split() you just need to define a regular expression that matches the delimiter, in this case -; you could use the following regular expression in a split() call:
"\\s-\\s"
It's often a good idea to use + after \\s to support one or more spaces, but it depends on what inputs you need to process. If you know it's always exactly one-space-followed-by-one-dash-followed-by-one-space, you could just split on:
" - "
In which case your extractTag() method would look like:
private static String extractTag(String s) {
String[] parts = s.split(" - ");
if (parts.length > 1) {
return s[0];
}
throw new IllegalArgumentException("Could not extract tag from '" + s + "'");
}
I'm trying to replace first occurence of String matching my regex, while iterating those occurences like this:
(this code is very simplified, so don't try to find some bigger sense of it)
Matcher tagsMatcher = Pattern.compile("\\{[sdf]\\}").matcher(value);
int i = 0;
while (tagsMatcher.find()) {
value = value.replaceFirst("\\{[sdf]\\}", "%" + i + "$s");
i++;
}
I'm getting IllegalArgumentException: Illegal group reference while executing replaceFirst. Why?
replacement part in replaceFirst(regex,replacement) can contain references to groups matched by regex. To do this it is using
$x syntax where x is integer representing group number,
${name} where name is name of named group (?<name>...)
Because of this ability $ is treated as special character in replacement, so if you want to make $ literal you need to
escape it with \ like replaceFirst(regex,"\\$whatever")
or let Matcher escape it for you using Matcher.quote method replaceFirst(regex,Matcher.quote("$whatever"))
BUT you shouldn't be using
value = value.replaceFirst("\\{[sdf]\\}", "%" + i + "\\$s");
inside loop because each time you do, you need to traverse entire string to find part you want to replace, so each time you need to start from beginning which is very inefficient.
Regex engine have solution for this inefficiency in form of matcher.appendReplacement(StringBuffer, replacement) and matcher.appendTail(StringBuffer).
appendReplacement method is adding to StringBuffer all data until current match, and lets you specify what should be put in place of matched by regex part
appendTail adds part which exists after last matched part
So your code should look more like
StringBuffer sb = new StringBuffer();
int i = 0;
Matcher tagsMatcher = Pattern.compile("\\{[sdf]\\}").matcher(value);
while (tagsMatcher.find()) {
tagsMatcher.appendReplacement(sb, Matcher.quoteReplacement("%" + (i++) + "$s"));
}
value = sb.toString();
You need to escape the dollar symbol.
value = value.replaceFirst("\\{[sdf]\\}", "%" + i + "\\$s");
Illegal group reference error occurs mainly because of trying to refer a group which really won't exists.
Special character $ can be handled is simple way. Check below example
public static void main(String args[]){
String test ="Other company in $ city ";
String test2 ="This is test company ";
try{
test2= test2.replaceFirst(java.util.regex.Pattern.quote("test"), Matcher.quoteReplacement(test));
System.out.println(test2);
test2= test2.replaceAll(java.util.regex.Pattern.quote("test"), Matcher.quoteReplacement(test));
System.out.println(test2);
}catch(Exception e){
e.printStackTrace();
}
}
Output:
This is Other company in $ city company
This is Other company in $ city company
I solved it by using apache commons, org.apache.commons.lang3.StringUtils.replaceOnce. This is regex safe.
I have a csv file with the following data format
123,"12.5","0.6","15/9/2012 12:11:19"
These numbers are:
order number
price
discount rate
date and time of sale
I want to extract these data from the line.
I have tried the regular expression:
String line = "123,\"12.5\",\"0.6\",\"15/9/2012 12:11:19\"";
Pattern pattern = Pattern.compile("(\\W?),\"([\\d\\.\\-]?)\",\"([\\d\\.\\-]?)\",\"([\\W\\-\\:]?)\"");
Scanner scanner = new Scanner(line);
if(scanner.hasNext(pattern)) {
...
}else{
// Alaways goes to here
}
It looks like my pattern is not correct as it always goes to the else section. What did I do wrong? Can someone suggests a solution for this?
Many thanks.
Seems a bit overcomplicated to specifically split, you should try splitting by the most obvious common delimiter between the elements, which is a comma. Perhaps you should try something like this:
final String info = "123,\"12.5\",\"0.6\",\"15/9/2012 12:11:19\"";
final String[] split = info.split(",");
final int orderNumber = Integer.parseInt(split[0]);
final double price = Double.parseDouble(split[1].replace("\"", ""));
final double discountRate = Double.parseDouble(split[2].replace("\"", ""));
final String date = split[3].replace("\"", "");
Regular expressions are very cumbersome for this type of work.
I suggest using a CSV library such as OpenCSV instead.
The library can parse the String entries into a String array and individual entries can be parsed as required. Here an OpenCSV example for the specific problem:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
int orderNumber = Integer.parseInt(nextLine[0]);
double price = Double.parseDouble(nextLine[1]);
double discountRate = Double.parseDouble(nextLine[2]);
...
}
Full documentation and examples can be found here
? in regex means "zero or one occurrence". You probably wanted to use + instead (one or more) so it could capture all the digits, points, colons, etc.
scanner.hasNext(pattern)
from documentation
Returns true if the next complete token matches the specified pattern.
but next token is 123,"12.5","0.6","15/9/2012 because scanner tokenizes words using space.
Also there are few problems with your regex
you used ? which means zero or one where you should use * - zero or more, or + - one or more,
you used \\W at start but this will also exclude numbers.
If you really want to use scanner and regex then try with
Pattern.compile("(\\d+),\"([^\"]+)\",\"([^\"]+)\",\"([^\"]+)\"");
and change used delimiter to new line mark with
scanner.useDelimiter(System.lineSeparator());
This is a possible solution to your situation:
String line = "123,\"12.5\",\"0.6\",\"15/9/2012 12:11:19\"";
Pattern pattern = Pattern.compile("([0-9]+),\\\"([0-9.]+)\\\",\\\"([0-9.]+)\\\",\\\"([0-9/:\\s]+)\\\"");
Scanner scanner = new Scanner(line);
scanner.useDelimiter("\n");
if(scanner.hasNext(pattern)) {
MatchResult result = scanner.match();
System.out.println("1st: " + result.group(1));
System.out.println("2nd: " + result.group(2));
System.out.println("3rd: " + result.group(3));
System.out.println("4th: " + result.group(4));
}else{
System.out.println("There");
}
Note that ? means 0 or 1 occurrences, meanwhile + means 1 or more.
Observe the use of 0-9 for digits. You can also use \dif you like. For spaces, you must change the delimiter of the scanner with scanner.useDelimiter("\n"), for example.
The output of this snippet is:
1st: 123
2nd: 12.5
3rd: 0.6
4th: 15/9/2012 12:11:19
I have a String that I have to parse for different keywords.
For example, I have the String:
"I will come and meet you at the 123woods"
And my keywords are
'123woods'
'woods'
I should report whenever I have a match and where. Multiple occurrences should also be accounted for.
However, for this one, I should get a match only on '123woods', not on 'woods'. This eliminates using String.contains() method. Also, I should be able to have a list/set of keywords and check at the same time for their occurrence. In this example, if I have '123woods' and 'come', I should get two occurrences. Method execution should be somewhat fast on large texts.
My idea is to use StringTokenizer but I am unsure if it will perform well. Any suggestions?
The example below is based on your comments. It uses a List of keywords, which will be searched in a given String using word boundaries. It uses StringUtils from Apache Commons Lang to build the regular expression and print the matched groups.
String text = "I will come and meet you at the woods 123woods and all the woods";
List<String> tokens = new ArrayList<String>();
tokens.add("123woods");
tokens.add("woods");
String patternString = "\\b(" + StringUtils.join(tokens, "|") + ")\\b";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
If you are looking for more performance, you could have a look at StringSearch: high-performance pattern matching algorithms in Java.
Use regex + word boundaries as others answered.
"I will come and meet you at the 123woods".matches(".*\\b123woods\\b.*");
will be true.
"I will come and meet you at the 123woods".matches(".*\\bwoods\\b.*");
will be false.
Hope this works for you:
String string = "I will come and meet you at the 123woods";
String keyword = "123woods";
Boolean found = Arrays.asList(string.split(" ")).contains(keyword);
if(found){
System.out.println("Keyword matched the string");
}
http://codigounico.blogspot.com/
How about something like Arrays.asList(String.split(" ")).contains("xx")?
See String.split() and How can I test if an array contains a certain value.
Got a way to match Exact word from String in Android:
String full = "Hello World. How are you ?";
String one = "Hell";
String two = "Hello";
String three = "are";
String four = "ar";
boolean is1 = isContainExactWord(full, one);
boolean is2 = isContainExactWord(full, two);
boolean is3 = isContainExactWord(full, three);
boolean is4 = isContainExactWord(full, four);
Log.i("Contains Result", is1+"-"+is2+"-"+is3+"-"+is4);
Result: false-true-true-false
Function for match word:
private boolean isContainExactWord(String fullString, String partWord){
String pattern = "\\b"+partWord+"\\b";
Pattern p=Pattern.compile(pattern);
Matcher m=p.matcher(fullString);
return m.find();
}
Done
public class FindTextInLine {
String match = "123woods";
String text = "I will come and meet you at the 123woods";
public void findText () {
if (text.contains(match)) {
System.out.println("Keyword matched the string" );
}
}
}
Try to match using regular expressions. Match for "\b123wood\b", \b is a word break.
The solution seems to be long accepted, but the solution could be improved, so if someone has a similar problem:
This is a classical application for multi-pattern-search-algorithms.
Java Pattern Search (with Matcher.find) is not qualified for doing that. Searching for exactly one keyword is optimized in java, searching for an or-expression uses the regex non deterministic automaton which is backtracking on mismatches. In worse case each character of the text will be processed l times (where l is the sum of the pattern lengths).
Single pattern search is better, but not qualified, too. One will have to start the whole search for every keyword pattern. In worse case each character of the text will be processed p times where p is the number of patterns.
Multi pattern search will process each character of the text exactly once. Algorithms suitable for such a search would be Aho-Corasick, Wu-Manber, or Set Backwards Oracle Matching. These could be found in libraries like Stringsearchalgorithms or byteseek.
// example with StringSearchAlgorithms
AhoCorasick stringSearch = new AhoCorasick(asList("123woods", "woods"));
CharProvider text = new StringCharProvider("I will come and meet you at the woods 123woods and all the woods", 0);
StringFinder finder = stringSearch.createFinder(text);
List<StringMatch> all = finder.findAll();
A much simpler way to do this is to use split():
String match = "123woods";
String text = "I will come and meet you at the 123woods";
String[] sentence = text.split();
for(String word: sentence)
{
if(word.equals(match))
return true;
}
return false;
This is a simpler, less elegant way to do the same thing without using tokens, etc.
You can use regular expressions.
Use Matcher and Pattern methods to get the desired output
You can also use regex matching with the \b flag (whole word boundary).
To Match "123woods" instead of "woods" , use atomic grouping in regular expresssion.
One thing to be noted is that, in a string to match "123woods" alone , it will match the first "123woods" and exits instead of searching the same string further.
\b(?>123woods|woods)\b
it searches 123woods as primary search, once it got matched it exits the search.
Looking back at the original question, we need to find some given keywords in a given sentence, count the number of occurrences and know something about where. I don't quite understand what "where" means (is it an index in the sentence?), so I'll pass that one... I'm still learning java, one step at a time, so I'll see to that one in due time :-)
It must be noticed that common sentences (as the one in the original question) can have repeated keywords, therefore the search cannot just ask if a given keyword "exists or not" and count it as 1 if it does exist. There can be more then one of the same. For example:
// Base sentence (added punctuation, to make it more interesting):
String sentence = "Say that 123 of us will come by and meet you, "
+ "say, at the woods of 123woods.";
// Split it (punctuation taken in consideration, as well):
java.util.List<String> strings =
java.util.Arrays.asList(sentence.split(" |,|\\."));
// My keywords:
java.util.ArrayList<String> keywords = new java.util.ArrayList<>();
keywords.add("123woods");
keywords.add("come");
keywords.add("you");
keywords.add("say");
By looking at it, the expected result would be 5 for "Say" + "come" + "you" + "say" + "123woods", counting "say" twice if we go lowercase. If we don't, then the count should be 4, "Say" being excluded and "say" included. Fine. My suggestion is:
// Set... ready...?
int counter = 0;
// Go!
for(String s : strings)
{
// Asking if the sentence exists in the keywords, not the other
// around, to find repeated keywords in the sentence.
Boolean found = keywords.contains(s.toLowerCase());
if(found)
{
counter ++;
System.out.println("Found: " + s);
}
}
// Statistics:
if (counter > 0)
{
System.out.println("In sentence: " + sentence + "\n"
+ "Count: " + counter);
}
And the results are:
Found: Say
Found: come
Found: you
Found: say
Found: 123woods
In sentence: Say that 123 of us will come by and meet you, say, at the woods of 123woods.
Count: 5
If you want to identify a whole word in a string and change the content of that word you can do this way. Your final string stays equals, except the word you treated. In this case "not" stays "'not'" in final string.
StringBuilder sb = new StringBuilder();
String[] splited = value.split("\\s+");
if(ArrayUtils.isNotEmpty(splited)) {
for(String valor : splited) {
sb.append(" ");
if("not".equals(valor.toLowerCase())) {
sb.append("'").append(valor).append("'");
} else {
sb.append(valor);
}
}
}
return sb.toString();