Java: crawl numbers

Java: crawl numbers - java

I created a java phonebook (desktop app) , on my pc there's a program that outputs the number of the caller. it's an 8 digits number.
here's how it works
I want to crawl only 8 digits numbers from the popup, so lets say this is a popup:
My name is someone like you, i am 22 years old, i was born in 19/10/1989,
my phone number is 34544512
my brother is someone like me he is 18 years old, born in 9101993
his number is 07777666
in this example, i want to crawl 07777666 and 34544512 only.
I want to check the popup window every 2s for new numbers, if a caller calls me twice, his number will be already stored my db and if not I'll store
Note: if that's can't be done, then forget about the popup, lets say it's just a text being updated every 2 seconds, how to crawl it
This not a homework lol :D

Use Java regular expressions. Create a regex of 8 or more digits and use it. You will be able to extract these 2 phone numbers from your text sample.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String args[]) throws Exception {
String testString = "My name is someone like you, i am 22 years old, i was born in 19/10/1989,"
+ " my phone number is 34544512 3454451266"
+ " my brother is someone like me he is 18 years old, born in 9101993 "
+ " his number is 07777666";
String[] pieces = testString.split("\\s+");
String expression = "\\d{8,}";
Pattern pattern = Pattern.compile(expression);
for (int i = 0; i < pieces.length; i++) {
if (pattern.matches(expression, pieces[i]))
System.out.println(pieces[i]);
}
}
}

Haha... this is so obviously a homework exercise that you're cheating on!
Your professor probably expects you to use regular expressions. If that's over your head, then just tokenize the strings and check each token with Long.parseLong().
Of course, both of these approaches assume that the data will be exactly like your example above, and not have dashes in the phone numbers. If you need to account for dashes (or dots, spaces, etc), then the regex or manual logic gets pretty complex pretty quickly.
UPDATE: If you do need to account for phone numbers with dashes or other characters in them, I would probably:
tokenize the string,
iterate through all tokens, using regex to remove all non-numeric characters, and finally
use regex (or Long.parseLong() and String.length()) to determine whether what's left is an 8-digit number.

If you mean that you want to extract 8-digit numbers from a text String, then you can do that as follows:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex
{
public static void main(String[] args)
{
Matcher m = Pattern.compile("\\b(\\d{8})\\b").matcher(
"Hello 12345678 world 23456789");
while (m.find())
{
System.out.println(m.group(1));
}
}
}
See http://docs.oracle.com/javase/tutorial/essential/regex/

Related

JAVA Regex - How do I filter out entire strings with lengths that are greater than a certain number? [duplicate]

This question already has answers here:
Difference between matches() and find() in Java Regex
(5 answers)
Closed 3 years ago.
My program is used to filter out names starting with a capital letter [A-M] and to filter out any names with a length less than 5 and greater than 9. The code does filter out the names with a length less than 5, but when I input a name with a length greater than 9, it just cuts off the rest of the name.
Ex: Bartholomew would cut off to Bartholom rather than just not using Bartholomew.
I have tried to move the length flag to different spots in the regex field. Other than that, I do not know regex well enough to try much more. As for putting these strings into another function just to test the lengths - I am trying to make it in one regex field.
import java.io.File;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Egor {
public static void main(String args[]) throws Exception{
Scanner s = new Scanner(new File("Names.dat"));
String[] names = new String[30];
int i = 0;
while(s.hasNext()) {
names[i] = s.next();
i++;
}
String store = "";
for (String str: names) {
store = store + str + " ";
}
System.out.println(store);
Pattern checkName = Pattern.compile("([A-M][a-z]{5,9})");
Matcher matcher = checkName.matcher(store);
System.out.println("-----------------------");
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
The expected should print out names like - Ashley, Brendan, Henry would print out
The unexpected is that names like - Bartholomew, with lengths greater than 9 print out to Bartholom

You need to add a positive look behind and positive look ahead and positive look behind for the desired characters that separate your names. Based on your code it looks like that would be a start of string anchor or space, and a end of string anchor or space for the look behind and look ahead respectively. Will look something like this:
(?<=\b)([A-M][a-z]{5,9})(?=\b)
Look ahead and look behinds in regex match what is ahead of and behind, but do not include it in the matched result.

How do i write regex pattern for a String to identify numbers that precedes space or hypen?

I have a free flowing string that has some random text like below:
"Some random text 080 2668215901"
"Some ramdom text 040-1234567890"
"Some random text 0216789101112"
I need to capture the the 3 digit numbers and the following 10 digit numbers:
with space condition
with hypen condition
without any space/hypen
I am using Java.
This is what I tried to get the numbers from the free flowing text:
"\\w+([0-9]+)\\w+([0-9]+)"
I can do a string length check to see if there are any 3 digit numbers that precedes a Hypen or a space, which is then followed by a 10 digit number.But i really would like to explore if regex can give me a better solution.
Also,if there are more occurances within the String,i'd need to capture them all. I would also need to capture any 10 digit String as well,that need not precede a hypen and a space

It is usually (\d{3})[ -]?(\d{10})
With boundary conditions maybe (?<!\d)(\d{3})[ -]?(\d{10})(?!\d)

Assuming you'll run this regex on individual lines, and ignoring some of the... more expressive regex implementations, this is perhaps the simplest way:
/([0-9]{3})[ -]?([0-9]{10})/
If your text might end in numbers, you'll need to anchor the result to the end of the line like this:
/([0-9]{3})[ -]?([0-9]{10})$/
If you are guaranteed literal double quote characters around your inputs, you could instead use:
/([0-9]{3})[ -]?([0-9]{10})"$/
And if you needed to match the entire line for some input error testing, you could use:
/^"(.+)([0-9]{3})[ -]?([0-9]{10})"$/

Here is a longer demo. From your responses above you're also looking for matches with trailing chars after the match.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Class {
private static final Pattern p = Pattern.compile("" +
"((?<threeDigits>\\d{3})[- ]?)?" +
"(?<tenDigits>\\d{10})");
public static void main(String... args) {
final String input =
"Here is some text to match: Some random text 080 2668215901. " +
"We're now matching stray sets of ten digit as well: 1234567890. " +
"Notice how you get the first ten and the second ten, with the preceding three:1234123412-040-1234567890" +
"A stranger case:111222333444555666777888. Where should matches here begin and end?";
printAllMatches(p.matcher(input));
}
private static void printAllMatches(final Matcher m) {
while (m.find()) {
System.out.println("three digits: " + m.group("threeDigits"));
System.out.println("ten digits: " + m.group("tenDigits"));
}
}
}
switched to findall battleplan.

Java : Specifying Case Insensitive Search on String into an ArrayList with Regex

Following are the intended output and the original output I got from using this line of code :- ArrayList<String> nodes = new ArrayList<String>
(Arrays.asList(str.split("(?i:"+word+")"+"[.,!?:;]?")));
on the input :-
input : "Cow shouts COW! other cows shout COWABUNGA! stupid cow."
The string will be split into an ArrayList at the acceptable "cow" versions.
Original Output(from line above) :
ArrayList nodes = {, shouts , other , s shout ,ABUNGA! stupid }
vs
Intended Output :
ArrayList nodes = {, shouts , other cows shout COWABUNGA! stupid }
What I'm trying to achieve :
Case insensitive search. (ACHIEVED)
Takes into account the possibilities of these punctuations ".,:;!?" behind the word that is to be split. hence "[.,!?:;]?" (ACHIEVED)
Only splits if it finds exact word lengths + "[.,!?:;]?". It will not split at "cows" nor "COWABUNGA!" (NOT ACHIEVED, need help)
Find a possible way to add the acceptable splitting-word versions {Cow,COW!,cow.} into another arrayList for future use later in the method. (IN PROGRESS)
As you can see, I have fulfilled 1. and 2. and I am pasting this question first whilst I work on 4.. I know this issue can be solved with more extra lines but I'd like to keep it minimal and efficient.
UPDATE : I found that "{"+input.length+"}" can limit the matches down to letter length but I don't know if it'll work or not.
All help will be appreciated. I apologize if this question is too trivial but alas, I am new. Thanks in advance!

The following code produces the output you specified given your input. I have broken the regular expression down into named components, so each bit should be self-explanatory.
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
public class Moo {
public static void main(String[] args) {
String input = "Cow shouts COW! other cows shout COWABUNGA! stupid cow.";
System.out.println(splitter(input, "cow"));
}
public static List<String> splitter(String input, String word) {
String beginningOfInputOrWordBoundary = "(\\A|\\W)";
String caseInsensitiveWord = "(?i:"+Pattern.quote(word)+")";
String optionalPunctuation = "\\p{Punct}?";
String endOfInputOrWordBoundary = "(\\z|\\W)";
String regex =
beginningOfInputOrWordBoundary +
caseInsensitiveWord +
optionalPunctuation +
endOfInputOrWordBoundary;
return Arrays.asList(input.split(regex));
}
}
Resulting output:
[, shouts, other cows shout COWABUNGA! stupid]

A word is a sequence of letters. Any character that is not a letter implies the end of a word.
Thus, this should provide the desired result:
(?i:Cow)[^\\p{IsAlphabetic}]

Comparing parts of Arrays against each other?

I'm really really really not sure what is the best way to approach this. I've gotten as far as I can, but I basically want to scan a user response with an array of words and search for matches so that my AI can tell what mood someone is in based off the words they used. However, I've yet to find a clear or helpful answer. My code is pretty cluttered too because of how many different methods I've tried to use. I either need a way to compare sections of arrays to each other or portions of strings. I've found things for finding a part of an array. Like finding eggs in green eggs and ham, but I've found nothing that finds a section of an array in a section of another array.
public class MoodCompare extends Mood1 {
public static void MoodCompare(String inputMood){
int inputMoodLength = inputMood.length();
int HappyLength = Arrays.toString(Happy).length();
boolean itWorks = false;
String[] inputMoodArray = inputMood.split(" ");
if(Arrays.toString(Happy).contains(Arrays.toString(inputMoodArray)) == true)
System.out.println("Success!");
InputMood is the data the user has input that should have keywords lurking in them to their mood. Happy is an array of the class Mood1 that is being extended. This is only a small piece of the class, much less the program, but it should be all I need to make a valid comparison to complete the class.
If anyone can help me with this, you will save me hours of work. So THANK YOU!!!

Manipulating strings will be nicer when you do not use the relative primitive arrays, where you have to walk through yourself etcetera. A Dutch proverb says: not seeing the wood through the trees.
In this case it seems you check words of the input against a set of words for some mood.
Lets use java collections:
Turning an input string into a list of words:
String input = "...";
List<String> sentence = Arrays.asList(input.split("\\W+"));
sentence.remove("");
\\W+ is a sequence of one or more non-word characters. Mind "word" mean A-Za-z0-9_.
Now a mood would be a set of unique words:
Set<String> moodWords = new HashSet<>();
Collections.addAll(moodWords, "happy", "wow", "hurray", "great");
Evaluation could be:
int matches = 0;
for (String word : sentence) {
if (moodWords.contains(word)) {
++matches;
}
}
int percent = sentence.isEmpty() ? 0 : matches * 100 / sentence.size();
System.out.printf("Happiness: %d %%%n", percent);
In java 8 even compacter.
int matches = sentence.stream().filter(moodWords::contains).count();
Explanation:
The foreach-word-in-sentence takes every word. For every word it checks whether it is contained in moodWords, the set of all mood words.
The percentage is taken over the number of words in the sentence being moody. The boundary condition of an empty sentence is handled by the if-then-else expression ... ? ... : ... - an empty sentence given the arbitrary percentage 0%.
The printf format used %d for the integer, %% for the percent sign % (self-escaped) and %n for the line break character(s).

If I'm understanding your question correctly, you mean something like this?
String words[] = {"green", "eggs", "and", "ham"};
String response = "eggs or ham";
Mood mood = new Mood();
for(String foo : words)
{
if(response.contains(foo))
{
//Check if happy etc...
if(response.equals("green")
mood.sad++;
...
}
}
System.out.println("Success");
...
//CheckMood() etc... other methods.

Try to use tokens.
Every time that the program needs to compare the contents of a row from one array to the other array, just tokenize the contents in parallel and compare them.
Visit the following Java Doc page for farther reference: http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html
or even view the following web pages:
http://introcs.cs.princeton.edu/java/72regular/Tokenizer.java.html

Now, how can I screen-scrape such a html line (using java)?

I am trying to screen-scrape a html page so I can extract desired valuable data from it and into a text file. So far it's going well until I came across this within the html page:
<td> <b>In inventory</b>: 0.3 kg<br /><b>Equipped</b>: -4.5 kg
The above line in the html code for the page often varies. So it need to figure about a way to scan the line (regardless of what it contains) for the weight (in this case would be 0.3 and -4.5) and store this data into 2 seperate doubles as of such:
double inventoryWeight = 0.3 double equippedWeight = -4.5
I would like this to be done using pure java; if need be, do not hesitate to notify me of any third-party programs which can be executed within my java application to achieve this (but please vividly explain if so).
Thank you a bunch!

RegEx is usually a good solution for scraping text. Parentheses denote "capturing groups", which are stored and can then be accessed using Matcher.group(). [-.\d]+ matches anything consisting of one or more digits (0-9), periods, and hyphens. .* matches anything (but sometimes not newline characters). Here it's just used to essentially "throw away" everything you don't care about.
import java.util.regex.*;
public class Foo {
public static void main(String[] args) {
String regex = ".*inventory<\\/b>: ([-.\\d]+).*Equipped<\\/b>: ([-.\\d]+).*";
String text = "<td> <b>In inventory</b>: 0.3 kg<br /><b>Equipped</b>: -4.5 kg";
// Look for a match
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
// Get the matched text
if (matcher.matches()) {
String inventoryWeight = matcher.group(1);
String equippedWeight = matcher.group(2);
System.out.println("Inventory weight: " + inventoryWeight);
System.out.println("Equipped weight: " + equippedWeight);
} else {
System.out.println("No match!");
}
}
}

Do you have this piece of html as String? If so, just search for <b>Equipped</b>. Then get <b>Equipped</b> end char position plus one. And then build new string by appending char by char until it's not a number or dot.
When you have those numbers in String variables you simply convert them to Doubles by using double aDouble = Double.parseDouble(aString)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: crawl numbers - java

Related

JAVA Regex - How do I filter out entire strings with lengths that are greater than a certain number? [duplicate]

How do i write regex pattern for a String to identify numbers that precedes space or hypen?

Java : Specifying Case Insensitive Search on String into an ArrayList with Regex

Comparing parts of Arrays against each other?

Now, how can I screen-scrape such a html line (using java)?

Categories

Resources