Getting scanner to read text file - java

I am trying to use a scanner to read a text file pulled with JFileChooser. The wordCount is working correctly, so I know it is reading. However, I cannot get it to search for instances of the user inputted word.
public static void main(String[] args) throws FileNotFoundException {
String input = JOptionPane.showInputDialog("Enter a word");
JFileChooser fileChooser = new JFileChooser();
fileChooser.showOpenDialog(null);
File fileSelection = fileChooser.getSelectedFile();
int wordCount = 0;
int inputCount = 0;
Scanner s = new Scanner (fileSelection);
while (s.hasNext()) {
String word = s.next();
if (word.equals(input)) {
inputCount++;
}
wordCount++;
}

You'll have to look for
, ; . ! ? etc.
for each word. The next() method grabs an entire string until it hits an empty space.
It will consider "hi, how are you?" as the following "hi,", "how", "are", "you?".
You can use the method indexOf(String) to find these characters. You can also use replaceAll(String regex, String replacement) to replace characters. You can individuality remove each character or you can use a Regex, but those are usually more complex to understand.
//this will remove a certain character with a blank space
word = word.replaceAll(".","");
word = word.replaceAll(",","");
word = word.replaceAll("!","");
//etc.
Read more about this method:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29
Here's a Regex example:
//NOTE: This example will not work for you. It's just a simple example for seeing a Regex.
//Removes whitespace between a word character and . or ,
String pattern = "(\\w)(\\s+)([\\.,])";
word = word.replaceAll(pattern, "$1$3");
Source:
http://www.vogella.com/articles/JavaRegularExpressions/article.html
Here is a good Regex example that may help you:
Regex for special characters in java
Parse and remove special characters in java regex
Remove all non-"word characters" from a String in Java, leaving accented characters?

if the user inputed text is different in case then you should try using equalsIgnoreCase()

in addition to blackpanthers answer you should also use trim() to account for whitespaces.as
"abc" not equal to "abc "

You should take a look at matches().
equals will not help you, since next() doesn't return the file word by word,
but rather whitespace (not comma, semicolon, etc.) separated token by token (as others mentioned).
Here the java docString#matches(java.lang.String)
...and a little example.
input = ".*" + input + ".*";
...
boolean foundWord = word.matches(input)
. is the regex wildcard and stands for any sign. .* stands for 0 or more undefined signs. So you get a match, if input is somewhere in word.

Related

Char Sequence vs Regex

txt.replaceAll("a","b");
Is "a" a Char Sequence or a Regex (or more specific Literal Search)?
And is my code correct?
I’m coding the Exercise "Normalize Text".
Task:
Only one space between words.
Only one space after comma (,), dot (.) and colon (:). First
character of word after dot is in Uppercase and other words are in
lower case.
Please correct me if I am wrong, including my English.
public class NormalizeText {
static String spacesBetweenWords(String txt){
txt = txt.replaceAll(" +", " ");
return txt;
}
/**
* - There are no spaces between comma or dot and word in front of it.
* - Only one space after comma (,), dot (.) and colon (:).
*/
static String spacesCommaDotColon(String txt) {
txt = txt.replaceAll(" +\\.", ".");
txt = txt.replaceAll(" +,", ",");
txt = txt.replaceAll(" +[:]", ":");
txt = txt.replaceAll("[.]( *)", ". ");
txt = txt.replaceAll("[,]( *)", ", ");
txt = txt.replaceAll("[:]( *)", ": ");
//txt.replaceAll("a","b");
return txt;
}
public static void main(String[] args) {
// TODO code application logic here
String txt = "\" \\\" i want to f\\\"ly\" . B.ut : I , Cant\\";
System.out.println(txt);
txt = spacesBetweenWords(txt);
System.out.println(spacesBetweenWords(txt));
System.out.println(spacesCommaDotColon(txt));
}
}
My teacher said my code is not using regex, but rather a Char Sequence.
I am very confused.
For starters because you learn how to user regex, an amazing site to learn how to use regex is this.
Now replaceAll first argument counts as regex. Just the letter "a" is a regex matching only the "a" inside the text. So what your teacher meant is probably to use a more complicated regex ( something to match multiple cases at once).
As this is an exercise I prefer not to give a solution so you will try to figure it out by yourself. The tip is try to use replaceAll only once.! Or the closer you can get to once.
As for your code if its correct. It seems good but you are missing the uppercase after the dots condition.
Also because I said try to use only one replaceAll the solution for the uppercase doesn't count as it requires an other approach.
I hope I helped and you will find a solution to the exercise and again sorry for not providing an answer to the exercise but In my opinion you need to try to figure it out on your own. You are already on a good road!
With regards to replaceAll, the docs say:
Replaces each substring of this string that matches the given regular expression with the given replacement.
An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression
       
Pattern.compile(regex).matcher(str).replaceAll(repl)
Therefore, replaceAll will always use regular expressions for its first parameter. With regards to simplifying your code,
static String spacesCommaDotColon(String txt) {
txt = txt.replaceAll(" +\\.", ".");
txt = txt.replaceAll(" +,", ",");
txt = txt.replaceAll(" +[:]", ":");
txt = txt.replaceAll("[.]( *)", ". ");
txt = txt.replaceAll("[,]( *)", ", ");
txt = txt.replaceAll("[:]( *)", ": ");
//txt.replaceAll("a","b");
return txt;
}
can be simplified to:
static String spacesCommaDotColon(String txt) {
return txt.replaceAll(" *([:,.]) *","$2 ");
}
and
static String spacesBetweenWords(String txt){
txt = txt.replaceAll(" +", " ");
return txt;
}
can be simplified to:
static String spacesBetweenWords(String txt){
return txt.replaceAll(" +", " ");
}
Your code is correct. Also, you could perform dot, comma and colon formatting with one call using capturing groups:
static String spacesCommaDotColon(String txt) {
return txt.replaceAll("\\s*([.,:])\\s*", "$1 ");
}
Explanation:
"\\s*([.,:])\\s*": look for a comma, dot or colon character with any surrounding blank character; capture said character (parenthesis captures matched text)
"$1 ": replace the matched text by the captured character (labelled as $1 since it's was caught by the first and uniq capturing parenthesis) and one space
Another solution given by TEXHIK, using look-ahead:
txt.replaceAll("(?<=[,.:])\s{2,}", "");
Which looks for any set of at least two blank character preceded by a comma, a dot or a colon and remove it. Maybe not something to see before understanding regex basis.

Quote issues when counting individual words in a large text file

I need to create code to count individual words in a .txt file. The format has to be similar to:
the - 10
text - 1
has - 5
etc.
I am experiencing an issue that I cant seem to resolve:
The text uses apostrophes for quoes, so my codes parses words like 'don't , and doesn't see 'don't as the same as don't. I don't know how to fix this.
This is the specific part of the code. I have to use regular expressions in a delimiter.
static int findAndCountWords (Scanner scanner, String[] words, int [] freqs)
{
assert (words != null)&&(freqs != null): "findAndCountWords doesn't work.";
int nr=0;
while (scanner.hasNext())
{
String word = scanner.next();
word = word.toLowerCase();
scanner.useDelimiter("[^a-z]");
//|[^a-z]+[\\'][^a-z]+
if (updateWord(word, words, freqs, nr))
nr++;
}
return nr;
}
I would trim any apostrophes from your words first.
You can do this with Apache commons:
str = StringUtils.stripStart(str,"'")
or your your matcher:
Pattern pattern = Pattern.compile("(?:^')|(?:'$)); // starts or ends with apostrophe
str = pattern.matcher(str).replaceAll(""); // not anymore
(I did not test the code, maybe some bug)

Split a sentence ignoring characters in Java

I Want to write a program that reads one line of input text and breaks it up into words.
The (solution)
words should be output one per line. A word is defined to be a sequence of letters.
Any characters in the input that are not letters should be discarded.
For example, if the user inputs the line:
He said, "That’s not a good idea."
then the output of the program should be:
He
said
That
‘s
not
a
good
idea
Simply use a regex
Pattern pattern = Pattern.compile("[\\w'’]+");
Matcher matcher = pattern.matcher("He said, \"That’s not a good idea.\"");
while (matcher.find())
System.out.println(matcher.group());
Try this:
public class Main {
public static void main(String[] args) {
Scanner stdIn = new Scanner(System.in); // user input
String line = stdIn.nextLine(); // read line
String[] words = line.split("[^a-zA-Z]+"); // split by all non-alphabetic characters (a regex)
for (String word : words) { // iterate through the words
System.out.println(word); // print word with a newline
}
}
}
It won't include the apostrophe in the token 's, but I don't know why you included that. It's not a letter, after all, and I read your first bold sentence. I hope the comments help explain how it works. There will be a trailing empty line, but that should be easy for you to fix if you really need to.

Java Regex all non word characters except whitespace

This has Probably been asked before, but i want to split a string at every non word character except the white space in java. i do not have experience with regex in general and the wiki doesn't really help.
I've tried it with this: "[\\W][^\\s]" but that did not help.
Edit: how the String is read out of the file
StringBuilder sb = new StringBuilder();
Scanner sc = new Scanner(getResources().openRawResource(R.raw.answers));
try
{
while (sc.hasNext())
{
sb.append(sc.next());
}
} finally
{
sc.close();
}
You can split using this regex:
String[] tok = input.split( "[\\W&&\\S]+" );
This will split on any non-word that is also a non-space character hence leaving aside space characters for split.
Check Character classes in Java Pattern reference.

Splitting strings based on a delimiter

I am trying to break apart a very simple collection of strings that come in the forms of
0|0
10|15
30|55
etc etc. Essentially numbers that are seperated by pipes.
When I use java's string split function with .split("|"). I get somewhat unpredictable results. white space in the first slot, sometimes the number itself isn't where I thought it should be.
Can anybody please help and give me advice on how I can use a reg exp to keep ONLY the integers?
I was asked to give the code trying to do the actual split. So allow me to do that in hopes to clarify further my problem :)
String temp = "0|0";
String splitString = temp.split("|");
results
\n
0
|
0
I am trying to get
0
0
only. Forever grateful for any help ahead of time :)
I still suggest to use split(), it skips null tokens by default. you want to get rid of non numeric characters in the string and only keep pipes and numbers, then you can easily use split() to get what you want. or you can pass multiple delimiters to split (in form of regex) and this should work:
String[] splited = yourString.split("[\\|\\s]+");
and the regex:
import java.util.regex.*;
Pattern pattern = Pattern.compile("\\d+(?=([\\|\\s\\r\\n]))");
Matcher matcher = pattern.matcher(yourString);
while (matcher.find()) {
System.out.println(matcher.group());
}
The pipe symbol is special in a regexp (it marks alternatives), you need to escape it. Depending on the java version you are using this could well explain your unpredictable results.
class t {
public static void main(String[]_)
{
String temp = "0|0";
String[] splitString = temp.split("\\|");
for (int i=0; i<splitString.length; i++)
System.out.println("splitString["+i+"] is " + splitString[i]);
}
}
outputs
splitString[0] is 0
splitString[1] is 0
Note that one backslash is the regexp escape character, but because a backslash is also the escape character in java source you need two of them to push the backslash into the regexp.
You can do replace white space for pipes and split it.
String test = "0|0 10|15 30|55";
test = test.replace(" ", "|");
String[] result = test.split("|");
Hope this helps for you..
You can use StringTokenizer.
String test = "0|0";
StringTokenizer st = new StringTokenizer(test);
int firstNumber = Integer.parseInt(st.nextToken()); //will parse out the first number
int secondNumber = Integer.parseInt(st.nextToken()); //will parse out the second number
Of course you can always nest this inside of a while loop if you have multiple strings.
Also, you need to import java.util.* for this to work.
The pipe ('|') is a special character in regular expressions. It needs to be "escaped" with a '\' character if you want to use it as a regular character, unfortunately '\' is a special character in Java so you need to do a kind of double escape maneuver e.g.
String temp = "0|0";
String[] splitStrings = temp.split("\\|");
The Guava library has a nice class Splitter which is a much more convenient alternative to String.split(). The advantages are that you can choose to split the string on specific characters (like '|'), or on specific strings, or with regexps, and you can choose what to do with the resulting parts (trim them, throw ayway empty parts etc.).
For example you can call
Iterable<String> parts = Spliter.on('|').trimResults().omitEmptyStrings().split("0|0")
This should work for you:
([0-9]+)
Considering a scenario where in we have read a line from csv or xls file in the form of string and need to separate the columns in array of string depending on delimiters.
Below is the code snippet to achieve this problem..
{ ...
....
String line = new BufferedReader(new FileReader("your file"));
String[] splittedString = StringSplitToArray(stringLine,"\"");
...
....
}
public static String[] StringSplitToArray(String stringToSplit, String delimiter)
{
StringBuffer token = new StringBuffer();
Vector tokens = new Vector();
char[] chars = stringToSplit.toCharArray();
for (int i=0; i 0) {
tokens.addElement(token.toString());
token.setLength(0);
i++;
}
} else {
token.append(chars[i]);
}
}
if (token.length() > 0) {
tokens.addElement(token.toString());
}
// convert the vector into an array
String[] preparedArray = new String[tokens.size()];
for (int i=0; i < preparedArray.length; i++) {
preparedArray[i] = (String)tokens.elementAt(i);
}
return preparedArray;
}
Above code snippet contains method call to StringSplitToArray where in the method converts the stringline into string array splitting the line depending on the delimiter specified or passed to the method. Delimiter can be comma separator(,) or double code(").
For more on this, follow this link : http://scrapillars.blogspot.in

Categories

Resources