I am writing a program that reads a text file passed as an argument to the main method and extracts all the unique words from the file and prints them in the console one per line. I am having trouble passing the tokens to a string array while each line is being read from the scanner:
There's a couple things I see that are wrong or could be written in a more efficient manner:
1)tokens is initialized to 100. This an obvious constraint, I thought about using something like a dynamic array like arrayList or vector but ultimately decided to use simple string array and simply expand the array (i.e. create a new array double the size of the original array, by writing some type of conditional statement that will determine if the tokens is filled up with max elements but scanner still has more lines.
2)I am not sure if simply passing input.hasNextLine() as the test statement in the for loop makes sense. I basically want to loop as long as input has reached EOF
3) I want the regex expression in split to catch all punctuation, whitespaces, and digits, I'm not 100% sure if it's written correctly
4) The line in question is tokens[index] = token[index], I'm not sure this correct. I want the tokens from each line being to be added to tokens.
public static void main(String[] arg) throws FileNotFoundException {
File textFile = new File(arg[0]);
String[] tokens = new String[100];
try {
Scanner input = new Scanner(textFile);
for (int index = 0; input.hasNextLine(); index++) {
String[] token = input.nextLine().split("[.,;']+\\d +\\s");
tokens[index] = token[index];
}
for (String token : tokens) {
System.out.println(token);
}
input.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
There are several errors in the code, I'll try to cover all of them:
change tokens to be an ArrayList, there is no reason not to
you need two iterations: a) lines in the file and b) tokens in the line
the regex is really specific of what you have between tokens (punctuations + one digit + spaces + other space)
public static void main(String[] arg) throws FileNotFoundException {
File textFile = new File(arg[0]);
ArrayList<String> tokens = new ArrayList<String>();
try {
Scanner input = new Scanner(textFile);
while (input.hasNextLine()) {
String[] lineTokens = input.nextLine().split("[,;:\"\\.\\s]+");
for (String token : lineTokens) {
tokens.add(token);
}
}
for (String token : tokens) {
System.out.println(token);
}
input.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
The regex can be improved but it depends on your data anyway so I can't know all the cases you need to handle.
Related
Hello I am trying to loop a file in java and output only string whose year has 2000 in it.
for some reason when I do .trim().compare(year) it still returns all of the string. I have no idea why
example of string in file are
20/04/1999-303009
13/04/2000-2799
06/10/1999-123
out of these 3 for example I want to get only 13/04/2000-2799 (note the file is huge)
Here is my code I came up with so far:
public static void main(String[] args) throws IOException {
//Initiating variables
String filedir =("c://test.txt");
ArrayList<String> list = new ArrayList<String>();
String year = "2000";
try (Scanner scanner = new Scanner(new File(filedir))) {
while (scanner.hasNextLine()){
// String[] parts = scanner.next().split("-");
if (scanner.nextLine().trim().contains(year)) {
System.out.println(scanner.nextLine());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
You are using scanner.nextLine() two times. That's an error. Call it's only once per iteration and assign the result to String value for usage.
You're calling scanner.nextLine() twice, which means that once you found a matching line, you are actually printing the next one.
the problem in your code is in the while block:
while(scanner.hasNextLine()){
//This first call returns 13/04/2000-2799
if(scanner.nextLine().trim().contains(year)){//This line finds matching value
//But this line prints the next line
System.out.println(scanner.nextLine());//this call returns 06/10/1999-123
}
}
What you could do is store the value you need in a variable and if it matches the year then you print it:
while(scanner.hasNextLine()){
//You store the value
String value = scanner.nextLine().trim();
//See if it matches the year
if(value.contains(year)){
//Print it in case it matches
System.out.println(value);
}
}
Hope this helps.
I am trying to write a code that looks at a text file, and does a while loop so that it reads each line, but I need each word per line to be in an array so I can carry out some if statements. The problem I am having at the moment is that my array is currently storing all the words in the file in an array.. instead of all the words per line in array.
Here some of my current code:
public static void main(String[] args) throws IOException {
Scanner in = new Scanner(new File("test.txt"));
List<String> listwords = new ArrayList<>();
while (in.hasNext()) {
listwords.addAll(Arrays.asList(in.nextLine().split(" ")));
}
if(listwords.get(4) == null){
name = listwords.get(2);
}
else {
name = listwords.get(4);
}
If you want to have an array of strings per line, then write like this instead:
List<String[]> listwords = new ArrayList<>();
while (in.hasNext()) {
listwords.add(in.nextLine().split(" "));
}
Btw, you seem to be using the term "array" and "ArrayList" interchangeably.
That would be a mistake, they are distinct things.
If you want to have an List of strings per line, then write like this instead:
List<List<String>> listwords = new ArrayList<>();
while (in.hasNext()) {
listwords.add(Arrays.asList(in.nextLine().split(" ")));
}
You can use
listwords.addAll(Arrays.asList(in.nextLine().split("\\r?\\n")));
to split on new Lines. Then you can split these further on the Whitespace, if you want.
I am trying to see if a string in my array matches a word. If so, perform the If statement.
Sounds simple as pie but the If statement does not see the string from the array!
The array (temps) contains nothing but "non"s so I know it's something wrong with the If statment.
Here is the snippet of code:
if ("non".equals(temps.get(3))) {
System.out.println("Don't know.");
}
Where temps is the array containing "non"s on different lines.
Here is the full code in case anyone is wondering:
public class Dump {
public static void main(String[] args) throws IOException {
String token1 = "";
//Reads in presidentsUSA.txt.
Scanner inFile1 = new Scanner(new File("presidentsUSA.txt"));
//Splits .txt file via new lines.
inFile1.useDelimiter("\\n");
List<String> temps = new ArrayList<String>();
while (inFile1.hasNext()) {
token1 = inFile1.next();
temps.add(token1);
}
inFile1.close();
// Stores each new line into an array called temps.
String[] tempsArray = temps.toArray(new String[0]);
if ("non".equals(temps.get(3))) {
System.out.println("Don't know.");
}
}
}
The most probable explanation why your if statement returns false is that you are on a Windows OS.
If you debug your code and watch temps.get(3) it will show the content as
non\r
and non\r does not equal non
Solution:
inFile1.useDelimiter("\\r\\n");
Check the accepted answere here (especially point 4): Difference between \n and \r?
I do not know how to take the integer and ignore the strings from the file using scanner. This is what I have so far. I need to know how to read the file token by token. Yes, this is a homework problem. Thank you so much.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ClientMergeAndSort{
public static void main(String[] args){
int length = 13;
try{
Scanner input = new Scanner(System.in);
System.out.print("Enter the file name with extention : ");
File file = new File(input.nextLine());
input = new Scanner(file);
while (!input.hasNextInt()) {
input.next();
}
int[] arraylist = new int[length];
for(int i =0; i < length; i++){
length++;
arraylist[i] = input.nextInt();
System.out.print(arraylist[i] + " ");
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Take a look at the API for what you're doing.
http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html#hasNextInt()
Specifically, Scanner.hasNextInt().
"Returns true if the next token in this scanner's input can be interpreted as an int value in the default radix using the nextInt() method. The scanner does not advance past any input."
So, your code:
while (!input.hasNextInt()) {
input.next();
}
That's going to look and see if input hasNextInt().
So if the next token - one character - is an int, it's false, and skips that loop.
If the next token isn't an int, it goes into the loop... and iterates to the next character.
That's going to either:
- find the first number in the input, and stop.
- go to the end of the input, not find any numbers, and probably hits an IllegalStateException when you try to keep going.
Write down in words what you want to do here.
Use the API docs to figure out how the hell to tell the computer that. :) Get one bit at a time right; this has several different parts, and the first one doesn't work yet.
Example: just get it to read a file, and display each line first. That lets you do debugging; it lets you build one thing at a time, and once you know that thing works, you build one more part on it.
Read the file first. Then display it as you read it, so you know it works.
Then worry about if it has numbers or not.
A easy way to do this is read all the data from file in a way that you prefer (line by line for example) and if you need to take tokens, you can use split function (String.split see Java doc) or StringTokenizer for each line of String that you are reading using a loop, in order to create tokens with a specific delimiter (a space for example) so now you have the tokens and you can do something that you need with them, hope you can resolve, if you have question you can ask.
Have a nice programming.
import static java.nio.file.Files.readAllBytes;
import static java.nio.file.Paths.get;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String args[]) throws IOException {
String newStr=new String(readAllBytes(get("data.txt")));
Pattern p = Pattern.compile("-?\\d+");
Matcher m = p.matcher(newStr);
while (m.find()) {
System.out.println("- "+m.group());
}
}
}
This code fill read the file and then using the regular expression you can get only Integer values.
Note: This code works in Java 8
I Think This will work for you requirement.
Before reading the data from the file initially,try to write some content to the file by using scanner and filewriter then try to execute the below code snippet.
File file = new File(your filepath);
List<Integer> list = new ArrayList<Integer>();
try {
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
String str =null;
while(true) {
str = bufferedReader.readLine();
if(str!=null) {
System.out.println(str);
char[] chars = str.toCharArray();
String finalInt = "";
for(int i=0;i<chars.length;i++) {
if(Character.isDigit(chars[i])) {
finalInt=finalInt+chars[i];
}
}
list.add(Integer.parseInt(finalInt));
System.out.println(list.size());
System.out.println(list);
} else {
break;
}
}
}catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
The final println statement will display all the integer in your file line by line.
Thanks
I want to read input from a file using a scanner, but I want the scanner to ignore everything inside (* ....... *). How do I do this? I'm taking integers and adding them to an array list, but if there are integers inside the text I want to ignore it adds those too.
public ArrayList<Integer> readNumbers(Scanner sc)
{
// TODO Implement readNumbers
ArrayList<Integer> list = new ArrayList<Integer>();
while(sc.hasNext())
{
try
{
String temp = sc.next();
list.add(Integer.parseInt(temp));
}
catch(Exception e)
{
}
}
return list;
}
Here's an example line of the text file
(* 21 Alabama Population in 2013 *) 4802740
I would add 21 and 4802740 to my array list.
I thought about using
sc.usedelimiter("(");
sc.usedelimiter(")");
But I just can't seem to get it to work.
Thanks!
It seems that you may be looking for something like
sc.useDelimiter("\\(\\*[^*]*\\*\\)|\\s+");
This regular expression \\(\\*[^*]*\\*\\) represents part which
\\(\\* - starts with (*,
\\*\\) - ends with *)
[^*]* - and have zero or more non * characters inside.
I also added |\\s+ to allow one or more spaces be delimiter (this delimiter is used by scanners by default).
BTW using try-catch as main part of control flow is generally considered as wrong. Instead you should change your code to something like
while (sc.hasNext()) {
if(sc.hasNextInt()) {
list.add(sc.nextInt());
} else {
//consume data you are not interested in
//so Scanner could move on to next tokens
sc.next();
}
}
Skip the "(* string *)" before reading next int:
try
{
try {
sc.skip("\\s*\\(\\*[^*]*\\*\\)");
} catch (NoSuchElementException e) {
}
String temp = sc.next();
list.add(Integer.parseInt(temp));
} catch (Exception e) {
}