Read specific data from a .txt file JAVA - java

I have a problem. I'm trying to read a large .txt file, but I don't need every piece of data that's inside.
My .txt file looks something like this:
8000000 abcdefg hijklmn word word letter
I only need, let's say, the number and the first two text positions: "abcdefg" and "hijklmn" and write it to another file after that. I don't know how to read and write just the data that I need.
Here is my code so far:
BufferedReader br = new BufferedReader(new FileReader("position2.txt"));
BufferedWriter bw = new BufferedWriter(new FileWriter("position.txt"));
String line;
while ((line = br.readLine())!= null){
if(line.isEmpty() || line.trim().equals("") || line.trim().equals("\n")){
continue;
}else{
//bw.write(line + "\n");
String[] data = line.split(" ");
bw.write(data[0] + " " + data[1] + " " + data[2] + "\n");
}
}
br.close();
bw.close();
}
Can you give me some sugestions ?
Thanks in advance
UPDATE:
My .txt files are a bit weird. Using the code above works great when there is only one single " " between them. My files can have a \t or more spaces, or a \t and some spaces between the words. Ho can I proceed now ?

Depending on the complexity of you data, you have a few options.
If the lines are simple space-separated values like shown, the simplest is to split the text, and write the values you want to keep to the new file:
try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
String line;
while ((line = br.readLine()) != null) {
String[] values = line.split(" ");
if (values.length >= 3)
bw.write(values[0] + ' ' + values[1] + ' ' + values[2] + '\n');
}
}
If the values might be more complex, you could use a regular expression:
Pattern p = Pattern.compile("^(\\d+ \\w+ \\w+)");
try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
String line;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if (m.find())
bw.write(m.group(1) + '\n');
}
}
This ensures that first value is digits only, and second and third values are word-characters only (a-z A-Z _ 0-9).

Assuming all lines of your text file follow the structure you described then you could do this:
Replace FILE_PATH with your actual file path.
public static void main(String[] args) {
try {
Scanner reader = new Scanner(new File("FILE_PATH/myfile.txt"));
PrintWriter writer = new PrintWriter(new File("FILE_PATH/myfile2.txt"));
while (reader.hasNextLine()) {
String line = reader.nextLine();
String[] tokens = line.split(" ");
writer.println(tokens[0] + ", " + tokens[1] + ", " + tokens[2]);
}
writer.close();
reader.close();
} catch (FileNotFoundException ex) {
System.out.println("Error: " + ex.getMessage());
}
}
You'll get something like:
word0, word1, word2

If your files are really huge (above 50-100 MB maybe GBs) and you are sure that the first word is a number and you need two words after that I would suggest you to read one line and iterate through that string. Stop when you find 3rd space.
String str = readLine();
int num_spaces = 0, cnt = 0;
String arr[] = new String[3];
while(num_spaces < 3){
if(str.charAt(cnt) == ' '){
num_space++;
}
else{
arr[num_space] += str.charAt(cnt);
}
}
If your data is couple of MB only or have a lot of numbers inside, no need to worry about iterating char by char. Just read line by line and split lines then check the words as it is mentioned

else {
String[] res = line.split(" ");
bw.write(res[0] + " " + res[1] + " " + res[2] + "\n"); // the first three words...
}

Related

How many words are in each sentence of the file based in this code Java

I need to count how many words are in each sentence of the file based in this code
We have the file called archivo:
File archivo = null;
try {
archivo = new File("Text.txt");
String line;
FileReader fr = new FileReader (archivo);
BufferedReader br = new BufferedReader(fr);
int i,a=0;
while((linea=br.readLine())!=null) {
for(i=0;i<line.length();i++){
if(i==0){
if(line.charAt(i)!=' ')
a++;
}else{
if(line.charAt(i-1)==' ')
if(line.charAt(i)!=' ')
a++;
}
}
}
Here we print the number of words, but i also need the number of words per sentence
System.out.println("There are "+a+" words");
fr.close();
}catch(IOException a){
System.out.println(a);
}
}
}
The text.txt says:
hi
I'm Katie
and I have two cats.
Do it as follows:
String line;
int count=0, totalCount=0;
while((line=br.readLine())!=null) {
count = line.split("\\s+").length;
System.out.println("The number of words in '" + line + "' is: " + count);
totalCount += count;
}
System.out.println("The total number of words in the file is " + count);
Explanation: String::split function splits a string into an array of strings based on the specified regex. The regex, \\s+ means one or more spaces. For each line, the program is printing count i.e. the number of words (which is the length of the resulting array after the split happens) and also adding it to totalCount. In the end, the program prints totalCount (which is the total number of words in the file).

Filtering specific team from text file and displaying results

I want my program to allow a user to enter a team name and based on that name it will distribute the pertinent team information to the console for viewing. So far, the program allows the user to input a text file that contains unformatted team data. It then formats that data, stores it and prints the information to the console. It is at this point in my program where I want the user to be able to start her/his filtering based on a team name. I am not necessarily looking for an exact answer but some helpful tips or suggestions would be appreciated.
public static void main(String[] args) {
Scanner keyboard = new Scanner (System.in);
// Allow the user to enter the name of text file that the data is stored in
System.out.println("This program will try to read data from a text file ");
System.out.print("Enter the file name: ");
String filename = keyboard.nextLine();
System.out.println();
Scanner fileReader = null;
//A list to add results to, so they can be printed out after the parsing has been completed.
ArrayList<LineResult> results = new ArrayList<>();
try {
File Fileobject = new File (filename);
fileReader = new Scanner (Fileobject);
while(fileReader.hasNext()) {
String line = fileReader.nextLine();// Read a line of data from text file
// this if statement helps to skip empty lines
if ("".equals(line)) {
continue;
}
String [] splitArray = line.split(":");
// check to make sure there are 4 parts in splitArray
if(splitArray.length == 4) {
// remove spaces
splitArray[0] = splitArray[0].trim();
splitArray[1] = splitArray[1].trim();
splitArray[2] = splitArray[2].trim();
splitArray[3] = splitArray[3].trim();
//This section checks if each line has any corrupted data
//and then display message to the user.
if("".equals(splitArray[0]))
{
System.out.println(line + " > The home or away team may be missing");
System.out.println();
}else if ("".equals(splitArray[1])) {
System.out.println(line + " > The home or away team may be missing");
System.out.println();
}
try {
// Extract each item into an appropriate variable
LineResult result = new LineResult();
result.homeTeam = splitArray[0];
result.awayTeam = splitArray[1];
result.homeScore = Integer.parseInt(splitArray[2]);
result.awayScore = Integer.parseInt(splitArray[3]);
results.add(result);
} catch(NumberFormatException e) {
System.out.println(line + " > Home team score may not be a valid integer number ");
System.out.println(" or it may be missing");
System.out.println();
}
}else {
System.out.println(line + " > The field delimiter may be missing or ");
System.out.println(" wrong field delimiter is used");
System.out.println();
}
}
System.out.println();
System.out.println();
//Print out results
System.out.println("Home team Score Away team Score");
System.out.println("========= ===== ========= =====");
//Loop through each result printing out the required values.
//TODO: REQ4, filter results based on user requested team
try (BufferedReader br = new BufferedReader(new File(filename));
BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
String line;
while ((line = br.readLine()) != null) {
String[] values = line.split(" ");
if (values.length >= 3)
bw.write(values[0] + ' ' + values[1] + ' ' + values[2] + '\n');
}
}
for (LineResult result : results) {
System.out.println(
String.format("%-15s %1s %-15s %1s",
result.homeTeam,
result.homeScore,
result.awayTeam,
result.awayScore));
}
// end of try block
} catch (FileNotFoundException e) {
System.out.println("Error - File does not exist");
System.out.println();
}
}
//Data object for holding a line result
static class LineResult {
String homeTeam, awayTeam;
int homeScore, awayScore;}
}

Formatting string to get just words in a column

I have a text:
c:\MyMP3s\4 Non Blondes\Bigger!\Faster, More!_Train.mp3
I want to remove form this text these characters: :,\!._
And format the text then like this:
c
MyMP3s
4
Non
Blindes
Bigger
Faster
More
Train
mp3
And write all of this in a file.
Here is what I did:
public static void formatText() throws IOException{
Writer writer = null;
BufferedReader br = new BufferedReader(new FileReader(new File("File.txt")));
String line = "";
while(br.readLine()!=null){
System.out.println("Into the loop");
line = br.readLine();
line = line.replaceAll(":", " ");
line = line.replaceAll(".", " ");
line = line.replaceAll("_", " ");
line = System.lineSeparator();
System.out.println(line);
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("Write.txt")));
writer.write(line);
}
And it doesn't work!
The exception:
Into the loop
Exception in thread "main" java.lang.NullPointerException
at Application.formatText(Application.java:25)
at Application.main(Application.java:41)
At the end of your code, you have:
line = System.lineSeperator()
This resets your replacements. Another thing to note is String#replaceAll takes in a regex for the first parameter. So you have to escape any sequences, such as .
String line = "c:\\MyMP3s\\4 Non Blondes\\Bigger!\\Faster, More!_Train.mp3";
System.out.println("Into the loop");
line = line.replaceAll(":\\\\", " ");
line = line.replaceAll("\\.", " ");
line = line.replaceAll("_", " ");
line = line.replaceAll("\\\\", " ");
line = line.replaceAll(" ", System.lineSeparator());
System.out.println(line);
The output is:
Into the loop
c
MyMP3s
4
Non
Blondes
Bigger!
Faster,
More!
Train
mp3

Troubles reading a single character from a text file

I'm writing an "app" that takes in time input from the user and stores the hours and the minutes separately for each day in a text file (giving a result that looks like:
day 1: 8h 45min
day 2: 8h 43min
... )
the idea behind it is to use this data for multiple stuff, like calculating the average time, or just accessing the time at any day, but I haven't reached that stage yet, I'm having troubles doing the simplest stuff like reading the hour and printing it.
here's the code
import java.util.Scanner;
import java.io.*;
public class TimeInput {
public static void main(String[] args) {
write();
read();
}
static void write() {
int dayOfMonth = 1;
String fileName = "time.txt";
int[] time = new int[2];
String timeDisplay;
Scanner s = new Scanner(System.in);
try {
FileWriter fileWriter = new FileWriter(fileName);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
while (dayOfMonth <=31) {
System.out.println("Day " + dayOfMonth);
System.out.print("Enter hour: " + "__" + "h\r");
System.out.print("Enter hour: ");
time[0] = s.nextInt();
System.out.print("Enter minutes: " + "__" + "min\r");
System.out.print("Enter minutes: ");
time[1] = s.nextInt();
timeDisplay = ("\n"+ "day " + dayOfMonth + ": " + time[0] + "h " + time[1] + "min");
bufferedWriter.write(timeDisplay);
bufferedWriter.newLine();
dayOfMonth++;
if (time[0] == 0 && time[1] == 0) {
bufferedWriter.close();
dayOfMonth = 32; // break
}
}
}
catch (IOException ex) {
System.out.println(
"Error writing to file '" + fileName + "'");
}
}
static void read() {
String fileName = "time.txt";
String line = null;
try {
FileReader fileReader = new FileReader(fileName);
BufferedReader bufferedReader = new BufferedReader(fileReader);
while ((line = bufferedReader.readLine()) != null) {
char h = line.charAt(7);
System.out.println(h);
}
bufferedReader.close();
}
catch (FileNotFoundException ex) {
System.out.println(
"Unable to open file '" + fileName + "'" );
}
catch (IOException ex) {
System.out.println(
"Error reading file '" + fileName +"'");
}
}
}
I keep getting a String out of bounds exception and I don't understand why
You need to check empty string before char operation.
while ((line = bufferedReader.readLine()) != null) {
if("".equals(line)){
continue;
}
char h = line.charAt(7);
System.out.println(h);
}
Buffered Writer also saved enter key presses between your input. So to eliminate that enter presses add dummy readLine statement between each line read.
line=bufferedReader.readLine();
while ((line = bufferedReader.readLine()) != null) {
System.out.println(line);
char h = line.charAt(7);
System.out.println(h);
line=bufferedReader.readLine();
}
Take a look to the last loop.
Put a breakpoint and see what happen. Maybe the ArrayIndexOutBoundExceptions happens cuz youre trying to read more that one character and this cant be possible. Take a look to this url to see how to read a txt with bufferedReader.
http://www.mkyong.com/java/how-to-read-file-from-java-bufferedreader-example/
Hope this help.

Counting number of time the articles "a","an" are being used in a text file

I'm trying to make a program that count the number of words, lines, sentences, and also the number of articles 'a', 'and','the'.
So far I got the words, lines, sentences. But I have no idea who I am going to count the articles. How can a program make the difference between 'a' and 'and'.
This my code so far.
public static void main(String[]args) throws FileNotFoundException, IOException
{
FileInputStream file= new FileInputStream("C:\\Users\\nlstudent\\Downloads\\text.txt");
Scanner sfile = new Scanner(new File("C:\\Users\\nlstudent\\Downloads\\text.txt"));
int ch,sentence=0,words = 0,chars = 0,lines = 0;
while((ch=file.read())!=-1)
{
if(ch=='?'||ch=='!'|| ch=='.')
sentence++;
}
while(sfile.hasNextLine()) {
lines++;
String line = sfile.nextLine();
chars += line.length();
words += new StringTokenizer(line, " ,").countTokens();
}
System.out.println("Number of words: " + words);
System.out.println("Number of sentence: " + sentence);
System.out.println("Number of lines: " + lines);
System.out.println("Number of characters: " + chars);
}
}
How can a program make the difference between 'a' and 'and'.
You can use regex for this:
String input = "A and Andy then the are a";
Matcher m = Pattern.compile("(?i)\\b((a)|(an)|(and)|(the))\\b").matcher(input);
int count = 0;
while(m.find()){
count++;
}
//count == 4
'\b' is a word boundary, '|' is OR, '(?i)' — ignore case flag. All list of patterns you can find here and probably you should learn about regex.
The tokenizer will split each line into tokens. You can evaluate each token (a whole word) to see if it matches a string you expect. Here is an example to count a, and, the.
int a = 0, and = 0, the = 0, forCount = 0;
while (sfile.hasNextLine()) {
lines++;
String line = sfile.nextLine();
chars += line.length();
StringTokenizer tokenizer = new StringTokenizer(line, " ,");
words += tokenizer.countTokens();
while (tokenizer.hasMoreTokens()) {
String element = (String) tokenizer.nextElement();
if ("a".equals(element)) {
a++;
} else if ("and".equals(element)) {
and++;
} else if ("for".equals(element)) {
forCount++;
} else if ("the".equals(element)) {
the++;
}
}
}

Categories

Resources