counting words of a file and storing it in array? - java

again. im gonna ask again about counting words and how to store it in array. So far, all i got is this.
Scanner sc = new Scanner(System.in);
int count;
void readFile() {
System.out.println("Gi navnet til filen: ");
String filNavn = sc.next();
try{
File k = new File(filNavn);
Scanner sc2 = new Scanner(k);
count = 0;
while(sc2.hasNext()) {
count++;
sc2.next();
}
Scanner sc3 = new Scanner(k);
String a[] = new String[count];
for(int i = 0;i<count;i++) {
a[i] =sc3.next();
if ( i == count -1 ) {
System.out.print(a[i] + "\n");
}else{
System.out.print(a[i] + " ");
}
}
System.out.println("Number of words: " + count);
}catch(FileNotFoundException e) {
my code works. but my question is, is there a more simple way to this? And the other question is how do i count the unique words out of the total words in a given file without using hashmap and arraylist.

Heres a simpler way to go about it:
public static void main(String[] args){
File f= new File(filename);
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
String[] res;
while((line = br.readLine())!= null ){
String[] tokens = line.split("\\s+");
String[] both = ArrayUtils.addAll(res, tokens);
}
}

Related

Converting ArrayLists in Java

I have the following code which counts and displays the number of times each word occurs in the whole text document.
try {
List<String> list = new ArrayList<String>();
int totalWords = 0;
int uniqueWords = 0;
File fr = new File("filename.txt");
Scanner sc = new Scanner(fr);
while (sc.hasNext()) {
String words = sc.next();
String[] space = words.split(" ");
for (int i = 0; i < space.length; i++) {
list.add(space[i]);
}
totalWords++;
}
System.out.println("Words with their frequency..");
Set<String> uniqueSet = new HashSet<String>(list);
for (String word : uniqueSet) {
System.out.println(word + ": " + Collections.frequency(list,word));
}
} catch (Exception e) {
System.out.println("File not found");
}
Is it possible to modify this code to make it so it only counts each occurrence once per line rather than in the entire document?
One can read the contents per line and then apply logic per line to count the words:
File fr = new File("filename.txt");
FileReader fileReader = new FileReader(file);
BufferedReader br = new BufferedReader(fileReader);
// Read the line in the file
String line = null;
while ((line = br.readLine()) != null) {
//Code to count the occurrences of the words
}
Yes. The Set data structure is very similar to the ArrayList, but with the key difference of having no duplicates.
So, just use a set instead.
In your while loop:
while (sc.hasNext()) {
String words = sc.next();
String[] space = words.split(" ");
//convert space arraylist -> set
Set<String> set = new HashSet<String>(Arrays.asList(space));
for (int i = 0; i < set.length; i++) {
list.add(set[i]);
}
totalWords++;
}
Rest of the code should remain the same.

Java word appearence in a text file

For the given text file (text.txt) compute how many times each word appears in the file. The output of the program should be another text file containing on each line a word and then the number of times it appears in the original file. After you finish change the program so that the words in the output file are sorted alphabetically. Do not use maps, use only basic arrays. The thing is displaying me only one word that I enter from keyboard in that text file, but how can I display for all words, not only for one? Thanks
package worddata;
import java.io.IOException;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
class WordData {
public FileReader fr = null;
public BufferedReader br =null;
public String [] stringArray;
public int counLine = 0;
public int arrayLength ;
public String s="";
public String stringLine="";
public String filename ="";
public String wordname ="";
public WordData(){
try{
Scanner scan = new Scanner(System.in);
System.out.println("Please enter the filename: ");
filename = scan.nextLine();
Scanner scan2 = new Scanner(System.in);
System.out.println("Please enter a word: ");
wordname = scan.nextLine();
fr = new FileReader(filename);
br = new BufferedReader(fr);
while((s = br.readLine()) != null){
stringLine = stringLine + s;
//System.out.println(s);
stringLine = stringLine + " ";
counLine ++;
}
stringArray = stringLine.split(" ");
arrayLength = stringArray.length;
for (int i = 0; i < arrayLength; i++) {
int c = 1 ;
for (int j = i+1; j < arrayLength; j++) {
if(stringArray[i].equalsIgnoreCase(stringArray[j])){
c++;
for (int j2 = j; j2 < arrayLength; j2++) {
stringArray[j2] = stringArray[j2+1];
arrayLength = arrayLength - 1;
}
if (stringArray[i].equalsIgnoreCase(wordname)){
System.out.println("The word "+wordname+" is present "+c+" times in the specified file.");
}
}
}
}
System.out.println("Total number of lines: "+counLine);
fr.close();
br.close();
}catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws IOException {
Scanner scan = new Scanner(System.in);
OutputStream out = new FileOutputStream("output.txt");
System.out.println("Please enter the filename: ");
String filename = scan.nextLine();
System.out.println("Please enter a word: ");
String wordname = scan.nextLine();
int count = 0;
try (LineNumberReader r = new LineNumberReader(new FileReader(filename))) {
String line;
while ((line = r.readLine()) != null) {
for (String element : line.split(" ")) {
if (element.equalsIgnoreCase(wordname)) {
count++;
System.out.println("Word found at line " + r.getLineNumber());
}
}
}
}
FileReader fileReader = new FileReader(filename);
BufferedReader bufferedReader = new BufferedReader(fileReader);
StringBuffer stringBuffer = new StringBuffer();
String line;
while ((line = bufferedReader.readLine()) != null) {
stringBuffer.append(line);
stringBuffer.append("\n");
}
fileReader.close();
System.out.println("The word " + stringBuffer.toString() + " appears " + count + " times.");
int i;
List<String> ls = new ArrayList<String>();
for (i = 1; i <= 1000; i++) {
String str = null;
str = +i + ":- The word "+wordname+" was found " + count +" times";
ls.add(str);
}
String listString = "";
for (String s : ls) {
listString += s + "\n";
}
FileWriter writer = null;
try {
writer = new FileWriter("final.txt");
writer.write(listString);
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The code below does something like you want I think.
it does the following:
read the contents from the input.txt file
Remove punctuation marks from the text
make it one string of words by removing line breaks
Split the text up in words by using space as delimiter
The lambda maps all the words to lowercase then removes whitespace and all empty entries then it...
loops over all words and computes there word count in het HashMap
then we sort the Map based on the count value in reverse order to get the highest counted words first
then write them to a StringBuilder to format it like this "word : count\n" and then write it to a text file
final String content = new String(Files.readAllBytes(Paths.get("<PATH TO YOUR PLACE>/input.txt")));
final List<String> words = Arrays.asList(content.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "").replace("\n", " ").split(" "));
final Map<String, Integer> wordlist = new HashMap<>();
words.stream()
.map(String::toLowerCase)
.map(String::trim)
.filter(s -> !s.isEmpty())
.forEach(s -> {
wordlist.computeIfPresent(s, (s1, integer) -> ++integer);
wordlist.putIfAbsent(s, 1);
});
final StringBuilder sb = new StringBuilder();
wordlist.entrySet()
.stream()
.sorted(Map.Entry.comparingByValue(Collections.reverseOrder()))
.collect(Collectors.toMap(
Map.Entry::getKey,
Map.Entry::getValue,
(e1, e2) -> e1,
LinkedHashMap::new
)).forEach((s, integer) -> sb.append(s).append(" : ").append(integer).append("\n"));
Files.write(Paths.get("<PATH TO YOUR PLACE>/output.txt"), sb.toString().getBytes());
Hope it helps :-)
Note: the <PATH TO YOUR PLACE> needs to be replaced by the fully qualified path to your text file with words.

How to read integers from a file that are separated with semi colon?

So in my codes, I am trying to read a file that is like:
100
22
123;22
123 342;432
but when it outputs it would include the ";" ( ex. 100,22,123;22,123,342;432} ).
I am trying to make the file into an array ( ex. {100,22,123,22,123...} ).
Is there a way to read the file, but ignore the semicolons?
Thanks!
public static void main(String args [])
{
String[] inFile = readFiles("ElevatorConfig.txt");
for ( int i = 0; i <inFile.length; i = i + 1)
{
System.out.println(inFile[i]);
}
System.out.println(Arrays.toString(inFile));
}
public static String[] readFiles(String file)
{
int ctr = 0;
try{
Scanner s1 = new Scanner(new File(file));
while (s1.hasNextLine()){
ctr = ctr + 1;
s1.next();
}
String[] words = new String[ctr];
Scanner s2 = new Scanner(new File(file));
for ( int i = 0 ; i < ctr ; i = i + 1){
words[i] = s2.next();
}
return words;
}
catch(FileNotFoundException e)
{
return null;
}
}
public static String[] readFiles(String file)
{
int ctr = 0;
try{
Scanner s1 = new Scanner(new File(file));
while (s1.hasNextLine()){
ctr = ctr + 1;
s1.next();
}
String[] words = new String[ctr];
Scanner s2 = new Scanner(new File(file));
for ( int i = 0 ; i < ctr ; i = i + 1){
words[i] = s2.next();
}
return words;
}
catch(FileNotFoundException e)
{
return null;
}
}
Replace this by
public static String[] readFiles(String file) {
List<String> retList = new ArrayList<String>();
Scanner s2 = new Scanner(new File(file));
for ( int i = 0 ; i < ctr ; i = i + 1){
String temp = s2.next();
String[] tempArr = se.split(";");
for(int k=0;k<tempArr.length;k++) {
retList.add(tempArr[k]);
}
}
return (String[]) retList.toArray();
}
Use regex. Read the entire file into a String (read each token as a String and append a blank space after each token in the String) and then split it at blank spaces and semi colons.
String x <--- contains all contents of the file
String[] words = x.split("[\\s\\;]+");
The contents of words[] are:
"100", "22", "123", "22", "123", "342", "432"
Remember to parse them to int before using as numbers.
Simple way to use BufferedReader Read line by line then split by ;
public static String[] readFiles(String file)
{
BufferedReader br = new BufferedReader(new FileReader(file)))
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String allfilestring = sb.toString();
String[] array = allfilestring.split(";");
return array;
}
You can use split() to split the string into array according to your requirement using regex.
String s; // string you have read from the file
String[] s1 = s.split(" |;"); // s1 contains the strings separated by space and ";"
Hope it helps
Keep the code for counting the size of the array.
I would just change the way you input your values.
for (int i = 0; i < ctr; i++) {
words[i] = "" + s1.nextInt();
}
Another option is to replace all non digit characters in your complete file string with a space. That way any non number character is ignored.
BufferedReader br = new BufferedReader(new FileReader(file)))
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
line = br.readLine();
}
String str = sb.toString();
str = str.replaceAll("\\D+"," ");
Now you have a string with numbers separated by spaces, we can tokenize them into number strings.
String[] final = str.split("\\s+");
then convert to int datatypes.

Compare files then delete the one with the greater characters

I have java comparing two different files but I would like it to take the one with the most characters and delete the other one. I don’t think it should go by file size because just one extra character added could have same size file.. Correct?
Any help is appreciated.
Here is my code:
import java.io.*;
import java.util.*;
public class FileComp
{
public static void main (String[] args) throws java.io.IOException
{
BufferedReader br2 = new BufferedReader (new
InputStreamReader(System.in));
String str = ("compt1.txt");
String str1 = ("compt2.txt");
String s1="";
String s2="",s3="",s4="";
String y="",z="";
BufferedReader br = new BufferedReader (new FileReader (str));
BufferedReader br1 = new BufferedReader (new FileReader (str1));
while((z=br1.readLine())!=null)
s3+=z;
while((y=br.readLine())!=null)
s1+=y;
System.out.println ();
int numTokens = 0;
StringTokenizer st = new StringTokenizer (s1);
String[] a = new String[10000];
for(int l=0;l<10000;l++)
{a[l]="";}
int i=0;
while (st.hasMoreTokens())
{
s2 = st.nextToken();
a[i]=s2;
i++;
numTokens++;
}
int numTokens1 = 0;
StringTokenizer st1 = new StringTokenizer (s3);
String[] b = new String[10000];
for(int k=0;k<10000;k++)
{b[k]="";}
int j=0;
while (st1.hasMoreTokens())
{
s4 = st1.nextToken();
b[j]=s4;
j++;
numTokens1++;
}
int x=0;
for(int m=0;m<a.length;m++)
{
if(a[m].equals(b[m])){}
else
{
x++;
System.out.println(a[m] + " -- " +b[m]);
System.out.println();}
}
//////////////////////////////Change this:
System.out.println("Number of differences " + x);
if(x>0){System.out.println("Files are not equal");}
else{System.out.println("Files are equal. No difference found");}
////////////////////////////////////
}
}
Use
File file = new File("File.txt");
long l = file.length();
You can use the length() method on File which returns the size in bytes
Each character takes some amount of memory. So, file size shouldn't be same.

Reading in information into separate arrays

I've been having some difficulties reading in information from a file into separate arrays. An example of the information in the file is:
14 Barack Obama:United States
17 David Cameron:United Kingdom
27 Vladimir Putin:Russian Federation
19 Angela Merkel:Germany
While I can separate the integers into an array, I am having trouble creating an array for the names and an array for the countries. This is my code thus far:
import java.util.*;
import java.io.*;
public class leadRank {
public static void main(String[] args) throws FileNotFoundException {
int size;
Scanner input = new Scanner(new File("names.txt"));
size = input.nextInt();
int[] rank = new int[size];
for (int i = 0; i < rank.length; i++) {
rank[i] = input.nextInt();
input.nextLine();
}
String[] name = new String[size];
for (int i = 0; i <name.length; i++) {
artist[i] =
I think that I would have to read in the line as a string and use indexOf to find the colon in order to start a new array but I'm unsure as to how to execute that.
I just tried to solve your problem in my ways. It was just for a time pass. Hopes this may helps you.
import java.util.*;
import java.io.*;
public class leadRank {
public static void main(String[] args) throws FileNotFoundException {
int size;
File file = new File("names.txt");
FileReader fr = new FileReader(file);
String s;
LineNumberReader lnr = new LineNumberReader(new FileReader(file));
lnr.skip(Long.MAX_VALUE);
size = lnr.getLineNumber()+1;
lnr.close();
int[] rank = new int[size];
String[] name = new String[size];
String[] country = new String[size];
try {
BufferedReader br = new BufferedReader(fr);
int i=0;
while ((s = br.readLine()) != null) {
String temp = s;
if(temp.contains(":")){
String[] splitres = temp.split(":");
String sub = splitres[0];
rank[i] = Integer.parseInt(sub.substring(0,sub.indexOf(" "))); // Adding rank to array rank[]
name[i] = sub.substring(sub.indexOf(" "), sub.length()-1); // Adding name to array name[]
country[i] = splitres[1]; // Adding the conutries to array country[]
}
i++;
}
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
This is a bit more efficient because it goes through the file only once.
public static void main(String[] args) throws FileNotFoundException {
// create an array list because the size of the array is still not know
ArrayList<Integer> ranks = new ArrayList<Integer>();
ArrayList<String> names = new ArrayList<String>();
ArrayList<String> countries = new ArrayList<String>();
// read the input file
Scanner input = new Scanner(new File("names.txt"));
// read each line
while (input.hasNext()) {
String wholeLine = input.nextLine();
// get the index of the first space
int spaceIndex = wholeLine.indexOf(" ");
// parse the rank
int rank;
try {
rank = Integer.parseInt(wholeLine.substring(0, spaceIndex));
} catch (NumberFormatException e) {
rank = -1;
}
// parse the name & country
String[] tokens = wholeLine.substring(spaceIndex + 1).split(":");
String name = tokens[0];
String country = tokens[1];
// add to the arrays
ranks.add(rank);
names.add(name);
countries.add(country);
}
// get your name and country arrays if needed
String[] nameArr = names.toArray(new String[]{});
String[] countryArr = countries.toArray(new String[]{});
// the rank array has to be created manually
int[] rankArr = new int[ranks.size()];
for (int i = 0; i < ranks.size(); i++) {
rankArr[i] = ranks.get(i).intValue();
}
}

Categories

Resources