I am trying to find top k words in a "data" text file. But I cannot remove stopwords including in "stop.txt" should I do it manually adding stopwords one by one or there is a method to read stop.txt file and remove these words in data.txt file?
try {
System.out.println("Enter value of 'k' words:: ");
Scanner in = new Scanner(System.in);
int n = in.nextInt();
w = new String[n];
r = new int[n];
Set<String> stopWords = new LinkedHashSet<String>();
BufferedReader SW = new BufferedReader(new FileReader("stop.txt"));
for(String line; (line = SW.readLine()) != null;)
stopWords.add(line.trim());
SW.close();
FileReader fr = new FileReader("data.txt");
BufferedReader br = new BufferedReader(fr);
String text = "";
String sz = null;
while((sz=br.readLine())!=null){
text = text.concat(sz);
}
String[] words = text.split(" ");
String[] uniqueLabels;
int count = 0;
uniqueLabels = getUniqLabels(words);
for(int j=0; j<n; j++){
r[j] = 0;
}
for(String l: uniqueLabels)
{
if("".equals(l) || null == l)
{
break;
}
for(String s : words)
{
if(l.equals(s))
{
count++;
}
}
for(int i=0; i<n; i++){
if(count>r[i]){
r[i] = count;
w[i] = l;
break;
}
}
count=0;
}
display(n);
} catch (Exception e) {
System.err.println("ERR "+e.getMessage());
}
Read file contents by:
List<String> stopwords = Files.readAllLines(Paths.get("english_stopwords.txt"));
Then use this for removing stop words:
ArrayList<String> allWords =
Stream.of(original.toLowerCase().split(" "))
.collect(Collectors.toCollection(ArrayList<String>::new));
allWords.removeAll(stopwords);
String result = allWords.stream().collect(Collectors.joining(" "));
Removing Stopwords from a String in Java
Related
I have the following code which counts and displays the number of times each word occurs in the whole text document.
try {
List<String> list = new ArrayList<String>();
int totalWords = 0;
int uniqueWords = 0;
File fr = new File("filename.txt");
Scanner sc = new Scanner(fr);
while (sc.hasNext()) {
String words = sc.next();
String[] space = words.split(" ");
for (int i = 0; i < space.length; i++) {
list.add(space[i]);
}
totalWords++;
}
System.out.println("Words with their frequency..");
Set<String> uniqueSet = new HashSet<String>(list);
for (String word : uniqueSet) {
System.out.println(word + ": " + Collections.frequency(list,word));
}
} catch (Exception e) {
System.out.println("File not found");
}
Is it possible to modify this code to make it so it only counts each occurrence once per line rather than in the entire document?
One can read the contents per line and then apply logic per line to count the words:
File fr = new File("filename.txt");
FileReader fileReader = new FileReader(file);
BufferedReader br = new BufferedReader(fileReader);
// Read the line in the file
String line = null;
while ((line = br.readLine()) != null) {
//Code to count the occurrences of the words
}
Yes. The Set data structure is very similar to the ArrayList, but with the key difference of having no duplicates.
So, just use a set instead.
In your while loop:
while (sc.hasNext()) {
String words = sc.next();
String[] space = words.split(" ");
//convert space arraylist -> set
Set<String> set = new HashSet<String>(Arrays.asList(space));
for (int i = 0; i < set.length; i++) {
list.add(set[i]);
}
totalWords++;
}
Rest of the code should remain the same.
So in my codes, I am trying to read a file that is like:
100
22
123;22
123 342;432
but when it outputs it would include the ";" ( ex. 100,22,123;22,123,342;432} ).
I am trying to make the file into an array ( ex. {100,22,123,22,123...} ).
Is there a way to read the file, but ignore the semicolons?
Thanks!
public static void main(String args [])
{
String[] inFile = readFiles("ElevatorConfig.txt");
for ( int i = 0; i <inFile.length; i = i + 1)
{
System.out.println(inFile[i]);
}
System.out.println(Arrays.toString(inFile));
}
public static String[] readFiles(String file)
{
int ctr = 0;
try{
Scanner s1 = new Scanner(new File(file));
while (s1.hasNextLine()){
ctr = ctr + 1;
s1.next();
}
String[] words = new String[ctr];
Scanner s2 = new Scanner(new File(file));
for ( int i = 0 ; i < ctr ; i = i + 1){
words[i] = s2.next();
}
return words;
}
catch(FileNotFoundException e)
{
return null;
}
}
public static String[] readFiles(String file)
{
int ctr = 0;
try{
Scanner s1 = new Scanner(new File(file));
while (s1.hasNextLine()){
ctr = ctr + 1;
s1.next();
}
String[] words = new String[ctr];
Scanner s2 = new Scanner(new File(file));
for ( int i = 0 ; i < ctr ; i = i + 1){
words[i] = s2.next();
}
return words;
}
catch(FileNotFoundException e)
{
return null;
}
}
Replace this by
public static String[] readFiles(String file) {
List<String> retList = new ArrayList<String>();
Scanner s2 = new Scanner(new File(file));
for ( int i = 0 ; i < ctr ; i = i + 1){
String temp = s2.next();
String[] tempArr = se.split(";");
for(int k=0;k<tempArr.length;k++) {
retList.add(tempArr[k]);
}
}
return (String[]) retList.toArray();
}
Use regex. Read the entire file into a String (read each token as a String and append a blank space after each token in the String) and then split it at blank spaces and semi colons.
String x <--- contains all contents of the file
String[] words = x.split("[\\s\\;]+");
The contents of words[] are:
"100", "22", "123", "22", "123", "342", "432"
Remember to parse them to int before using as numbers.
Simple way to use BufferedReader Read line by line then split by ;
public static String[] readFiles(String file)
{
BufferedReader br = new BufferedReader(new FileReader(file)))
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String allfilestring = sb.toString();
String[] array = allfilestring.split(";");
return array;
}
You can use split() to split the string into array according to your requirement using regex.
String s; // string you have read from the file
String[] s1 = s.split(" |;"); // s1 contains the strings separated by space and ";"
Hope it helps
Keep the code for counting the size of the array.
I would just change the way you input your values.
for (int i = 0; i < ctr; i++) {
words[i] = "" + s1.nextInt();
}
Another option is to replace all non digit characters in your complete file string with a space. That way any non number character is ignored.
BufferedReader br = new BufferedReader(new FileReader(file)))
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
line = br.readLine();
}
String str = sb.toString();
str = str.replaceAll("\\D+"," ");
Now you have a string with numbers separated by spaces, we can tokenize them into number strings.
String[] final = str.split("\\s+");
then convert to int datatypes.
I'm having a small problem with my code and I'm not exactly sure how to fix it.. Basically I'm trying to separate the file into different lines (Frames) and then input those lines into the file, and proceed to print them. My first line of the file never prints.
public class Main {
public static void main(String[] args) throws IOException
{
/*Switch switcherino = new Switch();*/
Frame frame = new Frame();
Scanner input = new Scanner(System.in);
System.out.println("Enter the name of the file to process: ");
String fileName = input.nextLine();
FileInputStream inputStream =
new FileInputStream(fileName);
InputStreamReader inputStreamReader =
new InputStreamReader(inputStream,Charset.forName("UTF-8"));
BufferedReader bufferedReader =
new BufferedReader(inputStreamReader);
try{
String str = " ";
while((str = bufferedReader.readLine())!= null){
String words[] = str.split(" ");
for (int i = 0; i < words.length; i++){
words[i] = bufferedReader.readLine();
System.out.println(words[i]);
}
}
}
catch (IOException e){
e.printStackTrace();
} finally {
try {
if (inputStream != null)
inputStream.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}
I don't want to use an ArrayList, as much as it would probably be easier.
Thanks in advance!
File: (switch.txt)
fa00 123123123abc 111111222222 data1
fa01 111111222222 123123123abc data2
fa03 444444444444 123123123abc data3
fa01 123123123abc 4353434234ab data4
fa99 a11b22c33d44 444444444444 data5
Output: (from System.println(words[i]);)
fa01 111111222222 123123123abc data2
fa03 444444444444 123123123abc data3
fa01 123123123abc 4353434234ab data4
fa99 a11b22c33d44 444444444444 data5
This is wrong logic: you read the line, you split it into words so then go ahead and print them - no need to try and read any more lines
while((str = bufferedReader.readLine())!= null){
String words[] = str.split(" ");
for (int i = 0; i < words.length; i++){
words[i] = bufferedReader.readLine();
System.out.println(words[i]);
}
}
use this instead
while((str = bufferedReader.readLine())!= null){
String words[] = str.split(" ");
for (int i = 0; i < words.length; i++){
System.out.println(words[i]);
}
}
// to count length
int length = 0;
BufferedReader br =
new BufferedReader(inputStreamReader);
while(true){
str = br.readLine();
if(str == null) break;
else length++;
} // this loop counts the length!!
final int clength = length;
//now this is what you want!
String words[] = new String[clength];
int j= 0;
while(true){
str = bufferedReader.readLine();
if(str == null) break;
words[j++] = str;
System.out.println(str); //FIXED
}
//Now the words[] have all the lines individually
Your code doesn't work because you called readLine() twice, which skipped the first line. Try this and let me know.
You don't need to use split() since you want the entire line :)
while((str = bufferedReader.readLine())!= null){
String words[] = str.split(" ");
for (int i = 0; i < words.length; i++){
words[i] = bufferedReader.readLine();
System.out.println(words[i]);
}
}
When iterate the file, you split your first line into a String array,
words[] contains the following elements : fa00, 123123123abc, 111111222222 and data1.
and then the inner for loop iterate your bufferReader and you assign the lines to a specific index of word and then you print out the word array elements
You are not supposed to invoke bufferedReader.readLine() in the inner for loop, it breaks your logic.
I am trying to make this input.txt into a 2D array. I tried a few different methods. This is my latest attempt, and I seem to be stuck here... Any help is much appreciated.
input.txt structure: SCI2000/Science/1200/10/C --> There are 23 rows and 5 columns. I'd also like to have a title made for each column.
FileReader fr = new FileReader("input.txt");
BufferedReader br = new BufferedReader(fr);
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
String[][] input = new String[23][5];
String[] tokens = everything.split("/");
for(String str : tokens)
System.out.print(str);
Just the main processing part (not tested):
int columns = 5;
String[] row = String[columns];
int j = 0;
while ((line = br.readline) != null) {
row = line.split("/");
for(int i=0; i<row.length; ++i) {
input[j,i] = row(i);
}
++j;
}
FileReader fr = new FileReader("input.txt");
BufferedReader br = new BufferedReader(fr);
String[][] input = new String[24][5]; // 1 row for title, 23 rows for data
// add title
input[0] = new String[]{"title1", "title1", "title1", "title1", "title1"};
String line = br.readLine();
int row = 1; // update here
while ( (line = br.readLine())!= null ) {
input[row++] = line.split("/");
}
// print all data
for ( int i = 0; i < input.length; i++) {
for ( int j = 0; j < input[i].length; j++ )
System.out.print(input[i][j] + " ");
//new line
System.out.println();
}
example i have this numbers or arrays on my file (notepad)
2 3 4 5 7 2 6 2
2 4 6 8 9 4 8 1
I want to ask if how to read the next row. I can only read the first row using this code.
String path = "/path/notepad.txt";
String stringOfNumbers[];
BufferedReader br = new BufferedReader (new InputStreamReader(System.in));
BufferedReader br2 = new BufferedReader (new FileReader(path));
String lineOfNumbers = br2.readLine();
stringOfNumbers = lineOfNumbers.split(" ");
//stringOfNumbers = lineOfNumbers.split("\n");
String str = lineOfNumbers.replace(","," ");
System.out.println(str);
System.out.print("");
int numbers[][] = new int [stringOfNumbers.length][stringOfNumbers.length];
for (int i = 0; i < numbers.length; i++)
{
numbers[i][i] = Integer.parseInt(stringOfNumbers[i]);
}
System.out.print("Enter the number to search: ");
int searchNumber = Integer.parseInt(br.readLine());
int location = 0;
for (int i = 0; i < numbers.length; i++)
{
if (numbers[i][i] == searchNumber)
{
location = i+ 1;
}
}
thank you in advance.
I would do somethin like this
FileReader fr = new FileReader("myFileName");
BufferedReader br = new BufferedReader(fr);
while((line=br.readLine())!=null) // as long as there are lines in the file
{
stringOfNumbers = line.split(" ");
// other code
}
Following code will Help you save all numbers in a file to memory
Scanner scanner = new Scanner(path);
List<Integer[]> integerArList = new ArrayList<Integer[]>();
while(scanner.hasNextLine()){
String lineOfNumbers = scanner.nextLine();
stringOfNumbers = lineOfNumbers.split(" ");
//stringOfNumbers = lineOfNumbers.split("\n");
String str = lineOfNumbers.replace(","," ");
System.out.println(str);
System.out.print("");
Integer[] numbers = new Integer[stringOfNumbers.length];
for (int i = 0; i < numbers.length; i++)
{
numbers[i] = Integer.parseInt(stringOfNumbers[i]);
}
integerArList.add(numbers);
}
After this you can search any Integer by traversing each array in the List like this:
int searchMe = <get this from user>
int location=0;
boolean found=false;
for(Integer[] intAr: integerArList){
for(int i=0;i<intAr.length;i++){
if(intAr[i]==searchMe){
location+=(i+1)
found=true;
break;
}
}
if(found) break;
location+=intAr.length;
}
System.out.println("Location of " + searchMe +" : " +(found?location:"Not Found in Data"));
Hope this helps.
To read all lines of a text file you can do something like this:
String path = "/path/notepad.txt";
String stringOfNumbers[];
BufferedReader br = new BufferedReader (new InputStreamReader(System.in));
BufferedReader br2 = new BufferedReader (new FileReader(path));
ArrayList<String> listOfLines = new ArrayList<String>();
//String lineOfNumbers = "";
String line = "";
String allIndexes = "";
while ((line = br2.readLine()) != null) {
if(!line.isEmpty()){
listOfLines.add(line);
}
}
for(String lineOfNumbers : listOfLines){
stringOfNumbers = lineOfNumbers.split(" ");
//stringOfNumbers = lineOfNumbers.split("\n");
String str = lineOfNumbers.replace(","," ");
System.out.println(str);
System.out.print("");
int numbers[][] = new int [stringOfNumbers.length][listOfLines.size()];
for (int i = 0; i < numbers.length; i++)
{
numbers[i][listOfLines.indexOf(lineOfNumbers)] = Integer.parseInt(stringOfNumbers[i]);
}
System.out.print("Enter the number to search: ");
int searchNumber = Integer.parseInt(br.readLine());
int locationI = 0;
int locationJ = 0;
for (int i = 0; i < numbers.length; i++)
{
for(int j = 0; j < listOfLines.size(); j++)
if (numbers[i][j] == searchNumber)
{
locationI = i + 1;
locationJ = j + 1;
allIndexes += locationI + ":" + locationJ + " ";
}
}
}