Code to read the dataset

Code to read the dataset - java

Here I read the dataset and extracted the data lines(not the attributes) and print it.Next I need to sort the dataset.Now this is stored in an ArrayList.How to sort it?
public static void main(String args[]) throws Exception
{
String filen, jsnfl;
Customiseddata data = new Customiseddata();
data.setAlgorithm("C4.5");
data.setUserName("Dahlia");
System.out.println("Enter the file name");
sc = new Scanner(System.in);
filen = sc.nextLine();
data.setFileName("input_files/" + filen);
Mainclass main = new Mainclass();
main.build(data);
}
public void build(Customiseddata data) throws Exception
{
int extension;
String filename;
filename = data.getFileName();
extension = filename.lastIndexOf('.');
String extensionType = filename.substring(extension + 1,
filename.length());
if (extensionType.equalsIgnoreCase("csv"))
{
readcsv(filename);
}
else if (extensionType.equalsIgnoreCase("arff"))
{
readarff(filename);
}
}
public void readarff(String filename) throws Exception
{
#SuppressWarnings("unused")
int filesize, attributesize, c = 0, i;
#SuppressWarnings("unused")
float v = 0;
String s, line1;
ArrayList<String> filelines;
ArrayList<String> attributes;
Customiseddata data = new Customiseddata();
Arfffilereader arfffile = new Arfffilereader();
Extractdata exdata = new Extractdata();
exdata = arfffile.extractInputArff(filename);
filelines = exdata.getFileLines();
attributes = exdata.getAttributes();
filesize = filelines.size();
attributesize = attributes.size();
data.setFilesize(filesize);
System.out.println("Print the attributes");
System.out.println("--------------------");
for (i = 0; i < attributesize; i++)
{
System.out.println(attributes.get(i));
}
System.out.println("\t");
System.out.println("Print the filelines");
System.out.println("--------------------");
for (int j = 0; j < filesize; j++)
{
System.out.println(filelines.get(j));
}
}
But after this I need to sort the dataset.

Since the elements of the list are Strings and since String implements Comparable, sorting a list is as simple as:
Collections.sort(theList);
Note however that it will sort the list in place. If you don't want that, make a copy of the list and sort that copy.

Related

Remove all java keywords from a file

I am going through a project where I need to remove all java keywords from a java file. First I create a keyword.java file and store all java keywords into this file.Like abstract continue for new switch assert default goto package etc which I store keyword.java file. I have another file named newFile.java and I read all data from newFile.java as a String. I have to remove all java keywords from newFile.java file. As far I tried:
public void processFile() throws IOException {
String data = "";
data = new String(Files.readAllBytes(Paths.get("H:\\java\\Clone\\newFile.java"))).trim();
String rmvPunctuation = removePunctuation(data);
String newLineRemove = rmvPunctuation.replace("\n", "").replace("\r", "");
String spaceRemove = newLineRemove.replaceAll("( ){2,}", " ");
removeKeyword(spaceRemove);}
public void removeKeyword(String fileAsString) throws FileNotFoundException, IOException {
ArrayList<String> keyWordList = new ArrayList<>();
ArrayList<String> methodContentList = new ArrayList<>();
FileInputStream fis = new FileInputStream("H:\\java\\keyword.java");
byte[] b = new byte[fis.available()];
fis.read(b);
String[] keyword = new String(b).trim().split(" ");
String newString = "";
for (int i = 0; i < keyword.length; i++) {
keyWordList.add(keyword[i].trim());
}
String[] p = fileAsString.split(" ");
for (int i = 0; i < p.length; i++) {
if (!(keyWordList.contains(p[i].trim()))) {
newString = newString + p[i] + " ";
}
}
System.out.println("" + newString);
}
But I could not found my desired output. All the java keywords are not removed from newFile.java file. I think StackOverflow community help me to solve this. I am also a beginner.
I also tried:
public void removeKeyword(String fileAsString) throws IOException {
String keyWord = new String(Files.readAllBytes(Paths.get("H:\\java\\keyword.java"))).trim();
String text = fileAsString.trim();
ArrayList<String> wordList = new ArrayList<>();
ArrayList<String> keyWordList = new ArrayList<>();
wordList.addAll(Arrays.asList(text.split(" ")));
keyWordList.addAll(Arrays.asList(keyWord.split(" ")));
wordList.removeAll(keyWordList);
System.out.println("" + wordList.toString());
}

TreeMap with (String,ArrayList<String,Int>)

I am trying to read an input file. Each value of the input file is inserted into the TreeMap as
If word is not existing: Insert the word to the treemap and associate the word with an ArrayList(docId, Count).
If the Word is present in the TreeMap, then check if the current DocID matches within the ArrayList and then increase the count.
THe
For the ArrayList, I created another class as below:
public class CountPerDocument
{
private final String documentId;
private final int count;
CountPerDocument(String documentId, int count)
{
this.documentId = documentId;
this.count = count;
}
public String getDocumentId()
{
return this.documentId;
}
public int getCount()
{
return this.count;
}
}
After that, I am trying to print the TreeMap into a text file as <DocID - Count>
Not sure what I am doing wrong here, but the output I get is as follows:
The Stem is todai:[CountPerDocument#5caf905d, CountPerDocument#27716f4, CountPerDocument#8efb846, CountPerDocument#2a84aee7, CountPerDocument#a09ee92, CountPerDocument#30f39991]
Wondering if anyone can guide me what i am doing wrong and if my method isn't correct what am i supposed to do?
public class StemTreeMap
{
private static final String r1 = "\\$DOC";
private static final String r2 = "\\$TITLE";
private static final String r3 = "\\$TEXT";
private static Pattern p1,p2,p3;
private static Matcher m1,m2,m3;
public static void main(String[] args)
{
BufferedReader rd,rd1;
String docid = null;
String id;
int tf = 0;
//CountPerDocument cp = new CountPerDocument(docid, count);
List<CountPerDocument> ls = new ArrayList<>();
Map<String,List<CountPerDocument>> mp = new TreeMap<>();
try
{
rd = new BufferedReader(new FileReader(args[0]));
rd1= new BufferedReader(new FileReader(args[0]));
int docCount = 0;
String line = rd.readLine();
p1 = Pattern.compile(r1);
p2 = Pattern.compile(r2);
p3 = Pattern.compile(r3);
while(line != null)
{
m1 = p1.matcher(line);
m2 = p2.matcher(line);
m3 = p3.matcher(line);
if(m1.find())
{
docid = line.substring(5, line.length());
docCount++;
//System.out.println("The Document ID is :");
//System.out.println(docid);
line = rd.readLine();
}
if(m2.find()||m3.find())
{
line = rd.readLine();
}
else
{
if(!(mp.containsKey(line))) // if the stem is not on the TreeMap
{
//System.out.println("The stem is not present in the tree");
tf = 1;
ls.add(new CountPerDocument(docid,tf));
mp.put(line, ls);
line = rd.readLine();
}
else
{
if(ls.indexOf(docid) > 0) //if its last entry matches the current document number
{
//System.out.println("The Stem is present for the same docid so incrementing docid");
tf = tf+1;
ls.add(new CountPerDocument(docid,tf));
line = rd.readLine();
}
else
{
//System.out.println("Stem is present but not the same docid so inserting new docid");
tf = 1;
ls.add(new CountPerDocument(docid,tf)); //set did to the current document number and tf to 1
line = rd.readLine();
}
}
}
}
rd.close();
System.out.println("The Number of Documents in the file is:"+ docCount);
//Write to an output file
String l = rd1.readLine();
File f = new File("dictionary.txt");
if (f.createNewFile())
{
System.out.println("File created: " + f.getName());
}
else
{
System.out.println("File already exists.");
Path path = Paths.get("dictionary.txt");
Files.deleteIfExists(path);
System.out.println("Deleted Existing File:: Creating New File");
f.createNewFile();
}
FileWriter fw = new FileWriter("dictionary.txt");
fw.write("The Total Number of Stems: " + mp.size() +"\n");
fw.close();
System.out.println("The Stem is todai:" + mp.get("todai"));
}catch(IOException e)
{
e.printStackTrace();
}
}
}

You didn't define the function String toString() in your class CountPerDocument. So, when you try to print a CountPerDocument variable, the default printed value is CountPerDocument#hashcode.
To decide how to represent a CountPerDocument variable in your code, add in your class the next function:
#Override
public String toString() {
return "<" + this.getDocumentId() + ", " + this.getCount() + ">";
}

Try to override toString method in CountPerDocument. Something like this:
public class CountPerDocument
{
private final String documentId;
private final int count;
CountPerDocument(String documentId, int count)
{
this.documentId = documentId;
this.count = count;
}
public String getDocumentId()
{
return this.documentId;
}
public int getCount()
{
return this.count;
}
#Override
public String toString() {
return documentId + "-" + count;
}
}

Java ArrayIndexOutOfBoundsException keeps appearing while trying to find most occuring word in file

I am currently building a program which reads a file and prints the most occurring words and how many times each word appears like so:
package WordLookUp;
import java.util.*;
import java.io.*;
import java.lang.*;
public class WordLookUp {
private String[] mostWords;
private Scanner reader;
private String line;
private FileReader fr;
private BufferedReader br;
private List<String> original;
private String token = " ";
public WordLookUp(String file) throws Exception {
this.reader = new Scanner(new File(file));
this.original = new ArrayList<String>();
while (this.reader.hasNext()) { //reads file and stores it in string
this.token = this.reader.next();
this.original.add(token); //adds it to my arrayList
}
}
public void findMostOccurringWords() {
List<String> mostOccur = new ArrayList<String>();
List<Integer> count = new ArrayList<Integer>();
int counter = 0;
this.mostWords = this.token.split(" "); //storing read lines in mostWords arrayList
try {
for (int i = 0; i < original.size(); i++) {
if (this.original.equals(this.mostWords[i])) {
counter++; //increase counter
mostOccur.add(this.mostWords[i]);
count.add(counter);
}
}
for (int i = 0; i < mostOccur.size(); i++) {
System.out.println("Word: " + mostOccur.get(i) + " count: " + count.get(i));
}
} catch (ArrayIndexOutOfBoundsException ae) {
System.out.println("Illegal index");
}
}
}
package WordLookUp;
import java.util.*;
import java.io.*;
public class Main {
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
WordLookUp wL = new WordLookUp("tiny1.txt");
wL.findMostOccurringWords();
}
}
So when I keep running my file, it throws the exception I gave it: "Illegal index". I think it is my findMostOccuringWords method. To me the logic feels correct, but I don't know why it is throwing an ArrayIndexOutOfBoundsException. I tried playing with the for loops and tried to go from int i = 0 to i < mostOccur.size() - 1 but that is not working either. Is my logic wrong ? I am not allowed to use a hashmap and our professor gave us a hint that we can do this assignment easily with arrays and ArrayLists (no other built in functions, but regexes is highly recommended for use as well for the rest of the assignment). I put a private FileReader and BufferedReader up there as I am trying to see if they would work better or not. Thanks for the advice!

Can you try to use the following codes? I think your current algorithm is wrong.
public class WordLookUp {
private List<String> original;
private List<String> mostOccur = new ArrayList<String>();
private List<Integer> count = new ArrayList<Integer>();
public WordLookUp(String file) throws Exception {
try(Scanner reader = new Scanner(new File(file));){
this.original = new ArrayList<String>();
String token = " ";
while (reader.hasNext()) { //reads file and stores it in string
token = reader.next();
this.original.add(token); //adds it to my arrayList
findMostOccurringWords(token);
}
}
}
public void findMostOccurringWords(String token) {
int counter = 0;
String[] mostWords = token.split(" "); //storing read lines in mostWords arrayList
try {
for (int i = 0; i < mostWords.length; i++) {
for(int j = 0; j < this.original.size(); j++) {
if (original.get(j).equals(mostWords[i])) {
counter++; //increase counter
}
}
if (mostOccur.contains(mostWords[i])) {
count.set(mostOccur.indexOf(mostWords[i]),counter);
}else {
mostOccur.add(mostWords[i]);
count.add(counter);
}
}
} catch (ArrayIndexOutOfBoundsException ae) {
System.out.println("Illegal index");
}
}
public void count() {
for (int i = 0; i < mostOccur.size(); i++) {
System.out.println("Word: " + mostOccur.get(i) + " count: " + count.get(i));
}
}
}
public class Main {
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
WordLookUp wL = new WordLookUp("F:\\gc.log");
wL.count();
}
}

Here in this loop:
for (int i = 0; i < mostOccur.size(); i++) {
System.out.println("Word: " + mostOccur.get(i) + " count: " + count.get(i));
}
You check to make sure that i is within bounds for mostOccur but not count. I would add a condition to check to make sure it is in bounds. Such as:
for (int i = 0; i < mostOccur.size() && i < count.size(); i++) {
System.out.println("Word: " + mostOccur.get(i) + " count: " + count.get(i));
}

Needing to update my outfile after items are changed

For my program I have it set up that I can edit and change values of items that are stored within my outfile in the program itself. However the numbers that they change to only update in the program itself. For example if I sell 10 ketchups than in my program i would have 0 but my outfile would still say I have 10. I need my outfile to update with my program. I came up with an override method but all it does currently is adds content on a new line within the outfile, I am not sure how I would go about actually updating any information stored on the outfile any help would be great.
Code:
public class Driver {
public static ArrayList<Item> list = new ArrayList<Item>();
static double myBalance = 100;
/*static ArrayList<Item> list = new ArrayList<Item>();*/
/**
* #param args
* #throws IOException
* #throws FileNotFoundException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
ArrayList<String> inventoryList = new ArrayList<String>();
BufferedReader readIn = null;
try {
readIn = new BufferedReader(new FileReader("inventory.out"));
readIn.lines().forEach(inventoryList::add);
} catch (Exception e) {
e.printStackTrace();
} finally {
if(readIn != null) {
readIn.close();
}
}
for (int i = 0; i < 4; i++) {
String item = inventoryList.get(i);// input String like the one you would read from a file
String delims = "[,]"; //delimiter - a comma is used to separate your tokens (name, qty,cost, price)
String[] tokens = item.split(delims); // split it into tokens and place in a 2D array.
String name = tokens[0]; System.out.println(name);
double cost = Double.parseDouble(tokens[1]);System.out.println(cost);
int qty = Integer.parseInt(tokens[2]);System.out.println(qty);
double price = Double.parseDouble(tokens[3]);System.out.println(price);
list.add(new Item(name, cost, qty, price));
}
sell("Mayo", 10);
buy("Ketchup", 20);
remove_item("Ketchup");
add_item("Tums", 20, 10, 5);
overwrite("New line");
PrintAll();
}
// Method to sell items from the arraylist
public static void sell(String itemName, int amount) {
for (int i = 0; i < list.size(); i++) {
if (list.get(i).getName().equals(itemName)) {
int number = i;
list.get(number).qty -= amount;
myBalance += list.get(number).getPrice() * amount;
}
}
}
// Method to buy more of the items in our array list
public static void buy(String itemName, int amount) {
for (int i = 0; i < list.size(); i++) {
if (list.get(i).getName().equals(itemName)) {
int number = i;
list.get(number).qty += amount;
myBalance -= list.get(number).getPrice() * amount;
}
}
}
// Method to remove an item completely from our inventory
public static void remove_item(String itemName) {
for (int i = 0; i < list.size(); i++) {
if (list.get(i).getName().equals(itemName)) {
int number = i;
list.remove(number);
}
}
}
public static void add_item(String itemName, double itemCost, int qty, double itemPrice) {
list.add(new Item(itemName, itemCost, qty, itemPrice));
}
public static void PrintAll() {
String output = "";
for(Item i : list) {
int everything = i.getQty();
String everything2 = i.getName().toString();
output += everything +" "+ everything2 + "\n";
}
JOptionPane.showMessageDialog(null, "Your current balance is: $" + myBalance + "\n" + "Current stock:" + "\n" + output);
}
public static void overwrite(String update) {
try
{
String filename= "inventory.out";
FileWriter fw = new FileWriter(filename,true); //the true will append the new data
fw.write("\n"+"add a line");//appends the string to the file
fw.close();
}
catch(IOException ioe)
{
System.err.println("IOException: " + ioe.getMessage());
}
}
}
Outfile contents:
Ketchup,1,10,2
Mayo,2,20,3
Bleach,3,30,4
Lysol,4,40,5

If you know the name of your outfile then clear the outfile as and when you need it updated and then write to it again. You can use the below code to erase content of a file.
PrintWriter writer = new PrintWriter(file);
writer.print("");
writer.close();

java.lang.NullPointerException output term frequency-inverse document frequency (tfidf) matrix java

I have this code that outputs the tfidf for all words in each file in the directory. I'm trying to transfer this to a matrix where each row correspond to each file in the directory and each column to all words in the files and I have some difficulty in doing it and i need some help.
what i get is a java.lang.NullPointerException when i try to output the matrix.
The values start to appear but for some reason they stop and the null error generates.
this is the code
public class TestTF_IDF {
public static void main(String[] args) throws UnsupportedEncodingException, FileNotFoundException{
//Test code for TfIdf
TfIdf tf = new TfIdf("E:/Thesis/ThesisWork/data1");
//Contains file name being processed
//String file;
tf.buildAllDocuments();
int numDocuments = tf.documents.size();
Double matrix[][] = new Double[numDocuments][];
int documentIndex = 0;
for (String file : tf.documents.keySet())
{
// System.out.println("File \t" + file);
Map<String, Double[]> myMap =
tf.documents.get(file).getF_TF_TFIDF();
int numWords = myMap.size();
matrix[documentIndex] = new Double[numWords];
int wordIndex = 0;
for (String key : myMap.keySet())
{
Double[] values = myMap.get(key);
matrix[documentIndex][wordIndex] = values[2];
wordIndex++;
//System.out.print("file="+ file+ "term=" +key + values[2]+" ");
}
documentIndex++;
for(int i=0; i<numDocuments;i++){
for(int j=0; j<numWords;j++){
System.out.print("file="+ file+ matrix[i][j]+ " "); //error here
}
}
}
}//public static void main(String[] args)
}//public class TestTF_IDF
Any ideas. Thanks

Although the question is remarkably unclear, here is what I tried to guess based on the question and the comments.
import java.util.Map;
public class TestTF_IDF
{
public static void main(String[] args) throws Exception
{
TfIdf tf = new TfIdf("E:/Thesis/ThesisWork/data1");
tf.buildAllDocuments();
int numDocuments = tf.documents.size();
Double[] matrix[][] = new Double[numDocuments][][];
int documentIndex = 0;
for (String file : tf.documents.keySet())
{
System.out.println("File \t" + file);
Map<String, Double[]> myMap =
tf.documents.get(file).getF_TF_TFIDF();
int numWords = myMap.size();
matrix[documentIndex] = new Double[numWords][];
int wordIndex = 0;
for (String key : myMap.keySet())
{
Double[] values = myMap.get(key);
matrix[documentIndex][wordIndex] = values;
wordIndex++;
}
documentIndex++;
}
}
}
class Document
{
public Map<String, Double[]> getF_TF_TFIDF()
{
return null;
}
}
class TfIdf
{
public Map<String, Document> documents;
TfIdf(String s)
{
}
public void buildAllDocuments()
{
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Code to read the dataset - java

Since the elements of the list are Strings and since String implements Comparable, sorting a list is as simple as: Collections.sort(theList); Note however that it will sort the list in place. If you don't want that, make a copy of the list and sort that copy.

Related

Remove all java keywords from a file

TreeMap with (String,ArrayList<String,Int>)

Java ArrayIndexOutOfBoundsException keeps appearing while trying to find most occuring word in file

Needing to update my outfile after items are changed

java.lang.NullPointerException output term frequency-inverse document frequency (tfidf) matrix java

Categories

Resources