Reading Unique Values - java

I wrote a piece of code that reads values from columns in a text file. To output the number of values, I used 'length' which works fine..but I need to count only the number of unique values.
public class REading_Two_Files {
public static void main(String[] args) {
try {
readFile(new File("C:\\Users\\teiteie\\Desktop\\RECSYS\\yoochoose-test.csv"), 0,( "C:\\Users\\teiteie\\Desktop\\RECSYS\\yoochoose-buys.csv"), 3);
//readFile(new File(File1,0, File2,3);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
//// 0 - will print column from file1
//3 - will print column from file 2
private static void readFile(File fin1,int whichcolumnFirstFile,String string,int whichcolumnSecondFile) throws IOException {
//private static void readFile(File fin1,int whichcolumnFirstFile,String string,int whichcolumnSecondFile) throws IOException
// code for this method.
//open the two files.
int noSessions = 0;
int noItems = 0;
// HashSet<String> uniqueLength = new HashSet<String>();
FileInputStream fis = new FileInputStream(fin1); //first file
FileInputStream sec = new FileInputStream(string); // second file
//Construct BufferedReader from InputStreamReader
BufferedReader br1= new BufferedReader(new InputStreamReader(fis));
BufferedReader br2= new BufferedReader(new InputStreamReader(sec));
String lineFirst = null, first_file[];
String lineSec = null, second_file [];
while ((lineFirst = br1.readLine()) != null && (lineSec = br2.readLine()) != null) {
first_file= lineFirst.split(",");
second_file = lineSec.split(",");
//int size = data[].size();
System.out.println(first_file[0]+" , "+second_file[0]);
if(first_file.length != 0){
noSessions++;
}
if(second_file.length != 0) {
noItems ++;
}
}
br1.close();
br2.close();
System.out.println("no of sessions "+noSessions+"\nno of items "+noItems );
}
}

To count only unique values we usually use a Set as they are specified as only containing unique values.
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.
Essentially - put all of your values in a Set (generally a HashSet is the most efficient but if you want concurrency there are better options) and then take the Set.size() as the number of unique values you put in.

just to give you some inspiration:
Map<String,Integer> lAllWordsWithCount = new HashMap<String, Integer>();
String[] lAllMyStringToCount = {"Hello", "I", "am", "what", "I", "am"};
for (String lMyString : lAllMyStringToCount) {
int lCount = 1;
if (lAllWordsWithCount.containsKey(lMyString)){
lCount = lAllWordsWithCount.get(lMyString) +1;
}
lAllWordsWithCount.put(lMyString, lCount);
}
for(String lStringKey : lAllWordsWithCount.keySet()){
System.out.println(lStringKey+" count="+lAllWordsWithCount.get(lStringKey));
}
will results in:
what count=1
am count=2
I count=2
Hello count=1

Related

Reading CSV file into an array in Java [incompatible types: Integer cannot be converted to int[].]

I have a CSV file that looks like:
and I have read this into an ArrayList and would like to index through the ArrayList and print the value (these values vary in types, eg. String, double and int). The code I have tried is:
public static void main(String[] args) throws IOException
{
System.out.println("Data from CSV file to be analysed:"+"\n");
String file = "jrc-covid-19-all-days-of-world_ASSIGNMENT-FIXED.csv";
ArrayList<Integer> lines = new ArrayList<Integer>();
String line = null;
try(BufferedReader bufferedReader = new BufferedReader(new FileReader(file)))
{
int i = 0;
while(((line = bufferedReader.readLine()) != null) && i<27)
{
String[] values = line.split(",");
i++;
System.out.println(Arrays.toString(values));
}
}
catch (IOException e)
{
e.printStackTrace();
}
int thirdCountry[] = lines.get(3);
int cp3 = thirdCountry[6];
System.out.println(cp3); //should return the value 107122
}
But I got the error message: incompatible types: Integer cannot be converted to int[].
1. Dealing with this error: Integer cannot be converted to int[]
We can only assign int[] to an int[], lines is a List of Integer and so when we try to get any element in lines it will always return an Integer.
So int thirdCountry[] = lines.get(3); basically means int[] = some int value and that's what causes the above issue. So inorder to fix it I declared lines like this ArrayList<String[]> lines.
2. Now, why the List is of String[] type?
Since the data can be a String, int, or a double, it is safe to have a String[] which would accept all three.
3. Blind rule while getting any element from an Array or Collection
Whenever you are trying to get some element from an array or collection, always check the length or size and that index should be less than the length of the same.
public static void main(String[] args) throws IOException {
System.out.println("Data from CSV file to be analysed:"+"\n");
String file = "jrc-covid-19-all-days-of-world_ASSIGNMENT-FIXED.csv";
ArrayList<String[]> lines = new ArrayList<String[]>();
String line = null;
try(BufferedReader bufferedReader = new BufferedReader(new FileReader(file))) {
int i = 0;
while(((line = bufferedReader.readLine()) != null) && i<27) {
lines.add(line.split(","));
System.out.println(Arrays.toString(lines.get(i)));
i++;
}
}
catch (IOException e) {
e.printStackTrace();
}
if(lines.size() > 3) {
String thirdCountry[] = lines.get(3);
if(thirdCountry.length > 6) {
String cp3 = thirdCountry[6];
System.out.println(cp3);
}
}
}
4. Adding numbers
For adding we need to convert the String values to numeric values (int, long, or double). Let's say we are converting to int, so the sample values can be "123", "abc", "abc123", or "" (an empty string). So you can try like this
String s1 = "";
int total = 0;
try {
total += Integer.parseInt(s1);
} catch (NumberFormatException e) {
System.out.println("Not a number!");
}
System.out.println(total);
You can modify this for long and double as per your comfort.

How to store fields in a text file into a list or map and then compare with fields in other text file in Java?

Text file1
2348384#Test 123####3983#Data 22 ....
etc this file1 has many fields that are separted by "#" character and there are many rows in the file.
Text file 2
23,809,Test 88, Dat 33
File 2 has fields separated by comma and its has many rows.
I need to compare if fields from file match in file2. Many fields are same in the 2 files, so i need to write code to match if both are same?
Should i store all fields in file 1 into a String[]
Say i want to compare Row 4 ,field 9 , String[3]
or should i store individuals fields in a String variable..
How can i compare fields in 2 text files in java? SHould i store all lines in a file in a List or HashMap?
thanks.
Do you mean something like this? Sorry if this is messy code
public final static void main(final String[] args) {
// file1 content:
// a###b###c#d
// e###f###g#h
List<String> list1 = getListFromFile(new File("\\file1.txt"));
// file2 content:
// a,b,c,d
// e,f,g,h
final List<String> list2 = getListFromFile(new File("\\file2.txt"));
list1 = format(list1);
if (get(list1, 2, 3).equals(get(list2, 2, 4))) {
System.out.println("Equal!");
} else {
System.out.println("Not Equal!");
}
}
final static String get(final List<String> list, final int row, final int field) {
final String[] strs = list.get(row - 1).split(",");
/*
* Handle IndexOutOfBoundsException
*/
return strs[field - 1].trim();
}
final static List<String> format(final List<String> list1) {
final List<String> newList = new ArrayList<>();
for (int i = 0; i < list1.size(); i++) {
newList.add(list1.get(i).replaceAll("[#]+", ","));
}
return newList;
}
final static List<String> getListFromFile(final File file) {
List<String> listOfFile = new ArrayList<>();
try (final BufferedReader reader = new BufferedReader(new FileReader(file))) {
String line;
while ((line = reader.readLine()) != null) {
listOfFile.add(line);
}
return listOfFile;
} catch (final IOException e) {
/*
* Catch it
*/
}
return null;
}

Splitting up a text file into two files (java)

I need some help into figuring out how to split a text file into two files in java.
I have a text file in which each line contains in alphabetical order a word a space and its index, i.e.
...
stand 345
stand 498
stare 894
...
What I would like to do is to read in this file and then write two separate files. One file should contain only one instance of the word and the other the positions of the word in the document.
The file is really big and I was wondering if I can use an array or a list to store the word and index before creating the file or if there is a better way.
I don't really know how to think.
I would suggest you to create a HashMap using the word as key and a list of indexes as value, like HashMap< String, ArrayList< String >>. This way you can easily check the words you already have put in the map, and update its index list.
List<String> list = map.get(word);
if (list == null)
{
list = new ArrayList<String>();
map.put(word, list);
}
list.add(index);
After reading and storing all values, you just need to iterate through the map and write its keys in one file and values in another.
for (Map.Entry<String, Object> entry : map.entrySet()) {
String key = entry.getKey();
ArrayList value = (ArrayList) entry.getValue();
// writing code here
}
If your file is really long, then you should consider using a database. If your file is not too big then you can use a HashMap. You can also use a class like this, it requires that the file is sorted, and it writes the words in one file and the indices in another file:
public class Split {
private String fileName;
private PrintWriter fileWords;
private PrintWriter fileIndices;
public Split(String fname) {
fileName = fname;
if (initFiles()) {
writeList();
}
closeFiles();
}
private boolean initFiles() {
boolean retval = false;
try {
fileWords = new PrintWriter("words-" + fileName, "UTF-8");
fileIndices = new PrintWriter("indices-" + fileName, "UTF-8");
retval = true;
} catch (Exception e) {
System.err.println(e.getMessage());
}
return retval;
}
private void closeFiles() {
if (null != fileWords) {
fileWords.close();
}
if (null != fileIndices) {
fileIndices.close();
}
}
private void writeList() {
String lastWord = null;
List<String> wordIndices = new ArrayList<String>();
Path file = Paths.get(fileName);
Charset charset = Charset.forName("UTF-8");
try (BufferedReader reader = Files.newBufferedReader(file, charset)) {
String line = null;
while ((line = reader.readLine()) != null) {
int len = line.length();
if (len > 0) {
int ind = line.indexOf(' ');
if (ind > 0 && ind < (len - 1)) {
String word = line.substring(0, ind);
String indice = line.substring(ind + 1, len);
if (!word.equals(lastWord)) {
if (null != lastWord) {
writeToFiles(lastWord, wordIndices);
}
lastWord = word;
wordIndices = new ArrayList<String>();
wordIndices.add(indice);
} else {
wordIndices.add(indice);
}
}
}
}
if (null != lastWord) {
writeToFiles(lastWord, wordIndices);
}
} catch (IOException x) {
System.err.format("IOException: %s%n", x);
}
}
private void writeToFiles(String word, List<String> list) {
boolean first = true;
fileWords.println(word);
for (String elem : list) {
if (first) {
first = false;
}
else {
fileIndices.print(" ");
}
fileIndices.print(elem);
}
fileIndices.println();
}
}
Be careful that the file name handling is not very robust, you can use it that way:
Split split = new Split("data.txt") ;
You can use this to save the words and the indices. You just need to call addLine for each line of your file.
Map<String, Set<Integer>> entries = new LinkedHashMap<>();
public void addLine(String word, Integer index) {
Set<Integer> indicesOfWord = entries.get(word);
if (indicesOfWord == null) {
entries.put(word, indicesOfWord = new TreeSet<>());
}
indicesOfWord.add(index);
}
To store them in separate files you can use this method:
public void storeInSeparateFiles(){
for (Entry<String, Set<Integer>> entry : entries.entrySet()) {
String word = entry.getKey();
Set<Integer> indices = entry.getValue();
// TODO: Save in separate files.
}
}

Loading elements of Array into a collection

I have a text file of names( last and first). I have successfully been able to use RandomAccessFile class to load all the names into an Array of strings. What is left for me to do, is to assign each of the first names to an Array of first names and each of the last names in the list to an array of Last Names. Here is what I did but Im not getting any desired result.
public static void main(String[] args) {
String fname = "src\\workshop7\\customers.txt";
String s;
String[] Name;
String[] lastName, firstName;
String last, first;
RandomAccessFile f;
try {
f = new RandomAccessFile(fname, "r");
while ((s = f.readLine()) != null) {
Name = s.split("\\s");
System.out.println(Arrays.toString(Name));
for (int i = 0; i < Name.length; i++) {
first = Name[0];
last = Name[1];
System.out.println("last Name: " + last + "First Name: "+ first);
}
}
f.close();
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
Please help me out I seem to be confused on what kind of collection to use and how to go about it Thanks
You could create a method to read a file and put the data in an Array, but, if you are determined to use an Array you are going to have to create it at a fixed size b/c arrays are immutable in java
public class tmp {
public static void main(String[] args) throws FileNotFoundException {
//problem you have to create an array of fixed size
String[] array = new String[4];
readLines(array);
}
public static String[] readLines(String[] lines) throws FileNotFoundException {
//this counter can be printed to check the size of your array
int count = 0; // number of array elements with data
// Create a File class object linked to the name of the file to read
java.io.File myFile = new java.io.File("path/to/file.txt");
// Create a Scanner named infile to read the input stream from the file
Scanner infile = new Scanner(myFile);
/* This while loop reads lines of text into an array. it uses a Scanner class
* boolean function hasNextLine() to see if there another line in the file.
*/
while (infile.hasNextLine()) {
// read a line and put it in an array element
lines[count] = infile.nextLine();
count++; // increment the number of array elements with data
} // end while
infile.close();
return lines;
}
}
However, the preferred method is to use an ArrayList which is an object that uses dynamically resizing arrays as data is added. In other words, you don't need to worry about having different size text files.
public static void main(String[] args) throws IOException {
BufferedReader in = new BufferedReader(new FileReader("path/of/file.txt"));
String str;
ArrayList<String> list = new ArrayList<String>();
while ((str = in.readLine()) != null) {
list.add(str);
}
String[] stringArr = list.toArray(new String[0]);
A little about random access.
Classes like BufferedReader and FileInputStream use a sequential process of reading or writing data. RandomAccess, on the other hand, does exactly as the name implies, which is to permit non-sequential, random access to the contents of a file. However, Random access is typically used for other applications like reading and writing to zip files. Unless you have speed concerns I would recommend using the other classes.
public static void main(String[] args) throws FileNotFoundException {
BufferedReader in = new BufferedReader(new FileReader("src\\workshop7\\customers.txt"));
String str;
String names[];
List<String> firstName = new ArrayList();
List<String> lastName = new ArrayList();
try {
while ((str = in.readLine()) != null) {
names = str.split("\\s");
int count = 0;
do{
firstName.add(names[count]);
lastName.add(names[count+1]);
count = count + 2;
}while(count < names.length);
}
} catch (IOException e) {
e.printStackTrace();
}
// do whatever with firstName list here
System.out.println(firstName);
// do whatever with LastName list here
System.out.println(lastName);
}

how to get specifics rows of 2d array returned by reading CSV file in java

This is data.csv file, now I want rows having classtype x (any number) and store those extarcted rows into new array, so if i have n classtype then i will have n new arrays.
age sex zipcode classtype
21 m 23423 1
12 f 23133 2
23 m 32323 2
23 f 23211 1
Example: If I want to retrieve rows which have classtype 1 and store this values in a new 2d array. Then output should come like this:
array1={{21,m,23423,1},{23,f,23211,1}}
I have written the below code which gives me arrayList as output.
public class CsvParser {
public static void main(String[] args) {
try {
FileReader fr = new FileReader((args.length > 0) ? args[0] : "data.csv");
Map<String, List<String>> values = parseCsv(fr, "\\s,", true);
System.out.println(values);
} catch (IOException e) {
e.printStackTrace();
}
}
public static Map<String, List<String>> parseCsv(Reader reader, String separator, boolean hasHeader) throws IOException {
Map<String, List<String>> values = new LinkedHashMap<String, List<String>>();
List<String> columnNames = new LinkedList<String>();
BufferedReader br = null;
br = new BufferedReader(reader);
String line;
int numLines = 0;
while ((line = br.readLine()) != null) {
if (StringUtils.isNotBlank(line)) {
if (!line.startsWith("#")) {
String[] tokens = line.split(separator);
if (tokens != null) {
for (int i = 0; i < tokens.length; ++i) {
if (numLines == 0) {
columnNames.add(hasHeader ? tokens[i] : ("row_"+i));
} else {
List<String> column = values.get(columnNames.get(i));
if (column == null) {
column = new LinkedList<String>();
}
column.add(tokens[i]);
values.put(columnNames.get(i), column);
}
}
}
++numLines;
}
}
}
return values;
}
The ouput of this code is:
{age=[21,12,23,23],sex=[m,f,m,f],zipcode=[23423,23133,32323,23211],classtype=[1,2,2,1]}
I got few links, which says about grouping elements in "java collectors class", But dont whether that is useful.
http://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#groupingBy-java.util.function.Function-
Your help will be very useful.
You can try something like
String[][] allArrays = new String[50][]; //Set it to however many you need
String classType = "1";
int counter = 0;
Scanner s = new Scanner(new File(fileName));
while(s.hasNextLine()) {
String row = s.nextLine();
if (row.endsWith(classType) {
allArrays[counter++] = row.split(","); //Adds the row, with each element being split by the comma
}
}
Do not reinvent the wheel, you can use an existing library to dump the content of CSV file to a Java Collection. I usually use OpenCSV to dump the contents of CSV file to List<String[]>. It has a one liner code to read all.
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
List<String[]> lines= reader.readAll();
Then iterate the list like this to do the grouping.
Map<String, List<String[]>> values = new LinkedHashMap<String, List<String[]>>();
for(String[] line : lines){
String key = line[4];
if(values.get(key) == null){
values.put(key, new ArrayList<String[]>());
}
values.get(key).add(line);
}
System.out.println(values);

Categories

Resources