Show duplicates in a String Array from csv File (Java) - java

My problem is that I created an array from a csv file and I now have to output any values with duplicates.
The file has a layout of 5x9952. It consists of the data:
id,birthday,name,sex, first name
I'd now like the program to show me in each column (e.g. name) which duplicates there are. Like if there are two people which the same name. But whatever I try from what I found on the Internet only shows me the duplicates of rows (like if name and first name are the same).
Here's what I got so far:
package javacvs;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
/**
*
* #author Tobias
*/
public class main {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
String csvFile = "/Users/Tobias/Desktop/PatDaten/123.csv";
String line = "";
String cvsSplitBy = ",";
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
// use comma as separator
String[] patDaten = line.split(cvsSplitBy);
for (int i = 0; i < patDaten.length-1; i++)
{
for (int j = i+1; j < patDaten.length; j++)
{
if( (patDaten[i].equals(patDaten[j])) && (i != j) )
{
System.out.println("Duplicate Element is : "+patDaten[j]);
}
}
}
}
}catch (IOException e) {
e.printStackTrace();
}
}
}
(I changed the name of the csv as it contains confidential data)

The real thing here: stop thinking "low level". Good OOP is about creating helpful abstractions.
In other words: your first stop should be to create a meaningful class definition that represents the content of one row, lets call it the Person class for now. And then you separate your further concerns:
you create one class/method that does nothing else but reading that CSV file - and creating one Person object per row
you create a meaningful data structure that tells you about duplicates
The later could (for example) some kind of reverse indexing. Meaning: you have a Map<String, List<Person>>. And after you have read all your Person objects (maybe in a simple list), you can do this:
Map<String, List<Person>> personsByName = new HashMap<>();
for (Person p : persons) {
List<Person> personsForName = personsByName.get(p.getName());
if (personsByName == null) {
personsForName = new ArrayList<>();
personsByName.put(p.getName(), personsForName);
}
personsForName.add(p);
}
After that loop that map contains all names used in your table - and for each name you have a list of the corresponding persons.

You are iterating upon the rows instead of iterating upon the column. What you need to do is to have the same cycle but upon the column.
What you can do is to acumulate the names in a separate array and than iterate it. I am sure you know what index is the column you want to compare. So you will need one cycle extra to accumulate the column you want to check for duplications.

It's a bit unclear what you want presented, the whole record, or only that there are duplicate names.
For the name only:
String csvFile = "test.csv";
List<String> readAllLines = Files.readAllLines(Paths.get(csvFile));
Set<String> names = new HashSet<>();
readAllLines.stream().map(s -> s.split(",")[2]).forEach(name -> {
if (!names.add(name)) {
System.out.println("Duplicate name: " + name);
}
});
For the whole record:
String csvFile = "test.csv";
List<String> readAllLines = Files.readAllLines(Paths.get(csvFile));
Set<String> names = new HashSet<>();
readAllLines.stream().forEach(record -> {
String name = record.split(",")[2];
if (!names.add(name)) {
System.out.println("Duplicate name: " + name + " with record " + record);
}
});

Your problem is the nesting of your loops. What you do is, that you read one line, split it up and then you compare the fields of this one row with each other. You do not even compare one line with other lines!
So first you need an array for all lines so you can compare these lines. As GhostCat recommended in his answer you should use your own class Person which has the five fields as attributes. But you could use a second array, so you can work with the indexes as Alexander Petrov said in his answer. In the latter case, you get a two-dimensional array:
String[][] patDaten;
After that you read all lines of your csv-file and for each line you create a new Person or a new inner array.
After reading the entire file, you compare the fields as you want. Here you use your double loop. So you compare patDaten[i].getName() with patDaten[j].getName() or with the array patDaten[i][1] with patDaten[j][1].

Related

Modifying an individual element in an array inside of an ArrayList

I have to write a piece of code for a class that counts the occurrences of characters within an input file and then sorts them by that, and I chose to do that by creating an ArrayList where each object[] has two elements, the character and the number of occurrences.
I was trying to increment the integer representing the number of occurrences and I just couldn't get that to work
My current attempt looks like this:
for(int i=0;i<=text.length();i++) {
if(freqlist.contains(text.charAt(i))) {
freqlist.indexOf(text.charAt(i))[1]=freqlist.get(freqlist.indexOf(text.charAt(i)))[1]+1;
}
}
text is just a string containing all of the input file
freqlist is declared earlier as
List<Object[]> freqlist=new ArrayList<Object[]>();
So, I was wondering how one could increment or modify an element of an array that is inside of an arraylist
In General there are 3 mistakes in your program which prevent it from working. It cannot work because the for loop has i<=text.length() and it should be i < text.length(), otherwise you will have exception. Second mistake is that you use freqlist.contains(...) where you assume both elements of object arrays are the same, or in other words the array is the equal, which is wrong assumption. Third mistake is using freqlist.indexOf(...) which relies on array equality again. I made the example working although this data structure List<Object[]> is inefficient for the task. It is best to use Map<Character,Integer>.
Here it is:
import java.util.ArrayList;
import java.util.List;
class Scratch {
public static void main(String[] args) {
String text = "abcdacd";
List<Object[]> freqlist= new ArrayList<>();
for(int i=0;i < text.length();i++) {
Object [] objects = find(freqlist, text.charAt(i));
if(objects != null) {
objects[1] = (Integer)objects[1] +1;
} else {
freqlist.add(new Object[]{text.charAt(i), 1});
}
}
for (Object[] objects : freqlist) {
System.out.println(String.format(" %s => %d", objects[0], objects[1]));
}
}
private static Object[] find(List<Object[]> freqlist, Character charAt) {
for (Object[] objects : freqlist) {
if (charAt.equals(objects[0])) {
return objects;
}
}
return null;
}
}
The way I would do this is first parse the file and convert it to an array of characters. This would then be sent to the charCounter() method which would count the number of times a letter occurs in the file.
/**
* Calculate the number of times a character is present in a character array
*
* #param myChars An array of characters from an input file, this should be parsed and formatted properly
* before sending to method
* #return A hashmap of all characters with their number of occurrences; if a
* letter is not in myChars it is not added to the HashMap
*/
public HashMap<Character, Integer> charCounter(char[] myChars) {
HashMap<Character, Integer> myCharCount = new HashMap<>();
if (myChars.length == 0) System.exit(1);
for (char c : myChars) {
if (myCharCount.containsKey(c)) {
//get the current number for the letter
int currentNum = myCharCount.get(c);
//Place the new number plus one to the HashMap
myCharCount.put(c, (currentNum + 1));
} else {
//Place the character in the HashMap with 1 occurrence
myCharCount.put(c, 1);
}
}
return myCharCount;
}
You could use some Stream magic, if you are using Java 8 for the grouping:
Map<String, Long> map = dummyString.chars() // Turn the String to an IntStream
.boxed() // Turn int to Integer to use Collectors.groupingBy
.collect(Collectors.groupingBy(
Character::toString, // Use the character as a key for the map
Collectors.counting())); // Count the occurrences
Now you could sort the result.

I'm trying to iterate through two arrays in Java, while also checking to see if the values are equal

I am trying to iterate through many arrays, two at a time. They contain upwards of ten-thousand entries each, including the source. In which I am trying to assign each word to either a noun, verb, adjective, or adverb.
I can't seem to figure a way to compare two arrays without writing an if else statement thousands of times.
I searched on Google and SO for similar issues. I couldn't find anything to move me forward.
package wcs;
import dictionaryReader.dicReader;
import sourceReader.sourceReader;
public class Assigner {
private static String source[], snArray[], svArray[], sadvArray[], sadjArray[];
private static String nArray[], vArray[], advArray[], adjArray[];
private static boolean finished = false;
public static void sourceAssign() {
sourceReader srcRead = new sourceReader();
//dicReader dic = new dicReader();
String[] nArray = dicReader.getnArray(), vArray = dicReader.getvArray(), advArray = dicReader.getAdvArray(),
adjArray = dicReader.getAdjArray();
String source[] = srcRead.getSource();
// Noun Store
for (int i = 0; i < source.length; i++) {
if (source[i] == dicReader.getnArray()[i]) {
source[i] = dicReader.getnArray()[i];
}else{
}
}
// Verb Store
// Adverb Store
// Adjective Store
}
}
Basically this is a simpler way to get a list of items that are in both Lists
// construct a list of item for first list
List<String> firstList = new ArrayList<>(Arrays.asList(new String[0])); // add items
//this function will only keep items in `firstList` if the value is in both lists
firstList.retainAll(Arrays.asList(new String[0]));
// iterate to do your work
for(String val:firstList) {
}

save multidimensional arrayList

I have a Multidimensional ArrayList, composed of multiple rows of different length. I would like to save the ArrayList as a single tab-delimited file with multiple columns, in which each column corresponds to a specific row of the ArrayList. I have tried to come up with a solution, but the only thing I could think of is to save each ArrayList row in separate files.
The ArrayList is called "array" and it contains several rows of different length. Here is the code:
try {
PrintWriter output = new PrintWriter(new FileWriter("try"));
output.print(array.get(0));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
In this case, I can save the first row of the ArrayList as a single file. The other solution I have been thinking about is to loop through the rows, to get as many separate files as the row numbers. However, I would like to get a single file with multiple tab-delimited columns.
Ecxuse me, my old answer was based on a wrong interpretation of your needs.
I made you an example code:
package com.test;
import java.util.ArrayList;
import java.util.List;
public class MulitArrayList {
public static void main(String[] args) {
List<String> innerArrayList1;
List<String> innerArrayList2;
List<String> innerArrayList3;
List<List> outerArrayList;
innerArrayList1 = new ArrayList<String>();
innerArrayList2 = new ArrayList<String>();
innerArrayList3 = new ArrayList<String>();
//comic heros
innerArrayList1.add("superman");
innerArrayList1.add("batman");
innerArrayList1.add("catwoman");
innerArrayList1.add("spiderman");
//historical persons
innerArrayList2.add("Stalin");
innerArrayList2.add("Gandy");
innerArrayList2.add("Lincoln");
innerArrayList2.add("Churchill");
//fast food
innerArrayList3.add("mc donalds");
innerArrayList3.add("burger king");
innerArrayList3.add("subway");
innerArrayList3.add("KFC");
//fill outerArrayList
outerArrayList = new ArrayList<List>();
outerArrayList.add(innerArrayList1);
outerArrayList.add(innerArrayList2);
outerArrayList.add(innerArrayList3);
//print
for(List<String> innerList : outerArrayList) {
for(String s : innerList) {
System.out.print(s + "\t");
}
System.out.println("\n");
}
}
}
The result is:
As you can see, the results are printed with tabs, but the space between them differentiate. You can write it the same way into your file, but this is not the best way to save data. I hope I could help you bit, greetings.

Writing a method with ArrayList of strings as parameters

I am trying to write a method that takes an ArrayList of Strings as a parameter and that places a string of four asterisks in front of every string of length 4.
However, in my code, I am getting an error in the way I constructed my method.
Here is my mark length class
import java.util.ArrayList;
public class Marklength {
void marklength4(ArrayList <String> themarklength){
for(String n : themarklength){
if(n.length() ==4){
themarklength.add("****");
}
}
System.out.println(themarklength);
}
}
And the following is my main class:
import java.util.ArrayList;
public class MarklengthTestDrive {
public static void main(String[] args){
ArrayList <String> words = new ArrayList<String>();
words.add("Kane");
words.add("Cane");
words.add("Fame");
words.add("Dame");
words.add("Lame");
words.add("Same");
Marklength ish = new Marklength();
ish.marklength4(words);
}
}
Essentially in this case, it should run so it adds an arraylist with a string of "****" placed before every previous element of the array list because the lengths of the strings are all 4.
BTW
This consists of adding another element
I am not sure where I went wrong. Possibly in my for loop?
I got the following error:
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at Marklength.marklength4(Marklength.java:7)
at MarklengthTestDrive.main(MarklengthTestDrive.java:18)
Thank you very much. Help is appreciated.
Let's think about this piece of code, and pretend like you don't get that exception:
import java.util.ArrayList;
public class Marklength {
void marklength4(ArrayList <String> themarklength){
for(String n : themarklength){
if(n.length() ==4){
themarklength.add("****");
}
}
System.out.println(themarklength);
}
}
Ok, so what happens if your list just contains item.
You hit the line if(n.length() ==4){, which is true because you are looking at item, so you go execute its block.
Next you hit the line themarklength.add("****");. Your list now has the element **** at the end of it.
The loop continues, and you get the next item in the list, which happens to be the one you just added, ****.
The next line you hit is if(n.length() ==4){. This is true, so you execute its block.
You go to the line themarklength.add("****");, and add **** to the end of the list.
Do we see a bad pattern here? Yes, yes we do.
The Java runtime environment also knows that this is bad, which is why it prevents something called Concurrent Modification. In your case, this means you cannot modify a list while you are iterating over it, which is what that for loop does.
My best guess as to what you are trying to do is something like this:
import java.util.ArrayList;
public class Marklength {
ArrayList<String> marklength4(ArrayList <String> themarklength){
ArrayList<String> markedStrings = new ArrayList<String>(themarklength.size());
for(String n : themarklength){
if(n.length() ==4){
markedStrings.add("****");
}
markedStrings.add(n);
}
System.out.println(themarklength);
return markedStrings;
}
}
And then:
import java.util.ArrayList;
public class MarklengthTestDrive {
public static void main(String[] args){
ArrayList <String> words = new ArrayList<String>();
words.add("Kane");
words.add("Cane");
words.add("Fame");
words.add("Dame");
words.add("Lame");
words.add("Same");
Marklength ish = new Marklength();
words = ish.marklength4(words);
}
}
This...
if(n.length() ==4){
themarklength.add("****");
}
Is simply trying to add "****" to the end of the list. This fails because the Iterator used by the for-each loop won't allow changes to occur to the underlying List while it's been iterated.
You could create a copy of the List first...
List<String> values = new ArrayList<String>(themarklength);
Or convert it to an array of String
String[] values = themarklength.toArray(new String[themarklength.size()]);
And uses these as you iteration points...
for (String value : values) {
Next, you need to be able to insert a new element into the ArrayList at a specific point. To do this, you will need to know the original index of the value you are working with...
if (value.length() == 4) {
int index = themarklength.indexOf(value);
And then add a new value at the required location...
themarklength.add(index, "****");
This will add the "****" at the index point, pushing all the other entries down
Updated
As has, correctly, been pointed out to me, the use of themarklength.indexOf(value) won't take into account the use case where the themarklength list contains two elements of the same value, which would return the wrong index.
I also wasn't focusing on performance as a major requirement for the providing a possible solution.
Updated...
As pointed out by JohnGarnder and AnthonyAccioly, you could use for-loop instead of a for-each which would allow you to dispense with the themarklength.indexOf(value)
This will remove the risk of duplicate values messing up the index location and improve the overall performance, as you don't need to create a second iterator...
// This assumes you're using the ArrayList as the copy...
for (int index = 0; index < themarklength.size(); index++) {
String value = themarklength.get(index);
if (value.length() == 4) {
themarklength.add(index, "****");
index++;
But which you use is up to you...
The problem is that in your method, you didn't modify each string in the arraylist, but only adds 4 stars to the list. So the correct way to do this is, you need to modify each element of the arraylist and replace the old string with the new one:
void marklength4(ArrayList<String> themarklength){
int index = 0;
for(String n : themarklength){
if(n.length() ==4){
n = "****" + n;
}
themarklength.set(index++, n);
}
System.out.println(themarklength);
}
If this is not what you want but you want to add a new string "**" before each element in the arraylist, then you can use listIterator method in the ArrayList to add new additional element before EACH string if the length is 4.
ListIterator<String> it = themarklength.listIterator();
while(it.hasNext()) {
String name = it.next();
if(name.length() == 4) {
it.previous();
it.add("****");
it.next();
}
}
The difference is: ListIterator allows you to modify the list when iterating through it and also allows you to go backward in the list.
I would use a ListIterator instead of a for each, listiterator.add likely do exactly what you want.
public void marklength4(List<String> themarklength){
final ListIterator<String> lit =
themarklength.listIterator(themarklength.size());
boolean shouldInsert = false;
while(lit.hasPrevious()) {
if (shouldInsert) {
lit.add("****");
lit.previous();
shouldInsert = false;
}
final String n = lit.previous();
shouldInsert = (n.length() == 4);
}
if (shouldInsert) {
lit.add("****");
}
}
Working example
Oh I remember this lovely error from the good old days. The problem is that your ArrayList isn't completely populated by the time the array element is to be accessed. Think of it, you create the object and then immediately start looping it. The object hence, has to populate itself with the values as the loop is going to be running.
The simple way to solve this is to pre-populate your ArrayList.
public class MarklengthTestDrive {
public static void main(String[] args){
ArrayList <String> words = new ArrayList<String>() {{
words.add("Kane");
words.add("Cane");
words.add("Fame");
words.add("Dame");
words.add("Lame");
words.add("Same");
}};
}
}
Do tell me if that fixes it. You can also use a static initializer.
make temporary arraylist, modify this list and copy its content at the end to the original list
import java.util.ArrayList;
public class MarkLength {
void marklength4(ArrayList <String> themarklength){
ArrayList<String> temp = new ArrayList<String>();
for(String n : themarklength){
if(n.length() ==4){
temp.add(n);
temp.add("****");
}
}
themarklength.clear();
themarklength.addAll(temp);
System.out.println(themarklength);
}
}

How to find all error messages and display them in descending order

Hi I am trying to sort input file from user for error messages in descending orders of occurrence.
input_file.txt
23545 debug code_to_debug
43535 error check your code
34243 error check values
32442 run program execute
24525 error check your code
I want to get output as
error check your code
error check values
My code currently:
import java.io.*;
import java.util.*;
public class Sort {
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader("fileToRead"));
Map<String, String> map=new TreeMap<String, String>();
String line="";
while((line=reader.readLine())!=null){
map.put(getField(line),line);
}
reader.close();
FileWriter writer = new FileWriter("fileToWrite");
for(String val : map.values()){
writer.write(val);
writer.write('\n');
}
writer.close();
}
private static String getField(String line) {
return line.split(" ")[0];//extract value you want to sort on
}
}
Change your mapping from <String, String> to <Integer, String>. Then, use a custom Comparator to compare the Integers from least to greatest.
It appears that your error messages are ranked by an integer value from most severe to least severe. This should allow you to use that fact.
Rather than having a Map<String,String> where the key is the integer value you could have the key as the error message and then the value could hold a list of the integer values so when reading the file it would become something like and also implement a comparator in the map to order them:
Map<String, String> map = new TreeMap<String, List<String>>(new Comparator<String>()
{
#Override
public int compare(String s1, String s2)
{
//Implement a compare to get the order of string you want
}
}
);
String line = "";
while((line = reader.readLine()) != null)
{
String lineStr = line.split(" ")[1]; // get the message
List<String> vals = map.get(lineStr) // get the existing list
if( vals == null)
vals = new ArrayList<String>(); // create a new list if there isn't one
vals.add(getFeild(line)); // add the int value to the list
map.put(lineStr,vals); // add to map
}
You could then sort the list into numeric order if you wanted. Also this would then require a bit more work to print out the map - but this depends on the format
If all you want to do is reorder the input so all the error messages appear at the top, a very simple way to do it is like the following:
static String[] errorsToTop(String[] input) {
String[] output = new String[input.length];
int i = 0;
for(String line : input) {
if(line.contains("error"))
output[i++] = line;
}
for(String line : input) {
if(!line.contains("error"))
output[i++] = line;
}
return output;
}
That just copies the array first starting with all errors messages, then will all non-error messages.
It's also possible to make those two loops a nested loop though the logic is less obvious.
static String[] errorsToTop(String[] input) {
String[] output = new String[input.length];
int i = 0;
boolean not = false;
do {
for(String line : input) {
if(line.contains("error") ^ not)
output[i++] = line;
}
} while(not = !not);
return output;
}
It's unclear to me whether the numbers appear in your input text file or not. If they don't, you can use startsWith instead of contains:
if(line.startsWith("error"))
You could also use matches with a regex like:
if(line.matches("^\\d+ error[\\s\\S]*"))
which says "starts with any integer followed by a space followed by error followed by anything or nothing".
Since no answer has been marked I'll add 2 cents.
This code below works for exactly what you posted (and maybe nothing else), it assumes that errors have higher numbers than non errors, and that you are grabbing top N of lines based on a time slice or something.
import java.util.NavigableMap;
import java.util.TreeMap;
public class SortDesc {
public static void main(String[] args) {
NavigableMap<Integer, String> descendingMap = new TreeMap<Integer, String>().descendingMap();
descendingMap.put(23545, "debug code_to_debug");
descendingMap.put(43535, "error check your code");
descendingMap.put(34243, "error check values");
descendingMap.put(32442, "run program execute");
descendingMap.put(24525, "error check your code");
System.out.println(descendingMap);
}
}
results look like this
{43535=error check your code, 34243=error check values, 32442=run program execute, 24525=error check your code, 23545=debug code_to_debug}

Categories

Resources