Read CSV records while ignoring trailing spaces - java

I'm iterating through a csv file and pulling out data by the headers. But I want to allow for trailing spaces and still recognize the header.
For example Header1,Header2,Header3 should still recognized as Header1 ,Header2,Header3
My code...
final Reader in = new BufferedReader(new InputStreamReader(csv));
for (CSVRecord record : CSVFormat.EXCEL.withHeader().parse(in)) {
try {
final MyObject mo = new MyObject();
mo.setHeader1(record.get("Header1"));
mo.setHeader2(record.get("Header2"));
mo.setHeader3(record.get("Header3"));
....
}catch(){
....
}
}
But this of course will only find Header1 if it matches exactly (no trailing spaces).
I couldn't find any method like record.getIgnoreSpace() or something like that

If you store the CSVParser object constructed using CSVFormat.EXCEL.withHeader().parse(in) into a variable, then you can use the method getHeaderMap() to find the indices of the desired headers. These indices can then be used instead of the header names to look up the fields (which is actually also a more efficient way to perform the lookups).
One way to do it is like this:
CSVParser parser = CSVFormat.EXCEL.withHeader().parse(in);
Map<String, Integer> headerMap = parser.getHeaderMap();
int header1Index = -1;
int header2Index = -1;
for (Map.Entry<String, Integer> entry : headerMap.entrySet()) {
String name = entry.getKey();
int index = entry.getValue();
switch (name.trim()) {
case "Header1":
header1Index = index;
break;
case "Header2":
header2Index = index;
break;
}
}
for (CSVRecord record : parser) {
...
mo.setHeader1(record.get(header1Index));
...
}

Below function can work as record.getIgnoreSpace():
getRecordTrimmedLookup(headerMap, csvRecord, "Header1");
try(Reader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile)));
CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT
.withIgnoreEmptyLines() //Not Mandatory
.withIgnoreHeaderCase() //Not Mandatory
.withFirstRecordAsHeader() //Not Mandatory
.withIgnoreSurroundingSpaces());) //Not Mandatory
{
Map<String, Integer> headerMap = parser.getHeaderMap();
System.out.println(getRecordTrimmedLookup(headerMap, csvRecord, "Header1"));
}
getRecordTrimmedLookup can be defined as:
private String getRecordTrimmedLookup(Map<String, Integer> headerMap, CSVRecord csvRecord, String columnName) {
for (Map.Entry<String, Integer> entry : headerMap.entrySet())
{
String name = entry.getKey();
int index = entry.getValue();
if(StringUtils.equalsIgnoreCase(StringUtils.trimToEmpty(name), StringUtils.trimToEmpty(columnName)))
{
return csvRecord.get(index);
}
}
return csvRecord.get(columnName);
}
Note: StringUtils is a org.apache.commons:commons-lang3 library found here.
Hope this answer helps someone!

I managed to ignore the space in the header name (in between) using the following code - by using the get(index) instead of get("header_name"). And also, stop reading the csv when blank value/row is detected:
CSVParser csvParser = CSVFormat.EXCEL.withFirstRecordAsHeader().parse(br);
for (CSVRecord record : csvParser) {
String number= record.get(0);
String date = record.get("date");
String location = record.get("Location");
String lsFile = record.get(3);
String docName = record.get(4);
if(StringUtils.isEmpty(lsFile)) {
break;
}
}

Related

Prinitng matching information from 2 files in Java

I am trying to write a program that checks two files and prints the common contents from both the files.
Example of the file 1 content would be:
James 1
Cody 2
John 3
Example of the file 2 content would be:
1 Computer Science
2 Chemistry
3 Physics
So the final output printed on the console would be:
James Computer Science
Cody Chemistry
John Physics
Here is what I have so far in my code:
public class Filereader {
public static void main(String[] args) throws Exception {
File file = new File("file.txt");
File file2 = new File("file2.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String st, st2;
while ((st = reader.readLine()) != null) {
System.out.println(st);
}
while ((st2 = reader2.readLine()) != null) {
System.out.println(st2);
}
reader.close();
reader2.close();
}
}
I am having trouble in figuring out how to match the file contents, and print only the student name and their major by matching the student id in each of the file. Thanks for all the help.
You can use the other answers and make an object to every file, like tables in databases.
public class Person{
Long id;
String name;
//getters and setters
}
public class Course{
Long id;
String name;
//getters and setters
}
Them you have more control with your columns and it is simple to use.
Further you will use an ArrayList<Person> and an ArrayList<Course> and your relation can be a variable inside your objects like courseId in Person class or something else.
if(person.getcourseId() == course.getId()){
...
}
Them if the match is the first number of the files use person.getId() == course.getId().
Ps: Do not use split(" ") in your case, because you can have other objects with two values i.e 1 Computer Science.
What you want is to organize your text file data into map, then merge their data. This will work even if your data are mixed, not in order.
public class Filereader {
public static void main(String[] args) throws Exception {
File file = new File("file.txt");
File file2 = new File("file2.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String st, st2;
Map<Integer, String> nameMap = new LinkedHashMap<>();
Map<Integer, String> majorMap = new LinkedHashMap<>();
while ((st = reader.readLine()) != null) {
System.out.println(st);
String[] parts = st.split(" "); // Here you got ["James", "1"]
String name = parts[0];
Integer id = Integer.parseInt(parts[1]);
nameMap.put(id, name);
}
while ((st2 = reader2.readLine()) != null) {
System.out.println(st2);
String[] parts = st2.split(" ");
String name = parts[1];
Integer id = Integer.parseInt(parts[0]);
majorMap.put(id, name);
}
reader.close();
reader2.close();
// Combine and print
nameMap.keySet().stream().forEach(id -> {
System.out.println(nameMap.get(id) + " " + majorMap.get(id));
})
}
}
You should read these files at the same time in sequence. This is easy to accomplish with a single while statement.
while ((st = reader.readLine()) != null && (st2 = reader2.readLine()) != null) {
// print both st and st2
}
The way your code is written now, it reads one file at a time, printing data to the console from each individual file. If you want to meld the results together, you have to combine the output of the files in a single loop.
Given that the intention may also be that you have an odd-sized file in one batch but you do have numbers to correlate across, or the numbers may come in a nonsequential order, you may want to store these results into a data structure instead, like a List, since you know the specific index of each of these values and know where they should fit in.
Combining the NIO Files and Stream API, it's a little simpler:
public static void main(String[] args) throws Exception {
Map<String, List<String[]>> f1 = Files
.lines(Paths.get("file1"))
.map(line -> line.split(" "))
.collect(Collectors.groupingBy(arr -> arr[1]));
Map<String, List<String[]>> f2 = Files
.lines(Paths.get("file2"))
.map(line -> line.split(" "))
.collect(Collectors.groupingBy(arr -> arr[0]));
Stream.concat(f1.keySet().stream(), f2.keySet().stream())
.distinct()
.map(key -> f1.get(key).get(0)[0] + " " + f2.get(key).get(0)[1])
.forEach(System.out::println);
}
As can easily be noticed in the code, there are assumptions of valid data an of consistency between the two files. If this doesn't hold, you may need to first run a filter to exclude entries missing in either file:
Stream.concat(f1.keySet().stream(), f2.keySet().stream())
.filter(key -> f1.containsKey(key) && f2.containsKey(key))
.distinct()
...
If you change the order such that the number comes first in both files, you can read both files into a HashMap then create a Set of common keys. Then loop through the set of common keys and grab the associated value from each Hashmap to print:
My solution is verbose but I wrote it that way so that you can see exactly what's happening.
import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;
import java.io.File;
import java.util.Scanner;
class J {
public static Map<String, String> fileToMap(File file) throws Exception {
// TODO - Make sure the file exists before opening it
// Scans the input file
Scanner scanner = new Scanner(file);
// Create the map
Map<String, String> map = new HashMap<>();
String line;
String name;
String code;
String[] parts = new String[2];
// Scan line by line
while (scanner.hasNextLine()) {
// Get next line
line = scanner.nextLine();
// TODO - Make sure the string has at least 1 space
// Split line by index of first space found
parts = line.split(" ", line.indexOf(' ') - 1);
// Get the class code and string val
code = parts[0];
name = parts[1];
// Insert into map
map.put(code, name);
}
// Close input stream
scanner.close();
// Give the map back
return map;
}
public static Set<String> commonKeys(Map<String, String> nameMap,
Map<String, String> classMap) {
Set<String> commonSet = new HashSet<>();
// Get a set of keys for both maps
Set<String> nameSet = nameMap.keySet();
Set<String> classSet = classMap.keySet();
// Loop through one set
for (String key : nameSet) {
// Make sure the other set has it
if (classSet.contains(key)) {
commonSet.add(key);
}
}
return commonSet;
}
public static Map<String, String> joinByKey(Map<String, String> namesMap,
Map<String, String> classMap,
Set<String> commonKeys) {
Map<String, String> map = new HashMap<String, String>();
// Loop through common keys
for (String key : commonKeys) {
// TODO - check for nulls if get() returns nothing
// Fetch the associated value from each map
map.put(namesMap.get(key), classMap.get(key));
}
return map;
}
public static void main(String[] args) throws Exception {
// Surround in try catch
File names = new File("names.txt");
File classes = new File("classes.txt");
Map<String, String> nameMap = fileToMap(names);
Map<String, String> classMap = fileToMap(classes);
Set<String> commonKeys = commonKeys(nameMap, classMap);
Map<String, String> nameToClass = joinByKey(nameMap, classMap, commonKeys);
System.out.println(nameToClass);
}
}
names.txt
1 James
2 Cody
3 John
5 Max
classes.txt
1 Computer Science
2 Chemistry
3 Physics
4 Biology
Output:
{Cody=Chemistry, James=Computer, John=Physics}
Notes:
I added keys in classes.txt and names.txt that purposely did not match so you see that it does not come up in the output. That is because the key never makes it into the commonKeys set. So, they never get inserted into the joined map.
You can loop through the HashMap if you want my calling map.entrySet()

Java how to remove duplicates from ArrayList

I have a CSV file which contains rules and ruleversions. The CSV file looks like this:
CSV FILE:
#RULENAME, RULEVERSION
RULE,01-02-01
RULE,01-02-02
RULE,01-02-34
OTHER_RULE,01-02-04
THIRDRULE, 01-02-04
THIRDRULE, 01-02-04
As you can see, 1 rule can have 1 or more rule versions. What I need to do is read this CSV file and put them in an array. I am currently doing that with the following script:
private static List<String[]> getRulesFromFile() {
String csvFile = "rulesets.csv";
BufferedReader br = null;
String line = "";
String delimiter = ",";
List<String[]> input = new ArrayList<String[]>();
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
if (!line.startsWith("#")) {
String[] rulesetEntry = line.split(delimiter);
input.add(rulesetEntry);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return input;
}
But I need to adapt the script so that it saves the information in the following format:
ARRAY (
=> RULE => 01-02-01, 01-02-02, 01-02-04
=> OTHER_RULE => 01-02-34
=> THIRDRULE => 01-02-01, 01-02-02
)
What is the best way to do this? Multidimensional array? And how do I make sure it doesn't save the rulename more than once?
You should use a different data structure, for example an HashMap, like this.
HashMap<String, List<String>> myMap = new HashMap<>();
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
if (!line.startsWith("#")) {
String[] parts = string.split(delimiter);
String key = parts[0];
String value = parts[1];
if (myMap.containsKey(key)) {
myMap.get(key).add(value);
} else {
List<String> values = new ArrayList<String>();
values.add(value);
myMap.put(key, values);
}
}
}
This should work!
See using an ArrayList is not a good data structure of choice here.
I would personally suggest you to use a HashMap> for this particular purpose.
The rules will be your keys and rule versions will be your values which will be a list of strings.
While traversing your original file, just check if the rule (key) is present, then add the value to the list of rule versions (values) already present, otherwise add a new key and add the value to it.
For instance like this:
public List<String> removeDuplicates(List<String> myList) {
Hashtable<String, String> hashtable=new Hashtable<String, String>();
for(String s:myList) {
hashtable.put(s, s);
}
return new ArrayList<String>(hashtable.values());
}
This is exactly what key - value pairs can be used for. Just take a look at the Map Interface. There you can define a unique key containing various elements as value, perfectly for your issue.
Code:
// This collection will take String type as a Key
// and Prevent duplicates in its associated values
Map<String, HashSet<String>> map = new HashMap<String,HashSet<String>>();
// Check if collection contains the Key you are about to enter
// !REPLACE! -> "rule" with the Key you want to enter into your collection
// !REPLACE! -> "whatever" with the Value you want to associate with the key
if(!map.containsKey("rule")){
map.put("rule", new HashSet<String>());
}
else{
map.get("rule").add("whatever");
}
Reference:
Set
Map

Java - Write hashmap to a csv file

I have a hashmap with a String key and String value. It contains a large number of keys and their respective values.
For example:
key | value
abc | aabbcc
def | ddeeff
I would like to write this hashmap to a csv file such that my csv file contains rows as below:
abc,aabbcc
def,ddeeff
I tried the following example here using the supercsv library: http://javafascination.blogspot.com/2009/07/csv-write-using-java.html. However, in this example, you have to create a hashmap for each row that you want to add to your csv file. I have a large number of key value pairs which means that several hashmaps, with each containing data for one row need to be created. I would like to know if there is a more optimized approach that can be used for this use case.
Using the Jackson API, Map or List of Map could be written in CSV file. See complete example here
/**
* #param listOfMap
* #param writer
* #throws IOException
*/
public static void csvWriter(List<HashMap<String, String>> listOfMap, Writer writer) throws IOException {
CsvSchema schema = null;
CsvSchema.Builder schemaBuilder = CsvSchema.builder();
if (listOfMap != null && !listOfMap.isEmpty()) {
for (String col : listOfMap.get(0).keySet()) {
schemaBuilder.addColumn(col);
}
schema = schemaBuilder.build().withLineSeparator(System.lineSeparator()).withHeader();
}
CsvMapper mapper = new CsvMapper();
mapper.writer(schema).writeValues(writer).writeAll(listOfMap);
writer.flush();
}
Something like this should do the trick:
String eol = System.getProperty("line.separator");
try (Writer writer = new FileWriter("somefile.csv")) {
for (Map.Entry<String, String> entry : myHashMap.entrySet()) {
writer.append(entry.getKey())
.append(',')
.append(entry.getValue())
.append(eol);
}
} catch (IOException ex) {
ex.printStackTrace(System.err);
}
As your question is asking how to do this using Super CSV, I thought I'd chime in (as a maintainer of the project).
I initially thought you could just iterate over the map's entry set using CsvBeanWriter and a name mapping array of "key", "value", but this doesn't work because HashMap's internal implementation doesn't allow reflection to get the key/value.
So your only option is to use CsvListWriter as follows. At least this way you don't have to worry about escaping CSV (every other example here just joins with commas...aaarrggh!):
#Test
public void writeHashMapToCsv() throws Exception {
Map<String, String> map = new HashMap<>();
map.put("abc", "aabbcc");
map.put("def", "ddeeff");
StringWriter output = new StringWriter();
try (ICsvListWriter listWriter = new CsvListWriter(output,
CsvPreference.STANDARD_PREFERENCE)){
for (Map.Entry<String, String> entry : map.entrySet()){
listWriter.write(entry.getKey(), entry.getValue());
}
}
System.out.println(output);
}
Output:
abc,aabbcc
def,ddeeff
Map<String, String> csvMap = new TreeMap<>();
csvMap.put("Hotel Name", hotelDetails.getHotelName());
csvMap.put("Hotel Classification", hotelDetails.getClassOfHotel());
csvMap.put("Number of Rooms", hotelDetails.getNumberOfRooms());
csvMap.put("Hotel Address", hotelDetails.getAddress());
// specified by filepath
File file = new File(fileLocation + hotelDetails.getHotelName() + ".csv");
// create FileWriter object with file as parameter
FileWriter outputfile = new FileWriter(file);
String[] header = csvMap.keySet().toArray(new String[csvMap.size()]);
String[] dataSet = csvMap.values().toArray(new String[csvMap.size()]);
// create CSVWriter object filewriter object as parameter
CSVWriter writer = new CSVWriter(outputfile);
// adding data to csv
writer.writeNext(header);
writer.writeNext(dataSet);
// closing writer connection
writer.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
If you have a single hashmap it is just a few lines of code. Something like this:
Map<String,String> myMap = new HashMap<>();
myMap.put("foo", "bar");
myMap.put("baz", "foobar");
StringBuilder builder = new StringBuilder();
for (Map.Entry<String, String> kvp : myMap.entrySet()) {
builder.append(kvp.getKey());
builder.append(",");
builder.append(kvp.getValue());
builder.append("\r\n");
}
String content = builder.toString().trim();
System.out.println(content);
//use your prefered method to write content to a file - for example Apache FileUtils.writeStringToFile(...) instead of syso.
result would be
foo,bar
baz,foobar
My Java is a little limited but couldn't you just loop over the HashMap and add each entry to a string?
// m = your HashMap
StringBuilder builder = new StringBuilder();
for(Entry<String, String> e : m.entrySet())
{
String key = e.getKey();
String value = e.getValue();
builder.append(key);
builder.append(',');
builder.append(value);
builder.append(System.getProperty("line.separator"));
}
string result = builder.toString();

Writing back to a csv with Java using super csv

Ive been working on this code for quite sometime and just want to be given the simple heads up if im routing down a dead end. The point where im at now is to mathch identical cells from diffrent .csv files and copy one row into another csv file. The question really is would it be possible to write at specfic lines say for example if the the 2 cells match at row 50 i wish to write back on to row 50. Im assuming that i would maybe extract everything to a hashmap, write it in there then write back to the .csv file? is there a easier way?
for example i have one Csv that has person details, and the other has property details of where the actual person lives, i wish to copy the property details to the person csv, aswell as match them up with the correct person detail. hope this makes sense
public class Old {
public static void main(String [] args) throws IOException
{
List<String[]> cols;
List<String[]> cols1;
int row =0;
int count= 0;
boolean b;
CsvMapReader Reader = new CsvMapReader(new FileReader("file1.csv"), CsvPreference.EXCEL_PREFERENCE);
CsvMapReader Reader2 = new CsvMapReader(new FileReader("file2.csv"), CsvPreference.EXCEL_PREFERENCE);
try {
cols = readFile("file1.csv");
cols1 = readFile("fiel2.csv");
String [] headers = Reader.getCSVHeader(true);
headers = header(cols1,headers
} catch (IOException e) {
e.printStackTrace();
return;
}
for (int j =1; j<cols.size();j++) //1
{
for (int i=1;i<cols1.size();i++){
if (cols.get(j)[0].equals(cols1.get(i)[0]))
{
}
}
}
}
private static List<String[]> readFile(String fileName) throws IOException
{
List<String[]> values = new ArrayList<String[]>();
Scanner s = new Scanner(new File(fileName));
while (s.hasNextLine()) {
String line = s.nextLine();
values.add(line.split(","));
}
return values;
}
public static void csvWriter (String fileName, String [] nameMapping ) throws FileNotFoundException
{
ICsvListWriter writer = new CsvListWriter(new PrintWriter(fileName),CsvPreference.STANDARD_PREFERENCE);
try {
writer.writeHeader(nameMapping);
} catch (IOException e) {
e.printStackTrace();
}
}
public static String[] header(List<String[]> cols1, String[] headers){
List<String> list = new ArrayList<String>();
String [] add;
int count= 0;
for (int i=0;i<headers.length;i++){
list.add(headers[i]);
}
boolean c;
c= true;
while(c) {
add = cols1.get(0);
list.add(add[count]);
if (cols1.get(0)[count].equals(null))// this line is never read errpr
{
c=false;
break;
} else
count ++;
}
String[] array = new String[list.size()];
list.toArray(array);
return array;
}
Just be careful if you read all of the addresses and person details into memory first (as Thomas has suggested) - if you're only dealing with small CSV files then it's fine, but you may run out of memory if you're dealing with larger files.
As an alternative, I've put together an example that reads the addresses in first, then writes the combined person/address details while it reads in the person details.
Just a few things to note:
I've used CsvMapReader and CsvMapWriter because you were - this meant I've had to use a Map containing a Map for storing the addresses. Using CsvBeanReader/CsvBeanWriter would make this a bit more elegant.
The code from your question doesn't actually use Super CSV to read the CSV (you're using Scanner and String.split()). You'll run into issues if your CSV contains commas in the data (which is quite possible with addresses), so it's a lot safer to use Super CSV, which will handle escaped commas for you.
Example:
package example;
import java.io.StringReader;
import java.io.StringWriter;
import java.util.HashMap;
import java.util.Map;
import org.supercsv.io.CsvMapReader;
import org.supercsv.io.CsvMapWriter;
import org.supercsv.io.ICsvMapReader;
import org.supercsv.io.ICsvMapWriter;
import org.supercsv.prefs.CsvPreference;
public class CombiningPersonAndAddress {
private static final String PERSON_CSV = "id,firstName,lastName\n"
+ "1,philip,fry\n2,amy,wong\n3,hubert,farnsworth";
private static final String ADDRESS_CSV = "personId,address,country\n"
+ "1,address 1,USA\n2,address 2,UK\n3,address 3,AUS";
private static final String[] COMBINED_HEADER = new String[] { "id",
"firstName", "lastName", "address", "country" };
public static void main(String[] args) throws Exception {
ICsvMapReader personReader = null;
ICsvMapReader addressReader = null;
ICsvMapWriter combinedWriter = null;
final StringWriter output = new StringWriter();
try {
// set up the readers/writer
personReader = new CsvMapReader(new StringReader(PERSON_CSV),
CsvPreference.STANDARD_PREFERENCE);
addressReader = new CsvMapReader(new StringReader(ADDRESS_CSV),
CsvPreference.STANDARD_PREFERENCE);
combinedWriter = new CsvMapWriter(output,
CsvPreference.STANDARD_PREFERENCE);
// map of personId -> address (inner map is address details)
final Map<String, Map<String, String>> addresses =
new HashMap<String, Map<String, String>>();
// read in all of the addresses
Map<String, String> address;
final String[] addressHeader = addressReader.getCSVHeader(true);
while ((address = addressReader.read(addressHeader)) != null) {
final String personId = address.get("personId");
addresses.put(personId, address);
}
// write the header
combinedWriter.writeHeader(COMBINED_HEADER);
// read each person
Map<String, String> person;
final String[] personHeader = personReader.getCSVHeader(true);
while ((person = personReader.read(personHeader)) != null) {
// copy address details to person if they exist
final String personId = person.get("id");
final Map<String, String> personAddress = addresses.get(personId);
if (personAddress != null) {
person.putAll(personAddress);
}
// write the combined details
combinedWriter.write(person, COMBINED_HEADER);
}
} finally {
personReader.close();
addressReader.close();
combinedWriter.close();
}
// print the output
System.out.println(output);
}
}
Output:
id,firstName,lastName,address,country
1,philip,fry,address 1,USA
2,amy,wong,address 2,UK
3,hubert,farnsworth,address 3,AUS
From your comment, it seems like you have the following situation:
File 1 contains persons
File 2 contains addresses
You then want to match persons and addresses by some key ( one or more fields) and write the combination back to a CSV file.
Thus the simplest approach might be something like this:
//use a LinkedHashMap to preserve the order of the persons as found in file 1
Map<PersonKey, String[]> persons = new LinkedHashMap<>();
//fill in the persons from file 1 here
Map<PersonKey, String[]> addresses = new HashMap<>();
//fill in the addresses from file 2 here
List<String[]> outputLines = new ArrayList<>(persons.size());
for( Map.Entry<PersonKey, String[]> personEntry: persons.entrySet() ) {
String[] person = personEntry.getValue();
String[] address = addresses.get( personEntry.getKey() );
//merge the two arrays and put them into outputLines
}
//write outputLines to a file
Note that PersonKey might just be a String or a wrapper object ( Integer etc.) if you can match persons and addresses by one field. If you have more fields you might need a custom PersonKey object with equals() and hashCode() properly overridden.

How to check number of instances of a domain in a text file

I have a text file containing domains like
ABC.COM
ABC.COM
DEF.COM
DEF.COM
XYZ.COM
i want to read the domains from the text file and check how many instances of domains are there.
Reading from a text file is easy but i am confused at how to check number of instances of domains.
Please help.
Split by space (String instances have method split), iterate through result array and use Map<String(domainName), Integer(count)> - when domain is in map, than increase count in map by 1, when not - put domain name in map and set 1 as a value.
Better solution is to use a Map to map the words Map with frequency.
Map<String,Integer> frequency = new LinkedHashMap<String,Integer>();
Read file
BufferedReader in = new BufferedReader(new FileReader("infilename"));
String str;
while ((str = in.readLine()) != null) {
buildMap(str);
}
in.close();
Build map method : You can split the urls in your file by reading them line by line and splitting with delimiter(in your case space).
String [] words = line.split(" ");
for (String word:words){
Integer f = frequency.get(word);
if(f==null) f=0;
frequency.put(word,f+1);
}
Find out for a particular domain with:
frequency.get(domainName)
Ref: Counting frequency of a string
List<String> domains=new ArrayList<String>(); // values from your file
domains.add("abc.com");
domains.add("abc.com");
domains.add("xyz.com");
//added for example
Map<String,Integer> domainCount=new HashMap<String, Integer>();
for(String domain:domains){
if(domainCount.containsKey(domain)){
domainCount.put(domain, domainCount.get(domain)+1);
}else
domainCount.put(domain, new Integer(1));
}
Set<Entry<String, Integer>> entrySet = domainCount.entrySet();
for (Entry<String, Integer> entry : entrySet) {
System.out.println(entry.getKey()+" : "+entry.getValue());
}
If the domains are unknown you can do something like:
// Field Declaration
private Map<String, Integer> mappedDomain = new LinkedHashMap<String, Integer>();
private static final List<String> domainList = new ArrayList<String>();
// Add all that you want to track
domainList.add("com");
domainList.add("net");
domainList.add("org");
...
// Inside the loop where you do a readLine
String[] words = line.split(" ");
for (String word : words) {
String[] wordSplit = word.split(".");
if (wordSplit.length == 2) {
for (String domainCheck : domainList) {
if (domainCheck.equals(wordSplit[1])) {
if (mappedDomain.containsKey(word)) {
mappedDomain.put(word, mappedDomain.get(word)+1);
} else {
mappedDomain.put(word, 1);
}
}
}
}
}
Note: This will work for something like xxx.xxx; if you need to take care of complex formats you need to modify the logic from wordSplit!

Categories

Resources