Combine multiple Treemaps into csv table with java - java

I am new to java and trying to figure out how to combine several treemaps into a table.
I have a java program that reads a text file and creates a treemap indexing the words in the file. The output has individual words as the key and the list of pages it appears on as the value. an example looks like this:
a 1:4:7
b 1:7
d 2
Now my program currently creates a thread for several text files and creates Treemaps for each file. I would like to combine these treemaps into one output. So say we have a second text file that looks like this:
a 1:2:4
b 3
c 7
The final output I am trying to create is a csv table that looks like this:
key,file1,file2
a,1:4:7,1:2:4
b,1:7,3
c,,7
d,2,
Is there a method to combine maps like this? I am a sql developer primarily so my idea was to print each map to a txt file along with the file name and then pivot this list based on the file name. This didn't seem like a very java like way to approach the problem though.

I think you need to do it manually.
I didnt compile my solution and it didnt write to csv file, but it should give you hint:
public void writeCsv(List<MyTreeMap> list) {
Set<String> ids = new TreeSet<String>();
// ids will store all unique key in your example: a,b,c,d
for(MyTreeMap m:list) {
for(String id:m.keySet()) {
ids.insert(id);
}
}
// iterate ids [a,b,c,d]
for(String id:ids) {
StringBuffer line = new StringBuffer();
line.append(id);
for(MyTreeMap m:list) {
String pages = m.get(id);
// pages will contains "1:4:7" like your example.
line.append(",");
line.append(pages);
}
System.out.println(line);
}
}

Related

easiest way to read a java file - is there a simpler auternative to JSON

I am writing a small java method that needs to read test data from a file on my win10 laptop.
The test data has not been formed yet but it will be text based.
I need to write a method that reads the data and analyses it character by character.
My questions are:
what is the simplest format to create and read the file....I was looking at JSON, something that does not look particularly complex but is it the best for a very simple application?
My second question (and I am a novice). If the file is in a text file on my laptop.....how do I tell my java code where to find it....how do I ask java to navigate the win10 operating system?
You can also map the text file into java objects (It depends on your text file).
For example, we have a text file that contains person name and family line by line like:
Foo,bar
John,doe
So for parse above text file and map it into a java object we can :
1- Create a Person Object
2- Read and parse the file (line by line)
Create Person Class
public class Person {
private String name;
private String family;
//setters and getters
}
Read The File and Parse line by line
public static void main(String[] args) throws IOException {
//Read file
//Parse line by line
//Map into person object
List<Person> personList = Files
.lines(Paths
.get("D:\\Project\\Code\\src\\main\\resources\\person.txt"))
.map(line -> {
//Get lines of test and split by ","
//It split words of the line and push them into an array of string. Like "John,Doe" -> [John,Doe]
List<String> nameAndFamily = Splitter.on(",").trimResults().omitEmptyStrings().splitToList(line);
//Create a new Person and get above words
Person person = new Person();
person.setName(nameAndFamily.get(0));
person.setFamily(nameAndFamily.get(1));
return person;
}
).collect(Collectors.toList());
//Process the person list
personList.forEach(person -> {
//You can whatever you want to the each person
//Print
System.out.println(person.getName());
System.out.println(person.getFamily());
});
}
Regarding your first question, I can't say much, without knowing anything about the data you like to write/read.
For your second question, you would normally do something like this:
String pathToFile = "C:/Users/SomeUser/Documents/testdata.txt";
InputStream in = new FileInputStream(pathToFile);
As your data gains more complexity you should probably think about using a defined format, if that is possible, something like JSON, YAML or similar for example.
Hope this helps a bit. Good luck with your project.
As for the format the text file needs to take, you should elaborate a bit on the kind of data. So I can't say much there.
But to navigate the file system, you just need to write the path a bit different:
The drive letter is a single character at the beginning of the path i.e. no colon ":"
replace the backslash with a slash
then you should be set.
So for example...
C:\users\johndoe\documents\projectfiles\mydatafile.txt
becomes
c/users/johndoe/documents/projectfiles/mydatafile.txt
With this path, you can use all the IO classes for file manipulation.

How to create JSON file from java without duplicating element?

I'm trying to make a simple dictionary program based on server-client socket communication. I'm trying to save user word and meaning input as a JSON file (which is dictionary data to search later on) but when I do add query it ends up with having duplicated JSON objects
for example, if I add happy and then weather and hello, the result written in JSON file is
like below
{"hello":"greeting"}{"happy":"joy","hello":"greeting"}
{"happy":"joy","weather":"cold","hello":"greeting"}`
instead of getting
{"hello":"greeting"}{"happy":"joy"}{"weather":"cold"} like I wanted
how can I fix this problem?
my code for that function is
case "add":{
FileWriter dictionaryWriter = new FileWriter("dictionary.json",true);
//split command again into 2 part now using delimiter ","
String break2[] = msgBreak[1].split(",");
String word = break2[0];
String meaning = break2[1];
dictionary.put(word, meaning);
System.out.println("Writing... " + word+":"+meaning);
dictionaryWriter.write(dictionary.toString());
//flush remain byte
dictionaryWriter.flush();
//close writer
dictionaryWriter.close();
break;}
this function is in while(true) loop with other dictionary functions
I tried to remove the appending file part, but when I remove the (,true) part the duplication error stopped but whenever I get a new connection, new dictionary file is created instead of having all data saved.
If anyone can help me solve this problem, I would appreciate it a lot!
Thanks you in advance.
You can try to create a new dictionary every time instead of using the existing one
Map<String, String> dictionary = new HashMap<>();
dictionary.put(word, meaning);
...

Java - method to lookup a specific ID from txt file and return that line details

I am writing a method to lookup a specific ID that is stored within a txt file.
These details are assigned to an arrayList titled list, if the lookup string matches the data stored in list then it reads the id,firstname,surname (IE the whole line of the txt file) and then creates an instance of another class profile.
I then want to add this lookup data to a new arrayList titled lookup then to output it. I have the below method however, it does not work and just jumps to my else clause.
Could anyone tell me where i'm going wrong and how to fix would be appreciated. Thanks.
Could you instead use a TupleMap for the same effect?
// create our map Map peopleByForename
= new HashMap>();
// populate it peopleByForename.put("Bob", new Tuple2(new Person("Bob
Smith",
new Person("Bob Jones"));
// read from it Tuple bobs = peopleByForename["Bob"];
Person bob1 = bobs.Item1; Person bob2 = bobs.Item2;
Then an example of reading the key:value from the txt file can be found here : Java read txt file to hashmap, split by ":" using Buffered Reader.
If you are using Java8, you can use Lambda to help you. Just replace this line:
if(list.contains(IDlookup))
to this one:
boolean containsId = list.stream().anyMatch((Profile p) -> p.getId().equals(IDlookup));
if (containsId)

Merge tab delimited files by key

I have three MapReduce jobs that produce tab delimited files, that operate on the same files. The first value is the key. This is the case for every output of these three MR jobs.
What I want to do now, is use MapReduce to "stitch" these files together by key. What would be the best Mapper output and Reducer input? I tried using ArrayWritable, but because of the shuffle, for some records the ArrayWritable from 1 file is in the third position, instead of the second.
I want this:
Key \t Values-from-first-MR-job \t Values-from-second-MR-job \t Values-from-third-MR-job
And this should be the same for all records. But, as I said, because of the shuffle, sometimes this happens for a few records:
Key \t Values-from-third-MR-job \t Values-from-first-MR-job \t Values-from-second-MR-job
How should I set up my Mapper and Reducer to fix this?
It's possible with simple tagging on the emitted value since only three types of files are involved. In map extract the path of the split, identify its position and add a suitable prefix to the value. For clarity, say the outputs are in 3 directories :
path1/mr_out_1
path2/mr_out_2
path3/mr_out_3
Using TextInputForamt for all these paths, in map you will do :
String[] keyVal = value.spilt("\t",2);
Path filePath = ((FileSplit) context.getInputSplit()).getPath();
String dirName = filePath.getParent().getName().toString();
Text outValue = new Text();
if(dirName.equals("mr_out_1")){
outValue.set("1_" + keyVal[1]);
} else if(dirName.equals("mr_out_2")){
outValue.set("2_" + keyVal[1]);
} else {
outValue.set("3_" + keyVal[1]);
}
context.write(new Text(keyVal[0]), outVal);
If you have all the files in the same directory, use the fileName instead of dirName. Then identify the flag based on the name(a regex match may be suitable) :
String fileName = filePath.getName().toString();
if(fileName.matches("regex")){ ... }
In reduce just put the incoming values to a List and sort. Rest is simple enough.
List<String> list = new ArrayList<String>(3);
for(Text v : values){
list.add(v.toString());
}
Collections.sort(list);
StringBuilder builder = new StringBuilder();
for(String s : list){
builder.append(s.substring(2)+"\t");
}
context.write(key, new Text(builder.toString().trim()));
I think it will serve the purpose. Keep in mind that the Collection.sort strategy will fail if there are more than 9 files (due to alphabetical order). Then you may extract the tag separately, cast it to an Integer and use a TreeMap<tag, actualString> for sorting.
NB: All the above snippets are using new API. I didn't use an IDE to write those, so few syntax errors may exist. And again I didn't follow proper conventions in writing. Say the outKey of map could be a class member, and using outKey.set(keyVal[0]) could remove a Text object creation overhead.

How to deal with CSV files having an unknown number of columns using Super CSV

For a project I need to deal with CSV files where I do not know the columns before runtime. The CSV files are perfectly valid, I only need to perform a simple task on several different files over and over again. I do need to analyse the values of the columns, which is why I would need to use a library for working with CSV files. For simplicity, lets assume that I need to do something simple like appending a date column to all files, regardless how many columns they have. I want to do that with Super CSV, because I use the library for other tasks as well.
What I am struggeling with is more a conceptual issue. I am not sure how to deal with the files if I do not know in advance how many columns there are. I am not sure how I should define POJOs that map arbitrary CSV files or how I should define the Cell Processors if I do not know which and how many columns will be in the file. How can I dynamically create Cell processors that match the number of columns? How would I define POJOs for instance based on the header of the CSV file?
Consider the case where I have two CSV files: products.csv and address.csv. Lets assume I want to append a date column with today’s date for both files, without having to write two different methods (e.g. addDateColumnToProduct() and addDateColumnToAddress()) which do the same thing.
product.csv:
name, description, price
"Apple", "red apple from Italy","2.5€"
"Orange", "orange from Spain","3€"
address.csv:
firstname, lastname
"John", "Doe"
"Coole", "Piet"
Based on the header information of the CSV files, how could I define a POJO that maps the product CSV? The same question for Cell Processors? How could I define even a very simple cell processor that just basically has the right amount of parameters for the constructor, e.g. for the product.csv
CellProcessor[] processor = new CellProcessor[] {
null,
null,
null
};
and for the address.csv:
CellProcessor[] processor = new CellProcessor[] {
null,
null
};
Is this even possible? Am I on the wrong track to achieve this?
Edit 1:
I am not looking for a solution that can deal with CSV files having variable columns in one file. I try to figure out if it is possible dealing with arbitrary CSV files during runtime, i.e. can I create POJOs based only on the header information which is contained in the CSV file during runtime. Without knowing in advance how many columns a csv file will have.
Solution
Based on the answer and comments from #baba
private static void readWithCsvListReader() throws Exception {
ICsvListReader listReader = null;
try {
listReader = new CsvListReader(new FileReader(fileName), CsvPreference.TAB_PREFERENCE);
listReader.getHeader(true); // skip the header (can't be used with CsvListReader)
int amountOfColumns=listReader.length();
CellProcessor[] processor = new CellProcessor[amountOfColumns];
List<Object> customerList;
while( (customerList = listReader.read(processor)) != null ) {
System.out.println(String.format("lineNo=%s, rowNo=%s, customerList=%s", listReader.getLineNumber(),
listReader.getRowNumber(), customerList));
}
}
finally {
if( listReader != null ) {
listReader.close();
}
}
}
Maybe a little bit late but could be helpful...
CellProcessor[] processors=new CellProcessor[properties.size()];
for(int i=0; i< properties.zise(); i++){
processors[i]=new Optional();
}
return processors;
This is a very common issue and there are multiple tutorials on the internetz, including the Super Csv page:
http://supercsv.sourceforge.net/examples_reading_variable_cols.html
As this line says:
As shown below you can execute the cell processors after calling
read() by calling the executeProcessors() method. Because it's done
after reading the line of CSV, you have an opportunity to check how
many columns there are (using listReader.length()) and supplying the
correct number of processors.

Categories

Resources