sorting data in a file - java

I am facing a problem in sorting. The format of data is:
b4 S0_c5 t 0.426544
b6 S1_c5 t 1.51049
b13 S0_c5 t 0.594502
b13 S1_c5 t 0.537496
b15 S1_c5 t 0.884126
b18 S0_c5 t 0.500933
b19 S1_c5 t 0.628472
b22 S0_c5 t 0.437718
and required result is:
S0_c5 b13 0.594502 b18 0.500933 b22 0.437718 b4 0.426544
S1_c5 b6 1.51049 b15 0.884126 b19 0.628472 b13 0.537496
the value is also in descending order. Thanks in advance.

Put the data in a TreeList<String, List<String>> (because it's sorted) where the second word from a sequence is the key, and the value a list of strings, then sort each list you obtain this way:
Map<String, List<String[]>> map = new TreeMap<String, List<String[]>>();
for (String s : strings) {
String[] tokens = s.split(" ");
List<String[]> values = map.get(tokens[1]);
if (values == null) {
values = new ArrayList<String[]>();
map.put(tokens[1], values);
}
values.add(new String[]{tokens[0], tokens[3]});
}
for (String key : map.keySet()) {
List<String[]> list = map.get(key);
Collections.sort(list, new Comparator<String[]>() {
#Override
public int compare(String[] o1, String[] o2) {
return o1[1].compareTo(o2[1]) * -1;
}
});
System.out.print(key + " ");
for (String[] s : list) {
System.out.print(s[0] + " " + s[1]);
}
System.out.println();
}
Update:
E.g. to read from a file:
BufferedReader br;
try {
br = new BufferedReader(new FileReader("d:/temp/r.res"));
Map<String, List<String[]>> map = new TreeMap<String, List<String[]>>();
while (br.ready()) {
String s = br.readLine();
if (!s.trim().isEmpty()) {
String[] tokens = s.split(" ");
List<String[]> values = map.get(tokens[1]);
if (values == null) {
values = new ArrayList<String[]>();
map.put(tokens[1], values);
}
values.add(new String[]{tokens[0], tokens[3]});
}
}
} finally {
br.close();
}

Split your data by ' '.
Create a HashMap<String, List<String[]>>
For Each Row:
Look if the Map Contains the Key (split[1])
If there is no List at that key, create one
add split[1], split to the correct list
Iterate through your map and order each List
Output the data

Put the data into a List and use Collections.sort() to sort it.

there is a class in the JDK just for the purpose of having a sorted list. It is named (somewhat out of order with the other Sorted* interfaces) "java.util.PriorityQueue". It can sort either Comparables or using a Comparator.
The difference with a List sorted using Collections.sort(...) is that this will maintain order at all times, and have good insertion performance by using a heap data structure, where inserting in a sorted ArrayList will be O(n) (i.e., using binary search and move).
However other than List, PriorityQueue does not support indexed access (get(5)), the only way to access items in a heap is to take them out, one at a time (thus the name PriorityQueue).

Try this. It will work.
private void ReadTextFile(String filename) throws IOException
{
BufferedReader br = null;
FileInputStream fin = null;
fin = new FileInputStream(filename);
br =new BufferedReader(new InputStreamReader(fin));
Map<String,String> stringStringMap = new TreeMap<String, String>(Collections.reverseOrder());
while ((line = br.readLine()) != null) {
stringStringMap.put(line.split(" ")[3],line);
}
Collection<String> collection = stringStringMap.values();
Map<String, List<String>> map = new TreeMap<String, List<String>>();
Iterator<String> iterator = collection.iterator();
while(iterator.hasNext()){
String[] tokens = iterator.next().split(" ");
List<String> values = map.get(tokens[1]);
if (values == null) {
values = new ArrayList<String>();
map.put(tokens[1], values);
}
values.add(tokens[0] + " " + tokens[3]);
}
for (List<String> mapList : map.values()) {
Collections.sort(mapList);
}
for (String key : map.keySet()) {
System.out.println(key + " " + map.get(key));
}
}

Related

number of element occurence

I'm trying to find number of element occurrence using treeset and hashmap.
when i'm running the program, value is not increasing in hashmap
I've tried map.put(data,map.get(data)+1) it is causing null pointer exception.
public class ReadData {
public static void main(String[] args) {
File f = new File("E:\\new1.txt");
try {
BufferedReader br = new BufferedReader(new FileReader(f));
String data = "";
int count =1;
HashMap<String,Integer> map = null;
TreeSet<String> set = new TreeSet<String>();
set.add("");
while((data=br.readLine())!=null) {
map = new HashMap<String,Integer>();
if(set.contains(data)) {
map.put(data,map.get(data)+1);
System.out.println("correct");
System.out.println(count+1);
}else
{
map.put(data,count);
set.add(data);
System.out.println("Not correct");
}
//System.out.println(map);
Set sets = map.entrySet();
Iterator iterator = sets.iterator();
while(iterator.hasNext()) {
Map.Entry mentry = (Map.Entry)iterator.next();
System.out.print("key is: "+ mentry.getKey() + " & Value is: ");
System.out.println(mentry.getValue());
}
}
}catch(Exception e) {
System.out.println(e);
}
}
}
input:- orange
apple
orange
orange
expeted o/p key is orange & value is 3
key is apple & value is 1
The output is key is: orange & Value is: 1
key is: apple & Value is: 1
java.lang.NullPointerException
You can do it cleaner using streams, with Collectors.groupingBy() and Collectors.counting(). You should also use try-with-resource construct and new Files class:
String delimiter = " ";
Path p = Paths.get("E:", "file.txt");
try (BufferedReader br = Files.newBufferedReader(p)) {
Map<String, Long> result = br.lines()
.flatMap(l -> Arrays.stream(l.split(delimiter)))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
System.out.println(result);
}
For orange apple orange orange input this code will print {orange=3, apple=1}.
Please notices that
HashMap<String,Integer> map = null;
is not the same as an empty map. First all you must create a new map before to use it.
It this case use for example
HashMap<String,Integer> map = null;
An then you are creating into the loop a new map, this is hard to read for your purpose. I would suggest just instantiate your map together with the set and remove
map = new HashMap<String,Integer>();
inside while loop
Your code should look like
HashMap<String, Integer> map = new HashMap<String, Integer>();
TreeSet<String> set = new TreeSet<String>();
set.add("");
while ((data = br.readLine()) != null) {
You can also use TreeMap instead of using HashMap + TreeSet.
public class ReadData {
public static void main(String[] args) {
try {
File f = new File("E:\\new1.txt");
BufferedReader br = new BufferedReader(new FileReader(f));
TreeMap<String,Integer> map = new TreeMap(String, Integer);
while((String data=br.readLine()) != null) {
String[] fruitNames = data.split(" "); // or regex s+ can also be used
for(String fruitName : fruitNames){
Integer count = map.get(fruitName);
Integer newVal = count == null ? 1 : count+1 ;
map.put(fruitName, newVal);
}
// iterate over keys in TreeMap
}
}catch(Exception e) {
System.out.println(e);
}
}
}
If you want to count the occurrences of a string, you can simply use StringUtils.countMatches from
Apache Commons lang.
//First get all the words from your line -
String[] allWords = data.split("\\s");
//Retrieve unique strings
String[] uniqueStrings = Arrays.stream(allWords).distinct().toArray(String[]::new);
// Print the occurrence of each string in data
for (String word: uniqueStrings){
System.out.println("Count of occurrences for the word " + word + "is: " + StringUtils.countMatches(data, word));
}

How to fill a Map<String, List<String>> from a text file ? -Difficulites dynamically naming each List

Here is the text file. K: represents a key and V: represents the values in the List I want to enter. ie : Strawberry , Apricot and Peach make up the List at Key Class A of the Map.
K: Class A//
V: Strawberry//
V: Apricot//
V: Peach//
K: Class B//
V: Chocolate//
K: Class C//
V: Creme de menthe//
V: Irish coffee//
The program here assigns the Keys correctly but adds every Value in the file to the List instead of just the ones I want.
//FillHM.java
import java.io.*;
import java.util.Scanner;
import java.util.*;
public class FillHM {
public static void main (String[] args) {
Map<String, List<String>> map = new HashMap<String, List<String>>();
Scanner sc1 = null;
try {
sc1 = new Scanner(new File("/home/craig/Desktop/mytext.txt"));
}catch (FileNotFoundException e) {e.printStackTrace();}
List<String>values = new ArrayList<>();
String s = " ";
String key = " ";
while (sc1.hasNextLine()) {
Scanner sc2 = new Scanner(sc1.nextLine());
sc2.useDelimiter("//");
while(sc2.hasNext()) {
s = sc2.next();
if (s.startsWith("K:")) {
key = s;
}
if (s.startsWith("V:")) {
values.add(s);
}
map.put(key, values);
} //end while
} //end while
System.out.println(map);
}
}
When you detect a new key, create a new list:
if (s.startsWith("K:")) {
key = s;
values = new ArrayList<>();
}
If you don't do this, every key will be mapped to the same list. You want each key mapped to its own list.
The problem is , you can't use values.clear() , because it will clear values of all keys.
So below is the solution .
if (s.startsWith("K:")) {
values = new ArrayList<>();
key = s;
}
Just change this if.
The problem is that your values list is never get clear when the key get changed.
SO make sure to clear the list when you add key and values to the map
if (s.startsWith("K:")) {
key = s;
values.clear();
}
I guess this code will work for you buddy. values.clear() will not resolve the issue.
Map<String, List<String>> map = new HashMap<String, List<String>>();
Scanner sc1 = null;
try {
sc1 = new Scanner(new File("/home/craig/Desktop/mytext.txt"));
}catch (FileNotFoundException e) {e.printStackTrace();}
String s = "";
String key = "";
List<String>values=new ArrayList<>();
while (sc1.hasNextLine()) {
Scanner sc2 = new Scanner(sc1.nextLine());
sc2.useDelimiter("//");
s = sc2.next();
if (s.startsWith("K:")) {
if(values.size()!=0){
map.put(key, values);
//System.out.println(map);
}
key = s;
values=new ArrayList<>();
}
else if (s.startsWith("V:")) {
values.add(s);
}
}
map.put(key, values);
System.out.println(map);

How to retrive specific rows from a matrix returned by reading a CSV file in java

I want to retrieve some rows of a 2d array.
example: I have file named as "data.csv", which contains
age sex zipcode classtype
21 m 23423 1
12 f 23133 2
23 m 32323 2
23 f 23211 1
The below mentioned code will give output like this:
{age=[21,12,23,23],sex=[m,f,m,f],zipcode=[23423,23133,32323,23211],classtype=[1,2,2,1]}
Now I want to retrieve rows which have classtype 1 and store this values in a new 2d array.
like partition1={{21,m,23423,1},{23,f,23211,1}}
public class CsvParser {
public static void main(String[] args) {
try {
FileReader fr = new FileReader((args.length > 0) ? args[0] : "data.csv");
Map<String, List<String>> values = parseCsv(fr, " ", true);
System.out.println(values);
List<List<String>> partition1 = new ArrayList<>(25);
List<String> classTypes = values.get("classtype");
for (int row = 0; row < classTypes.size(); row++) {
String classType = classTypes.get(row);
if ("1".equals(classType)) {
List<String> data = new ArrayList<>(25);
data.add(values.get("age").get(row));
data.add(values.get("sex").get(row));
data.add(values.get("zipcode").get(row));
data.add(values.get("classtype").get(row));
partition1.add(data);
}
}
System.out.println(partition1);
} catch (IOException e) {
e.printStackTrace();
}
}
public static Map<String, List<String>> parseCsv(Reader reader, String separator, boolean hasHeader) throws IOException {
Map<String, List<String>> values = new LinkedHashMap<String, List<String>>();
List<String> columnNames = new LinkedList<String>();
BufferedReader br = null;
br = new BufferedReader(reader);
String line;
int numLines = 0;
while ((line = br.readLine()) != null) {
if (StringUtils.isNotBlank(line)) {
if (!line.startsWith("#")) {
String[] tokens = line.split(separator);
if (tokens != null) {
for (int i = 0; i < tokens.length; ++i) {
if (numLines == 0) {
columnNames.add(hasHeader ? tokens[i] : ("row_"+i));
} else {
List<String> column = values.get(columnNames.get(i));
if (column == null) {
column = new LinkedList<String>();
}
column.add(tokens[i]);
values.put(columnNames.get(i), column);
}
}
}
++numLines;
}
}
}
return values;
}
}
FileReader file1 = new FileReader(file);
BufferedReader buffer = new BufferedReader(file1);
String line = "";
while ((line = buffer.readLine()) != null) {
StringBuilder sb = new StringBuilder();
String[] str = line.split(",");
if(str[0]!=null||str[1]!=null||str[2]!=null){
sb.append("'" + str[0] + "',");
sb.append("'" +str[1] + "',");
sb.append("'" +str[2] + "'");
}
CSV File Must to be split comma based it should be work
Once I changed Map<String, List<String>> values = parseCsv(fr, "\\s,", true); to Map<String, List<String>> values = parseCsv(fr, " ", true); I was able to get the data in the right format...
From there it was just a matter to read through each row of classtype, when I found a value that matched 1, I would pull out each property for the given row and add it to a List, forming a single row. This was then added to another List which would maintain all the matching rows, for example...
List<List<String>> partition1 = new ArrayList<>(25);
List<String> classTypes = values.get("classtype");
for (int row = 0; row < classTypes.size(); row++) {
String classType = classTypes.get(row);
if ("1".equals(classType)) {
List<String> data = new ArrayList<>(25);
data.add(values.get("age").get(row));
data.add(values.get("sex").get(row));
data.add(values.get("zipcode").get(row));
data.add(values.get("classtype").get(row));
partition1.add(data);
}
}
System.out.println(partition1);
Which outputs...
[[21, m, 23423, 1], [23, f, 23211, 1]]
If you're looking for a more automated method, then I'm afraid you're out of luck, as Map makes no guarantee about the order that the keys are stored, iterated.
Of course, instead of using a List<List>, you could use a List<Map> which would maintain the keys for each value, for example...
List<Map<String, String>> partition1 = new ArrayList<>(25);
List<String> classTypes = values.get("classtype");
for (int row = 0; row < classTypes.size(); row++) {
String classType = classTypes.get(row);
if ("1".equals(classType)) {
Map<String, String> data = new HashMap<>(25);
for (String key : values.keySet()) {
data.put(key, values.get(key).get(row));
}
partition1.add(data);
}
}
System.out.println(partition1);
Which outputs...
[{sex=m, classtype=1, zipcode=23423, age=21}, {sex=f, classtype=1, zipcode=23211, age=23}]

OutOfMemoryError: Java heap space-ArrayLists Java

for(int i=0; i<words.size(); i++){
for(int j=0; j<Final.size(); j++){
if(words.get(i)==Final.get(j)){
temp=times.get(j);
temp=temp+1;
times.set(j, temp);
}
else{
Final.add(words.get(i));
times.add(1);
}
}
}
I want to create two ArrayLists; times(integers) and Final(String). The ArrayList "words" includes words of a string and some words are shown multiple times. What Im trying to do is add every word(but just once) of the "words" to the "Final", and add th number(how many times this word appears on the "words") to the "times" . Is something wrong?
Because I get OutOfMemoryError: Java heap space
I also think using a Hashmap is the best solution.
In your code, there is an error, maybe your problem is here.
Replace the following :
if(words.get(i)==Final.get(j)){
By :
if(words.get(i).equals(Final.get(j))){
you don't require two arrays to find out word and its count. you can get this detail after using hashmap. this hashmap contains key as your word and value will be its count.
like one hashmap
Map<String, Integer> words = new HashMap<String, Integer>();
and then you can use this map by following way
try {
//getting content from file.
Scanner inputFile = new Scanner(new File("d:\\test.txt"));
//reading line by line
while (inputFile.hasNextLine()) {
// SringTokenize is automatically divide the string with space.
StringTokenizer tokenizer = new StringTokenizer(
inputFile.nextLine());
while (tokenizer.hasMoreTokens()) {
String word = tokenizer.nextToken();
// If the HashMap already contains the key, increment the
// value
if (words.containsKey(word)) {
words.put(word, words.get(word) + 1);
}
// Otherwise, set the value to 1
else {
words.put(word, 1);
}
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}// Loop through the HashMap and print the results
for (Entry<String, Integer> entry : words.entrySet()) {
String key = entry.getKey();
Integer value = entry.getValue();
System.out.println("Word"+key + ": its occurance " + value);
If you are going to run out of memory it would be trying to read all the words into a collection. I suggest you not do this and instead count the words as you get them.
e.g.
Map<String, Integer> freq = new HashMap<>();
try(BufferedReader br = new BufferedReader(new FileReader(filename))) {
for(String line; (line = br.readLine()) != null; ) {
for(String word : line.trim().split("\\s+")) {
Integer count = freq.get(word);
freq.put(word, count == null ? 1 : 1 + count);
}
}
}
try this example.
String[] words = {"asdf","zvcxc", "asdf", "zxc","zxc", "zxc"};
Map<String, Integer> result = new HashMap<String, Integer>();
for (String word : words) {
if (!result.containsKey(word)) {
result.put(word, 1);
} else {
result.put(word, result.get(word) + 1);
}
}
//print result
for (Map.Entry<String, Integer> entry : result.entrySet()) {
System.out.println(String.format("%s -- %s times", entry.getKey(), entry.getValue()));
}
Output:
zvcxc -- 1 times
zxc -- 3 times
asdf -- 2 times

How to get the frequently occuring words from the text extracted using tika

I have extracted text for multiple file formats(pdf,html,doc) using below code(using tika)
File file1 = new File("c://sample.pdf);
InputStream input = new FileInputStream(file1);
BodyContentHandler handler = new BodyContentHandler(10*1024*1024);
JSONObject obj = new JSONObject();
obj.put("Content",handler.toString());
Now my requirement is to get the frequently occurring words from the extracted content, can u please suggest me how to do this.
Thanks
Here's a function to the most frequent word.
You need to pass the content to the function, and you get the frequently occurring word.
String getMostFrequentWord(String input) {
String[] words = input.split(" ");
// Create a dictionary using word as key, and frequency as value
Map<String, Integer> dictionary = new HashMap<String, Integer>();
for (String word : words) {
if (dictionary.containsKey(word)) {
int frequency = dictionary.get(word);
dictionary.put(word, frequency + 1);
} else {
dictionary.put(word, 1);
}
}
int max = 0;
String mostFrequentWord = "";
Set<Entry<String, Integer>> set = dictionary.entrySet();
for (Entry<String, Integer> entry : set) {
if (entry.getValue() > max) {
max = entry.getValue();
mostFrequentWord = entry.getKey();
}
}
return mostFrequentWord;
}
The algorithm is O(n) so the performance should be okay.

Categories

Resources