OutOfMemoryError: Java heap space-ArrayLists Java - java

for(int i=0; i<words.size(); i++){
for(int j=0; j<Final.size(); j++){
if(words.get(i)==Final.get(j)){
temp=times.get(j);
temp=temp+1;
times.set(j, temp);
}
else{
Final.add(words.get(i));
times.add(1);
}
}
}
I want to create two ArrayLists; times(integers) and Final(String). The ArrayList "words" includes words of a string and some words are shown multiple times. What Im trying to do is add every word(but just once) of the "words" to the "Final", and add th number(how many times this word appears on the "words") to the "times" . Is something wrong?
Because I get OutOfMemoryError: Java heap space

I also think using a Hashmap is the best solution.
In your code, there is an error, maybe your problem is here.
Replace the following :
if(words.get(i)==Final.get(j)){
By :
if(words.get(i).equals(Final.get(j))){

you don't require two arrays to find out word and its count. you can get this detail after using hashmap. this hashmap contains key as your word and value will be its count.
like one hashmap
Map<String, Integer> words = new HashMap<String, Integer>();
and then you can use this map by following way
try {
//getting content from file.
Scanner inputFile = new Scanner(new File("d:\\test.txt"));
//reading line by line
while (inputFile.hasNextLine()) {
// SringTokenize is automatically divide the string with space.
StringTokenizer tokenizer = new StringTokenizer(
inputFile.nextLine());
while (tokenizer.hasMoreTokens()) {
String word = tokenizer.nextToken();
// If the HashMap already contains the key, increment the
// value
if (words.containsKey(word)) {
words.put(word, words.get(word) + 1);
}
// Otherwise, set the value to 1
else {
words.put(word, 1);
}
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}// Loop through the HashMap and print the results
for (Entry<String, Integer> entry : words.entrySet()) {
String key = entry.getKey();
Integer value = entry.getValue();
System.out.println("Word"+key + ": its occurance " + value);

If you are going to run out of memory it would be trying to read all the words into a collection. I suggest you not do this and instead count the words as you get them.
e.g.
Map<String, Integer> freq = new HashMap<>();
try(BufferedReader br = new BufferedReader(new FileReader(filename))) {
for(String line; (line = br.readLine()) != null; ) {
for(String word : line.trim().split("\\s+")) {
Integer count = freq.get(word);
freq.put(word, count == null ? 1 : 1 + count);
}
}
}

try this example.
String[] words = {"asdf","zvcxc", "asdf", "zxc","zxc", "zxc"};
Map<String, Integer> result = new HashMap<String, Integer>();
for (String word : words) {
if (!result.containsKey(word)) {
result.put(word, 1);
} else {
result.put(word, result.get(word) + 1);
}
}
//print result
for (Map.Entry<String, Integer> entry : result.entrySet()) {
System.out.println(String.format("%s -- %s times", entry.getKey(), entry.getValue()));
}
Output:
zvcxc -- 1 times
zxc -- 3 times
asdf -- 2 times

Related

number of element occurence

I'm trying to find number of element occurrence using treeset and hashmap.
when i'm running the program, value is not increasing in hashmap
I've tried map.put(data,map.get(data)+1) it is causing null pointer exception.
public class ReadData {
public static void main(String[] args) {
File f = new File("E:\\new1.txt");
try {
BufferedReader br = new BufferedReader(new FileReader(f));
String data = "";
int count =1;
HashMap<String,Integer> map = null;
TreeSet<String> set = new TreeSet<String>();
set.add("");
while((data=br.readLine())!=null) {
map = new HashMap<String,Integer>();
if(set.contains(data)) {
map.put(data,map.get(data)+1);
System.out.println("correct");
System.out.println(count+1);
}else
{
map.put(data,count);
set.add(data);
System.out.println("Not correct");
}
//System.out.println(map);
Set sets = map.entrySet();
Iterator iterator = sets.iterator();
while(iterator.hasNext()) {
Map.Entry mentry = (Map.Entry)iterator.next();
System.out.print("key is: "+ mentry.getKey() + " & Value is: ");
System.out.println(mentry.getValue());
}
}
}catch(Exception e) {
System.out.println(e);
}
}
}
input:- orange
apple
orange
orange
expeted o/p key is orange & value is 3
key is apple & value is 1
The output is key is: orange & Value is: 1
key is: apple & Value is: 1
java.lang.NullPointerException
You can do it cleaner using streams, with Collectors.groupingBy() and Collectors.counting(). You should also use try-with-resource construct and new Files class:
String delimiter = " ";
Path p = Paths.get("E:", "file.txt");
try (BufferedReader br = Files.newBufferedReader(p)) {
Map<String, Long> result = br.lines()
.flatMap(l -> Arrays.stream(l.split(delimiter)))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
System.out.println(result);
}
For orange apple orange orange input this code will print {orange=3, apple=1}.
Please notices that
HashMap<String,Integer> map = null;
is not the same as an empty map. First all you must create a new map before to use it.
It this case use for example
HashMap<String,Integer> map = null;
An then you are creating into the loop a new map, this is hard to read for your purpose. I would suggest just instantiate your map together with the set and remove
map = new HashMap<String,Integer>();
inside while loop
Your code should look like
HashMap<String, Integer> map = new HashMap<String, Integer>();
TreeSet<String> set = new TreeSet<String>();
set.add("");
while ((data = br.readLine()) != null) {
You can also use TreeMap instead of using HashMap + TreeSet.
public class ReadData {
public static void main(String[] args) {
try {
File f = new File("E:\\new1.txt");
BufferedReader br = new BufferedReader(new FileReader(f));
TreeMap<String,Integer> map = new TreeMap(String, Integer);
while((String data=br.readLine()) != null) {
String[] fruitNames = data.split(" "); // or regex s+ can also be used
for(String fruitName : fruitNames){
Integer count = map.get(fruitName);
Integer newVal = count == null ? 1 : count+1 ;
map.put(fruitName, newVal);
}
// iterate over keys in TreeMap
}
}catch(Exception e) {
System.out.println(e);
}
}
}
If you want to count the occurrences of a string, you can simply use StringUtils.countMatches from
Apache Commons lang.
//First get all the words from your line -
String[] allWords = data.split("\\s");
//Retrieve unique strings
String[] uniqueStrings = Arrays.stream(allWords).distinct().toArray(String[]::new);
// Print the occurrence of each string in data
for (String word: uniqueStrings){
System.out.println("Count of occurrences for the word " + word + "is: " + StringUtils.countMatches(data, word));
}

Calculating Word Frequency Using StreamTokenizer () , HashMap() , HashSet(). in Java Core

import java.io.*;
import java.util.*;
class A {
public static void main(String args[]) throws Exception {
Console con = System.console();
String str;
int i=0;
HashMap map = new HashMap();
HashSet set = new HashSet();
System.out.println("Enter File Name : ");
str = con.readLine();
File f = new File(str);
f.createNewFile();
FileInputStream fis = new FileInputStream(str);
StreamTokenizer st = new StreamTokenizer(fis);
while(st.nextToken()!=StreamTokenizer.TT_EOF) {
String s;
switch(st.ttype) {
case StreamTokenizer.TT_NUMBER: s = st.nval+"";
break;
case StreamTokenizer.TT_WORD: s = st.sval;
break;
default: s = ""+((char)st.ttype);
}
map.put(i+"",s);
set.add(s);
i++;
}
Iterator iter = set.iterator();
System.out.println("Frequency Of Words :");
while(iter.hasNext()) {
String word;
int count=0;
word=(String)iter.next();
for(int j=0; j<i ; j++) {
String word2;
word2=(String)map.get(j+"");
if(word.equals(word2))
count++;
}
System.out.println(" WORD : "+ word+" = "+count);
}
System.out.println("Total Words In Files: "+i);
}
}
In This code First I have already created a text file which contains the following data :
# Hello Hii World # * c++ java salesforce
And the output of this code is :
**Frequency Of Words :
WORD : # = 1
WORD : # = 1
WORD : c = 1
WORD : salesforce = 1
WORD : * = 1
WORD : Hii = 1
WORD : + = 2
WORD : java = 1
WORD : World = 1
WORD : Hello = 1
Total Words In Files: 11**
where i am unable to find why this shows c++ as a seperate words . I
want to combine c++ as a single word as in the output
You can do it in this way
// Create the file at path specified in the String str
// ...
HashMap<String, Integer> map = new HashMap<>();
InputStream fis = new FileInputStream(str);
Reader bufferedReader = new BufferedReader(new InputStreamReader(fis));
StreamTokenizer st = new StreamTokenizer(bufferedReader);
st.wordChars('+', '+');
while(st.nextToken() != StreamTokenizer.TT_EOF) {
String s;
switch(st.ttype) {
case StreamTokenizer.TT_NUMBER:
s = String.valueOf(st.nval);
break;
case StreamTokenizer.TT_WORD:
s = st.sval;
break;
default:
s = String.valueOf((char)st.ttype);
}
Integer val = map.get(s);
if(val == null)
val = 1;
else
val++;
map.put(s, val);
}
Set<String> keySet = map.keySet();
Iterator<String> iter = keySet.iterator();
System.out.println("Frequency Of Words :");
int sum = 0;
while(iter.hasNext()) {
String word = iter.next();
int count = map.get(word);
sum += count;
System.out.println(" WORD : " + word + " = " + count);
}
System.out.println("Total Words In Files: " + sum);
Note that I've updated your code using Generics instead of the raw version of HashMap and Iterator. Moreover, the constructor you used for StreamTokenizer was deprecated. The use of both map and set was useless because you can iterate over the key set of the map using .keySet() method. The map now goes from String (the word) to Integer (the number of word count).
Anyway, regarding the example you did, I think that a simple split method would have been more appropriate.
For further information about the wordChars method of StreamTokenizer you can give a look at #wordChars(int, int)

Assign a unique key to repeated Arraylist items. and Keep track of Ordering in java

I have a data like :
in an arraylist of Strings I am collecting names .
example:
souring.add(some word);
later I have something in souring = {a,b,c,d,d,e,e,e,f}
I want to assign each element a key like:
0=a
1=b
2=c
3=d
3=d
4=e
4=e
4=e
5=f
and then I store all ordering keys in an array . like:
array= [0,1,2,3,3,4,4,4,5]
heres my code on which I am working :
public void parseFile(String path){
String myData="";
try {
BufferedReader br = new BufferedReader(new FileReader(path)); {
int remainingLines = 0;
String stringYouAreLookingFor = "";
for(String line1; (line1 = br.readLine()) != null; ) {
myData = myData + line1;
if (line1.contains("relation ") && line1.endsWith(";")) {
remainingLines = 4;//<Number of Lines you want to read after keyword>;
stringYouAreLookingFor += line1;
String everyThingInsideParentheses = stringYouAreLookingFor.replaceFirst(".*\\((.*?)\\).*", "$1");
String[] splitItems = everyThingInsideParentheses.split("\\s*,\\s*");
String[] sourceNode = new String[10];
String[] destNode = new String[15];
int i=0;
int size = splitItems.length;
int no_of_sd=size;
tv.setText(tv.getText()+"size " + size + "\n"+"\n"+"\n");
sourceNode[0]=splitItems[i];
// here I want to check and assign keys and track order...
souring.add(names);
if(size==2){
destNode[0]=splitItems[i+1];
tv.setText(tv.getText()+"dest node = " + destNode[0] +"\n"+"\n"+"\n");
destination.add(destNode[0]);
}
else{
tv.setText(tv.getText()+"dest node = No destination found"+"\n"+"\n"+"\n");
}
} else if (remainingLines > 0) {
remainingLines--;
stringYouAreLookingFor += line1;
}
}
br.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
How can I do this?
can any one help me in this..?
I would advise you to use ArrayList instead of String[]
So, if you want to add an element you just write
ArrayList<String> list = new ArrayList<String>;
list.add("whatever you want");
Then, if you want to avoid repetitions just use the following concept:
if(!list.contains(someString)){
list.add(someString);
}
And if you want to reach some element you just type:
list.get(index);
Or you can easily find an index of an element
int index=list.indexOf(someString);
Hope it helps!
Why don't you give it a try, its take time to understand what you actually want.
HashMap<Integer,String> storeValueWithKey=new HashMap<>();
// let x=4 be same key and y="x" be new value you want to insert
if(storeValueWithKey.containsKey(x))
storeValueWithKey.get(x)+=","+y;
else
storeValueWithKey.put(z,y); //Here z is new key
//Than for searching ,let key=4 be value and searchValue="a"
ArrayList<String> searchIn=new ArrayList<>(Arrays.asList(storeValueWithKey.get("key").split(",")));
if(searchIn.contains("searchValue"))
If problem still persist than comment

counting unique occurrences of string in document

I am reading a logfile into java. For each line in the logfile, I am checking to see if the line contains an ip address. If the line contains an ip address, I want to then +1 to the count of the number of times that ip address showed up in the log file. How can I accomplish this in Java?
The code below successfully extracts the ip address from each line that contains an ip address, but the process for counting occurrences of ip addresses does not work.
void read(String fileName) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)));
int counter = 0;
ArrayList<IPHolder> ips = new ArrayList<IPHolder>();
try {
String line;
while ((line = br.readLine()) != null) {
if(!getIP(line).equals("0.0.0.0")){
if(ips.size()==0){
IPHolder newIP = new IPHolder();
newIP.setIp(getIP(line));
newIP.setCount(0);
ips.add(newIP);
}
for(int j=0;j<ips.size();j++){
if(ips.get(j).getIp().equals(getIP(line))){
ips.get(j).setCount(ips.get(j).getCount()+1);
}else{
IPHolder newIP = new IPHolder();
newIP.setIp(getIP(line));
newIP.setCount(0);
ips.add(newIP);
}
}
if(counter % 1000 == 0){System.out.println(counter+", "+ips.size());}
counter+=1;
}
}
} finally {br.close();}
for(int k=0;k<ips.size();k++){
System.out.println("ip, count: "+ips.get(k).getIp()+" , "+ips.get(k).getCount());
}
}
public String getIP(String ipString){//extracts an ip from a string if the string contains an ip
String IPADDRESS_PATTERN =
"(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";
Pattern pattern = Pattern.compile(IPADDRESS_PATTERN);
Matcher matcher = pattern.matcher(ipString);
if (matcher.find()) {
return matcher.group();
}
else{
return "0.0.0.0";
}
}
The holder class is:
public class IPHolder {
private String ip;
private int count;
public String getIp(){return ip;}
public void setIp(String i){ip=i;}
public int getCount(){return count;}
public void setCount(int ct){count=ct;}
}
The key word to search for is HashMap in this case.
A HashMap is a list of key value pairs (in this case pairs of ips and their count).
"192.168.1.12" - 12
"192.168.1.13" - 17
"192.168.1.14" - 9
and so on.
It is much easier to use and access than to always iterate over your array of container objects to find out whether there already is a container for that ip or not.
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(/*Your file */)));
HashMap<String, Integer> occurrences = new HashMap<String, Integer>();
String line = null;
while( (line = br.readLine()) != null) {
// Iterate over lines and search for ip address patterns
String[] addressesFoundInLine = ...;
for(String ip: addressesFoundInLine ) {
// Did you already have that address in your file earlier? If yes, increase its counter by
if(occurrences.containsKey(ip))
occurrences.put(ip, occurrences.get(ip)+1);
// If not, create a new entry for this address
else
occurrences.put(ip, 1);
}
}
// TreeMaps are automatically orered if their elements implement 'Comparable' which is the case for strings and integers
TreeMap<Integer, ArrayList<String>> turnedAround = new TreeMap<Integer, ArrayList<String>>();
Set<Entry<String, Integer>> es = occurrences.entrySet();
// Switch keys and values of HashMap and create a new TreeMap (in case there are two ips with the same count, add them to a list)
for(Entry<String, Integer> en: es) {
if(turnedAround.containsKey(en.getValue()))
turnedAround.get(en.getValue()).add((String) en.getKey());
else {
ArrayList<String> ips = new ArrayList<String>();
ips.add(en.getKey());
turnedAround.put(en.getValue(), ips);
}
}
// Print out the values (if there are two ips with the same counts they are printed out without an special order, that would require another sorting step)
for(Entry<Integer, ArrayList<String>> entry: turnedAround.entrySet()) {
for(String s: entry.getValue())
System.out.println(s + " - " + entry.getKey());
}
In my case the output was the following:
192.168.1.19 - 4
192.168.1.18 - 7
192.168.1.27 - 19
192.168.1.13 - 19
192.168.1.12 - 28
I answered this question about half an hour ago and I guess that is exactly what you are searching for, so if you need some example code, take a look at it.
Here is some code that uses a HashMap to store the IPs and a regex to match them in each line. It uses try-with-resources to automatically close the file.
EDIT: I added code to print in descending order like you asked in the other answer.
void read(String fileName) throws IOException {
//Step 1 find and register IPs and store their occurence counts
HashMap<String, Integer> ipAddressCounts = new HashMap<>();
try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)))) {
Pattern findIPAddrPattern = Pattern.compile("((\\d+.){3}\\d+)");
String line;
while ((line = br.readLine()) != null) {
Matcher matcher = findIPAddrPattern.matcher(line);
while (matcher.find()) {
String ipAddr = matcher.group(0);
if ( ipAddressCounts.get(ipAddr) == null ) {
ipAddressCounts.put(ipAddr, 1);
}
else {
ipAddressCounts.put(ipAddr, ipAddressCounts.get(ipAddr) + 1);
}
}
}
}
//Step 2 reverse the map to store IPs by their frequency
HashMap<Integer, HashSet<String>> countToAddrs = new HashMap<>();
for (Map.Entry<String, Integer> entry : ipAddressCounts.entrySet()) {
Integer count = entry.getValue();
if ( countToAddrs.get(count) == null )
countToAddrs.put(count, new HashSet<String>());
countToAddrs.get(count).add(entry.getKey());
}
//Step 3 sort and print the ip addreses, most frequent first
ArrayList<Integer> allCounts = new ArrayList<>(countToAddrs.keySet());
Collections.sort(allCounts, Collections.reverseOrder());
for (Integer count : allCounts) {
for (String ip : countToAddrs.get(count)) {
System.out.println("ip, count: " + ip + " , " + count);
}
}
}

How to get the frequently occuring words from the text extracted using tika

I have extracted text for multiple file formats(pdf,html,doc) using below code(using tika)
File file1 = new File("c://sample.pdf);
InputStream input = new FileInputStream(file1);
BodyContentHandler handler = new BodyContentHandler(10*1024*1024);
JSONObject obj = new JSONObject();
obj.put("Content",handler.toString());
Now my requirement is to get the frequently occurring words from the extracted content, can u please suggest me how to do this.
Thanks
Here's a function to the most frequent word.
You need to pass the content to the function, and you get the frequently occurring word.
String getMostFrequentWord(String input) {
String[] words = input.split(" ");
// Create a dictionary using word as key, and frequency as value
Map<String, Integer> dictionary = new HashMap<String, Integer>();
for (String word : words) {
if (dictionary.containsKey(word)) {
int frequency = dictionary.get(word);
dictionary.put(word, frequency + 1);
} else {
dictionary.put(word, 1);
}
}
int max = 0;
String mostFrequentWord = "";
Set<Entry<String, Integer>> set = dictionary.entrySet();
for (Entry<String, Integer> entry : set) {
if (entry.getValue() > max) {
max = entry.getValue();
mostFrequentWord = entry.getKey();
}
}
return mostFrequentWord;
}
The algorithm is O(n) so the performance should be okay.

Categories

Resources