Input/Output blank space - java

In this code, I get all words from a file and count them. After, that write them and their frequencies in a file.
This code is doing what i want exactly but additionally it count all blank spaces and write them to file , too. How can i not include them?
String line;
BigDecimal count = new BigDecimal(0);
ArrayList<String> words = new ArrayList<String>();
Pattern pattern = Pattern.compile("[^a-zA-Z]", Pattern.CASE_INSENSITIVE);
while ((line = reader.readLine()) != null) {
String string1 = line.toLowerCase();
String string[] = pattern.split(string1);
for (String s : string) {
words.add(s);
}
}
Map<String, BigDecimal> map = new HashMap<String, BigDecimal>();
for (String s : words) {
BigDecimal x = new BigDecimal(1);
if (map.containsKey(s)) {
count = map.get(s);
map.put(s, count.add(x));
} else if (!map.containsKey(s)) {
map.put(s, x);
}
}
Map<String, BigDecimal> wordHistogram = map;
List<Entry<String, BigDecimal>> sortedWordHistogram = new LinkedList<Entry<String, BigDecimal>>(
wordHistogram.entrySet());
Collections.sort(sortedWordHistogram, (o1, o2) -> o2.getValue().compareTo(o1.getValue()));
Map<String, BigDecimal> inTxt = map;
for (Entry<String, BigDecimal> entry : sortedWordHistogram) {
inTxt.put(entry.getKey(), entry.getValue());
writer.write(entry.getKey() + " : " + entry.getValue() + "\n");
}
I believe it is efficient enough but any adjustment to make it better or more efficient is pleased.

Simply replace your regex ([^a-zA-Z]) with \\s+.
This will make sure all the spaces between the words are considered while splitting a line.
Also, you can simplify your code further by replacing the following lines:
Pattern pattern = Pattern.compile("[^a-zA-Z]", Pattern.CASE_INSENSITIVE);
while ((line = reader.readLine()) != null) {
String string1 = line.toLowerCase();
String string[] = pattern.split(string1);
for (String s : string) {
words.add(s);
}
}
with
while ((line = reader.readLine()) != null) {
String string[] = line.trim().toLowerCase().split("\\s+");
for (String s : string) {
words.add(s);
}
}
Note that I have also used trim() additionally in order to remove the leading and trailing whitespace characters from the line before splitting it.

Related

number of element occurence

I'm trying to find number of element occurrence using treeset and hashmap.
when i'm running the program, value is not increasing in hashmap
I've tried map.put(data,map.get(data)+1) it is causing null pointer exception.
public class ReadData {
public static void main(String[] args) {
File f = new File("E:\\new1.txt");
try {
BufferedReader br = new BufferedReader(new FileReader(f));
String data = "";
int count =1;
HashMap<String,Integer> map = null;
TreeSet<String> set = new TreeSet<String>();
set.add("");
while((data=br.readLine())!=null) {
map = new HashMap<String,Integer>();
if(set.contains(data)) {
map.put(data,map.get(data)+1);
System.out.println("correct");
System.out.println(count+1);
}else
{
map.put(data,count);
set.add(data);
System.out.println("Not correct");
}
//System.out.println(map);
Set sets = map.entrySet();
Iterator iterator = sets.iterator();
while(iterator.hasNext()) {
Map.Entry mentry = (Map.Entry)iterator.next();
System.out.print("key is: "+ mentry.getKey() + " & Value is: ");
System.out.println(mentry.getValue());
}
}
}catch(Exception e) {
System.out.println(e);
}
}
}
input:- orange
apple
orange
orange
expeted o/p key is orange & value is 3
key is apple & value is 1
The output is key is: orange & Value is: 1
key is: apple & Value is: 1
java.lang.NullPointerException
You can do it cleaner using streams, with Collectors.groupingBy() and Collectors.counting(). You should also use try-with-resource construct and new Files class:
String delimiter = " ";
Path p = Paths.get("E:", "file.txt");
try (BufferedReader br = Files.newBufferedReader(p)) {
Map<String, Long> result = br.lines()
.flatMap(l -> Arrays.stream(l.split(delimiter)))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
System.out.println(result);
}
For orange apple orange orange input this code will print {orange=3, apple=1}.
Please notices that
HashMap<String,Integer> map = null;
is not the same as an empty map. First all you must create a new map before to use it.
It this case use for example
HashMap<String,Integer> map = null;
An then you are creating into the loop a new map, this is hard to read for your purpose. I would suggest just instantiate your map together with the set and remove
map = new HashMap<String,Integer>();
inside while loop
Your code should look like
HashMap<String, Integer> map = new HashMap<String, Integer>();
TreeSet<String> set = new TreeSet<String>();
set.add("");
while ((data = br.readLine()) != null) {
You can also use TreeMap instead of using HashMap + TreeSet.
public class ReadData {
public static void main(String[] args) {
try {
File f = new File("E:\\new1.txt");
BufferedReader br = new BufferedReader(new FileReader(f));
TreeMap<String,Integer> map = new TreeMap(String, Integer);
while((String data=br.readLine()) != null) {
String[] fruitNames = data.split(" "); // or regex s+ can also be used
for(String fruitName : fruitNames){
Integer count = map.get(fruitName);
Integer newVal = count == null ? 1 : count+1 ;
map.put(fruitName, newVal);
}
// iterate over keys in TreeMap
}
}catch(Exception e) {
System.out.println(e);
}
}
}
If you want to count the occurrences of a string, you can simply use StringUtils.countMatches from
Apache Commons lang.
//First get all the words from your line -
String[] allWords = data.split("\\s");
//Retrieve unique strings
String[] uniqueStrings = Arrays.stream(allWords).distinct().toArray(String[]::new);
// Print the occurrence of each string in data
for (String word: uniqueStrings){
System.out.println("Count of occurrences for the word " + word + "is: " + StringUtils.countMatches(data, word));
}

get count number of HashMap value

Using the code from this link loading text file contents to GUI:
Map<String, String> sections = new HashMap<>();
Map<String, String> sections2 = new HashMap<>();
String s = "", lastKey="";
try (BufferedReader br = new BufferedReader(new FileReader("input.txt"))) {
while ((s = br.readLine()) != null) {
String k = s.substring(0, 10).trim();
String v = s.substring(10, s.length() - 50).trim();
if (k.equals(""))
k = lastKey;
if(sections.containsKey(k))
v = sections.get(k) + v;
sections.put(k,v);
lastKey = k;
}
} catch (IOException e) {
}
System.out.println(sections.get("AUTHOR"));
System.out.println(sections2.get("TITLE"));
In case of if contents of input.txt:
AUTHOR authors name
authors name
authors name
authors name
TITLE Sound, mobility and landscapes of exhibition: radio-guided
tours at the Science Museum
Now I want to count the values in HashMap, but sections.size() counting all data line stored in text file.
I w'd like to ask how can I count the items, i.e. values v in sections? How can I get number 4, according to authors name?
Since the AUTHOR has a 1 to many relationship, you should map it to a List structure instead of a String.
For example:
Map<String, ArrayList<String>> sections = new HashMap<>();
Map<String, String> sections2 = new HashMap<>();
String s = "", lastKey="";
try (BufferedReader br = new BufferedReader(new FileReader("input.txt"))) {
while ((s = br.readLine()) != null) {
String k = s.substring(0, 10).trim();
String v = s.substring(10, s.length() - 50).trim();
if (k.equals(""))
k = lastKey;
ArrayList<String> authors = null;
if(sections.containsKey(k))
{
authors = sections.get(k);
}
else
{
authors = new ArrayList<String>();
sections.put(k, authors);
}
authors.add(v);
lastKey = k;
}
} catch (IOException e) {
}
// to get the number of authors
int numOfAuthors = sections.get("AUTHOR").size();
// convert the list to a string to load it in a GUI
String authors = "";
for (String a : sections.get("AUTHOR"))
{
authors += a;
}

counting unique occurrences of string in document

I am reading a logfile into java. For each line in the logfile, I am checking to see if the line contains an ip address. If the line contains an ip address, I want to then +1 to the count of the number of times that ip address showed up in the log file. How can I accomplish this in Java?
The code below successfully extracts the ip address from each line that contains an ip address, but the process for counting occurrences of ip addresses does not work.
void read(String fileName) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)));
int counter = 0;
ArrayList<IPHolder> ips = new ArrayList<IPHolder>();
try {
String line;
while ((line = br.readLine()) != null) {
if(!getIP(line).equals("0.0.0.0")){
if(ips.size()==0){
IPHolder newIP = new IPHolder();
newIP.setIp(getIP(line));
newIP.setCount(0);
ips.add(newIP);
}
for(int j=0;j<ips.size();j++){
if(ips.get(j).getIp().equals(getIP(line))){
ips.get(j).setCount(ips.get(j).getCount()+1);
}else{
IPHolder newIP = new IPHolder();
newIP.setIp(getIP(line));
newIP.setCount(0);
ips.add(newIP);
}
}
if(counter % 1000 == 0){System.out.println(counter+", "+ips.size());}
counter+=1;
}
}
} finally {br.close();}
for(int k=0;k<ips.size();k++){
System.out.println("ip, count: "+ips.get(k).getIp()+" , "+ips.get(k).getCount());
}
}
public String getIP(String ipString){//extracts an ip from a string if the string contains an ip
String IPADDRESS_PATTERN =
"(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";
Pattern pattern = Pattern.compile(IPADDRESS_PATTERN);
Matcher matcher = pattern.matcher(ipString);
if (matcher.find()) {
return matcher.group();
}
else{
return "0.0.0.0";
}
}
The holder class is:
public class IPHolder {
private String ip;
private int count;
public String getIp(){return ip;}
public void setIp(String i){ip=i;}
public int getCount(){return count;}
public void setCount(int ct){count=ct;}
}
The key word to search for is HashMap in this case.
A HashMap is a list of key value pairs (in this case pairs of ips and their count).
"192.168.1.12" - 12
"192.168.1.13" - 17
"192.168.1.14" - 9
and so on.
It is much easier to use and access than to always iterate over your array of container objects to find out whether there already is a container for that ip or not.
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(/*Your file */)));
HashMap<String, Integer> occurrences = new HashMap<String, Integer>();
String line = null;
while( (line = br.readLine()) != null) {
// Iterate over lines and search for ip address patterns
String[] addressesFoundInLine = ...;
for(String ip: addressesFoundInLine ) {
// Did you already have that address in your file earlier? If yes, increase its counter by
if(occurrences.containsKey(ip))
occurrences.put(ip, occurrences.get(ip)+1);
// If not, create a new entry for this address
else
occurrences.put(ip, 1);
}
}
// TreeMaps are automatically orered if their elements implement 'Comparable' which is the case for strings and integers
TreeMap<Integer, ArrayList<String>> turnedAround = new TreeMap<Integer, ArrayList<String>>();
Set<Entry<String, Integer>> es = occurrences.entrySet();
// Switch keys and values of HashMap and create a new TreeMap (in case there are two ips with the same count, add them to a list)
for(Entry<String, Integer> en: es) {
if(turnedAround.containsKey(en.getValue()))
turnedAround.get(en.getValue()).add((String) en.getKey());
else {
ArrayList<String> ips = new ArrayList<String>();
ips.add(en.getKey());
turnedAround.put(en.getValue(), ips);
}
}
// Print out the values (if there are two ips with the same counts they are printed out without an special order, that would require another sorting step)
for(Entry<Integer, ArrayList<String>> entry: turnedAround.entrySet()) {
for(String s: entry.getValue())
System.out.println(s + " - " + entry.getKey());
}
In my case the output was the following:
192.168.1.19 - 4
192.168.1.18 - 7
192.168.1.27 - 19
192.168.1.13 - 19
192.168.1.12 - 28
I answered this question about half an hour ago and I guess that is exactly what you are searching for, so if you need some example code, take a look at it.
Here is some code that uses a HashMap to store the IPs and a regex to match them in each line. It uses try-with-resources to automatically close the file.
EDIT: I added code to print in descending order like you asked in the other answer.
void read(String fileName) throws IOException {
//Step 1 find and register IPs and store their occurence counts
HashMap<String, Integer> ipAddressCounts = new HashMap<>();
try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)))) {
Pattern findIPAddrPattern = Pattern.compile("((\\d+.){3}\\d+)");
String line;
while ((line = br.readLine()) != null) {
Matcher matcher = findIPAddrPattern.matcher(line);
while (matcher.find()) {
String ipAddr = matcher.group(0);
if ( ipAddressCounts.get(ipAddr) == null ) {
ipAddressCounts.put(ipAddr, 1);
}
else {
ipAddressCounts.put(ipAddr, ipAddressCounts.get(ipAddr) + 1);
}
}
}
}
//Step 2 reverse the map to store IPs by their frequency
HashMap<Integer, HashSet<String>> countToAddrs = new HashMap<>();
for (Map.Entry<String, Integer> entry : ipAddressCounts.entrySet()) {
Integer count = entry.getValue();
if ( countToAddrs.get(count) == null )
countToAddrs.put(count, new HashSet<String>());
countToAddrs.get(count).add(entry.getKey());
}
//Step 3 sort and print the ip addreses, most frequent first
ArrayList<Integer> allCounts = new ArrayList<>(countToAddrs.keySet());
Collections.sort(allCounts, Collections.reverseOrder());
for (Integer count : allCounts) {
for (String ip : countToAddrs.get(count)) {
System.out.println("ip, count: " + ip + " , " + count);
}
}
}

OutOfMemoryError: Java heap space-ArrayLists Java

for(int i=0; i<words.size(); i++){
for(int j=0; j<Final.size(); j++){
if(words.get(i)==Final.get(j)){
temp=times.get(j);
temp=temp+1;
times.set(j, temp);
}
else{
Final.add(words.get(i));
times.add(1);
}
}
}
I want to create two ArrayLists; times(integers) and Final(String). The ArrayList "words" includes words of a string and some words are shown multiple times. What Im trying to do is add every word(but just once) of the "words" to the "Final", and add th number(how many times this word appears on the "words") to the "times" . Is something wrong?
Because I get OutOfMemoryError: Java heap space
I also think using a Hashmap is the best solution.
In your code, there is an error, maybe your problem is here.
Replace the following :
if(words.get(i)==Final.get(j)){
By :
if(words.get(i).equals(Final.get(j))){
you don't require two arrays to find out word and its count. you can get this detail after using hashmap. this hashmap contains key as your word and value will be its count.
like one hashmap
Map<String, Integer> words = new HashMap<String, Integer>();
and then you can use this map by following way
try {
//getting content from file.
Scanner inputFile = new Scanner(new File("d:\\test.txt"));
//reading line by line
while (inputFile.hasNextLine()) {
// SringTokenize is automatically divide the string with space.
StringTokenizer tokenizer = new StringTokenizer(
inputFile.nextLine());
while (tokenizer.hasMoreTokens()) {
String word = tokenizer.nextToken();
// If the HashMap already contains the key, increment the
// value
if (words.containsKey(word)) {
words.put(word, words.get(word) + 1);
}
// Otherwise, set the value to 1
else {
words.put(word, 1);
}
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}// Loop through the HashMap and print the results
for (Entry<String, Integer> entry : words.entrySet()) {
String key = entry.getKey();
Integer value = entry.getValue();
System.out.println("Word"+key + ": its occurance " + value);
If you are going to run out of memory it would be trying to read all the words into a collection. I suggest you not do this and instead count the words as you get them.
e.g.
Map<String, Integer> freq = new HashMap<>();
try(BufferedReader br = new BufferedReader(new FileReader(filename))) {
for(String line; (line = br.readLine()) != null; ) {
for(String word : line.trim().split("\\s+")) {
Integer count = freq.get(word);
freq.put(word, count == null ? 1 : 1 + count);
}
}
}
try this example.
String[] words = {"asdf","zvcxc", "asdf", "zxc","zxc", "zxc"};
Map<String, Integer> result = new HashMap<String, Integer>();
for (String word : words) {
if (!result.containsKey(word)) {
result.put(word, 1);
} else {
result.put(word, result.get(word) + 1);
}
}
//print result
for (Map.Entry<String, Integer> entry : result.entrySet()) {
System.out.println(String.format("%s -- %s times", entry.getKey(), entry.getValue()));
}
Output:
zvcxc -- 1 times
zxc -- 3 times
asdf -- 2 times

How to find size of ArrayList<String> in my map?

I want to find the size of each value from the key-value pair in Map<Integer, ArrayList<String>>. Simply writing list.size() does not work.
Here's my code:
public void getF() throws Exception {
BufferedReader br2 =
new BufferedReader(
new FileReader("/home/abc/NetBeansProjects/network1.txt"));
System.out.println("hello" +r.usr);
while ((s= br2.readLine()) != null) {
String F[]= s.split(":");
for (String uid : F) {
if (uid == F[0]) {
user.add(uid);
} else {
li = followee.get(Integer.valueOf(F[0]));
if (li == null) {
followee.put(Integer.valueOf(F[0]), li= new ArrayList<String>());
}
li.add(uid);
}
System.out.println(followee);
int g = li.size();
System.out.println("g:" +g);
[...]
}
}
}
Why am I not getting correct size on last line?
Try to follow the data structures, by keeping the variable as close to their usage.
(I know in other languages the convention is to declare them at the top.)
Here li should be kept at the begin of a while-step. And its more natural to handle f[0] outside the loop, instead of for+if. I think the latter put you on the wrong foot.
Set<String> user = new HashSet<>();
Map<Integer, List<String>> followee = new HashMap<>();
String s;
while ((s = br2.readLine()) != null) {
// s has the format "key:value value value"
String keyAndValues[] = s.split(":", 2);
if (keyAndValues.length != 2) {
continue;
}
Integer key = Integer.valueOf(keyAndValues[0]);
String values = keyAndValues[1];
user.add(keyAndValues[0]);
List<String> li = followee.get(key);
if (li == null) {
li = new ArrayList<>();
followee.put(key, li);
}
Collections.addAll(values.split(" +");
System.out.println(followee);
int g = li.size();
System.out.println("g:" + g);
//[...]
}

Categories

Resources