Java read txt file to hashmap, split by ":"

Java read txt file to hashmap, split by ":" - java

I have a txt file with the form:
Key:value
Key:value
Key:value
...
I want to put all the keys with their value in a hashMap that I've created. How do I get a FileReader(file) or Scanner(file) to know when to split up the keys and values at the colon (:) ? :-)
I've tried:
Scanner scanner = new scanner(file).useDelimiter(":");
HashMap<String, String> map = new Hashmap<>();
while(scanner.hasNext()){
map.put(scanner.next(), scanner.next());
}

Read your file line-by-line using a BufferedReader, and for each line perform a split on the first occurrence of : within the line (and if there is no : then we ignore that line).
Here is some example code - it avoids the use of Scanner (which has some subtle behaviors and imho is actually more trouble than its worth).
public static void main( String[] args ) throws IOException
{
String filePath = "test.txt";
HashMap<String, String> map = new HashMap<String, String>();
String line;
BufferedReader reader = new BufferedReader(new FileReader(filePath));
while ((line = reader.readLine()) != null)
{
String[] parts = line.split(":", 2);
if (parts.length >= 2)
{
String key = parts[0];
String value = parts[1];
map.put(key, value);
} else {
System.out.println("ignoring line: " + line);
}
}
for (String key : map.keySet())
{
System.out.println(key + ":" + map.get(key));
}
reader.close();
}

The below will work in java 8.
The .filter(s -> s.matches("^\\w+:\\w+$")) will mean it only attempts to work on line in the file which are two strings separated by :, obviously fidling with this regex will change what it will allow through.
The .collect(Collectors.toMap(k -> k.split(":")[0], v -> v.split(":")[1])) will work on any lines which match the previous filter, split them on : then use the first part of that split as the key in a map entry, then the second part of that split as the value in the map entry.
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Map;
import java.util.stream.Collectors;
public class Foo {
public static void main(String[] args) throws IOException {
String filePath = "src/main/resources/somefile.txt";
Path path = FileSystems.getDefault().getPath(filePath);
Map<String, String> mapFromFile = Files.lines(path)
.filter(s -> s.matches("^\\w+:\\w+"))
.collect(Collectors.toMap(k -> k.split(":")[0], v -> v.split(":")[1]));
}
}

One more JDK 1.8 implementation.
I suggest using try-with-resources and forEach iterator with putIfAbsent() method to avoid java.lang.IllegalStateException: Duplicate key value if there are some duplicate values in the file.
FileToHashMap.java
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Map;
import java.util.HashMap;
import java.util.stream.Stream;
public class FileToHashMap {
public static void main(String[] args) throws IOException {
String delimiter = ":";
Map<String, String> map = new HashMap<>();
try(Stream<String> lines = Files.lines(Paths.get("in.txt"))){
lines.filter(line -> line.contains(delimiter)).forEach(
line -> map.putIfAbsent(line.split(delimiter)[0], line.split(delimiter)[1])
);
}
System.out.println(map);
}
}
in.txt
Key1:value 1
Key1:duplicate key
Key2:value 2
Key3:value 3
The output is:
{Key1=value 1, Key2=value 2, Key3=value 3}

I would do it like this
Properties properties = new Properties();
properties.load(new FileInputStream(Path of the File));
for (Map.Entry<Object, Object> entry : properties.entrySet()) {
myMap.put((String) entry.getKey(), (String) entry.getValue());
}

Related

Java. Extracting character from array that isn't ASCII

I'm trying to extract a certain character from a buffer that isn't ASCII. I'm reading in a file that contains movie names that have some non ASCII character sprinkled in it like so.
1|Tóy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Gét Shorty (1995)
I was able to pick off the lines that contained the non ASCII characters, but I'm trying to figure out how to get that particular character from the lines that have said non ASCII character and replace it with an ACSII character from the map I've made.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) {
HashMap<Character, Character>Char_Map = new HashMap<>();
Char_Map.put('o','ó');
Char_Map.put('e','é');
Char_Map.put('i','ï');
for(Map.Entry<Character,Character> entry: Char_Map.entrySet())
{
System.out.println(entry.getKey() + " -> "+ entry.getValue());
}
try
{
BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
String contentLine= br.readLine();
while(contentLine != null)
{
String[] contents = contentLine.split("\\|");
boolean result = contents[1].matches("\\A\\p{ASCII}*\\z");
if(!result)
{
System.out.println(contentLine);
//System.out.println();
}
contentLine= br.readLine();
}
}
catch (IOException ioe)
{
System.out.println("Cannot open file as it doesn't exist");
}
}
}
I tried using something along the lines of:
if((contentLine.charAt(i) == something
But I'm not sure.

You can just use replaceAll. Put this in the while loop, so that it works on each line you read from the file. With this change, you won't need the split and if (... matches) anymore.
contentLine.replaceAll("ó", "o");
contentLine.replaceAll("é", "e");
contentLine.replaceAll("ï", "i");
If you want to keep a map, just iterate over its keys and replace with the values you want to map to:
Map<String, String> map = new HashMap<>();
map.put("ó", "o");
// ... and all the others
Later, in your loop reading the contents, you replace all the characters:
for (Map.Entry<String, String> entry : map.entrySet())
{
String oldChar = entry.getKey();
String newChar = entry.getValue();
contentLine = contentLine.replaceAll(oldChar, newChar);
}
Here is a complete example:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) throws Exception {
HashMap<String, String> nonAsciiToAscii = new HashMap<>();
nonAsciiToAscii.put("ó", "o");
nonAsciiToAscii.put("é", "e");
nonAsciiToAscii.put("ï", "i");
BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
String contentLine = br.readLine();
while (contentLine != null)
{
for (Map.Entry<String, String> entry : nonAsciiToAscii.entrySet())
{
String oldChar = entry.getKey();
String newChar = entry.getValue();
contentLine = contentLine.replaceAll(oldChar, newChar);
}
System.out.println(contentLine); // or whatever else you want to do with the cleaned lines
contentLine = br.readLine();
}
}
}
This prints:
robert:~$ javac Main.java && java Main
1|Toy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Get Shorty (1995)
robert:~$

You want to flip your keys and values:
Map<Character, Character> charMap = new HashMap<>();
charMap.put('ó','o');
charMap.put('é','e');
charMap.put('ï','i');
and then get the mapped character:
char mappedChar = charMap.getOrDefault(inputChar, inputChar);
To get the chars for a string, call String#toCharArray()

Duplicate word frequencies issues in Java [duplicate]

This question already has an answer here:
Duplicate word frequencies problem in text file in Java [closed]
(1 answer)
Closed 1 year ago.
[I am new to Java and Stackoverflow. My last question was closed. I have added a complete code this time. thanks] I have a large txt file of 4GB (vocab.txt). It contains plain Bangla(unicode) words. Each word is in newline with its frequency(equal sign in between). Such as,
আমার=5
তুমি=3
সে=4
আমার=3 //duplicate of 1st word of with different frequency
করিম=8
সে=7 //duplicate of 3rd word of with different frequency
As you can see, it has same words multiple times with different frequencies. How to keep only a single word (instead of multiple duplicates) and with summation of all frequencies of the duplicate words. Such as, the file above would be like (output.txt),
আমার=8 //5+3
তুমি=3
সে=11 //4+7
করিম=8
I have used HashMap to solve the problem. But I think I made some mistakes somewhere. It runs and shows the exact data to output file without changing anything.
package data_correction;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.OutputStreamWriter;
import java.util.*;
import java.awt.Toolkit;
public class Main {
public static void main(String args[]) throws Exception {
FileInputStream inputStream = null;
Scanner sc = null;
String path="C:\\DATA\\vocab.txt";
FileOutputStream fos = new FileOutputStream("C:\\DATA\\output.txt",true);
BufferedWriter bufferedWriter = new BufferedWriter(
new OutputStreamWriter(fos,"UTF-8"));
try {
System.out.println("Started!!");
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
line = line.trim();
String [] arr = line.split("=");
Map<String, Integer> map = new HashMap<>();
if (!map.containsKey(arr[0])){
map.put(arr[0],Integer.parseInt(arr[1]));
}
else{
map.put(arr[0], map.get(arr[0]) + Integer.parseInt(arr[1]));
}
for(Map.Entry<String, Integer> each : map.entrySet()){
bufferedWriter.write(each.getKey()+"="+each.getValue()+"\n");
}
}
bufferedWriter.close();
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
System.out.print("FINISH");
Toolkit.getDefaultToolkit().beep();
}
}
Thanks for your time.

This should do what you want with some mor eJava magic:
public static void main(String[] args) throws Exception {
String separator = "=";
Map<String, Integer> map = new HashMap<>();
try (Stream<String> vocabs = Files.lines(new File("test.txt").toPath(), StandardCharsets.UTF_8)) {
vocabs.forEach(
vocab -> {
String[] pair = vocab.split(separator);
int value = Integer.valueOf(pair[1]);
String key = pair[0];
if (map.containsKey(key)) {
map.put(key, map.get(key) + value);
} else {
map.put(key, value);
}
}
);
}
System.out.println(map);
}
For test.txt take the correct file path. Pay attention that the map is kept in memory, so this is maybe not the best approach. If necessary replace the map with a e.g. database backed approach.

Convert ArrayList to TreeMap

I have a little problem of understanding, I will put the code here and try to explain my problem.
I have a first class, ReadSymptomFromDataFile :
package com.hemebiotech.analytics;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
/**
* Simple brute force implementation
*
*/
public class ReadSymptomDataFromFile implements ISymptomReader {
private final String filepath;
/**
*
* #param filepath a full or partial path to file with symptom strings in it, one per line
*/
public ReadSymptomDataFromFile (String filepath) {
this.filepath = filepath;
}
#Override
public List<String> getSymptoms () {
ArrayList<String> result = new ArrayList<>();
if (filepath != null) {
try {
BufferedReader reader = new BufferedReader (new FileReader(filepath));
String line = reader.readLine();
while (line != null) {
result.add(line);
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return result;
}
}
This class is used to read a txt file which contains a list of symptoms, with several times the same symptoms inside, hence the value of the TreeMap, a symptom associated with the number of times it appears. (Value, Key)
So far so good.
Then I have this code that I made myself but it happens from the class ReadSymptomData :
package com.hemebiotech.analytics.Test;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.*;
public class MainAppTest2 {
public static void main (String[] args) {
try {
File file = new File ("Project02Eclipse\\symptoms.txt");
Scanner scan = new Scanner (file);
Map<String, Integer> wordCount = new TreeMap<> ();
while (scan.hasNext ()) {
String word = scan.next ();
if (!wordCount.containsKey (word)) {
wordCount.put (word, 1);
} else {
wordCount.put (word, wordCount.get (word) + 1);
}
}
// Result in console & Write file output
FileWriter writer = new FileWriter ("resultat2.out");
BufferedWriter out = new BufferedWriter (writer);
for (Map.Entry<String, Integer> entry : wordCount.entrySet ()) {
System.out.println ("Valeur: " + entry.getKey () + "| Occurence: " + entry.getValue ());
out.write (entry.getKey () + " = " + entry.getValue () + " \n");
out.flush (); // Force write
}
} catch (IOException e) {
System.out.println ("Fichier introuvable");
}
}
}
This code does much the same thing, it reads a txt file, saves it in a TreeMap, displays it on the console and saves it in a resultat file.
Now my problem is that I am trying to split my code into several classes while using the already existing class ReadSymptomData, one class to read the text file, another to convert it all to TreeMap, another class to write the results in an output file, and a final one for exception handling.
I started with this FileToTreeMap class, but it's ugly, it's not clean, and I'm sure it can be done better to convert my ReadSymptomDataFromFile object to a TreeMap:
package com.hemebiotech.analytics.Test.read;
import com.hemebiotech.analytics.ReadSymptomDataFromFile;
import java.util.*;
public class FileToTreeMap {
// Read file
public Map<String, Integer> readFile () {
ReadSymptomDataFromFile list = new ReadSymptomDataFromFile ("Project02Eclipse\\symptoms.txt");
Map<String, Integer> listSort = new TreeMap<> ();
ArrayList<String> test = new ArrayList<> (list.getSymptoms ());
Scanner scan = new Scanner (String.valueOf (test));
while (scan.hasNext ()) {
String word = scan.next ();
if (!listSort.containsKey (word)) {
listSort.put (word, 1);
} else {
listSort.put (word, listSort.get (word) + 1);
}
}
for (Map.Entry<String, Integer> entry : listSort.entrySet ()) {
System.out.println ("Valeur: " + entry.getKey () + " Occurence: " + entry.getValue ());
}
return listSort;
}
}
Here, I am a little lost in cutting my code, and the main problem I have is to convert my ArrayList to a TreeMap.
Sorry for the length of the post, but I would appreciate any help I get, thanks in advance.

To convert a list into a map, what you can do is use a for loop, which goes through the list and adds each item one by one:
ReadSymptomDataFromFile list = new ReadSymptomDataFromFile("Project02Eclipse\\symptoms.txt");
Map<String, Integer> listSort = new TreeMap<>();
List<String> test = list.getSymptoms();
for (String word : test) {
if (!listSort.containsKey(word)) {
listSort.put(word, 1);
} else {
listSort.put(word, listSort.get(word) + 1);
}
}

Try in Java 8
final List<String> symptomList;
final Map<String, Integer> countedSymptoms;
symptoms.forEach((symptom) ->
countedSymptoms.put(symptom, Collections.frequency(symptomList, symptom)));

You iterate through list and put the elements in the map. Since this is an arrayList duplicate elements would stay. But beforehand couldn't get why you have to read some data to a list and some to treeMap? You have already been loading your symptoms data to a treeMap. So the code stub is more like
for(String el : test) {
if(!listSort.containsKey(el))
listSort.put(el,1);
else
listSort.put(el, listSort.get(el)+1));
}

You could use the merge method of Map:
List<String> symptoms = List.of("a", "b", "c", "b", "c", "a", "b", "b", "c");
Map<String, Integer> counts = new TreeMap<>();
symptoms.forEach(s -> counts.merge(s, 1, Integer::sum));
counts.forEach((k, v) -> System.out.println(k + ": " + v));
This prints:
a: 2
b: 4
c: 3

Java CSV Formatting Issues

I am reading a HashMap and creating a CSV file.
The below code accepts a HashMap and produce a CSV file. However the formatting is a problem. For a HashMap,
HashMap<String, Integer> hmap = new HashMap<String, Integer>();
hmap.put("Feature1", 1);
hmap.put("Feature2", 2);
It produces
Feature2,Feature2,
2,1,Feature1,Feature1,
2,1,
Expected Output (without comma at the end of each line):
Feature2,Feature1
2,1
Which is a wrong formatting
This is the code I use. How to fix it ?
public String appendCSV(Map featureset) throws IOException{
StringBuilder csvReport = new StringBuilder();
Map<String, Integer> map =featureset;
Set<String> keys = map.keySet();
String[] lsitofkeys = {};
for(String elements:keys){
for(int i =0 ; i< keys.size(); i++){
csvReport.append(elements+",");
}
csvReport.append("\n");
for(String key: keys){
csvReport.append(map.get(key).toString()+",");
}
}
return csvReport.toString();
}

Java 8 has String.join():
Having collected the keys and values into lists:
csvReport.append(String.join(",", keys));
csvReport.append(String.join(",", values));
The Streams API has Collectors.joining() which helps even more:
List<Entry> entries = new ArrayList<>(map.entrySet());
csvReport.append(entries.stream()
.map(e -> e.getKey())
.collect(Collectors.joining(","));
csvReport.append("\n");
csvReport.append(entries.stream()
.map(e -> e.getValue())
.collect(Collectors.joining(","));
csvReport.append("\n");
Both of these ultimately use StringJoiner. If you have an academic interest in how to build a joined string without a delimiter at the end, it's worth looking at the code for StringJoiner for an elegant example.
However - There are subtleties to writing CSV and it's a good idea to use a library unless there are reasons (legal, academic) not to. Apache Commons CSV is one.

seems you have issues with your loop
you need two separate loops (no inner loops);
Also to get rid of that comma at the end, you can use a simple check using a isFirst variable like below :)
public String appendCSV(Map featureset) throws IOException{
StringBuilder csvReport = new StringBuilder();
Map<String, Integer> map =featureset;
Set<String> keys = map.keySet();
String[] lsitofkeys = {};
boolean isFirst=true;
for(String elements : keys){
if(!isFirst){
csvReport.append(",");
}
csvReport.append(elements);
isFirst=false;
}
csvReport.append("\n");
isFirst=true;
for(String elements : keys){
if(!isFirst){
csvReport.append(",");
}
csvReport.append(map.get(elements));
isFirst=false;
}
return csvReport.toString();
}

Just remote the last character of your string if it is longer than 1 character. Here is how to do it: str.substring(0, str.length() - 1);

You should have a separate StringBuilder for the keys and another one for the values. Then as you go through your keys you add them to your key StringBuilder and then take the map given and grab the value associated with that key and add it to your value StringBuilder.
Lastly you just keep track of how many keys you have seen so far. If the number of keys seen is not equal to the size of the map then you append a comma. But if you are on the last element according to the numOfKeys counter, then you append nothing to the end of the StringBuilders.
StringBuilder csvKeyReport = new StringBuilder();
StringBuilder csvValueReport = new StringBuilder();
Map<String, Integer> map = hmap;
Set<String> keys = map.keySet();
int numOfKeys = 0;
for(String key : keys)
{
numOfKeys++;
String comma = numOfKeys == map.size() ? "" : ",";
csvKeyReport.append(key + comma);
csvValueReport.append(map.get(key) + comma);
}
csvKeyReport.append("\n");
csvKeyReport.append(csvValueReport.toString() + "\n");
System.out.print(csvKeyReport.toString());
Output
Feature2,Feature1
2,1

One way to achieve that is doing something like this:
import java.io.IOException;
import java.util.HashMap;
public class HashMapToCSV {
public static void main(String[] args) {
HashMap<String, Integer> hmap = new HashMap<String, Integer>();
hmap.put("Feature1", 1);
hmap.put("Feature2", 2);
try {
System.out.println(appendCSV(hmap));
} catch (IOException e){
e.printStackTrace();
}
}
public static String appendCSV(HashMap<String,Integer> featureset) throws IOException{
StringBuilder csvReport = new StringBuilder();
// loop through the keySet and append the keys
for(String key: featureset.keySet()){
csvReport.append(key+",");
}
// to remove the comma at the end
csvReport.replace(csvReport.length()-1, csvReport.length(), "");
csvReport.append("\n"); // append new line
// then loop through the keySet and append the values
for(String key: featureset.keySet()){
csvReport.append(featureset.get(key)+",");
}
csvReport.replace(csvReport.length()-1, csvReport.length(), "");
return csvReport.toString();
}
}
Output
Feature2,Feature1
2,1

You should to create a new stringbuilder into the cycle:
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
public class NewClass {
public NewClass() throws IOException {
HashMap<String, Integer> hmap = new HashMap<String, Integer>();
hmap.put("Feature1", 1);
hmap.put("Feature2", 2);
System.out.print(appendCSV(hmap));
}
public String appendCSV(Map featureset) throws IOException {
StringBuilder csvReport = new StringBuilder();
StringBuilder csvReportVal = new StringBuilder();
Set<String> keys = featureset.keySet();
for (String elements : keys) {
csvReport.append(elements + ",");
csvReportVal.append(featureset.get(elements).toString() + ",");
}
// Excluding the latest ","
csvReport.setLength(csvReport.length() - 1);
csvReportVal.setLength(csvReportVal.length() - 1);
csvReport.append("\n" + csvReportVal.toString());
return csvReport.toString();
}
public static void main(String[] args) throws IOException {
new NewClass();
}
}
OUTPUT:
Feature2,Feature1
2,1

Printing from MAP in JAVA

I want to print some data that I've passed to map from a text file. However, when I print the data, program prints the lines twice. Is there any way to fix it? I just want to print the data in an exact way, no duplicates.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.*;
public class ReadToHashmap {
public static void main(String[] args) throws Exception
{
Map<String, String> map = new HashMap<>();
final BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\\Documents and Settings\\stajn\\Desktop\\Cache_Son\\Cache\\Testing.txt"));
if (bufferedReader != null) {
String line;
while ((line = bufferedReader.readLine()) != null) {
String parts[] = line.split("\n");
map.put(parts[0],parts[0]);
}
bufferedReader.close();
Iterator iterator = map.keySet().iterator();
while (iterator.hasNext())
{
String key = iterator.next().toString();
String value = map.get(key).toString();
System.out.println(key + " " + value);
}
}
}
}

You are putting in map like
map.put(parts[0],parts[0]);
So here Key and value are same.When you print
System.out.println(key + " " + value);
Both will print the same.
Do you need ?
System.out.println("value=" + value);

You are putting the line both in the KEY like in the VALUE of the MAP.
In your code you are printing KEY and VALUE so you are duplicating the line
You can print only the values content on a map with:
Iterator iterator = map.values().iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next());
}

Your code has some other serious problems:
When using bufferedReader.readLine() your String will never contain a newline-character so your split() will do nothing and parts[] will always be a length-1-array.
As for printing the same value twice see the other answers...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java read txt file to hashmap, split by ":" - java

I would do it like this Properties properties = new Properties(); properties.load(new FileInputStream(Path of the File)); for (Map.Entry<Object, Object> entry : properties.entrySet()) { myMap.put((String) entry.getKey(), (String) entry.getValue()); }

Related

Java. Extracting character from array that isn't ASCII

Duplicate word frequencies issues in Java [duplicate]

Convert ArrayList to TreeMap

Java CSV Formatting Issues

Printing from MAP in JAVA

Categories

Resources