I'm trying to extract a certain character from a buffer that isn't ASCII. I'm reading in a file that contains movie names that have some non ASCII character sprinkled in it like so.
1|Tóy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Gét Shorty (1995)
I was able to pick off the lines that contained the non ASCII characters, but I'm trying to figure out how to get that particular character from the lines that have said non ASCII character and replace it with an ACSII character from the map I've made.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) {
HashMap<Character, Character>Char_Map = new HashMap<>();
Char_Map.put('o','ó');
Char_Map.put('e','é');
Char_Map.put('i','ï');
for(Map.Entry<Character,Character> entry: Char_Map.entrySet())
{
System.out.println(entry.getKey() + " -> "+ entry.getValue());
}
try
{
BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
String contentLine= br.readLine();
while(contentLine != null)
{
String[] contents = contentLine.split("\\|");
boolean result = contents[1].matches("\\A\\p{ASCII}*\\z");
if(!result)
{
System.out.println(contentLine);
//System.out.println();
}
contentLine= br.readLine();
}
}
catch (IOException ioe)
{
System.out.println("Cannot open file as it doesn't exist");
}
}
}
I tried using something along the lines of:
if((contentLine.charAt(i) == something
But I'm not sure.
You can just use replaceAll. Put this in the while loop, so that it works on each line you read from the file. With this change, you won't need the split and if (... matches) anymore.
contentLine.replaceAll("ó", "o");
contentLine.replaceAll("é", "e");
contentLine.replaceAll("ï", "i");
If you want to keep a map, just iterate over its keys and replace with the values you want to map to:
Map<String, String> map = new HashMap<>();
map.put("ó", "o");
// ... and all the others
Later, in your loop reading the contents, you replace all the characters:
for (Map.Entry<String, String> entry : map.entrySet())
{
String oldChar = entry.getKey();
String newChar = entry.getValue();
contentLine = contentLine.replaceAll(oldChar, newChar);
}
Here is a complete example:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) throws Exception {
HashMap<String, String> nonAsciiToAscii = new HashMap<>();
nonAsciiToAscii.put("ó", "o");
nonAsciiToAscii.put("é", "e");
nonAsciiToAscii.put("ï", "i");
BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
String contentLine = br.readLine();
while (contentLine != null)
{
for (Map.Entry<String, String> entry : nonAsciiToAscii.entrySet())
{
String oldChar = entry.getKey();
String newChar = entry.getValue();
contentLine = contentLine.replaceAll(oldChar, newChar);
}
System.out.println(contentLine); // or whatever else you want to do with the cleaned lines
contentLine = br.readLine();
}
}
}
This prints:
robert:~$ javac Main.java && java Main
1|Toy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Get Shorty (1995)
robert:~$
You want to flip your keys and values:
Map<Character, Character> charMap = new HashMap<>();
charMap.put('ó','o');
charMap.put('é','e');
charMap.put('ï','i');
and then get the mapped character:
char mappedChar = charMap.getOrDefault(inputChar, inputChar);
To get the chars for a string, call String#toCharArray()
This question already has an answer here:
Duplicate word frequencies problem in text file in Java [closed]
(1 answer)
Closed 1 year ago.
[I am new to Java and Stackoverflow. My last question was closed. I have added a complete code this time. thanks] I have a large txt file of 4GB (vocab.txt). It contains plain Bangla(unicode) words. Each word is in newline with its frequency(equal sign in between). Such as,
আমার=5
তুমি=3
সে=4
আমার=3 //duplicate of 1st word of with different frequency
করিম=8
সে=7 //duplicate of 3rd word of with different frequency
As you can see, it has same words multiple times with different frequencies. How to keep only a single word (instead of multiple duplicates) and with summation of all frequencies of the duplicate words. Such as, the file above would be like (output.txt),
আমার=8 //5+3
তুমি=3
সে=11 //4+7
করিম=8
I have used HashMap to solve the problem. But I think I made some mistakes somewhere. It runs and shows the exact data to output file without changing anything.
package data_correction;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.OutputStreamWriter;
import java.util.*;
import java.awt.Toolkit;
public class Main {
public static void main(String args[]) throws Exception {
FileInputStream inputStream = null;
Scanner sc = null;
String path="C:\\DATA\\vocab.txt";
FileOutputStream fos = new FileOutputStream("C:\\DATA\\output.txt",true);
BufferedWriter bufferedWriter = new BufferedWriter(
new OutputStreamWriter(fos,"UTF-8"));
try {
System.out.println("Started!!");
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
line = line.trim();
String [] arr = line.split("=");
Map<String, Integer> map = new HashMap<>();
if (!map.containsKey(arr[0])){
map.put(arr[0],Integer.parseInt(arr[1]));
}
else{
map.put(arr[0], map.get(arr[0]) + Integer.parseInt(arr[1]));
}
for(Map.Entry<String, Integer> each : map.entrySet()){
bufferedWriter.write(each.getKey()+"="+each.getValue()+"\n");
}
}
bufferedWriter.close();
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
System.out.print("FINISH");
Toolkit.getDefaultToolkit().beep();
}
}
Thanks for your time.
This should do what you want with some mor eJava magic:
public static void main(String[] args) throws Exception {
String separator = "=";
Map<String, Integer> map = new HashMap<>();
try (Stream<String> vocabs = Files.lines(new File("test.txt").toPath(), StandardCharsets.UTF_8)) {
vocabs.forEach(
vocab -> {
String[] pair = vocab.split(separator);
int value = Integer.valueOf(pair[1]);
String key = pair[0];
if (map.containsKey(key)) {
map.put(key, map.get(key) + value);
} else {
map.put(key, value);
}
}
);
}
System.out.println(map);
}
For test.txt take the correct file path. Pay attention that the map is kept in memory, so this is maybe not the best approach. If necessary replace the map with a e.g. database backed approach.
I have a CSV file with this content:
2017-10-29 00:00:00.0,"1005",-10227,0,0,0,332894,0,0,222,332894,222,332894 2017-10-29 00:00:00.0,"1010",-125529,0,0,0,420743,0,0,256,420743,256,420743 2017-10-29 00:00:00.0,"1005",-10227,0,0,0,332894,0,0,222,332894,222,332894 2017-10-29 00:00:00.0,"1013",-10625,0,0,-687,599098,0,0,379,599098,379,599098 2017-10-29 00:00:00.0,"1604",-1794.9,0,0,-3.99,4081.07,0,0,361,4081.07,361,4081.07
So lines 1 and 3 are duplicates.
Now I want to read the file in and print out duplicate lines in the console.
I set up this Java code reading the file in and throwing it line by line into an ArrayList. Then I create an immutable
copy, loop through the ArrayList and in the binarySearch I use the immutable copy of the ArrayList:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class ReadValidationFile {
public static void main(String[] args) {
List<String> validationFile = new ArrayList<>();
try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
String line;
while((line = br.readLine())!= null){
validationFile.add(line);
}
} catch (FileNotFoundException e) {
//e.printStackTrace();
System.out.println("file not found " + e.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
List<String> validationFileCopy = Collections.unmodifiableList(validationFile);
for(String line : validationFile){
int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());
if (comp <= 0){
System.out.println(line);
}
}
}
}
Comparator Class:
import java.util.Comparator;
public class ComparatorLine implements Comparator<String> {
#Override
public int compare(String s1, String s2) {
return s1.compareToIgnoreCase(s2);
}
}
I expect this line to be printed:
2017-10-29 00:00:00.0,"1005",-10227,0,0,0,332894,0,0,222,332894,222,332894
But the output I get is this:
2017-10-29 00:00:00.0,"1010",-125529,0,0,0,420743,0,0,256,420743,256,420743
Can you help me please to see what I am doing wrong? My comparator I think is okay. What is wrong with my
ArrayLists?
The other answer(s) correctly state that you should be using Set instead of List. But for the sake of learning, let's have a look at your code and see where you went wrong.
public class ReadValidationFile {
public static void main(String[] args) {
List<String> validationFile = new ArrayList<>();
try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
Semicolon is unnecessary.
String line;
while((line = br.readLine())!= null){
validationFile.add(line);
}
This can all be achieved in just one line: List<String> validationFile = Files.readAllLines(Paths.get("validation_small.csv"), "utf-8");
} catch (FileNotFoundException e) {
//e.printStackTrace();
System.out.println("file not found " + e.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
List<String> validationFileCopy = Collections.unmodifiableList(validationFile);
Actually, this is not a copy. It is just an unmodifiable view of the same list.
for(String line : validationFile){
int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());
You might as well just search validationFile itself. However, you are calling binarySearch which only works on sorted lists, but your list is not sorted. See documentation.
if (comp <= 0){
System.out.println(line);
}
You are printing when it's not found (comp <= 0). If the search succeeds, it will return a non-negative number (comp >= 0). But another problem is that you are searching the whole list for each element, and the search will obviously always succeed (that is, if your list was sorted).
Save yourself all the trouble and use a Set instead. And, using Java 8 streams, the whole program can be reduced to the following:
public static void main(String[] args) throws Exception {
Set<String> uniqueLines = new HashSet<>();
Files.lines(Paths.get("", "utf-8"))
.filter(line -> !uniqueLines.add(line))
.forEach(System.out::println);
}
If you really need to ignore case when comparing strings (from your given data, it looks like it doesn't make any difference since it's just numbers), then store each unique line by first uppercasing and then lowercasing it. This apparently cumbersome technique is necessary because just lowercasing is not enough if dealing with non-English language text. The equalsIgnoreCase method also does this.
public static void main(String[] args) throws Exception {
Set<String> uniqueLines = new HashSet<>();
Files.lines(Paths.get("", "utf-8"))
.filter(line -> !uniqueLines.add(line.toUpperCase().toLowerCase()))
.forEach(System.out::println);
}
Create a Set while reading lines from the input csv file, anytime add() element to set returns false print the line as it is duplicate line.
If you want list of all duplicate lines then create a List which will have lines that returned false when tried add() to Set.
NOTE:
I have simulated your file reading by using a static data.
Small note, if your data only contains numbers and no alphabets then you do not need case-insensitive comparison.
If your data contains alphabets then also you do not need a special Comparator as you can insert data into Set using add(line.toLowerCase()) which will ensure that all lines are compared with lower case and then added to Set.
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
public class ReadValidationFile {
static List<String> validationFile = new ArrayList<>();
static {
validationFile.add("2017-10-29 00:00:00.0,\"1005\",-10227,0,0,0,332894,0,0,222,332894,222,332894");
validationFile.add("2017-10-29 00:00:00.0,\"1010\",-125529,0,0,0,420743,0,0,256,420743,256,420743");
validationFile.add("2017-10-29 00:00:00.0,\"1005\",-10227,0,0,0,332894,0,0,222,332894,222,332894");
validationFile.add("2017-10-29 00:00:00.0,\"1013\",-10625,0,0,-687,599098,0,0,379,599098,379,599098");
validationFile.add("2017-10-29 00:00:00.0,\"1604\",-1794.9,0,0,-3.99,4081.07,0,0,361,4081.07,361,4081.07");
}
public static void main(String[] args) {
// Option 1 : unique lines only
Set<String> uniqueLinesOnly = new HashSet<>(validationFile);
// Option 2 : unique lines and duplicate lines
Set<String> uniqueLines = new HashSet<>();
Set<String> duplicateLines = new HashSet<>();
for (String line : validationFile) {
if (!uniqueLines.add(line.toLowerCase())) {
duplicateLines.add(line.toLowerCase());
}
}
// Option 3 : unique lines and duplicate lines by Java Streams
Set<String> uniquesJava8 = new HashSet<>();
List<String> duplicatesJava8 = validationFile
.stream()
.filter(element -> !uniquesJava8.add(element.toLowerCase()))
.map(element -> element.toLowerCase())
.collect(Collectors.toList());
}
}
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
public class ReadValidationFile {
public static void main(String[] args){
List<String> validationFile = new ArrayList<>();
try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
String line;
while((line = br.readLine())!= null){
validationFile.add(line);
}
} catch (FileNotFoundException e) {
//e.printStackTrace();
System.out.println("file not found " + e.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
Set<String> uniques = new HashSet<>();
List<String> duplicates = validationFile.stream().filter(i->!uniques.add(i)).collect(Collectors.toList());
System.out.println(duplicates);
}
}
I have a text file with content like, with 792 lines:
der 17788648
und 14355959
die 10939606
Die 10480597
Now I want to compare if "Die" and "die" are equal in lowercase.
So if two Strings in lowerCase are equal, copy the word into a new text file in lowerCase and sum the values.
Expected output:
der 17788648
und 14355959
die 114420203
I have that so far:
try {
BufferedReader bk = null;
BufferedWriter bw = null;
bk = new BufferedReader(new FileReader("outagain.txt"));
bw = new BufferedWriter(new FileWriter("outagain5.txt"));
List<String> list = new ArrayList<>();
String s = "";
while (s != null) {
s = bk.readLine();
list.add(s);
}
for (int k = 0; k < 793; k++) {
String u = bk.readLine();
if (list.contains(u.toLowerCase())) {
//sum values?
} else {
bw.write(u + "\n");
}
}
System.out.println(list.size());
} catch (Exception e) {
System.out.println("Exception caught : " + e);
}
Instead of list.add(s);, use list.add(s.toLowerCase());. Right now your code is comparing lines of indeterminate case to lower-cased lines.
With Java 8, the best approach to standard problems like reading files, comparing, grouping, collecting is to use the streams api, since it is much more concise to do that in that way. At least when the files is only a few KB, then there will be no problems with that.
Something like:
Map<String, Integer> nameSumMap = Files.lines(Paths.get("test.txt"))
.map(x -> x.split(" "))
.collect(Collectors.groupingBy(x -> x[0].toLowerCase(),
Collectors.summingInt(x -> Integer.parseInt(x[1]))
));
First, you can read the file with Files.lines(), which returns a Stream<String>, than you can split the strings into a Stream<String[]>,
finally you can use the groupingBy() and summingInt() functions to group by the first element of the array and sum by the second one.
If you don't want to use the stream API, you can also create a HashMap und do your summing manually in the loop.
Use a HashMap to keep track of the unique fields. Before you do a put, do a get to see if the value is already there. If it is, sum the old value with the new one and put it in again (this replaces the old line having same key)
package com.foundations.framework.concurrency;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
public class FileSummarizer {
public static void main(String[] args) {
HashMap<String, Long> rows = new HashMap<String, Long>();
String line = "";
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("data.txt"));
while ((line = reader.readLine()) != null) {
String[] tokens = line.split(" ");
String key = tokens[0].toLowerCase();
Long current = Long.parseLong(tokens[1]);
Long previous = rows.get(key);
if(previous != null){
current += previous;
}
rows.put(key, current);
}
}
catch (IOException e) {
e.printStackTrace();
}
finally {
try {
reader.close();
Iterator<String> iterator = rows.keySet().iterator();
while (iterator.hasNext()) {
String key = iterator.next().toString();
String value = rows.get(key).toString();
System.out.println(key + " " + value);
}
}
catch (IOException e) {
e.printStackTrace();
}
}
}
}
The String class has an equalIgnoreCase method which you can use to compare two strings irrespective of case. so:
String var1 = "Die";
String var2 = "die";
System.out.println(var1.equalsIgnoreCase(var2));
Would print TRUE.
If I got your question right, you want to know how you can get the prefix from the file, compare it, get the value behind it and sum them up for each prefix. Is that about right?
You could use regular expressions to get the prefixes and values seperately. Then you can sum up all values with the same prefix and write them to the file for each one.
If you are not familiar with regular expressions, this links could help you:
Regex on tutorialpoint.com
Regex on vogella.com
For additional tutorials just scan google for "java regex" or similar tags.
If you do not want to differ between upper- and lowercase strings, just convert them all to lower/upper before comparing them as #spork explained already.
I have a txt file with the form:
Key:value
Key:value
Key:value
...
I want to put all the keys with their value in a hashMap that I've created. How do I get a FileReader(file) or Scanner(file) to know when to split up the keys and values at the colon (:) ? :-)
I've tried:
Scanner scanner = new scanner(file).useDelimiter(":");
HashMap<String, String> map = new Hashmap<>();
while(scanner.hasNext()){
map.put(scanner.next(), scanner.next());
}
Read your file line-by-line using a BufferedReader, and for each line perform a split on the first occurrence of : within the line (and if there is no : then we ignore that line).
Here is some example code - it avoids the use of Scanner (which has some subtle behaviors and imho is actually more trouble than its worth).
public static void main( String[] args ) throws IOException
{
String filePath = "test.txt";
HashMap<String, String> map = new HashMap<String, String>();
String line;
BufferedReader reader = new BufferedReader(new FileReader(filePath));
while ((line = reader.readLine()) != null)
{
String[] parts = line.split(":", 2);
if (parts.length >= 2)
{
String key = parts[0];
String value = parts[1];
map.put(key, value);
} else {
System.out.println("ignoring line: " + line);
}
}
for (String key : map.keySet())
{
System.out.println(key + ":" + map.get(key));
}
reader.close();
}
The below will work in java 8.
The .filter(s -> s.matches("^\\w+:\\w+$")) will mean it only attempts to work on line in the file which are two strings separated by :, obviously fidling with this regex will change what it will allow through.
The .collect(Collectors.toMap(k -> k.split(":")[0], v -> v.split(":")[1])) will work on any lines which match the previous filter, split them on : then use the first part of that split as the key in a map entry, then the second part of that split as the value in the map entry.
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Map;
import java.util.stream.Collectors;
public class Foo {
public static void main(String[] args) throws IOException {
String filePath = "src/main/resources/somefile.txt";
Path path = FileSystems.getDefault().getPath(filePath);
Map<String, String> mapFromFile = Files.lines(path)
.filter(s -> s.matches("^\\w+:\\w+"))
.collect(Collectors.toMap(k -> k.split(":")[0], v -> v.split(":")[1]));
}
}
One more JDK 1.8 implementation.
I suggest using try-with-resources and forEach iterator with putIfAbsent() method to avoid java.lang.IllegalStateException: Duplicate key value if there are some duplicate values in the file.
FileToHashMap.java
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Map;
import java.util.HashMap;
import java.util.stream.Stream;
public class FileToHashMap {
public static void main(String[] args) throws IOException {
String delimiter = ":";
Map<String, String> map = new HashMap<>();
try(Stream<String> lines = Files.lines(Paths.get("in.txt"))){
lines.filter(line -> line.contains(delimiter)).forEach(
line -> map.putIfAbsent(line.split(delimiter)[0], line.split(delimiter)[1])
);
}
System.out.println(map);
}
}
in.txt
Key1:value 1
Key1:duplicate key
Key2:value 2
Key3:value 3
The output is:
{Key1=value 1, Key2=value 2, Key3=value 3}
I would do it like this
Properties properties = new Properties();
properties.load(new FileInputStream(Path of the File));
for (Map.Entry<Object, Object> entry : properties.entrySet()) {
myMap.put((String) entry.getKey(), (String) entry.getValue());
}