Comparing two different text files and replacing similar words

Comparing two different text files and replacing similar words - java

I started learning Java recently and I need to compare 1000 words in a text file and a thesaurus text file. Each line in the thesaurus text file has words that are similar and contains one word each line from the 1000 words which are on one line each. Each word in the thesaurus is separated by a comma. I think I nearly have it. What I need to do i next is check if a word is contained in the thesaurus and if it is, map that line of words in the thesaurus to the the word in the 1000 words text file and im not sure how to do that.
package ie.gmit.sw;
import java.io.*;
import java.util.*;
public class Parser {
private Map<String, String>map = new TreeMap<>();
private Collection<String>google = new TreeSet<>();
public void parseGoogle(String file) throws IOException
{
BufferedReader brGoogle = new BufferedReader(new FileReader("google-1000.txt"));
String word = null;
while((word = brGoogle.readLine())!= null)
{
google.add(word);
}
brGoogle.close();
}//parseGoogle
public void parse(String file)throws IOException
{
BufferedReader brMoby = new BufferedReader(new FileReader("MobyThesaurus2.txt"));
String line = null;
while((line = brMoby.readLine())!= null)
{
String[] words = line.split(",");
}
}
public String[] getGoogleWord(String[] words) {
if(google.contains(words))
{
}
return words;
}
}//class Parser

Example implementation of the mapper:
import java.util.*;
import java.util.stream.Collectors;
public Map<String, List<String>> mapWordsToThesaurus(Set<String> words, Set<String> thesaurus) {
Map<String, List<String>> result = new HashMap<>();
words.forEach(
word ->
result.put(
word,
thesaurus.stream()
.filter(line -> line.contains(word))
.collect(Collectors.toList())));
return result;
}

Related

Java. Extracting character from array that isn't ASCII

I'm trying to extract a certain character from a buffer that isn't ASCII. I'm reading in a file that contains movie names that have some non ASCII character sprinkled in it like so.
1|Tóy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Gét Shorty (1995)
I was able to pick off the lines that contained the non ASCII characters, but I'm trying to figure out how to get that particular character from the lines that have said non ASCII character and replace it with an ACSII character from the map I've made.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) {
HashMap<Character, Character>Char_Map = new HashMap<>();
Char_Map.put('o','ó');
Char_Map.put('e','é');
Char_Map.put('i','ï');
for(Map.Entry<Character,Character> entry: Char_Map.entrySet())
{
System.out.println(entry.getKey() + " -> "+ entry.getValue());
}
try
{
BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
String contentLine= br.readLine();
while(contentLine != null)
{
String[] contents = contentLine.split("\\|");
boolean result = contents[1].matches("\\A\\p{ASCII}*\\z");
if(!result)
{
System.out.println(contentLine);
//System.out.println();
}
contentLine= br.readLine();
}
}
catch (IOException ioe)
{
System.out.println("Cannot open file as it doesn't exist");
}
}
}
I tried using something along the lines of:
if((contentLine.charAt(i) == something
But I'm not sure.

You can just use replaceAll. Put this in the while loop, so that it works on each line you read from the file. With this change, you won't need the split and if (... matches) anymore.
contentLine.replaceAll("ó", "o");
contentLine.replaceAll("é", "e");
contentLine.replaceAll("ï", "i");
If you want to keep a map, just iterate over its keys and replace with the values you want to map to:
Map<String, String> map = new HashMap<>();
map.put("ó", "o");
// ... and all the others
Later, in your loop reading the contents, you replace all the characters:
for (Map.Entry<String, String> entry : map.entrySet())
{
String oldChar = entry.getKey();
String newChar = entry.getValue();
contentLine = contentLine.replaceAll(oldChar, newChar);
}
Here is a complete example:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) throws Exception {
HashMap<String, String> nonAsciiToAscii = new HashMap<>();
nonAsciiToAscii.put("ó", "o");
nonAsciiToAscii.put("é", "e");
nonAsciiToAscii.put("ï", "i");
BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
String contentLine = br.readLine();
while (contentLine != null)
{
for (Map.Entry<String, String> entry : nonAsciiToAscii.entrySet())
{
String oldChar = entry.getKey();
String newChar = entry.getValue();
contentLine = contentLine.replaceAll(oldChar, newChar);
}
System.out.println(contentLine); // or whatever else you want to do with the cleaned lines
contentLine = br.readLine();
}
}
}
This prints:
robert:~$ javac Main.java && java Main
1|Toy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Get Shorty (1995)
robert:~$

You want to flip your keys and values:
Map<Character, Character> charMap = new HashMap<>();
charMap.put('ó','o');
charMap.put('é','e');
charMap.put('ï','i');
and then get the mapped character:
char mappedChar = charMap.getOrDefault(inputChar, inputChar);
To get the chars for a string, call String#toCharArray()

Duplicate word frequencies issues in Java [duplicate]

This question already has an answer here:
Duplicate word frequencies problem in text file in Java [closed]
(1 answer)
Closed 1 year ago.
[I am new to Java and Stackoverflow. My last question was closed. I have added a complete code this time. thanks] I have a large txt file of 4GB (vocab.txt). It contains plain Bangla(unicode) words. Each word is in newline with its frequency(equal sign in between). Such as,
আমার=5
তুমি=3
সে=4
আমার=3 //duplicate of 1st word of with different frequency
করিম=8
সে=7 //duplicate of 3rd word of with different frequency
As you can see, it has same words multiple times with different frequencies. How to keep only a single word (instead of multiple duplicates) and with summation of all frequencies of the duplicate words. Such as, the file above would be like (output.txt),
আমার=8 //5+3
তুমি=3
সে=11 //4+7
করিম=8
I have used HashMap to solve the problem. But I think I made some mistakes somewhere. It runs and shows the exact data to output file without changing anything.
package data_correction;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.OutputStreamWriter;
import java.util.*;
import java.awt.Toolkit;
public class Main {
public static void main(String args[]) throws Exception {
FileInputStream inputStream = null;
Scanner sc = null;
String path="C:\\DATA\\vocab.txt";
FileOutputStream fos = new FileOutputStream("C:\\DATA\\output.txt",true);
BufferedWriter bufferedWriter = new BufferedWriter(
new OutputStreamWriter(fos,"UTF-8"));
try {
System.out.println("Started!!");
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
line = line.trim();
String [] arr = line.split("=");
Map<String, Integer> map = new HashMap<>();
if (!map.containsKey(arr[0])){
map.put(arr[0],Integer.parseInt(arr[1]));
}
else{
map.put(arr[0], map.get(arr[0]) + Integer.parseInt(arr[1]));
}
for(Map.Entry<String, Integer> each : map.entrySet()){
bufferedWriter.write(each.getKey()+"="+each.getValue()+"\n");
}
}
bufferedWriter.close();
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
System.out.print("FINISH");
Toolkit.getDefaultToolkit().beep();
}
}
Thanks for your time.

This should do what you want with some mor eJava magic:
public static void main(String[] args) throws Exception {
String separator = "=";
Map<String, Integer> map = new HashMap<>();
try (Stream<String> vocabs = Files.lines(new File("test.txt").toPath(), StandardCharsets.UTF_8)) {
vocabs.forEach(
vocab -> {
String[] pair = vocab.split(separator);
int value = Integer.valueOf(pair[1]);
String key = pair[0];
if (map.containsKey(key)) {
map.put(key, map.get(key) + value);
} else {
map.put(key, value);
}
}
);
}
System.out.println(map);
}
For test.txt take the correct file path. Pay attention that the map is kept in memory, so this is maybe not the best approach. If necessary replace the map with a e.g. database backed approach.

Prinitng matching information from 2 files in Java

I am trying to write a program that checks two files and prints the common contents from both the files.
Example of the file 1 content would be:
James 1
Cody 2
John 3
Example of the file 2 content would be:
1 Computer Science
2 Chemistry
3 Physics
So the final output printed on the console would be:
James Computer Science
Cody Chemistry
John Physics
Here is what I have so far in my code:
public class Filereader {
public static void main(String[] args) throws Exception {
File file = new File("file.txt");
File file2 = new File("file2.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String st, st2;
while ((st = reader.readLine()) != null) {
System.out.println(st);
}
while ((st2 = reader2.readLine()) != null) {
System.out.println(st2);
}
reader.close();
reader2.close();
}
}
I am having trouble in figuring out how to match the file contents, and print only the student name and their major by matching the student id in each of the file. Thanks for all the help.

You can use the other answers and make an object to every file, like tables in databases.
public class Person{
Long id;
String name;
//getters and setters
}
public class Course{
Long id;
String name;
//getters and setters
}
Them you have more control with your columns and it is simple to use.
Further you will use an ArrayList<Person> and an ArrayList<Course> and your relation can be a variable inside your objects like courseId in Person class or something else.
if(person.getcourseId() == course.getId()){
...
}
Them if the match is the first number of the files use person.getId() == course.getId().
Ps: Do not use split(" ") in your case, because you can have other objects with two values i.e 1 Computer Science.

What you want is to organize your text file data into map, then merge their data. This will work even if your data are mixed, not in order.
public class Filereader {
public static void main(String[] args) throws Exception {
File file = new File("file.txt");
File file2 = new File("file2.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String st, st2;
Map<Integer, String> nameMap = new LinkedHashMap<>();
Map<Integer, String> majorMap = new LinkedHashMap<>();
while ((st = reader.readLine()) != null) {
System.out.println(st);
String[] parts = st.split(" "); // Here you got ["James", "1"]
String name = parts[0];
Integer id = Integer.parseInt(parts[1]);
nameMap.put(id, name);
}
while ((st2 = reader2.readLine()) != null) {
System.out.println(st2);
String[] parts = st2.split(" ");
String name = parts[1];
Integer id = Integer.parseInt(parts[0]);
majorMap.put(id, name);
}
reader.close();
reader2.close();
// Combine and print
nameMap.keySet().stream().forEach(id -> {
System.out.println(nameMap.get(id) + " " + majorMap.get(id));
})
}
}

You should read these files at the same time in sequence. This is easy to accomplish with a single while statement.
while ((st = reader.readLine()) != null && (st2 = reader2.readLine()) != null) {
// print both st and st2
}
The way your code is written now, it reads one file at a time, printing data to the console from each individual file. If you want to meld the results together, you have to combine the output of the files in a single loop.
Given that the intention may also be that you have an odd-sized file in one batch but you do have numbers to correlate across, or the numbers may come in a nonsequential order, you may want to store these results into a data structure instead, like a List, since you know the specific index of each of these values and know where they should fit in.

Combining the NIO Files and Stream API, it's a little simpler:
public static void main(String[] args) throws Exception {
Map<String, List<String[]>> f1 = Files
.lines(Paths.get("file1"))
.map(line -> line.split(" "))
.collect(Collectors.groupingBy(arr -> arr[1]));
Map<String, List<String[]>> f2 = Files
.lines(Paths.get("file2"))
.map(line -> line.split(" "))
.collect(Collectors.groupingBy(arr -> arr[0]));
Stream.concat(f1.keySet().stream(), f2.keySet().stream())
.distinct()
.map(key -> f1.get(key).get(0)[0] + " " + f2.get(key).get(0)[1])
.forEach(System.out::println);
}
As can easily be noticed in the code, there are assumptions of valid data an of consistency between the two files. If this doesn't hold, you may need to first run a filter to exclude entries missing in either file:
Stream.concat(f1.keySet().stream(), f2.keySet().stream())
.filter(key -> f1.containsKey(key) && f2.containsKey(key))
.distinct()
...

If you change the order such that the number comes first in both files, you can read both files into a HashMap then create a Set of common keys. Then loop through the set of common keys and grab the associated value from each Hashmap to print:
My solution is verbose but I wrote it that way so that you can see exactly what's happening.
import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;
import java.io.File;
import java.util.Scanner;
class J {
public static Map<String, String> fileToMap(File file) throws Exception {
// TODO - Make sure the file exists before opening it
// Scans the input file
Scanner scanner = new Scanner(file);
// Create the map
Map<String, String> map = new HashMap<>();
String line;
String name;
String code;
String[] parts = new String[2];
// Scan line by line
while (scanner.hasNextLine()) {
// Get next line
line = scanner.nextLine();
// TODO - Make sure the string has at least 1 space
// Split line by index of first space found
parts = line.split(" ", line.indexOf(' ') - 1);
// Get the class code and string val
code = parts[0];
name = parts[1];
// Insert into map
map.put(code, name);
}
// Close input stream
scanner.close();
// Give the map back
return map;
}
public static Set<String> commonKeys(Map<String, String> nameMap,
Map<String, String> classMap) {
Set<String> commonSet = new HashSet<>();
// Get a set of keys for both maps
Set<String> nameSet = nameMap.keySet();
Set<String> classSet = classMap.keySet();
// Loop through one set
for (String key : nameSet) {
// Make sure the other set has it
if (classSet.contains(key)) {
commonSet.add(key);
}
}
return commonSet;
}
public static Map<String, String> joinByKey(Map<String, String> namesMap,
Map<String, String> classMap,
Set<String> commonKeys) {
Map<String, String> map = new HashMap<String, String>();
// Loop through common keys
for (String key : commonKeys) {
// TODO - check for nulls if get() returns nothing
// Fetch the associated value from each map
map.put(namesMap.get(key), classMap.get(key));
}
return map;
}
public static void main(String[] args) throws Exception {
// Surround in try catch
File names = new File("names.txt");
File classes = new File("classes.txt");
Map<String, String> nameMap = fileToMap(names);
Map<String, String> classMap = fileToMap(classes);
Set<String> commonKeys = commonKeys(nameMap, classMap);
Map<String, String> nameToClass = joinByKey(nameMap, classMap, commonKeys);
System.out.println(nameToClass);
}
}
names.txt
1 James
2 Cody
3 John
5 Max
classes.txt
1 Computer Science
2 Chemistry
3 Physics
4 Biology
Output:
{Cody=Chemistry, James=Computer, John=Physics}
Notes:
I added keys in classes.txt and names.txt that purposely did not match so you see that it does not come up in the output. That is because the key never makes it into the commonKeys set. So, they never get inserted into the joined map.
You can loop through the HashMap if you want my calling map.entrySet()

How can I unscramble a list of words using a HashMap?

I will be given two files which I need to read into my program. One file will be a list of real words, while the other will be a list of those same words out of order. I need to output the scrambled words in alphabetical order with the real words printed next to them, and I need to do this using a Hashmap. My issue is that I can print out the scrambled word and 1 real word next to it, but in some cases there may be more than one real word for each jumbled word.
for example, my program can do this:
cta cat
stpo post
but I need it to be able to do this:
cta cat
stpo post stop
What changes do I need to make to my code to be able to have more than one dictionary word for each scrambled word? Thank you for your help. My code is below:
import java.io.*;
import java.util.*;
public class Project5
{
public static void main (String[] args) throws Exception
{
BufferedReader dictionaryList = new BufferedReader( new FileReader( args[0] ) );
BufferedReader scrambleList = new BufferedReader( new FileReader( args[1] ) );
HashMap<String, String> dWordMap = new HashMap<String, String>();
while (dictionaryList.ready())
{
String word = dictionaryList.readLine();
dWordMap.put(createKey(word), word);
}
dictionaryList.close();
ArrayList<String> scrambledList = new ArrayList<String>();
while (scrambleList.ready())
{
String scrambledWord = scrambleList.readLine();
scrambledList.add(scrambledWord);
}
scrambleList.close();
Collections.sort(scrambledList);
for (String words : scrambledList)
{
String dictionaryWord = dWordMap.get(createKey(words));
System.out.println(words + " " + dictionaryWord);
}
}
private static String createKey(String word)
{
char[] characterWord = word.toCharArray();
Arrays.sort(characterWord);
return new String(characterWord);
}
}

You need to do several changes. The biggest one is that dWordMap can't hold just one String - it needs to hold the list of words that are found in the scrambled words file.
The next change is being able to manipulate that list. I've added a sample solution which is untested but should give you a good place to start from.
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;
public class Projects {
public static void main (String[] args) throws Exception
{
BufferedReader dictionaryList = new BufferedReader( new FileReader( args[0] ) );
BufferedReader scrambleList = new BufferedReader( new FileReader( args[1] ) );
Map<String, List<String>> dWordMap = new HashMap<>();
while (dictionaryList.ready()) {
String word = dictionaryList.readLine();
dWordMap.put(createKey(word), new ArrayList<>());
}
dictionaryList.close();
while (scrambleList.ready()) {
String scrambledWord = scrambleList.readLine();
String key = createKey(scrambledWord);
List<String> list = dWordMap.get(key);
list.add(scrambledWord);
}
scrambleList.close();
for (Map.Entry<String, List<String>> entry : dWordMap.entrySet()) {
String word = entry.getKey();
List<String> words = entry.getValue();
Collections.sort(words);
System.out.println(concatList(words, " ") + " " + word );
}
}
private static String createKey(String word) {
char[] characterWord = word.toCharArray();
Arrays.sort(characterWord);
return new String(characterWord);
}
private static String concatList(List<String> list, String delimiter) {
StringJoiner joiner = new StringJoiner(delimiter);
list.forEach(joiner::add);
return joiner.toString();
}
}
There a few other changes I would have made - the first is to put the calls to dictionaryList.close(); and scrambleList.close(); in a finally part of a try...catch clause to make sure that the resources are freed in the end no matter what happens. You can also consider using Java 8's Streams to make the code more up to date. I'll be happy to give some more tips if this doesn't fit your needs or you have any more questions. Good luck!

If you want to record the list of dictionary words that are anagrams of each scrambled word then you will need to have a map to a list:
Map<String, List<String>> anagrams = new HashMap<>();
Then, for each scrambled word, you add a list of dictionary words to the map:
anagrams.put(scrambled, allAnagrams(scrambled));
Where allAnagrams would look like:
private List<String> allAnagrams(String scrambled) {
List<String> anagrams = new ArrayList<>();
for (String word: dictionary) {
if (isAnagram(word, scrambled))
anagrams.add(word);
}
Collections.sort(anagrams);
return anagrams;
}
Not that if you have Java 8 and are familiar with streams then this could be:
private List<String> allAnagrams(String scrambled) {
return dictionary.stream()
.filter(word -> isAnagram(scrambled, word))
.sorted()
.collect(Collectors.toList());
}

To improve upon #sprinter's Map<String, List<String>> example:
private final Map<String, List<String>> lookup = new HashMap<>();
public List<String> getList(String word) {
//can also make #computeIfAbsent use an "initializer" for the key
return lookup.computeIfAbsent(word, k -> new ArrayList<>());
}
Then it's simple to interact with:
List<String> words = getList("tspo"); //spot, post, stop, etc...
You can do the unscrambling from there, and could go even further if you wanted to save space and find a way to index the key as a specific list of characters (so that sotp and tpos would do only one lookup).

Read file, replace a word and write to a new file in Java

I am trying to read a text file which I then put into an ArrayList and then go through that ArrayList and replace any occurrences of the word "this" with "**". I then want to put this modified ArrayList back into a new file with the newly edited text.
The applications currently reads the lines into the ArrayList correctly and writing to the new file works. However, replaceWords method doesn't appear to be functioning as expected i.e. this is not being replaced by **. Any help would be greatly appreciated!
package com.assignment2s162305.answers;
import java.io.FileWriter;
import java.io.IOException;
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Collections;
public class Question27 {
private List<String> lines = new ArrayList<String>();
// read original file to an ArrayList
public String[] readOriginalFile(String filename) throws IOException {
FileReader fileReader = new FileReader(filename);
BufferedReader bufferedReader = new BufferedReader(fileReader);
String line = null;
while ((line = bufferedReader.readLine()) != null) {
lines.add(line);
}
bufferedReader.close();
return lines.toArray(new String[lines.size()]);
}
// replace words with ****
public void replaceWords() {
Collections.replaceAll(lines, "this", "****");
System.out.println(lines);
}
// write modified ArrayList to a new file
public void writeToNewFile() throws IOException {
FileWriter writer = new FileWriter("output.txt");
for (String str : lines) {
writer.write(str);
}
writer.close();
}
}
package com.assignment2s162305.answers;
import java.io.IOException;
public class Question27Test {
public static void main(String[] args) {
Question27 question27object = new Question27();
String filename = "Hamlet2.txt";
try {
String[] lines = question27object.readOriginalFile(filename);
System.out.println("______ORIGINAL DOCUMENT______\n");
for (String line : lines) {
System.out.println(line);
}
System.out.println("\n\n");
question27object.replaceWords();
question27object.writeToNewFile();
} catch(IOException e) {
// Print out the exception that occurred
System.out.println("Unable to create "+filename+": "+e.getMessage());
}
}
}

Your replaceWords method has a bug. To fix it, you need to loop through the lines and do the replacement in each line. What you have implemented is to replace all lines which are equal to "this" with ****. So this works OK but is not what you wanted.
This this code. This will fix it.
public void replaceWords() {
ArrayList<String> lns = new ArrayList<String>();
for (String ln : lines){
lns.add(ln.replaceAll("this", "****"));
}
lines.clear();
lines = lns;
System.out.println(lines);
}

You are reading lines of text into your array lines. However, Collections.replaceAll doesn't search inside the string to replace the word "this"; it will just test if the entire line is equal to "this", which it certainly isn't.
Replaces all occurrences of one specified value in a list with another. More formally, replaces with newVal each element e in list such that (oldVal==null ? e==null : oldVal.equals(e)).
Example:
List<String> lines = new ArrayList<String>();
lines.add("This");
lines.add("this");
lines.add("THIS");
lines.add("this won't work.");
Collections.replaceAll(lines, "this", "****");
System.out.println(lines);
Output:
[This, ****, THIS, this won't work.]
You can split your lines into words, and attempt Collections.replaceAll on the List of words.
Or, you can use String's replace method, which will match the word within the line:
for (int i = 0; i < lines.size(); i++)
{
lines.set(i, lines.get(i).replace("this", "****"));
}

From replaceAll method javadoc:
Replaces all occurrences of one specified value in a list with
another. More formally, replaces with newVal each element e in list
such that (oldVal==null ? e==null : oldVal.equals(e)). (This method
has no effect on the size of the list.)
It means, that you are trying to replace String instance holding exactly "this" text by String instance holding "****" text instead of replacing text inside String object. Note that String is immutable.
You have to iterate list of Strings and replace text using String#replace or String#replaceAll methods.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Comparing two different text files and replacing similar words - java

Related

Java. Extracting character from array that isn't ASCII

Duplicate word frequencies issues in Java [duplicate]

Prinitng matching information from 2 files in Java

How can I unscramble a list of words using a HashMap?

Read file, replace a word and write to a new file in Java

Categories

Resources