Remove stop words in Java --- Help needed - java

Im using a method to remove stop word defined in a file, that will rip off those words from the query string that i pass to this method... The code is working fine
Now what i need to do is ... If the query string contains just those stop words alone then it should not be ripped of..
eg : if the stopwords file has "is" "was" "and"
if the query is "I was a student" then the output should be " I a student"
but if the query is "and is " now i need the output the same as "and is".
Below is the method that i wrote to remove stop words.
public static String removeStopWords(String query) throws UnsupportedEncodingException
{
String []queryTerms = query.split("&");
String queryString="";
StringBuffer sb =new StringBuffer();
for (int i=0;i<queryTerms.length;i++){
if(queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")){
queryString = queryTerms[i].replaceAll("q=","").trim().replace("+"," ").replaceAll("\\s+"," ").trim();
}
}
if(!queryString.equalsIgnoreCase("")) {
String [] tokens=queryString.split("\\s+");
List lStopWords=StopWordDataLoad.getlQueryStringStopword();
List<String> lTokens=new ArrayList<String>();
boolean noStopWord=false;
for(String s: tokens)
if(!lStopWords.contains(s)) {
if(sb.length()==0) sb.append(s);
else sb.append(" ").append(s);
} else noStopWord=true;
queryString=sb.toString().replaceAll("\\s+", " ");
if(queryString.equalsIgnoreCase("") || noStopWord ==false) return query;
}
else return query;
String fque="";
String finQue = "";
ArrayList<String> list = new ArrayList<String>();
for (int i=0;i<queryTerms.length;i++){
if(queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")){
fque = "q="+URLEncoder.encode(queryString,PropertyLoader.getHttpEncoding());
list.add(fque);
} else if (!queryTerms[i].equalsIgnoreCase("")) list.add(queryTerms[i]);
}
ListIterator<String> iter = list.listIterator();
while(iter.hasNext()){
String str = iter.next();
finQue=finQue+"&"+str;
}
return finQue.trim();
}

Just change the last line to this:
String result = finQue.trim();
if (result.equals("")) {
return query;
} else {
return result;
}

Related

optimising the search time in hashmap

I have a csv file which is hashmapped, whenever the user enter the city name(key) it will display all the details of that city. I have to optimize the search result time, everytime the it is reading the file(instead of only once) and displaying the values.
The CSV files contains data like this :
city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
Malishevë,Malisheve,42.4822,20.7458,Kosovo,XK,XKS,Malishevë,admin,,1901597212
Prizren,Prizren,42.2139,20.7397,Kosovo,XK,XKS,Prizren,admin,,1901360309
Zubin Potok,Zubin Potok,42.9144,20.6897,Kosovo,XK,XKS,Zubin
Potok,admin,,1901608808
import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Scanner;
import java.io.IOException;
public class CSVFileReaders{
public static void main(String[] args) {
String filePath = "C:\\worldcities1.csv";
Scanner in = new Scanner(System.in);
System.out.println(" \n Enter the City name to be Searched : \n _> ");
long start = System.currentTimeMillis();
String searchTerm = in.nextLine();
readAndFindRecordFromCSV(filePath, searchTerm);
long end = System.currentTimeMillis();
System.out.println(" \n It took " + (end - start) + " Milli Seconds to search the result \n");
in.close();
}
public static void readAndFindRecordFromCSV( String filePath, String searchTerm) {
try{
HashMap<String,ArrayList<String>> cityMap = new HashMap<String,ArrayList<String>>();
Scanner x = new Scanner (new File(filePath),"UTF-8");
String city= "";
while(x.hasNextLine()) {
ArrayList<String> values = new ArrayList<String>();
String name = x.nextLine();
//break each line of the csv file to its elements
String[] line = name.split(",");
city = line[1];
for(int i=0;i<line.length;i++){
values.add(line[i]);
}
cityMap.put(city,values);
}
x.close();
//Search the city
if(cityMap.containsKey(searchTerm)) {
System.out.println("City name is : "+searchTerm+"\nCity details are accordingly in the order :"
+ "\n[city , city_ascii , lat , lng , country , iso2 , iso3 , admin_name , capital , population , id] \n"
+cityMap.get(searchTerm)+"");
}
else {
System.out.println("Enter the correct City name");
}
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}`
the time should be optimized and every time i search it is reading the entire file(which should happen)
Currently you mix the map initialization inside the search function.
You don't want that.
First, init the map, then use it in the search function.
To do that, extract a method for statements that instantiate and value the map and then refactor the readAndFindRecordFromCSV() method so that it accepts a Map as additional parameter :
public static void readAndFindRecordFromCSV( String filePath, String searchTerm, HashMap<String,ArrayList<String>> dataByCity) {...}
With refactoring IDE features, it should be simple enough : "extracting method" then "change signature".
Here is a code (not tested at runtime but tested at compile time) that splits the logical in separated tasks and also rely on instance methods :
public class CSVFileReaders {
private final String csvFile;
private HashMap<String, ArrayList<String>> cityMap;
private final Scanner in = new Scanner(System.in);
public static void main(String[] args) {
String filePath = "C:\\worldcities1.csv";
CSVFileReaders csvFileReaders = new CSVFileReaders(filePath);
csvFileReaders.createCitiesMap();
csvFileReaders.processUserFindRequest(); // First search
csvFileReaders.processUserFindRequest(); // Second search
}
public CSVFileReaders(String csvFile) {
this.csvFile = csvFile;
}
public void createCitiesMap() {
cityMap = new HashMap<>();
try (Scanner x = new Scanner(new File(csvFile), "UTF-8")) {
String city = "";
while (x.hasNextLine()) {
ArrayList<String> values = new ArrayList<String>();
String name = x.nextLine();
//break each line of the csv file to its elements
String[] line = name.split(",");
city = line[1];
for (int i = 0; i < line.length; i++) {
values.add(line[i]);
}
cityMap.put(city, values);
}
x.close();
} catch (FileNotFoundException e) {
throw new RuntimeException(e);
}
}
public void processUserFindRequest() {
System.out.println(" \n Enter the City name to be Searched : \n _> ");
long start = System.currentTimeMillis();
String searchTerm = in.nextLine();
long end = System.currentTimeMillis();
System.out.println(" \n It took " + (end - start) + " Milli Seconds to search the result \n");
//Search the city
if (cityMap.containsKey(searchTerm)) {
System.out.println("City name is : " + searchTerm + "\nCity details are accordingly in the order :"
+ "\n[city , city_ascii , lat , lng , country , iso2 , iso3 , admin_name , capital , population , id] \n"
+ cityMap.get(searchTerm) + "");
} else {
System.out.println("Enter the correct City name");
}
}
}
The interesting part is here :
String filePath = "C:\\worldcities1.csv";
CSVFileReaders csvFileReaders = new CSVFileReaders(filePath);
csvFileReaders.createCitiesMap();
csvFileReaders.processUserFindRequest(); // First search
csvFileReaders.processUserFindRequest(); // Second search
The logical is clearer now.
Why do you create / load the CSV into a HashMap with every search ?
Just create the HashMap only once in the beginning, and then on every search just check whether it exists in the HashMap, eg move the read part into a separate method :
HashMap<String,ArrayList<String>> cityMap = new HashMap<String,ArrayList<String>>();
public static void readCSVIntoHashMap( String filePath) {
try{
Scanner x = new Scanner (new File(filePath),"UTF-8");
String city= "";
while(x.hasNextLine()) {
ArrayList<String> values = new ArrayList<String>();
String name = x.nextLine();
//break each line of the csv file to its elements
String[] line = name.split(",");
city = line[1];
for(int i=0;i<line.length;i++){
values.add(line[i]);
}
cityMap.put(city,values);
}
x.close();
...
}
Then have a separate method for searching :
public static void search(String searchTerm) {
if(cityMap.containsKey(searchTerm)) {
...
}
}

Replace word in Java

There is some line, for example "1 qqq 4 aaa 2" and list {aaa, qqq}. I must change all words (consists only from letters) on words from list. Answer on this example "1 aaa 4 qqq 2". Try
StringTokenizer tokenizer = new StringTokenizer(str, " ");
while (tokenizer.hasMoreTokens()){
tmp = tokenizer.nextToken();
if(tmp.matches("^[a-z]+$"))
newStr = newStr.replaceFirst(tmp, words.get(l++));
}
But it's not working. In result I have the same line.
All my code:
String space = " ", tmp, newStr;
Scanner stdin = new Scanner(System.in);
while (stdin.hasNextLine()) {
int k = 0, j = 0, l = 0;
String str = stdin.nextLine();
newStr = str;
List<String> words = new ArrayList<>(Arrays.asList(str.split(" ")));
words.removeIf(new Predicate<String>() {
#Override
public boolean test(String s) {
return !s.matches("^[a-z]+$");
}
});
Collections.sort(words);
StringTokenizer tokenizer = new StringTokenizer(str, " ");
while (tokenizer.hasMoreTokens()){
tmp = tokenizer.nextToken();
if(tmp.matches("^[a-z]+$"))
newStr = newStr.replaceFirst(tmp, words.get(l++));
}
System.out.printf(newStr);
}
I think the problem might be that replaceFirst() expects a regular expression as first parameter and you are giving it a String.
Maybe try
newStr = newStr.replaceFirst("^[a-z]+$", words.get(l++));
instead?
Update:
Would that be a possibility for you:
StringBuilder _b = new StringBuilder();
while (_tokenizer.hasMoreTokens()){
String _tmp = _tokenizer.nextToken();
if(_tmp.matches("^[a-z]+$")){
_b.append(words.get(l++));
}
else{
_b.append(_tmp);
}
_b.append(" ");
}
String newStr = _b.toString().trim();
Update 2:
Change the StringTokenizer like this:
StringTokenizer tokenizer = new StringTokenizer(str, " ", true);
That will also return the delimiters (all the spaces).
And then concatenate the String like this:
StringBuilder _b = new StringBuilder();
while (_tokenizer.hasMoreTokens()){
String _tmp = _tokenizer.nextToken();
if(_tmp.matches("^[a-z]+$")){
_b.append(words.get(l++));
}
else{
_b.append(_tmp);
}
}
String newStr = _b.toString().trim();
That should work.
Update 3:
As #DavidConrad mentioned StrinkTokenizer should not be used anymore. Here is another solution with String.split():
final String[] _elements = str.split("(?=[\\s]+)");
int l = 0;
for (int i = 0; i < _tokenizer.length; i++){
if(_tokenizer[i].matches("^[a-z]+$")){
_b.append(_arr[l++]);
}
else{
_b.append(_tokenizer[i]);
}
}
Just out of curiosity, another solution (the others really don't answer the question), which takes the input line and sorts the words alphabetically in the result, as you commented in your question.
public class Replacer {
public static void main(String[] args) {
Replacer r = new Replacer();
Scanner in = new Scanner(System.in);
while (in.hasNextLine()) {
System.out.println(r.replace(in.nextLine()));
}
}
public String replace(String input) {
Matcher m = Pattern.compile("([a-z]+)").matcher(input);
StringBuffer sb = new StringBuffer();
List<String> replacements = new ArrayList<>();
while (m.find()) {
replacements.add(m.group());
}
Collections.sort(replacements);
m.reset();
for (int i = 0; m.find(); i++) {
m.appendReplacement(sb, replacements.get(i));
}
m.appendTail(sb);
return sb.toString();
}
}

How do I sort jumbled words from the dictionary using Hashmap

I cannot get my program to compile. What I am trying to do is get the program to print out all the jumbled words with the dictionary words that can be made from it printed next to it. I believe it is an error in the way I nested my loops but I can't figure it out. Is anyone able to give me a hand?
public static void main(String[] args) throws Exception
{
if (args.length < 2) die("Must give name of two input files on cmd line.");
BufferedReader dictionaryFile = new BufferedReader( new FileReader( args[0] ));
BufferedReader jumbleFile = new BufferedReader( new FileReader(args[0] ));
HashMap<String, ArrayList<String>> map = new HashMap<String, ArrayList<String>>();
ArrayList<String> jumbleWords = new ArrayList<String>();
ArrayList<String> dictionaryWords = new ArrayList<String>();
ArrayList<String> keysList = new ArrayList<String>();
while(jumbleFile.ready())
{
String jWord=jumbleFile.readLine();
jumbleWords.add(jWord);
}
jumbleFile.close();
Collections.sort(jumbleWords);
while(dictionaryFile.ready())
{
String dWord= dictionaryFile.readLine();
String dictWord= toCanonical(dWord);
if(map.containsKey(dictWord))
{
map.get(dictWord);
map.put(dWord, map.get(dictWord));
}
else
{
ArrayList<String> dictionaryWords2 = new ArrayList<String>();
dictionaryWords2.add(dWord);
map.put(dictWord, dictionaryWords2);
}
for( String i : map.keySet())
{
keysList.add(i);
}
Collections.sort(keysList);
for (String key : keysList)
{
System.out.print(key);
String toCanJWord= toCanonical(key);
if(map.containsKey(toCanJWord))
{
map.get(toCanJWord);
Collections.sort(map.get(toCanJWord));
for(map.get(toCanJWord))
{
System.out.print(toCanJWord);
}
}
System.out.println();
}
private static String toCanonical( String word )
{
char[] letters = word.toCharArray();
Arrays.sort(letters);
return new String(letters);
}
private static void die( String errmsg )
{
System.out.println( "\nFATAL ERROR: " + errmsg + "\n" );
System.exit(0);
}
}`
You have a couple issues. First you are missing a } at the end of the for loop here:
for (String key : keysList)
{
System.out.print(key);
String toCanJWord = toCanonical(key);
if(map.containsKey(toCanJWord))
{
map.get(toCanJWord);
Collections.sort(map.get(toCanJWord));
//this isn't correct. Not sure what you are trying to do here
//but this is why it won't compile
for(map.get(toCanJWord))
{
System.out.print(toCanJWord);
}
}
}//missing this closing bracket
There is also an issue with your for loop, see the comments.
Your for loop is wrong:
for(map.get(toCanJWord))
{
System.out.print(toCanJWord);
}
It needs to be of the format:
for(String wordToPrint : map.get(toCanJWord))
{
System.out.print(wordToPrint );
}

How can I count the number of cities per country from the data file?

How can I count the number of cities per country from the data file? I would also like to display the value as percentage of the total.
import java.util.StringTokenizer;
import java.io.*;
public class city
{
public static void main(String[] args)
{
String[] city = new String[120];
String country = null;
String[] latDegree =new String[120];
String lonDegree =null;
String latMinute =null;
String lonMinute =null;
String latDir = null;
String lonDir = null;
String time = null;
String amORpm = null;
try
{
File myFile = new File("CityLongandLat.txt");
FileReader fr = new FileReader(myFile);
BufferedReader br = new BufferedReader(fr);
String line = null;
int position =0;
int latitude=0;
while( (line = br.readLine()) != null)
{
// System.out.println(line);
StringTokenizer st = new StringTokenizer(line,",");
while(st.hasMoreTokens())
{
city[position] = st.nextToken();
country = st.nextToken();
latDegree[latitude] =st.nextToken();
latMinute =st.nextToken();
latDir = st.nextToken();
lonDegree =st.nextToken();
lonMinute =st.nextToken();
lonDir = st.nextToken();
time = st.nextToken();
amORpm = st.nextToken();
}
if(city.length<8)
{
System.out.print(city[position] + "\t\t");
}
else
{
System.out.print(city[position] + "\t");
}
if(country.length()<16)
{
System.out.print(country +"\t\t");
}
else
{
System.out.print(country);
}
System.out.print(latDegree + "\t");
System.out.print(latMinute + "\t");
System.out.print(latDir + "\t");
System.out.print(lonDegree + "\t");
System.out.print(lonMinute + "\t");
System.out.print(lonDir + "\t");
System.out.print(time + "\t");
System.out.println(amORpm + "\t");
position++;
}
br.close();
}
catch(Exception ex)
{
System.out.println("Error !!!");
}
}
}
One easy way that comes to my mind would be as follows...
Create a hashMap Object where the key is a string (the country) and the value is an integer (number of cities found for the country) so it would be something like
Map countryResultsFoundMap = new HashMap< String,Integer>();
In short, for each row you would pick the country, (I would recommend that you .trim() and .toLowerCase() the value first) and check if it is existing in the hashMap, if not, add the entry like countryResultsFoundMap.put(country,0), otherwise, if the country already exists the pick the value from the hashMAp and add +1 to its integer value.
Eventually you will have all the values stored in the map and you can have access to that data for your calculations.
Hope that helps
"here are some of the output from the data file from my programme"
Aberdeen Scotland 57 2 [Ljava.lang.String;#33906773 9 N [Ljava.lang.String;#4d7‌​7c977 9 W 05:00 p.m. Adelaide Australia 34 138 [Ljava.lang.String;#33906773 55 S [Ljava.lang.String;‌​#4d77c977 36 E 02:30 a.m...
The reason why your getting that output, is because you're trying to print the array object latDegree.
String[] latDegree
...
System.out.print(latDegree + "\t");
Also, you have lattitude = 0; but you never increment it, so it will always use the index 0 for the array. You need to increment it, like you did position++.
So for the print statement, print the print the value at index lattitude, not the entire array
Try this
System.out.print(latDegree[lattitude] + "\t");
...
lattitude++;
If for some reason you do want to print the array, then use Arrays.toString(array); or just iterate through it
I would also start with a map, and group the cities by country with a map.
Map<String,<List<String>>
Where the key is the country and the value is the list of cities in this country. With the size() methods you can perform the operations cities per country and percentage of total.
When you read one line you check if the key (country) already exists, if not you create a new list and add the city, otherwise add the city only to the existing list.
As a starter you could use the following snippet. However this sample assumes that the content of the file is read already and given as an argument to the method.
Map<String,List<String>> groupByCountry(List<String> lines){
Map<String,List<String>> group = new HashMap<>();
for (String line : lines) {
String[] tokens = line.split(",");
String city = tokens[0];
String country = tokens[1];
...
if(group.containsKey(country)){
group.get(country).add(city);
}else{
List<String> cities = new ArrayList<>();
cities.add(city);
group.put(country, cities);
}
}
return group;
}

array in array list

In the input file, there are 2 columns: 1) stem, 2) affixes. In my coding, i recognise each of the columns as tokens i.e. tokens[1] and tokens[2]. However, for tokens[2] the contents are: ng ny nge
stem affixes
---- -------
nyak ng ny nge
my problem here, how can I declare the contents under tokens[2]? Below are my the snippet of the coding:
try {
FileInputStream fstream2 = new FileInputStream(file2);
DataInputStream in2 = new DataInputStream(fstream2);
BufferedReader br2 = new BufferedReader(new InputStreamReader(in2));
String str2 = "";
String affixes = " ";
while ((str2 = br2.readLine()) != null) {
System.out.println("Original:" + str2);
tokens = str2.split("\\s");
if (tokens.length < 4) {
continue;
}
String stem = tokens[1];
System.out.println("stem is: " + stem);
// here is my point
affixes = tokens[3].split(" ");
for (int x=0; x < tokens.length; x++)
System.out.println("affix is: " + affixes);
}
in2.close();
} catch (Exception e) {
System.err.println(e);
} //end of try2
You are using tokens as an array (tokens[1]) and assigning the value of a String.split(" ") to it. So it makes things clear that the type of tokens is a String[] array.
Next,
you are trying to set the value for affixes after splitting tokens[3], we know that tokens[3] is of type String so calling the split function on that string will yield another String[] array.
so the following is wrong because you are creating a String whereas you need String[]
String affixes = " ";
so the correct type should go like this:
String[] affixes = null;
then you can go ahead and assign it an array.
affixes = tokens[3].split(" ");
Are you looking for something like this?
public static void main(String[] args) {
String line = "nyak ng ny nge";
MyObject object = new MyObject(line);
System.out.println("Stem: " + object.stem);
System.out.println("Affixes: ");
for (String affix : object.affixes) {
System.out.println(" " + affix);
}
}
static class MyObject {
public final String stem;
public final String[] affixes;
public MyObject(String line) {
String[] stemSplit = line.split(" +", 2);
stem = stemSplit[0];
affixes = stemSplit[1].split(" +");
}
}
Output:
Stem: nyak
Affixes:
ng
ny
nge

Categories

Resources