How to speedup multiple searches on an HashTable - java

I have two files (with almost 5000 lines each) with logs. The files in each line has a set of rules associated too an email, like this:
Y#12#EMAIL_1#RULE_1,RULE_2,RULE_3,RULE_4#time=993470174
Y#12#EMAIL_2#RULE_1,RULE_2,RULE_3,RULE_4#time=993470175
Y#12#EMAIL_3#RULE_1,RULE_2,RULE_3#time=9934701778
I use the following function to read the file, and get the rules for each email:
private void processFile()
{
ArrayList<String[]> lSplitRules = new ArrayList<>();
try {
FileInputStream fileStream = new FileInputStream("log.log");
DataInputStream fileIn = new DataInputStream(fileStream);
BufferedReader fileBr = new BufferedReader(new InputStreamReader(fileIn));
String strLine;
while ((strLine = fileBr.readLine()) != null)
{
String[] lTokens = strLineSpam.split("#");
String lRawRules = lTokens[3];
lSplitRules.add(lRawRules.split(","));
}
} catch (FileNotFoundException e) {
System.out.println("File: log.log, not found. Error: " + e.getMessage());
} catch (IOException e) {
System.out.println("Couldn't open log.log. Error: " + e.getMessage());
}
So far so, good. In each "space" of the ArrayList I'll have an String[] containing the rules for each email. In other hand i have also an HashMap containing one unique list of rules and it's value like this:
RULE_NAME - VALUE
RULE_1 - 0.1
RULE_2 - 0.5
RULE_3 - 0.6
...
I need to compare every rule of every email too see if it exists on the HashMap. If exist returns the value of the rule for some calculations
I use this function for that:
private Double eval (String rule, Map<String, Double> scores)
{
for (Entry<String, Double> entry : scores.entrySet()) {
if (entry.getKey().equalsIgnoreCase(rule))
{
return entry.getValue();
}
}
return 0.0;
}
The problem is that i need to compare every email and it's rules multiple times (more then 10.000), since I'm using a Genetic Algorithm to try to optimize the VALUE of each RULE. Is there anyway to optimize the comparison of the rules of each email through the HASHMAP? Since i need speed, I'm doing 100 verifications in 8 minutes now.
Sorry for my english.
Regards

The whole point of having a hash table is so youc an do a single hash lookup. If you are just going to loop through the keys, you may as well use a List.
I don't know where you are building your scores, but you can normalise the case.
scores.put(key.toLowerCase(), value);
for a case insensive lookup
Double d= scores.get(key.toLowerCase());

Related

Move from string to array but after that select by the first character (Record Type 1, 2, 5)

I need your help, I am new in Java
I need to read a flat file with 5 different of records
the way to differentiate each record is the first characters, after that I have the idea to move to an 5 different array to play with with the data inside.
example
120220502Name Last Name1298843984 $1.50
120220501other client 8989899889 $23.89
2Toronto372 Yorkland drive 1 year Ontario
512345678Transfer Stove Pay
522457839Pending Microwave Interactive
any help will quite appreciated
Break the problem into chunks. The first problem is reading the file:
try (BufferedReader reader = new BufferedReader(new FileReader("path/to/file"))) {
parseData(reader); //method to do the work.
} catch (IOException e) {
e.printStackTrace();
}
Then you need to decide what kind of record it is:
public void parseData(BufferedReader input) throws IOException {
for (String line = input.readLine(); line != null; line = input.readLine()) {
if (line.startsWith("1")) {
parseType1(line);
} else if (line.startsWith("2")) {
parseType2(line);
} else if (line.startsWith("5")) {
parseType5(line);
} else {
throw new Exception("Unknown record type: " + line.charAt(0));
}
}
}
Then you'll need to create the various parseTypeX method to handle turning the text into usable chunks and then into classes.
public Type1Record parseType1(String data) {
//create a Type1Record
Type1Record record = new Type1Record();
//split the string something like
String [] fields = data.split("\\s+");
//Assign those chunks to the record
record.setId(fields[0]);
record.setFirstName(fields[1]);
record.setLastName(fields[2]);
record.setTotal(fields[3]); //if you want this to be a real number, you'll need to remove the $
}
Repeat the process with the other record types. You'll likely need to group records together, but that should be easy enough.

Read and compare two large Files

I would like to read and compare all the lines of both files, I explain, I would like to find for each password hasher (from my test.txt file) the hashes that are the same (from the password.txt file). The problem is that it should be fast enough (I would say max 45 min for 10M for password.txt and 1M for test.txt).
I have for the moment this code
private static void bufferedReaderFilePasswordFirst() {
Path path = Paths.get("C:\\Users\\basil\\OneDrive - Haute Ecole Bruxelles Brabant (HE2B)\\Documents\\NetBeansProjects\\sha256\\passwords.txt");
Path pathUser = Paths.get("C:\\Users\\basil\\OneDrive - Haute Ecole Bruxelles Brabant (HE2B)\\Documents\\NetBeansProjects\\sha256\\test.txt");
int nbOfLine = 0;
StringBuffer oui = new StringBuffer();
try (BufferedReader readerPasswordGenerate = Files.newBufferedReader(path, Charset.forName("UTF-8"));) {
String currentLineUser = null;
String currentLinePassword = null;
long start = System.nanoTime();
while (((currentLinePassword = readerPasswordGenerate.readLine()) != null)) {
BufferedReader readerPasswordUser = Files.newBufferedReader(pathUser, Charset.forName("UTF-8"));
while ((currentLineUser = readerPasswordUser.readLine()) != null) {
String firstWord = currentLinePassword.substring(0, currentLinePassword.indexOf(":"));
if ((firstWord.charAt(0) == currentLineUser.charAt(0))
&& (firstWord.charAt(14) == currentLineUser.charAt(14))
&& (firstWord.charAt(31) == currentLineUser.charAt(31))
&& (firstWord.charAt(63) == currentLineUser.charAt(63))
) {
if (firstWord.equals(currentLineUser)) {
String secondWord = currentLinePassword.substring(currentLinePassword.lastIndexOf(":") + 1);
oui.append(secondWord).append(System.lineSeparator());
}
}
}
if (nbOfLine % 300 == 0) {
System.out.println("We are at the " + nbOfLine);
final long consumed = System.nanoTime() - start;
final long totConsumed = TimeUnit.NANOSECONDS.toMillis(consumed);
final double tot = (double) totConsumed;
System.out.printf("Not done. Took %s seconds", (tot / 1000));
System.out.println(oui + " oui");
}
nbOfLine++;
}
System.out.println(oui);
final long consumed = System.nanoTime() - start;
final long totConsumed = TimeUnit.NANOSECONDS.toMillis(consumed);
final double tot = (double) totConsumed;
System.out.printf("Done. Took %s seconds", (tot / 1000));
} catch (IOException ex) {
ex.printStackTrace(); //handle an exception here
}
}
In this code, I just compare for each element in my test.txt if the corresponding element in the password hash is same.
The password.txt contains for all elements: hash:password
and test.txt contains only: hash
Thanks
In this code, I just compare for each element in my test.txt if the corresponding element in the password hash is same.
If you are familiar with Big-O notation, you might recognize that this means your algorithm runs in O(n^2) time. In your specific case, for each of the 1,000,000 lines in test.txt you are doing 10,000,000 comparisons for a total of 10,000,000,000,000 total comparisons. To achieve your goal of running it within 45 minutes you would need to do 3.7 billion comparisons per second. For comparison, the i7 in my laptop runs at a max of 3.9GHz (billion cycles per second) and it will take much more than a single cpu cycle to execute one of these comparisons.
You can reduce the time complexity down to O(n) by first reading the password.txt into a HashMap (10,000,000 operations). From there, any individual check from test.txt only takes a single operation (1,000,000 total), resulting in 11,000,000 operations total. That means you only have to do ~4,000 operations a second (a 99.99989% reduction) to finish in 45 minutes which is much more doable.
Here's some pseudo-code to illustrate what that could look like:
// I like Scanner over BufferedReader for reading files. Use whatever you like.
Scanner readPassword = new Scanner(new File("password.txt"));
// Load all password/hash pairings from password.txt into a HashMap for quick lookups
HashMap<String, List> passwords = new HashMap<>();
while (readPassword.hasNextLine()) {
String line = readPassword.nextLine();
String[] lineParts = line.split(":");
String hash = lineParts[0];
String password = lineParts[1];
// If we haven't seen the hash before, create a new list to store its associated passwords
if (passwords.get(hash) == null) {
passwords.put(hash, new LinkedList<>());
}
// Add the password to the list of all passwords that have this hash
passwords.get(hash).add(password);
}
// Perform all the lookups from test.txt
Scanner readTest = new Scanner(new File("test.txt"));
while (readTest.hasNextLine()) {
String testHash = readTest.nextLine();
List matchingPasswords = passwords.get(testHash);
// Now do whatever you want with the list of associated passwords...
}
Side Notes:
Looking at your code, it look like you have a few extra requirements (e.g. timing) that I didn't consider in this code snippet. I trust you can figure out how to integrate those additional requirements.
Some of the more academic people on here might take issue with a few parts of my Big-O description/analysis. I'm sure their comments on this post will expound that topic in greater detail if that interests you.

Java comparing csv file values

I'm trying to create a csv file where only 1 team name is shown per row, so when you click the button twice it will only add the team name if its not already there. currently it adds the team "UWE" every single time you press the button. the code for this is below:
public void showStats(ActionEvent event){
try {
File matchFile = new File("src/sample/matchData.csv");
File scoreFile = new File("src/sample/scoreData.csv");
Scanner matchReader = new Scanner(matchFile);
Scanner scoreReader = new Scanner(scoreFile);
while (matchReader.hasNextLine()) {
String data = matchReader.nextLine();
List<String> matchList = Arrays.asList(data.split(","));
while (scoreReader.hasNextLine()) {
String dataScore = scoreReader.nextLine();
List<String> dataScoreList = Arrays.asList(dataScore.split(","));
if (dataScoreList.get(0).equals(matchList.get(0))) {
//
} else {
writeExcel("scoreData", matchList.get(0)) ;
}
System.out.println(dataScoreList);
}
System.out.println(matchList);
}
matchReader.close();
scoreReader.close();
} catch (FileNotFoundException e) {
System.out.println("An error occurred.");
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
The csv file "matchData" contains:
UWE,KCC,Jin,Julia,Chris,Ryan,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,5,0
The csv file "scoreData" has one empty line in it
You can first go through your source CSV file and put in a map only the lines that contain unique team key....
while (matchReader.hasNextLine()) {
String data = matchReader.nextLine();
String[] record = data.split(",", 2);
Map<String, String> matchList = new TreeMap<>();
matchList.putIfAbsent(record[0], record[1]); // only unique keys are entered.
}
// TODO write to Excel each entry in the map (you don't need to check for unique keys)
Notice that writing to Excel is done after the map is complete. This is the best approach; or at least better than what you showed in your original post. With this approach, you are letting the data structure simplify your process (and no nested loops).
UPDATE:
I forgot to mention that matchList.putIfAbsent(K, V) works with Java 8 and later. If using Java 7 or older (should upgrade Java ASAP), then you will have to do the following:
String value = matchList.get(record[0]);
if (value == null) {
matchList.put(record[0], record[1]);
}
This is because Map#get(K) returns null is no entry is found OR the map allowed for null values to be entered for a given key. Otherwise, it will return the previous value. The new method introduced in Java 8 does this check automatically.

How to deal with same keys in HashMaps ?

Hello fellow soldiers.
Obviously keys in hashmaps are unique. However, I've been trying to write a code that reads a csv file and then puts the key and value in the map. However, there are keys the same (every key is like 15 times in the csv file). In that case, it should make a sum of the values, and just return the key once.
How to do that? My code right now is as follows.
BufferedReader br = null;
String line;
try {
br = new BufferedReader(new FileReader(filepath));
} catch (FileNotFoundException fnfex) {
System.out.println(fnfex.getMessage() + "Bestand niet gevonden!");
System.exit(0);
}
//this is where we read lines
try {
while((line = br.readLine()) != null) {
String[] splitter = line.split(cvsSplitBy);
if(splitter[0] != "Voertuig") {
alldataMap.put(splitter[0], splitter[8]);
}
//MIGHT BE JUNK, DONT KNOW YET
/*if((splitter[0].toLowerCase()).contains("1")){
double valuekm = Double.parseDouble(splitter[8]);
license1 += valuekm;
System.out.println(license1);
}
else {
System.out.println("not found");
}*/
}
System.out.println(alldataMap);
TextOutput();
} catch (IOException ioex) {
System.out.println(ioex.getMessage() + " Error 1");
} finally {
System.exit(0);
}
So if I have the following info (in this case its the 0th and 8th word read every line in the csv file)
Apples; 299,9
Bananas; 300,23
Apples; 3912,1
Bananas;342
Bananas;343
It should return
Apples;Total
Bananas;Total
Try the following:
if( alldataMap.containsKey(splitter[0]) ) {
Double sum = alldataMap.remove(splitter[0]) + Double.parseDouble(splitter[8]);
allDataMap.put(splitter[0], sum );
} else {
alldataMap.put(splitter[0], Double.valueOf(splitter[8]) );
}
You can use putIfAbsent and compute since Java 8:
Map<String, Integer> myMap = new HashMap<>();
//...
String fruitName = /*whatever*/;
int qty = /*whatever*/;
myMap.putIfAbsent(fruitName, 0);
myMap.compute(fruitName, (k, oldQty) -> oldQty + qty);
You can use Map#containsKey() to check for an existing mapping, then if there is one use Map#get() to retrieve the value and add the new one, and finally Map#put() to store the sum:
if(map.containsKey(key))
map.put(key, map.get(key)+value);
else
map.put(key, value);
See here for the documentation of those methods.
i would use the merge for a map :
alldataMap.merge(splitter[0], Double.valueOf(splitter[8]), (oldVal, newVal) -> oldVal + newVal);
From doc:
If the specified key is not already associated with a value or is associated with null, associates it with the given non-null value. Otherwise, replaces the associated value with the results of the given remapping function, or removes if the result is null. This method may be of use when combining multiple mapped values for a key. For example, to either create or append a String msg to a value mapping:
I won't suggest the way to it in a loop because that's already done, but I'd suggest a Streams solution, in a unique line :
Map<String, Double> alldataMap = new HashMap<>();
try {
alldataMap =
Files.lines(Paths.get("", filepath))
.map(str -> str.split(cvsSplitBy))
.filter(splitte -> !splitte[0].equals("Voertuig"))
.collect(Collectors.toMap(sp -> sp[0],
sp -> Double.parseDouble(sp[8].replaceAll(",", ".")),
(i1, i2) -> i1 + i2));
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(alldataMap); // {Apples=4212.0, Bananas=985.23}
The steps are the same :
Iterate over the lines
split on the cvsSplitBy
remove lines which starts with Voertuig (! use .equals() and not !=)
build the map following 3 rules :
the key is the first String
the value is second String parsed as Double
if merge is required : sum both
Edit, as nobody propose the use of .getOrDefault() I give it
while ((line = br.readLine()) != null) {
String[] splitter = line.split(cvsSplitBy);
if (!splitter[0].equals("Voertuig")) {
alldataMap.put(splitter[0],
alldataMap.getOrDefault(splitter[0], 0.0) +
Double.parseDouble(splitter[8].replaceAll(",", ".")));
}
}
If tke key already exists, it'll sum, it the key does not exists it'll sum the value with a 0

Why is website crawling taking forever?

public class Parser {
public static void main(String[] args) {
Parser p = new Parser();
p.matchString();
}
parserObject courseObject = new parserObject();
ArrayList<parserObject> courseObjects = new ArrayList<parserObject>();
ArrayList<String> courseNames = new ArrayList<String>();
String theWebPage = " ";
{
try {
URL theUrl = new URL("http://ocw.mit.edu/courses/");
BufferedReader reader =
new BufferedReader(new InputStreamReader(theUrl.openStream()));
String str = null;
while((str = reader.readLine()) != null) {
theWebPage = theWebPage + " " + str;
}
reader.close();
} catch (MalformedURLException e) {
// do nothing
} catch (IOException e) {
// do nothing
}
}
public void matchString() {
// this is my regex that I am using to compare strings on input page
String matchRegex = "#\\w+(-\\w+)+";
Pattern p = Pattern.compile(matchRegex);
Matcher m = p.matcher(theWebPage);
int i = 0;
while (!m.hitEnd()) {
try {
System.out.println(m.group());
courseNames.add(i, m.group());
i++;
} catch (IllegalStateException e) {
// do nothing
}
}
}
}
What I am trying to achieve with the above code is to get the list of departments on the MIT OpencourseWare website. I am using a regular expression that matches the pattern of the department names as in the page source. And I am using a Pattern object and a Matcher object and trying to find() and print these department names that match the regular expression. But the code is taking forever to run and I don't think reading in a webpage using bufferedReader takes that long. So I think I am either doing something horribly wrong or parsing websites takes a ridiculously long time. so I would appreciate any input on how to improve performance or correct a mistake in my code if any. I apologize for the badly written code.
The problem is with the code
while ((str = reader.readLine()) != null)
theWebPage = theWebPage + " " +str;
The variable theWebPage is a String, which is immutable. For each line read, this code creates a new String with a copy of everything that's been read so far, with a space and the just-read line appended. This is an extraordinary amount of unnecessary copying, which is why the program is running so slow.
I downloaded the web page in question. It has 55,000 lines and is about 3.25MB in size. Not too big. But because of the copying in the loop, the first line ends up being copied about 1.5 billion times (1/2 of 55,000 squared). The program is spending all its time copying and garbage collecting. I ran this on my laptop (2.66GHz Core2Duo, 1GB heap) and it took 15 minutes to run when reading from a local file (no network latency or web crawling countermeasures).
To fix this, make theWebPage into a StringBuilder instead, and change the line in the loop to be
theWebPage.append(" ").append(str);
You can convert theWebPage to a String using toString() after the loop if you wish. When I ran the modified version, it took a fraction of a second.
BTW your code is using a bare code block within { } inside a class. This is an instance initializer (as opposed to a static initializer). It gets run at object construction time. This is legal, but it's quite unusual. Notice that it misled other commenters. I'd suggest converting this code block into a named method.
Is this your whole program? Where is the declaration of parserObject?
Also, shouldn't all of this code be in your main() prior to calling matchString()?
parserObject courseObject = new parserObject();
ArrayList<parserObject> courseObjects = new ArrayList<parserObject>();
ArrayList<String> courseNames = new ArrayList<String>();
String theWebPage=" ";
{
try {
URL theUrl = new URL("http://ocw.mit.edu/courses/");
BufferedReader reader = new BufferedReader(new InputStreamReader(theUrl.openStream()));
String str = null;
while((str = reader.readLine())!=null)
{
theWebPage = theWebPage+" "+str;
}
reader.close();
} catch (MalformedURLException e) {
} catch (IOException e) {
}
}
You are also catching exceptions and not displaying any error messages. You should always display an error message and do something when you encounter an exception. For example, if you can't download the page, there is no reason to try to parse a empty string.
From you comment I learned about static blocks in classes (thank you, didn't know about them). However, from what I've read you need to put the keyword static before the start of the block {. Also, it might just be better to put the code into your main, that way you can exit if you get a MalformedURLException or IOException.
You can, of course, solve this assignment with the limited JDK 1.0 API, and run into the issue that Stuart Marks helped you solve in his excellent answer.
Or, you just use a popular de-facto standard library, like for instance, Apache Commons IO, and read your website into a String using a no-brainer like this:
// using this...
import org.apache.commons.io.IOUtils;
// run this...
try (InputStream is = new URL("http://ocw.mit.edu/courses/").openStream()) {
theWebPage = IOUtils.toString(is);
}

Categories

Resources