url harvester concurrency issue, ConcurrentModificationException

url harvester concurrency issue, ConcurrentModificationException - java

Hi I'm trying to do a recursive .pdf url harvest and I'm getting a ConcurrentModificationException.. I don't understand how this is happening, and I don't know much about concurrency; I would greatly appreciate some insight towards how this is occurring and how it can be fixed.
public class urlHarvester {
private URL rootURL;
private String fileExt;
private int depth;
private HashSet<String> targets;
private HashMap<Integer, LinkedList<String>> toVisit;
public urlHarvester(URL rootURL, String fileExt, int depth) {
this.rootURL = rootURL;
this.fileExt = fileExt;
this.depth = depth;
targets = new HashSet<String>();
toVisit = new HashMap<Integer, LinkedList<String>>();
for (int i = 1; i < depth + 1; i++) {
toVisit.put(i, new LinkedList<String>());
}
doHarvest();
}
private void doHarvest() {
try {
harvest(rootURL, depth);
while (depth > 0) {
for (String s : toVisit.get(depth)) {
toVisit.get(depth).remove(s);
harvest(new URL(s),depth-1);
}
depth--;
}
} catch (Exception e) {
System.err.println(e);
e.printStackTrace();
}
for (String s : targets) {
System.out.println(s);
}
}
private void harvest(URL url, int depth) {
try {
URLConnection urlConnection = url.openConnection();
InputStream inputStream = urlConnection.getInputStream();
Scanner scanner = new Scanner(new BufferedInputStream(inputStream));
java.lang.String source = "";
while (scanner.hasNext()) {
source = source + scanner.next();
}
inputStream.close();
scanner.close();
Matcher matcher = Pattern.compile("ahref=\"(.+?)\"").matcher(source);
while(matcher.find()) {
java.lang.String matched = matcher.group(1);
if (!matched.startsWith("http")) {
if (matched.startsWith("/") && url.toString().endsWith("/")) {
matched = url.toString() + matched.substring(1);
} else if ((matched.startsWith("/") && !url.toString().endsWith("/"))
|| (!matched.startsWith("/") && url.toString().endsWith("/"))) {
matched = url.toString() + matched;
} else if (!matched.startsWith("/") && !url.toString().endsWith("/")) {
matched = url.toString() + "/" + matched;
}
}
if (matched.endsWith(".pdf") && !targets.contains(matched)) {
targets.add(matched);System.out.println("ADDED");
}
if (!toVisit.get(depth).contains(matched)) {
toVisit.get(depth).add(matched);
}
}
} catch (Exception e) {
System.err.println(e);
}
}
class with main calls:
urlHarvester harvester = new urlHarvester(new URL("http://anyasdf.com"), ".pdf", 5);

The error probably has nothing to do with concurrency, but is caused by this loop:
for (String s : toVisit.get(depth)) {
toVisit.get(depth).remove(s);
harvest(new URL(s),depth-1);
}
To remove items from a collection while iterating, you need to use the remove method from an iterator:
List<String> list = toVisit.get(depth); //I assume list is not null
for (Iterator<String> it = list.iterator(); it.hasNext();) {
String s = it.next();
it.remove();
harvest(new URL(s),depth-1);
}

A ConcurrentModificationException is thrown when attempting to remove an object directly form a collection while iterating over it.
This is happening when you are attempting to remove an entry from the toVisit HashMap:
for (String s : toVisit.get(depth)) {
toVisit.get(depth).remove(s); <----
...
You can use an iterator instead of attempting to remove directly from your collection:
Iterator<String> iterator = toVisit.get(depth).iterator();
while (iterator.hasNext()) {
String s = iterator.next();
iterator.remove();
harvest(new URL(s),depth-1);
}

Related

When reading from file, it's add an item twice to the combobox

I'm reading from a file, and then add a specific String to a jcombobox, but I only need it once. I tried something like this: (even with contains), but it still has the element twice
public void beolvas() {
gyarto_cmb.removeAllItems();
try {
BufferedReader be = null;
be = new BufferedReader(new FileReader("F:\\telefonok.txt"));
String sor = null;
while ((sor = be.readLine()) != null) {
StringTokenizer felbont = new StringTokenizer(sor, ";");
String gyarto_meg = felbont.nextToken();
String tel_tip = felbont.nextToken();
double kijel_meret = (double) Double.parseDouble(felbont.nextToken());
String kijel_felbontas = felbont.nextToken();
int tarhely_merete = (int) Integer.parseInt(felbont.nextToken());
int akkumulator_kap = (int) Integer.parseInt(felbont.nextToken());
int telefon_ara = (int) Integer.parseInt(felbont.nextToken());
Gyarto gyart_1 = new Gyarto();
gyart_1.megnevezes = gyarto_meg;
Tipus tipus1 = new Tipus(tel_tip, kijel_meret, kijel_felbontas, tarhely_merete, akkumulator_kap,
telefon_ara);
gyart_1.tipuska.add(tipus1);
telefonok.add(gyart_1);
if (telefonok.indexOf(gyarto_meg) == -1) {
gyarto_cmb.addItem(gyarto_meg);
}
}
} catch (Exception ex) {
System.out.println("Error:" + ex.toString());
}
}

I don't see the declaration for gyarto_cmb, so I don't know what methods are available
Set<String> added = new HashSet<String>();
while((sor=be.readLine())!=null){
...
if(!added.contains(gyarto_meg)) {
added.add(gyarto_meg);
gyarto_cmb.addItem(gyarto_meg);
}
}

The public static HashMap instance returns NULL when accessed from another class , even though I populate the values in constructor

I am trying to create a dictionary to learn Java which will read the words predefined in wordlist.txt as
subterfuge:something intended to misrepresent the true nature of an activity
stymie:thwarting and distressing situation;
But when I try to access the map instance of ReadToHashmap class which is declared as public static, it allows me to access However it always returns null.
How can I access the map instance with all the the HashMap updated as per the wordlist.txt?
public class ReadToHashmap {
public static Map<String, String> map = new HashMap<String, String>();
public ReadToHashmap() {
// TODO Auto-generated constructor stub
}
public static Map getHasMap()
{
return map;
}
public static void main(String[] args) throws Exception {
try{
BufferedReader in = new BufferedReader(new FileReader("C:\\Users\\Maxs\\workspace\\Dictionary\\src\\wordlist.txt"));
String line = "";
while ((line = in.readLine()) != null) {
String parts[] = line.split(":");
map.put(parts[0], parts[1]);
}
in.close();
}
catch(Exception e)
{
System.out.println("Erro " +e.getMessage());
}
Iterator it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
System.out.println(pair.getKey() + " = " + pair.getValue());
it.remove(); // avoids a ConcurrentModificationException
}
findMeaning word = new findMeaning();
String inputWord;
String outputWord;
System.out.println("Enter rge word to be searched " );
inputWord = word.getTheWord();
outputWord =word.getFromDictionary(inputWord);
System.out.println("Thge meaning is " +outputWord);
}
}
Another class
public class findMeaning {
public String inputWord;
public String description;
findMeaning()
{
inputWord = "";
}
public String getTheWord()
{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
try{
inputWord = br.readLine();
}
catch(IOException e)
{
System.out.println("Error while reading " +e.getMessage() );
return "";
}
return this.inputWord;
}
public String getFromDictionary(String key){
System.out.println("The output " +ReadToHashmap.getHasMap().toString());
if(inputWord.isEmpty())
{
return "No lattest input from user ";
}
description = (String) ReadToHashmap.getHasMap().get(inputWord);
if(description == null)
{
return "Word Doesnot exsist";
}
return description;
}
}

Your main method populates the map, then iterates through all its entries and removes each of them. So obviously, after this loop, the map is empty.

You appear to be removing every entry within the HashMap before you use it in your other class.
Iterator it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
System.out.println(pair.getKey() + " = " + pair.getValue());
it.remove(); // ***** here *****
}
Don't do this.
Other issues: using a static field in this way is not a good idea, and can lead to hard to debug errors. Much better if you create well encapsulated classes that interact with each other in a clean object-oriented fashion.

How to find size of ArrayList<String> in my map?

I want to find the size of each value from the key-value pair in Map<Integer, ArrayList<String>>. Simply writing list.size() does not work.
Here's my code:
public void getF() throws Exception {
BufferedReader br2 =
new BufferedReader(
new FileReader("/home/abc/NetBeansProjects/network1.txt"));
System.out.println("hello" +r.usr);
while ((s= br2.readLine()) != null) {
String F[]= s.split(":");
for (String uid : F) {
if (uid == F[0]) {
user.add(uid);
} else {
li = followee.get(Integer.valueOf(F[0]));
if (li == null) {
followee.put(Integer.valueOf(F[0]), li= new ArrayList<String>());
}
li.add(uid);
}
System.out.println(followee);
int g = li.size();
System.out.println("g:" +g);
[...]
}
}
}
Why am I not getting correct size on last line?

Try to follow the data structures, by keeping the variable as close to their usage.
(I know in other languages the convention is to declare them at the top.)
Here li should be kept at the begin of a while-step. And its more natural to handle f[0] outside the loop, instead of for+if. I think the latter put you on the wrong foot.
Set<String> user = new HashSet<>();
Map<Integer, List<String>> followee = new HashMap<>();
String s;
while ((s = br2.readLine()) != null) {
// s has the format "key:value value value"
String keyAndValues[] = s.split(":", 2);
if (keyAndValues.length != 2) {
continue;
}
Integer key = Integer.valueOf(keyAndValues[0]);
String values = keyAndValues[1];
user.add(keyAndValues[0]);
List<String> li = followee.get(key);
if (li == null) {
li = new ArrayList<>();
followee.put(key, li);
}
Collections.addAll(values.split(" +");
System.out.println(followee);
int g = li.size();
System.out.println("g:" + g);
//[...]
}

Creating hashmap from json data

I am working on a very simple application for a website, just a basic desktop application.
So I've figured out how to grab all of the JSON Data I need, and if possible, I am trying to avoid the use of external libraries to parse the JSON.
Here is what I am doing right now:
package me.thegreengamerhd.TTVPortable;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import me.thegreengamerhd.TTVPortable.Utils.Messenger;
public class Channel
{
URL url;
String data;
String[] dataArray;
String name;
boolean online;
int viewers;
int followers;
public Channel(String name)
{
this.name = name;
}
public void update() throws IOException
{
// grab all of the JSON data from selected channel, if channel exists
try
{
url = new URL("https://api.twitch.tv/kraken/channels/" + name);
URLConnection connection = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
data = new String(in.readLine());
in.close();
// clean up data a little, into an array
dataArray = data.split(",");
}
// channel does not exist, throw exception and close client
catch (Exception e)
{
Messenger.sendErrorMessage("The channel you have specified is invalid or corrupted.", true);
e.printStackTrace();
return;
}
StringBuilder sb = new StringBuilder();
for (int i = 0; i < dataArray.length; i++)
{
sb.append(dataArray[i] + "\n");
}
System.out.println(sb.toString());
}
}
So here is what is printed when I enter an example channel (which grabs data correctly)
{"updated_at":"2013-05-24T11:00:26Z"
"created_at":"2011-06-28T07:50:25Z"
"status":"HD [XBOX] Call of Duty Black Ops 2 OPEN LOBBY"
"url":"http://www.twitch.tv/zetaspartan21"
"_id":23170407
"game":"Call of Duty: Black Ops II"
"logo":"http://static-cdn.jtvnw.net/jtv_user_pictures/zetaspartan21-profile_image-121d2cb317e8a91c-300x300.jpeg"
"banner":"http://static-cdn.jtvnw.net/jtv_user_pictures/zetaspartan21-channel_header_image-7c894f59f77ae0c1-640x125.png"
"_links":{"subscriptions":"https://api.twitch.tv/kraken/channels/zetaspartan21/subscriptions"
"editors":"https://api.twitch.tv/kraken/channels/zetaspartan21/editors"
"commercial":"https://api.twitch.tv/kraken/channels/zetaspartan21/commercial"
"teams":"https://api.twitch.tv/kraken/channels/zetaspartan21/teams"
"features":"https://api.twitch.tv/kraken/channels/zetaspartan21/features"
"videos":"https://api.twitch.tv/kraken/channels/zetaspartan21/videos"
"self":"https://api.twitch.tv/kraken/channels/zetaspartan21"
"follows":"https://api.twitch.tv/kraken/channels/zetaspartan21/follows"
"chat":"https://api.twitch.tv/kraken/chat/zetaspartan21"
"stream_key":"https://api.twitch.tv/kraken/channels/zetaspartan21/stream_key"}
"name":"zetaspartan21"
"delay":0
"display_name":"ZetaSpartan21"
"video_banner":"http://static-cdn.jtvnw.net/jtv_user_pictures/zetaspartan21-channel_offline_image-b20322d22543539a-640x360.jpeg"
"background":"http://static-cdn.jtvnw.net/jtv_user_pictures/zetaspartan21-channel_background_image-587bde3d4f90b293.jpeg"
"mature":true}
Initializing User Interface - JOIN
All of this is correct. Now what I want to do, is to be able to grab, for example the 'mature' tag, and it's value. So when I grab it, it would be like as simple as:
// pseudo code
if(mature /*this is a boolean */ == true){ // do stuff}
So if you don't understand, I need to split away the quotes and semicolon between the values to retrieve a Key, Value.

It's doable with the following code :
public static Map<String, Object> parseJSON (String data) throws ParseException {
if (data==null)
return null;
final Map<String, Object> ret = new HashMap<String, Object>();
data = data.trim();
if (!data.startsWith("{") || !data.endsWith("}"))
throw new ParseException("Missing '{' or '}'.", 0);
data = data.substring(1, data.length()-1);
final String [] lines = data.split("[\r\n]");
for (int i=0; i<lines.length; i++) {
String line = lines[i];
if (line.isEmpty())
continue;
line = line.trim();
if (line.indexOf(":")<0)
throw new ParseException("Missing ':'.", 0);
String key = line.substring(0, line.indexOf(":"));
String value = line.substring(line.indexOf(":")+1);
if (key.startsWith("\"") && key.endsWith("\"") && key.length()>2)
key = key.substring(1, key.length()-1);
if (value.startsWith("{"))
while (i+1<line.length() && !value.endsWith("}"))
value = value + "\n" + lines[++i].trim();
if (value.startsWith("\"") && value.endsWith("\"") && value.length()>2)
value = value.substring(1, value.length()-1);
Object mapValue = value;
if (value.startsWith("{") && value.endsWith("}"))
mapValue = parseJSON(value);
else if (value.equalsIgnoreCase("true") || value.equalsIgnoreCase("false"))
mapValue = new Boolean (value);
else {
try {
mapValue = Integer.parseInt(value);
} catch (NumberFormatException nfe) {
try {
mapValue = Long.parseLong(value);
} catch (NumberFormatException nfe2) {}
}
}
ret.put(key, mapValue);
}
return ret;
}
You can call it like that :
try {
Map<String, Object> ret = parseJSON(sb.toString());
if(((Boolean)ret.get("mature")) == true){
System.out.println("mature is true !");
}
} catch (ParseException e) {
}
But, really, you shouldn't do this, and use an already existing JSON parser, because this code will break on any complex or invalid JSON data (like a ":" in the key), and if you want to build a true JSON parser by hand, it will take you a lot more code and debugging !

This is a parser of an easy json string:
public static HashMap<String, String> parseEasyJson(String json) {
final String regex = "([^{}: ]*?):(\\{.*?\\}|\".*?\"|[^:{}\" ]*)";
json = json.replaceAll("\n", "");
Matcher m = Pattern.compile(regex).matcher(json);
HashMap<String, String> map = new HashMap<>();
while (m.find())
map.put(m.group(1), m.group(2));
return map;
}
Live Demo

How to find latest jar version of jars by java program?

In my project has 40 to 50 jar files available, It takes lot of time to find out latest version of each jar at every time. Can u any one help me to write a java program for this?

You may want to just use maven : http://maven.apache.org/
Or an other dependencies manager, like Ivy.

At the time of ant-build please call this method
public void ExpungeDuplicates(String filePath) {
Map<String,Integer> replaceJarsMap = null;
File folder = null;
File[] listOfFiles = null;
List<String> jarList = new ArrayList<String>();
String files = "";
File deleteFile = null;
Iterator<String> mapItr = null;
//String extension ="jar";
try {
folder = new File(filePath);
listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
files = listOfFiles[i].getName();
jarList.add(files);
}
}
if (jarList.size() > 0) {
replaceJarsMap = PatternClassifier.findDuplicatesOrLowerVersion(jarList);
System.err.println("Duplicate / Lower Version - Total Count : "+replaceJarsMap.size());
mapItr = replaceJarsMap.keySet().iterator();
while (mapItr.hasNext()) {
String key = mapItr.next();
int repeat = replaceJarsMap.get(key);
System.out.println( key +" : "+repeat);
for (int i = 0; i <repeat; i++) {
deleteFile = new File(filePath + System.getProperty ("file.separator")+key);
try{
if (deleteFile != null && deleteFile.exists()){
if(deleteFile.delete()){
System.err.println(key +" deleted");
}
}
}catch (Exception e) {
}
}
}
}
} catch (Exception e) {
// TODO: handle exception
}
}
You only need to give the path of your Lib to this function.This method will find all the duplicate or lower version of of file.
And the crucial function is given below...Which finds out the duplicates from the list of files you provided.
public static Map<String,Integer> findDuplicatesOrLowerVersion(List<String> fileNameList) {
List<String> oldJarList = new ArrayList<String>();
String cmprTemp[] = null;
boolean match = false;
String regex = "",regexFileType = "",verInfo1 = "",verInfo2 = "",compareName = "",tempCompareName = "",tempJarName ="";
Map<String,Integer> duplicateEntryMap = new HashMap<String, Integer>();
int count = 0;
Collections.sort(fileNameList, Collections.reverseOrder());
try{
int size = fileNameList.size();
for(int i = 0;i<size;i++){
cmprTemp = fileNameList.get(i).split("[0-9\\._]*");
for(String s : cmprTemp){
compareName += s;
}
regex = "^"+compareName+"[ajr0-9_\\-\\.]*";
regexFileType = "[0-9a-zA-Z\\-\\._]*\\.jar$";
if( fileNameList.get(i).matches(regexFileType) && !oldJarList.contains(fileNameList.get(i))){
for(int j = i+1 ;j<size;j++){
cmprTemp = fileNameList.get(j).split("[0-9\\._]*");
for(String s : cmprTemp){
tempCompareName += s;
}
match = (fileNameList.get(j).matches(regexFileType) && tempCompareName.matches(regex));
if(match){
cmprTemp = fileNameList.get(i).split("[a-zA-Z\\-\\._]*");
for(String s : cmprTemp){
verInfo1 += s;
}
verInfo1 += "000";
cmprTemp = fileNameList.get(j).split("[a-zA-Z\\-\\._]*");
for(String s : cmprTemp){
verInfo2 += s;
}
verInfo2 += "000";
int length = 0;
if(verInfo1.length()>verInfo2.length()){
length = verInfo2.length();
}else{
length = verInfo1.length();
}
if(Long.parseLong(verInfo1.substring(0,length))>=Long.parseLong(verInfo2.substring(0,length))){
count = 0;
if(!oldJarList.contains(fileNameList.get(j))){
oldJarList.add(fileNameList.get(j));
duplicateEntryMap.put(fileNameList.get(j),++count);
}else{
count = duplicateEntryMap.get(fileNameList.get(j));
duplicateEntryMap.put(fileNameList.get(j),++count);
}
}else{
tempJarName = fileNameList.get(i);
}
match = false;verInfo1 = "";verInfo2 = "";
}
tempCompareName = "";
}
if(tempJarName!=null && !tempJarName.equals("")){
count = 0;
if(!oldJarList.contains(fileNameList.get(i))){
oldJarList.add(fileNameList.get(i));
duplicateEntryMap.put(fileNameList.get(i),++count);
}else{
count = dupl icateEntryMap.get(fileNameList.get(i));
duplicateEntryMap.put(fileNameList.get(i),++count);
}
tempJarName = "";
}
}
compareName = "";
}
}catch (Exception e) {
e.printStackTrace();
}
return duplicateEntryMap;
}
What findDuplicatesOrLowerVersion(List fileNameList) function task - Simply it found the duplicates and passting a map which contains the name of the file and number of time the lower version repeats.
Try this. The remaining file exist in the folder should be latest or files with out duplicates.Am using this for finding the oldest files.on the basis of that it will find the old and delete it.
This am only checking the name..Futher improvement you can made.
Where PatternClassifier is a class which contains the second method given here.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

url harvester concurrency issue, ConcurrentModificationException - java

Related

When reading from file, it's add an item twice to the combobox

The public static HashMap instance returns NULL when accessed from another class , even though I populate the values in constructor

How to find size of ArrayList<String> in my map?

Creating hashmap from json data

How to find latest jar version of jars by java program?

Categories

Resources