Crawling Web and Stored Links - java

I want to create a thread in order to crawl all links of a website and store it in LinkedHashSet, but when I print the size of this LinkedHashSet, it prints nothing. I've started learning crawling! I've referenced The Art of Java. Here is my code:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.LinkedHashSet;
import java.util.logging.Level;
import java.util.logging.Logger;
public class TestThread {
public void crawl(URL url) {
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(url.openConnection().getInputStream()));
String line = reader.readLine();
LinkedHashSet toCrawlList = new LinkedHashSet();
while (line != null) {
toCrawlList.add(line);
System.out.println(toCrawlList.size());
}
} catch (IOException ex) {
Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex);
}
}
public static void main(String[] args) {
final TestThread test1 = new TestThread();
Thread thread = new Thread(new Runnable() {
public void run(){
try {
test1.crawl(new URL("http://stackoverflow.com/"));
} catch (MalformedURLException ex) {
Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex);
}
}
});
}
}

You should fill your list like this:
while ((line = reader.readLine()) != null) {
toCrawlList.add(line);
}
System.out.println(toCrawlList.size());
If that doesn't work, try to set a break point in your code and find out if your reader even contains anything.

Related

How to print contents of a text to console as objects

Let's say I have theese words in a text file
Dictionary.txt
artificial
intelligence
abbreviation
hybrid
hysteresis
illuminance
identity
inaccuracy
impedance
impenetrable
imperfection
impossible
independent
How can I make each word a different object and print them on the console?
You can simple use Scanner.nextLine(); function.
Here is the following code which can help
also import the libraries
import java.util.Scanner;
import java.io.File;
import java.util.Arrays;
Use following code:-
String []words = new String[1];
try{
File file = new File("/path/to/Dictionary.txt");
Scanner scan = new Scanner(file);
int i=0;
while(scan.hasNextLine()){
words[i]=scan.nextLine();
i++;
words = Arrays.copyOf(words,words.legnth+1); // Increasing legnth of array with 1
}
}
catch(Exception e){
System.out.println(e);
}
You must go and research on Scanner class
This is a very simple solution using Files:
package org.kodejava.io;
import java.net.URI;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
import java.util.Objects;
public class ReadFileAsListDemo {
public static void main(String[] args) {
ReadFileAsListDemo demo = new ReadFileAsListDemo();
demo.readFileAsList();
}
private void readFileAsList() {
String fileName = "Dictionary.txt";
try {
URI uri = Objects.requireNonNull(this.getClass().getResource(fileName)).toURI();
List<String> lines = Files.readAllLines(Paths.get(uri),
Charset.defaultCharset());
for (String line : lines) {
System.out.println(line);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Source: https://kodejava.org/how-do-i-read-all-lines-from-a-file/
This is another neat solution using buffered reader:
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.logging.Level;
import java.util.logging.Logger;
/**
 * BufferedReader and Scanner can be used to read
line by line from any File or
 * console in Java.
 * This Java program
demonstrate line by line reading using BufferedReader in Java
 *
 * #author Javin Paul
 */
public class BufferedReaderExample {
public static void main(String args[]) {
//reading file line by line in Java using BufferedReader
FileInputStream fis = null;
BufferedReader reader = null;
try {
fis = new FileInputStream("C:/sample.txt");
reader = new BufferedReader(new InputStreamReader(fis));
System.out.println("Reading
File line by line using BufferedReader");
String line = reader.readLine();
while(line != null){
System.out.println(line);
line = reader.readLine();
}
} catch (FileNotFoundException ex) {
Logger.getLogger(BufferedReaderExample.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(BufferedReaderExample.class.getName()).log(Level.SEVERE, null, ex);
} finally {
try {
reader.close();
fis.close();
} catch (IOException ex) {
Logger.getLogger(BufferedReaderExample.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
Source: https://javarevisited.blogspot.com/2012/07/read-file-line-by-line-java-example-scanner.html#axzz7lrQcYlyy
These are all good answers. The OP didn't state what release of Java they require, but in modern Java I'd just use:
import java.nio.file.*;
public class x {
public static void main(String[] args) throws java.io.IOException {
Files.lines(Path.of("/path/to/Dictionary.txt")).forEach(System.out::println);
}
}

reader.readLine() of BufferedReader causes this: Exception in thread "main" java.io.IOException: Stream closed

Here is a code snippet from my main Java function:
try (MultiFileReader multiReader = new MultiFileReader(inputs)) {
PriorityQueue<WordEntry> words = new PriorityQueue<>();
for (BufferedReader reader : multiReader.getReaders()) {
String word = reader.readLine();
if (word != null) {
words.add(new WordEntry(word, reader));
}
}
}
Here is how I get my BufferedReader readers from another Java file:
public List<BufferedReader> getReaders() {
return Collections.unmodifiableList(readers);
}
But for some reason, when I compile my code here is what I get:
The error happens exactly at the line where I wrote String word = reader.readLine(); and what's weird is that reader.readLine() is not null, in fact multiReader.getReaders() returns a list of 100 objects (they are files read from a directory). I would like some help solving that issue.
I posted where the issue is, now let me provide a broader view of my code. To run it, it suffices to compile it under the src/ directory doing javac *.java and java MergeShards shards/ sorted.txt provided that shards/ is present under src/ and contains .txt files in my scenario.
This is MergeShards.java where I have my main function:
import java.io.BufferedReader;
import java.io.Writer;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import java.util.Objects;
import java.util.PriorityQueue;
import java.util.stream.Collectors;
public final class MergeShards {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("Usage: MergeShards [input folder] [output file]");
return;
}
List<Path> inputs = Files.walk(Path.of(args[0]), 1).skip(1).collect(Collectors.toList());
Path outputPath = Path.of(args[1]);
try (MultiFileReader multiReader = new MultiFileReader(inputs)) {
PriorityQueue<WordEntry> words = new PriorityQueue<>();
for (BufferedReader reader : multiReader.getReaders()) {
String word = reader.readLine();
if (word != null) {
words.add(new WordEntry(word, reader));
}
}
try (Writer writer = Files.newBufferedWriter(outputPath)) {
while (!words.isEmpty()) {
WordEntry entry = words.poll();
writer.write(entry.word);
writer.write(System.lineSeparator());
String word = entry.reader.readLine();
if (word != null) {
words.add(new WordEntry(word, entry.reader));
}
}
}
}
}
private static final class WordEntry implements Comparable<WordEntry> {
private final String word;
private final BufferedReader reader;
private WordEntry(String word, BufferedReader reader) {
this.word = Objects.requireNonNull(word);
this.reader = Objects.requireNonNull(reader);
}
#Override
public int compareTo(WordEntry other) {
return word.compareTo(other.word);
}
}
}
This is my MultiFileReader.java file:
import java.io.BufferedReader;
import java.io.Closeable;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public final class MultiFileReader implements Closeable {
private final List<BufferedReader> readers;
public MultiFileReader(List<Path> paths) {
readers = new ArrayList<>(paths.size());
try {
for (Path path : paths) {
readers.add(Files.newBufferedReader(path));
}
} catch (IOException e) {
e.printStackTrace();
} finally {
close();
}
}
public List<BufferedReader> getReaders() {
return Collections.unmodifiableList(readers);
}
#Override
public void close() {
for (BufferedReader reader : readers) {
try {
reader.close();
} catch (Exception ignored) {
}
}
}
}
The finally block in your constructor closes all of your readers. Remove that.
public MultiFileReader(List<Path> paths) {
readers = new ArrayList<>(paths.size());
try {
for (Path path : paths) {
readers.add(Files.newBufferedReader(path));
}
} catch (IOException e) {
e.printStackTrace();
} /* Not this. finally {
close();
} */
}

Why onGuildMemberRoleAdd doesnt work? DiscordBot in Java

The "onGuildMemberRoleAdd" listener just doesn't work ... As if the listener was ignored...
And I don't know why :( can someone help me pls!
I also implemented the listener in the main class.
Listener:
package listener.rollen;
import net.dv8tion.jda.api.events.guild.member.GuildMemberRoleAddEvent;
import net.dv8tion.jda.api.hooks.ListenerAdapter;
public class RollenAdd extends ListenerAdapter {
#Override
public void onGuildMemberRoleAdd(GuildMemberRoleAddEvent event) {
System.out.println("test");
}
}
Main
package main;
import listener.*;
import listener.punktesystem.SprachchatConnect;
import listener.punktesystem.SprachchatDisconnect;
import listener.rollen.*;
import net.dv8tion.jda.api.OnlineStatus;
import net.dv8tion.jda.api.entities.Activity;
import net.dv8tion.jda.api.sharding.DefaultShardManagerBuilder;
import net.dv8tion.jda.api.sharding.ShardManager;
import javax.security.auth.login.LoginException;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class Start {
public static Start INSTANCE;
public ShardManager shardMan;
public static void main(String[] args) {
try {
new Start();
} catch (LoginException e) {
e.printStackTrace();
}
}
public Start() throws LoginException, IllegalArgumentException {
INSTANCE = this;
DefaultShardManagerBuilder builder = DefaultShardManagerBuilder.createDefault("Token");
builder.setActivity(Activity.watching("ZZZZs Zaubertrick"));
builder.setStatus(OnlineStatus.ONLINE);
listeners(builder);
shardMan = builder.build();
shutdown();
}
public void shutdown(){
new Thread(() -> {
String line = "";
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
try{
while ((line = reader.readLine()) != null){
if(line.equalsIgnoreCase("exit")){
if(shardMan != null){
shardMan.setStatus(OnlineStatus.OFFLINE);
shardMan.shutdown();
System.out.println("Bot ist offline!");
}
}
}
}catch (IOException e){
e.printStackTrace();
}
}).start();
}
public void listeners(DefaultShardManagerBuilder builder){
builder.addEventListeners(new RollenAdd());
}
}
I think that's all that has to do with it...
At first I thought it was because the Privileged Gateway Intents were not activated in the DiscordDeveloperPortal, but after that it still didn't work

Converting serially reading multiple files to reading them in parallel? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Following code in Java reads multiple files one after another serially and it works well till here. (The files are JSON and at this step they are stored in a String/Buffer without parsing.)
for (int fileIndex = 0; fileIndex < numberOfFiles; fileIndex++) {
BufferedReader br = new BufferedReader(new FileReader("Files/file" + fileIndex + ".json"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String contentJSON = sb.toString();
} finally {
br.close();
}
}
How to read those files in parallel by using Threads ?
I could not match Multithreading to above code and every time got errors.
I've not tested this code directly (as I don't have a bunch of files to read), but the basic idea would be to do something like...
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Test {
public static void main(String[] args) {
new Test();
}
public Test() {
try {
int numberOfFiles = 10;
ExecutorService service = Executors.newFixedThreadPool(10);
List<ReadWorker> workers = new ArrayList<>(numberOfFiles);
for (int fileIndex = 0; fileIndex < numberOfFiles; fileIndex++) {
workers.add(new ReadWorker(fileIndex));
}
List<Future<String>> results = service.invokeAll(workers);
for (Future<String> result : results) {
try {
String value = result.get();
} catch (ExecutionException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
}
} catch (InterruptedException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
}
public class ReadWorker implements Callable<String> {
private int fileIndex;
public ReadWorker(int fileIndex) {
this.fileIndex = fileIndex;
}
#Override
public String call() throws Exception {
try (BufferedReader br = new BufferedReader(new FileReader("Files/file" + fileIndex + ".json"))) {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
return sb.toString();
}
}
}
}
This will basically execute a series of Callables and wait for them all to complete, at which time, you can then read the results (or errors)
See the Executors trail for more details
Tested and verified version...
So, I dumped a series of files into a the Files folder at the root of my working directory, modified the above example to list all the files in that directory and read them....
import java.io.BufferedReader;
import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Test {
public static void main(String[] args) {
new Test();
}
public Test() {
File files[] = new File("Files").listFiles(new FileFilter() {
#Override
public boolean accept(File pathname) {
return pathname.getName().toLowerCase().endsWith(".svg");
}
});
try {
int numberOfFiles = files.length;
ExecutorService service = Executors.newFixedThreadPool(20);
List<ReadWorker> workers = new ArrayList<>(numberOfFiles);
for (File file : files) {
workers.add(new ReadWorker(file));
}
System.out.println("Execute...");
List<Future<String>> results = service.invokeAll(workers);
System.out.println("Results...");
for (Future<String> result : results) {
try {
String value = result.get();
System.out.println(value);
} catch (ExecutionException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
}
service.shutdownNow();
} catch (InterruptedException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
}
public class ReadWorker implements Callable<String> {
private File file;
public ReadWorker(File file) {
this.file = file;
}
#Override
public String call() throws Exception {
System.out.println("Reading " + file);
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
return sb.toString();
}
}
}
}
And this works just fine and I have no issue.
java.io.FileNotFoundException: Files\file0.json is a localised issue you are going to have to solve. Does file0.json actually exist? Does it exist in the Files directory? Is the Files directory in the root of the working directory when the program is executed?
None of these issues can be solved by us, as we don't have access to your environment
Test #3
I then renamed all the files in my Files directory to file{x}.json using...
File files[] = new File("Files").listFiles(new FileFilter() {
#Override
public boolean accept(File pathname) {
return pathname.getName().toLowerCase().endsWith(".svg");
}
});
for (int index = 0; index < files.length; index++) {
File source = files[index];
File target = new File(source.getParent(), "file" + index + ".json");
source.renameTo(target);
}
And the modified the example slightly to include a File#exists report...
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Test {
public static void main(String[] args) {
new Test();
}
public Test() {
try {
int numberOfFiles = 10;
ExecutorService service = Executors.newFixedThreadPool(20);
List<ReadWorker> workers = new ArrayList<>(numberOfFiles);
for (int index = 0; index < numberOfFiles; index++) {
workers.add(new ReadWorker(index));
}
System.out.println("Execute...");
List<Future<String>> results = service.invokeAll(workers);
System.out.println("Results...");
for (Future<String> result : results) {
try {
String value = result.get();
System.out.println(value);
} catch (ExecutionException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
}
service.shutdownNow();
} catch (InterruptedException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
}
public class ReadWorker implements Callable<String> {
private int fileIndex;
public ReadWorker(int fileIndex) {
this.fileIndex = fileIndex;
}
#Override
public String call() throws Exception {
System.out.println("Reading " + fileIndex);
File file = new File("Files/file" + fileIndex + ".json");
System.out.println("File " + fileIndex + " exists = " + file.exists());
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
return sb.toString();
} finally {
System.out.println("All done here");
}
}
}
}
Which prints
Execute...
Reading 8
Reading 1
Reading 2
Reading 4
Reading 6
Reading 9
Reading 3
Reading 7
Reading 0
Reading 5
File 8 exists = true
File 1 exists = true
File 5 exists = true
File 4 exists = true
File 9 exists = true
File 2 exists = true
File 0 exists = true
File 3 exists = true
File 7 exists = true
File 6 exists = true
All done here
All done here
All done here
All done here
All done here
All done here
All done here
All done here
All done here
All done here
Results...
// I won't bore you with the results, as it's a lot of pointless text
which all worked without issues

I need to tockenize string of a java code to its blocks

import java.io.IOException;
import java.net.DatagramPacket;
import java.net.DatagramSocket;
import java.net.InetSocketAddress;
public class Client {
public static void main(String[] args) throws IOException {
DatagramSocket socket = new DatagramSocket();
socket.connect(new InetSocketAddress(5000));
byte[] message = "Oh Hai!".getBytes();
DatagramPacket packet = new DatagramPacket(message, message.length);
socket.send(packet);
}
}
I have this code as a string and I need to get its methods statements loops for separate arrays
Can any body suggest a solution
You can use the StreamTokenizer to analyze a stream (e.g. StringReader).
Here you are an example:
import java.io.IOException;
import java.io.StreamTokenizer;
import java.io.StringReader;
public class Test {
public static void main(String[] args){
StreamTokenizer tokenizer = new StreamTokenizer(new StringReader("public static void main(String[] args){"));
tokenizer.parseNumbers();
tokenizer.wordChars('_', '_');
tokenizer.eolIsSignificant(true);
tokenizer.ordinaryChars(0, ' ');
tokenizer.slashSlashComments(true);
tokenizer.slashStarComments(true);
int token;
try {
while( (token = tokenizer.nextToken()) != StreamTokenizer.TT_EOF) {
if(token == StreamTokenizer.TT_WORD) {
System.out.println(tokenizer.sval);
}
}
} catch (IOException e) {
e.printStackTrace();
// Please handle this exception
}
}
}
This generates the following output:
public
static
void
main
String
args
Please have a look at this for further details:
https://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html

Categories

Resources