I would like my web crawler to download all the browsed URL's locally. At the minute it will download every site it comes to but then overwrite the local file in each website visited. The crawler start at www.bbc.co.uk, downloads that file and then when it hits another it overwrites that file with the next URL. How can I make it download them in to single files so I have a collection at the end? I have this code below but I dont know where to go from here. Any advice would be great. The URL inside the brackets (URL) is a string which is used to manipulate all the browsed webpages.
URL url = new URL(URL);
BufferedWriter writer;
try (BufferedReader reader = new BufferedReader
(new InputStreamReader(url.openStream()))) {
writer = new BufferedWriter
(new FileWriter("c:/temp/data.html", true));
String line;
while ((line = reader.readLine()) != null) {
//System.out.println(line);
writer.write(line);
writer.newLine();
}
}
writer.close();
You need to give to your files a unique name.
You can save them in different folders (one root directory for each web site).
Or you can give them a unique name (using a counter for example).
Related
I'm new to coding and have decided to start my learning on Java. I've got NetBeans and have started to create a very basic web application. I'd like to be able to display values from a .txt file onto the webpage, and I've got this code to do so.
<%
BufferedReader in = new BufferedReader(new FileReader("Cats.txt"));
String line;
while((line = in.readLine()) != null)
{
out.println(line);
}
in.close();
%>
My text file is in the same folder as my src folder (As I've seen you need to put the file)
However, whenever I navigate to the web page I get a FileNotFound error. I've tried placing the files path in the FileReader but that gives an error due to the backslashes.
If anyone could help I'd be greatly appreciated
Currently it's looking for the file in the src directory of your application you should just be able move the file there and it should read it. If you would like to direct to a specific path you need to tell the IDE to treat the '\' as a normal slash to do this you need to close it off by using two '\'s instead of one eg:
<%
BufferedReader in = new BufferedReader(new
FileReader("C:\\MYPATH\\MYPATH2\\Cats.txt"));
String line;
while((line = in.readLine()) != null)
{
out.println(line);
}
in.close()
%>
I have an Android project that displays data from a JSON file. The file is read from the assets directory, following the approach below:
src/main/assets/my_file.json
InputStream is = getResources().getAssets().open(filename);
BufferedReader br = new BufferedReader(new InputStreamReader(is));
// use StringBuilder to get file contents as String
I also have local unit tests which require the same JSON file. In order to avoid the slowness of Instrumentation tests, I am currently loading a duplicate of the file as below:
src/main/resources/my_copy.json
File testData = new File(getClass().getResource(filename).getPath());
InputStream is = new FileInputStream(testData);
// read File as before
Is there an approach that would allow the JSON file to be stored in one location, whilst running the unit tests on a local JVM?
If you're using Robolectric, see this answer.
Note that the "assets" directory is an Android concept, it's different from Java resources. That said, you could also move your JSON file from assets to resources, and use it from both Android and JVM form there like you would in any Java application.
Files in the resource directory can be accessed within Android applications by using ClassLoader#getResourceAsStream(). This returns an InputStream, which can then be used to read the file. This avoids having to duplicate files between resources and the assets directory.
InputStream is = getClass().getClassLoader().getResourceAsStream("my_file.json");
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
sb.append(line);
}
}
String json = sb.toString();
I have a url of a text file and I want to read it:
URL url = new URL("example.com/textfile.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inpuline = null;
while ((inpuline = in.readLine()) != null) {
System.out.println(inpuline);
}
in.close();
The problem is when I change the Content of textfile.txt, my program does not realize the changes next time it runs.
After you change the txt file, you should verify that your server realized the changes and return the last version of your file. To verify this use a browser. If you didn't get the last version of your file something is wrong with the server. If you need to press Ctrl+F5 it means that the maybe some proxies or your browser cashed the old file.
After all trying the following workarounds may helps:
try {
URL url = new URL("example.com/textfile.txt");
Scanner s = new Scanner(url.openStream());
// read from your scanner
}
catch(IOException ex) {
ex.printStackTrace(); // for now, simply output it.
}
If you got the cached version of your file again, then try to use HttpURLConnection to download the file and write it to a temp file. Then read from that temp file and after that delete that temp file. Maybe downloading the file can force the server to get the newest version of that file. To avoid cached version of your file try this:
// Create a URLConnection object
URLConnection connection = myURL.openConnection();
// Disable caching
connection.setUseCaches(false);
Good Luck.
I have a scheduled task (using cron) inside my Spring MVC application. Inside the programmed task I have to get a CSV from an external server in the following link:
http://www.aemet.es/es/eltiempo/observacion/ultimosdatos_6172O_datos-horarios.csv?k=and&l=6172O&datos=det&w=0&f=temperatura&x=h24
And once I get it I have to parse it.
The problem comes when getting the file, as when I click on the previous link I can download it to my computer, but I don't know how to do that using Spring.... can you give me I hint??
UPDATE: I don't have any code yet, but I guess that must be something similar to the following code:
URL stockURL = new URL("http://example.com/stock.csv");
BufferedReader in = new BufferedReader(new InputStreamReader(stockUrl.openStream()));
CSVReader reader = new CSVReader(in);
But the problem is that my URL is not exactly a .csv. Whe I put the URL in a browser it looks like it is a redirect.
Thank you very much indeed.
Thank you all for your comments. Even if the URL doesn't have a CSV extension, i tried the following code (in Java, not Spring) but it works!!
URL stockURL = new URL("http://www.aemet.es/es/eltiempo/observacion/ultimosdatos_6172O_datos-horarios.csv?k=and&l=6172O&datos=det&w=0&f=temperatura&x=h24");
BufferedReader in = new BufferedReader(new InputStreamReader(stockURL.openStream()));
//CSVReader reader = new CSVReader(in);
String line;
while((line = in.readLine()) != null){
System.out.println(line);
}
So,I guess that using Spring is going to be really very similar to get the file, so thank you very much everybody!
I can read texts and write them to console however when i install this application to another computer wherever it is installed I dont want to change the path of the txt file. I want to write it like
BufferedReader in = new BufferedReader(new FileReader("xxx.txt"));
I don't want to:
BufferedReader in = new BufferedReader(new FileReader("C:\\Users\\abcde\\Desktop\\xxx.txt"));
is there any way to show this txt file? By the way I put this txt file inside the sources but it cant read!
First get the default application path then check if file exist if exist continue if not close application.
String path = System.getProperty("user.dir");
System.out.println(path + "\\disSoruCevap.txt");
File file = new File(path + "\\disSoruCevap.txt");
if (!file.exists()) {
System.out.println("System couldnt file source file!");
System.out.println("Application will explode");
}
EDIT*
Please prefer one of the answer using resource streams, as you will
see from comments using user.dir is not safe in every case.
You are looking for :
BufferedReader in = new BufferedReader(getClass().getResourceAsStream("/xxx.txt"));
This will load xxx.txt from your jar file (or any jar file in your class path that has that file inside its root directory).
URL fileURL= yourClassName.class.getResource("yourFileName.extension");
String myURL= fileURL.toString();
now you don't need long path name PLUS this one is dynamic in nature i.e., you can now move your project to any pc, any drive.This is because it access URL by using your CLASS location not by any static location (like c:\folder\ab.mp3, then you can't access that file if you move to D drive because then you have to change to D:/folder/ab.mp3 manually which is static in nature)(NOTE: just keep that file with your project)
You can use fileURL as: File file=new File(fileURL.toURI());
You can use myURL as: Media musicFile=new Media(myURL); //in javaFX which need string not url of file
InputStream input = Class_name.class.getResourceAsStream("/xxx.txt");
InputStreamReader inputReader = new InputStreamReader(input);
BufferedReader br = new BufferedReader(inputReader);
String line = null;
try {
while((line = br.readLine())!=null){
System.out.println(line);
}
} catch (IOException ex) {
ex.printStackTrace();
}
You don't need to write or mention long path. Using this code Class_name.class.getResourceAsStream("/xxx.txt"), you can easily get your file.
BufferedReader in = new BufferedReader(new FileReader("xxx.txt")); works fine because when you run your application on an IDE, xxx.txt apparantly is lying in Java's working directory.
Working directory is an operating system feature and it can not be changed.
There are a few ways to deal with this.
1 - use file constructor new File(parent, filename); and load parent using a public static final constant or a property (either passed from command line or otherwise)
2 - or use InputStream in = YourClass.class.getClassLoader().getResourceAsStream("xxx.txt"); - provided your xxx.txt file is packaged under same location as YourClass
Try:
InputStream is = ClassLoader.getSystemResourceAsStream("xxx.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(is));
Depending on where exactly is your file compared to the root of your classpath, you may have to replace xxx.txt3 with /xxx.txt.
My file paths are like this:
public final static String COURSE_FILE_LOCATION = "src/main/resources/courses.csv";
public final static String PREREQUISITE_FILE_LOCATION = "src/main/resources/prerequisites.csv";
This doesn't work. So I delete the .iml file, .idea and target folder from the project and reload them.
Read the correct path like this:
This would work then.