I'm writing a program where I get information from a page and put it in excel file.
The problem is, I don't find a way to search for the tag with the specific info.
Here is my code(so far):
private void getAll() throws IOException {
for (int i = 0;i<250;i++){
URL vurl = new URL("http://www.bamart.be/nl/artists/detail/" + i);
BufferedReader reader = new BufferedReader(new InputStreamReader(vurl.openStream()));
String line;
while ((line = reader.readLine()) != null){
if (line.equalsIgnoreCase("<div class=\"subcontent\">"){
System.out.println("Found info!");
}
printInfo(line,i);
}
}
}
private void printInfo(String info,int i){
System.out.println("/***********************************************/");
System.out.println("************\t" + info + "**********************/");
System.out.println("/************" +" Artist page:" + i + " of 999 **********************/" );
}
The println doesn't come up, but it is in the html file.
if (line.equalsIgnoreCase("<div class=\"subcontent\">"){ }
This if statement is checking for exact equality (ignoring case) however there could be other content on that line including whitespace for example.
What you might want instead would be something like
if (line.toLowerCase().contains("<div class=\"subcontent\">") { }
Try using Jsoup starting with this example
Related
I've a huge text file, I'd like to search for specific words and print three or more then this number OF THE WORDS AFTER IT so far I have done this
public static void main(String[] args) {
String fileName = "C:\\Users\\Mishari\\Desktop\\Mesh.txt";
String line = null;
try {
FileReader fileReader =
new FileReader(fileName);
BufferedReader bufferedReader =
new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
bufferedReader.close();
} catch(FileNotFoundException ex) {
System.out.println(
"Unable to open file '" +
fileName + "'");
} catch(IOException ex) {
System.out.println(
"Error reading file '"
+ fileName + "'");
}
}
It's only for printing the file can you advise me what's the best way of doing it.
You can look for the index of word in line using this method.
int index = line.indexOf(word);
If the index is -1 then that word does not exist.
If it exist than takes the substring of line starting from that index till the end of line.
String nextWords = line.substring(index);
Now use String[] temp = nextWords.split(" ") to get all the words in that substring.
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
if (line.contains("YOUR_SPECIFIC_WORDS")) { //do what you need here }
}
By the sounds of it what you appear to be looking for is a basic Find & Replace All mechanism for each file line that is read in from file. In other words, if the current file line that is read happens to contain the Word or phrase you would like to add words after then replace that found word with the very same word plus the other words you want to add. In a sense it would be something like this:
String line = "This is a file line.";
String find = "file"; // word to find in line
String replaceWith = "file (plus this stuff)"; // the phrase to change the found word to.
line = line.replace(find, replaceWith); // Replace any found words
System.out.println(line);
The console output would be:
This is a file (plus this stuff) line.
The main thing here though is that you only want to deal with actual words and not the same phrase within another word, for example the word "and" and the word "sand". You can clearly see that the characters that make up the word 'and' is also located in the word 'sand' and therefore it too would be changed with the above example code. The String.contains() method also locates strings this way. In most cases this is undesirable if you want to specifically deal with whole words only so a simple solution would be to use a Regular Expression (RegEx) with the String.replaceAll() method. Using your own code it would look something like this:
String fileName = "C:\\Users\\Mishari\\Desktop\\Mesh.txt";
String findPhrase = "and"; //Word or phrase to find and replace
String replaceWith = findPhrase + " (adding this)"; // The text used for the replacement.
boolean ignoreLetterCase = false; // Change to true to ignore letter case
String line = "";
try {
FileReader fileReader = new FileReader(fileName);
BufferedReader bufferedReader = new BufferedReader(fileReader);
while ((line = bufferedReader.readLine()) != null) {
if (ignoreLetterCase) {
line = line.toLowerCase();
findPhrase = findPhrase.toLowerCase();
}
if (line.contains(findPhrase)) {
line = line.replaceAll("\\b(" + findPhrase + ")\\b", replaceWith);
}
System.out.println(line);
}
bufferedReader.close();
} catch (FileNotFoundException ex) {
System.out.println("Unable to open file: '" + fileName + "'");
} catch (IOException ex) {
System.out.println("Error reading file: '" + fileName + "'");
}
You will of course notice the escaped \b word boundary Meta Characters within the regular expression used in the String.replaceAll() method specifically in the line:
line = line.replaceAll("\\b(" + findPhrase + ")\\b", replaceWith);
This allows us to deal with whole words only.
I have this code to search a document and save the sentences to a ArrayList<StringBuffer> and save this object in a file
public static void save(String doc_path) {
StringBuffer text = new StringBuffer(new Corpus().createDocument(doc_path + ".txt").getDocStr());
ArrayList<StringBuffer> lines = new ArrayList();
Matcher matcher = compile("(?<=\n).*").matcher(text);
while (matcher.find()) {
String line_str = matcher.group();
if (checkSentenceLine(line_str)){
lines.add(new StringBuffer(line_str));
}
}
FilePersistence.save (lines, doc_path + ".lin");
FilePersistence.save (lines.toString(), doc_path + "_extracoes.txt");
}
Corpus
public Document createDocument(String file_path) {
File file = new File(file_path);
if (file.isFile()) {
return new Document(file);
} else {
Message.displayError("file path is not OK");
return null;
}
}
FilePersistence
public static void save (Object object_root, String file_path){
if (object_root == null) return;
try{
ObjectOutputStream output = new ObjectOutputStream(new FileOutputStream (file_path));
output.writeObject(object_root);
output.close();
} catch (Exception exception){
System.out.println("Fail to save file: " + file_path + " --- " + exception);
}
}
public static Object load (String file_path){
try{
ObjectInputStream input = new ObjectInputStream(new FileInputStream (file_path));
Object object_root = input.readObject();
return object_root;
}catch (Exception exception){
System.out.println("Fail to load file: " + file_path + " --- " + exception);
return null;
}
}
the problem is, the document has some right single quotation characters as apostrophes, and when I load it
and print on screen I get some odd squares instead of apostrophes on netBeans and Â' if I open the file on notepad and this is preventing me to properly handle the extracted sentences or at least showing them properly. At first I thought it was due to encoding incompatibility.
Then I tried changing encoding on project properties to CP1252 but it only changes the blank squares to question marks and on notepad still the same Â'
I also tried using
String line_str = matcher.group().replace("’","'")
and
String line_str = matcher.group().replace('\u2019','\')
but it does nothing
Update:
if (checkSentenceLine(line_str)){
System.out.println(line_str);
lines.add(new StringBuffer(line_str));
}
This is before saving to a binary file. It already mess up the single quotes. shows as blank squares in UTF8 and as ? in CP1252. Makes me think the problem is when reading from the .txt
weird thing is that if i do this:
System.out.println('\u2019');
shows a perfect right single quote. the problem is only when reading from a .txt file, which makes me think it's a problem with the method I'm using to read from file. It also happens to bullet point symbols.
Maybe the problem is when parsing StringBuffer to String? if so, how could I prevent this from happening?
I'm having a logic issue to update a text file via user input.
I have a text file containing product information (ID;Name;Cost;Stock) :
001;Hand Soap;2.00;500
In order to add a product the user calls a function addProduct in order to either update a product if the product name already exists in the file or append to the text file if it does not yet exist. I'm unsure of two things : how to append only once (for the moment it's appending for every line it reads..) and how to deal with an empty text file.
This is how addProduct looks:
public void addProduct(Product product, int amountReceived) throws FileNotFoundException, IOException {
newProduct = product;
String productParams = newProduct.getProduct();
String productID = newProduct.getProductID();
int productStock = newProduct.getProductStock();
String productName = newProduct.getProductName();
String tempFileName = "tempFile.txt";
System.out.println("Attempting to Add Product : " + newProduct.getProduct());
BufferedReader br = null;
BufferedWriter bw = null;
try {
FileInputStream fstream = new FileInputStream(ProductMap.productFile);
br = new BufferedReader(new InputStreamReader(fstream));
String line;
StringBuilder fileContent = new StringBuilder();
while ((line = br.readLine()) != null) {
System.out.println("Line : " + line);
String [] productInfo = line.split(";");
System.out.println("Added Product Info length : " + productInfo.length);
if (productInfo.length > 0) {
if (productInfo[1].equals(productName))
{
System.out.println("Adding existing product");
System.out.println("Product Info : " + String.valueOf(productInfo[3]));
//line = line.replace(String.valueOf(productInfo), String.valueOf(productStock - amountSold));
productInfo[3] = String.valueOf(Integer.parseInt(productInfo[3]) + amountReceived);
String newLine = productInfo[0] + ";" + productInfo[1] + ";" + productInfo[2] + ";" + productInfo[3];
fileContent.append(newLine);
fileContent.append("\n");
System.out.println("Updated Product Info : " + String.valueOf(Integer.parseInt(productInfo[3]) + amountReceived));
System.out.println("Line :" + newLine);
} else {
fileContent.append(line);
fileContent.append("\n");
fileContent.append(productParams);
fileContent.append("\n");
//fileContent.append(productParams + "\n");
//System.out.println("Product Name : " + productInfo[1]);
//System.out.println("The full product info : " +productParams);
}
}
br.readLine();
}
if (br.readLine() == null) {
fileContent.append(productParams);
}
System.out.println("Product Updated File Contents : " + fileContent);
FileWriter fstreamWrite = new FileWriter(ProductMap.productFile);
BufferedWriter out = new BufferedWriter(fstreamWrite);
System.out.println("File Content : " + fileContent);
out.write(fileContent.toString());
out.close();
in.close();
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
At a high level, a simple text file may not be the best choice for this use. This implementation requires enough memory to hold the entire file.
If you only had additions and could just append to the file directly, things would be easier. A database would seem to be the best choice. Somewhere between a database and a simple text file, a RandomAccessFile could help if the data could be written with standard lengths for each field. Then you could overwrite a particular row rather than having to rewrite the whole file.
Given the constraints of the current setup, I can't think of a way around writing all the data each time the file is updated.
To get around the empty file problem, you could skip the else condition of the current loop So the new data would not be added to the fileContent StringBuffer. Then when writing the data back out, you could either write the new data before or after the other information from the file.
Also, the readLine at the bottom of the loop is not needed. Any row that is read at the bottom of the loop will be skipped over and not really processed when the read at the top of the loop gets the next line.
I have a problem with my bukkit plugin.
What I try to do is search through a file, and read it line by line (that works), then if the line has some text in it, it has to return that line, but it also has to return all the other lines in the file which also have that specific text in it. And when i have these lines, i have to send these lines in a message to the Player, that is not the problem, but when i send the lines i get now, the "\n" doesn't work, here is the code i use now:
public String searchText(String text, String file, Player p)
{
String data = null;
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
while((line = br.readLine()) != null)
{
if(line.indexOf(text) >= 0)
{
data += System.getProperty("line.separator") + line + System.getProperty("line.separator");
}
p.sendMessage("+++++++++++GriefLog+++++++++++");
p.sendMessage(data);
p.sendMessage("++++++++++GriefLogEnd+++++++++");
}
br.close();
} catch (Exception e) {
e.printStackTrace();
}
return "";
}
The return is meant to be empty, because the info is returned to the player a bit higher:P
The problem now is, how do i add an "\n" to the data variable, because when i use this function in the rest of my code, it gives a lot of lines, but without the "\n", so how do i put that in?
Since your method isn't supposed to return anything, remove your return statement and set the return type to void.
It looks like your code would output the data string once for each line your search term occurs, try:
data = "";
while((line = br.readLine()) != null)
{
if(line.indexOf(text) >= 0)
{
//remove the first System.getProperty("line.separator") if
//you don't want a leading empty line
data += System.getProperty("line.separator") + line +
System.getProperty("line.separator");
}
}
if (data.length() > 0) {
p.sendMessage("+++++++++++GriefLog+++++++++++");
p.sendMessage(data);
p.sendMessage("++++++++++GriefLogEnd+++++++++");
}
I was writing a program in Java to search for a piece of text
I took these 3 as inputs
The directory, from where the search should start
The text to be searched for
Should the search must be recursive (to or not to include the directories inside a directory)
Here is my code
public void theRealSearch(String dirToSearch, String txtToSearch, boolean isRecursive) throws Exception
{
File file = new File(dirToSearch);
String[] fileNames = file.list();
for(int j=0; j<fileNames.length; j++)
{
File anotherFile = new File(fileNames[j]);
if(anotherFile.isDirectory())
{
if(isRecursive)
theRealSearch(anotherFile.getAbsolutePath(), txtToSearch, isRecursive);
}
else
{
BufferedReader bufReader = new BufferedReader(new FileReader(anotherFile));
String line = "";
int lineCount = 0;
while((line = bufReader.readLine()) != null)
{
lineCount++;
if(line.toLowerCase().contains(txtToSearch.toLowerCase()))
System.out.println("File found. " + anotherFile.getAbsolutePath() + " at line number " + lineCount);
}
}
}
}
When recursion is set true, the program returns a FILENOTFOUNDEXCEPTION
So, I referred to the site from where I got the idea to implement this program and edited my program a bit. This is how it goes
public void theRealSearch(String dirToSearch, String txtToSearch, boolean isRecursive) throws Exception
{
File[] files = new File(dirToSearch).listFiles();
for(int j=0; j<files.length; j++)
{
File anotherFile = files[j];
if(anotherFile.isDirectory())
{
if(isRecursive)
theRealSearch(anotherFile.getAbsolutePath(), txtToSearch, isRecursive);
}
else
{
BufferedReader bufReader = new BufferedReader(new FileReader(anotherFile));
String line = "";
int lineCount = 0;
while((line = bufReader.readLine()) != null)
{
lineCount++;
if(line.toLowerCase().contains(txtToSearch.toLowerCase()))
System.out.println("File found. " + anotherFile.getAbsolutePath() + " at line number " + lineCount);
}
}
}
}
It worked perfectly then. The only difference between the two snippets is the way of creating the files, but they look the same to me!!
Can anyone point me out where I messed up?
In the second example it is used listFiles() whichs returns files. In your example it is used list() which returns only the names of the files - here the error.
The problem in the first example is in the fact that file.list() returns an array of file NAMES, not paths. If you want to fix it, simply pass file as an argument when creating the file, so that it's used as the parent file:
File anotherFile = new File(file, fileNames[j]);
Now it assumes that anotherFile is in the directory represented by file, which should work.
You need to include the base directory when you build the File object as #fivedigit points out.
File dir = new File(dirToSearch);
for(String fileName : file.list()) {
File anotherDirAndFile = new File(dir, fileName);
I would close your files when you are finished and I would avoid using throws Exception.