Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
So I am trying to write a program which can collect certain information from different articles and combine them. The step in which I am having trouble is extracting the article from the web page.
I was wondering whether you could provide any suggestions to java libraries/methods for extracting text from a web page?
I have also found this product:
http://www.diffbot.com/products/automatic/article/
and was wondering whether you think this is the way to go? If so can someone point me to a java implementation - cannot seem to find one although apparently it exists.
Many thanks
Clarification - I am more looking for an algorithm/library/method for detecting where where in an html dom tree a block of text that could be an article is located. Like Safari's reader function.
ps if you think this is much easier done in something like python just say - although my program has to run in Java as it should eventually run on a server (using java framework) I could try having it make use of python scripts - although would only do this if you advise that Python is the way to go.
Have a look at Apache Tika. It's meant to be used together with a crawler and can extract both text and metadata for you. You can also select various output types.
I have found an open source solution which was extremely highly rated.
https://code.google.com/p/boilerpipe/
A review on different text extraction algorithms:
http://tomazkovacic.com/blog/122/evaluating-text-extraction-algorithms/
It appears that diffbot does perform very well but is not open source. So in terms of open source, boiler pipe could be the way to go.
This is not the answer to every malformed HTML you can get, but most of the time jtidy does a good job cleaning the HTML and giving you an interface for accessing the various DOM nodes, and with that access to the text inside that nodes.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to call a java file(java class) from index.html using href without using java-script in-between html and java.
I am using springboot and my java file is inside src/main/java/controller and the html is inside src/main/resources/templates.
So anyone please help me to find whether it is possible or not,if yes then how can we achieve that.
Thanks in Advance..
Updated from OP comment:
Thank you very much for your reply and let me correct the question can
I call a function which is written in java directly from a html. –
JAVA Coder Nov 5 at 9:08
Still the same answer. Methods or functions, terminology aside. What you're proposing is like trying to power a bicycle by attaching an engine piston to it without the engine. Does not compute.
You cannot directly "call" a java process from html/js.
---- END UPDATE
No.
Longer answer:
Java files aren't called. Java classes are. The Java runtime has to live somewhere for java to be used. Typically that would be in your app server/web server. Its entirely possible to use java to generate the html, but the way your using the term "call" in this case doesn't make sense. Html doesn't "call" anything as its really just a rich text implementation (like a painting). Modern browsers implement javascript interpreters (which has nothing to do the Java) to run javascript code.
So, can you write html to "call" java (without javascript): No. Can you use java to generate html: Yes. Can you "call" java from javascript. Only if its are exposed as a web service (e.g. classes in the app server are configured to present http content)
A programming class and web services overview would probably be helpful.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
This may be too broad of a question, but how does one access a Bible document in an Android application? Is it a text file that has an index to find specific verses or is it something much more complicated? I hope that is enough to answer from.
The first step would be to actually find a structured bible dataset from somewhere.
You can search and try to see if there's an xml version of your favourite translation somewhere, and maybe download that.
Once you've got it (either as xml, json, or whatever format), you can write some Java code to parse the file, and load it into some appropriate data structure that allows you to do what you want with it efficiently (eg. search it by verse).
You could even put it into a database (eg. MySQL or MongoDB), which would allow you to search it efficiently.
But really, how you want to structure the data depends on how you're going to use it, and what formats it's already available in (as it could be a pain to clean up the XML).
You might find the following resources useful:
Web Service APIs to directly get verses: http://www.4-14.org.uk/xml-bible-web-service-api
These would mean avoiding a lot of the headaches of dealing with file formats, indexing, and all kinds of other stuff.
Web service APIs generally work by your program submitting a query to a website (eg. including the biblical reference), and you get back some structured data (eg. xml/json) containing the verse(s) you requested.
Download a structured offline copy: http://www.bibletechnologies.net./osistext/
This would mean you have to find, download, parse, and index your own data structure for dealing with the text, but it would be much faster (if done right) than using a web service to do it.
The link I posted here has only some example books from the bible, but if you look you'll find more around the web.
It completely depends on the format of the file.
Any book or text document has multiple ways it can be stored and distributed. It could simply be in a .pdf file, or it could be stored in an XML, or .epub
It is beyond broad, because there are so many ways to do it, it's impossible to guess without more information.
This link has some information about the e-book formats:
http://en.wikipedia.org/wiki/Comparison_of_e-book_formats
And that's just one small subsection of ways text can be stored.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
For example, say I had the string
hunger > 80 then findFood();
or
distanceTo sun < 30 then moveAwayFrom(sun);
That's not the exact syntax of what I want, but does anyone know a simple way I could make it so that I can pass that onto an entity in a game and they will basically follow that? The only thing that spring to mind at the moment is making a huge block of if statements that parse the given string, but that feels really really ineffecient.
I'd like a second opinion, just to see if I'm overlooking something really simple here :/
Not an easy task! What you are basically saying is that you need to create a language.
This language will describe all possible commands which you will then parse and generate commands from.
Now you might be thinking to yourself "gee, I've never written a language before!". That's where ANTLR comes in. It allows you to write the grammar for your language, and then generate the parser/lexer that you will need to decode the commands. You can get an IDE for working with ANTLER called ANTLRWorks, and you should check out the getting started tutorial. You really will have to get over your "curse" of trouble with wiki pages and dive in here.
Along the way you will probably realize easier or more efficient ways to encode your commands so that you can later decode them. Some possible alternatives are embedding a scripting language which you will use to encode/decode the commands, such as Python, JavaScript, or Lua. I have seen Lua used in games before, you can read their statement on why they are popular in games here. Good luck!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to develop a simple way to generate a flowchart for websites.
Here is why I would need something like that:
At first one draws a flowchart in < insert-program-here > and exports it as a xml-file.
This xml-file shall be imported and unmarshalled by another program, which in this case is a Java-application which is nothing more than a graphical interface which a user can use to find his/her way to a specific solution by following the flowchart.
I hope I got it right so far..
Does anybody know a simple program that would export a flowchart as a xml-file so I can use it as mentioned above? Or is there another way that is more comfortable to accomplish my needs?
Thanks for any help!
Best regards.
Personally I would recommend yEd because I have had good experiences with it. It uses the XML Format GraphML by default so your users will not have to use an export-Function but can use the files produced when saving their work.
But I don't think one can answer your question with a simple "I recommend program X." because nobody knows what you really need. Maybe one program does not give you all features you want to have for your flowcharts. Maybe others are not cheap enough.
I think you will have to find the right one by yourself. To determine which program can do it for you could check several drawing applications. A list of flowcharting programs can be found in this question. If that is not enough you may use google to find more.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am trying to print excel file and word document to printer, but I don't know how to do it in Java.
Can anyone please help me and provide the code example?
Desktop.getDesktop().print(new File("resume.doc"));
I found that if you seeking for source code, surely someone would vote you down. What you need should be an idea, or some hint to the solution only.
For printing Office documents from java, One way is to call .NET from java, but this one is very slow and if you want to integrate the source code, you need a bridge. If you want to write serious projects, this method seems too weak.
Another way is to use the Apache POI, as indicated by Nicholas. But POI also have some problem render Office documents. The positive sides is that it can be stable compared with calling .NET
For the java desktop way, seeing from java API, it seems to leave the solution to your OS. I am not sure about it. You can try it.
Anyway it is not a good way to print Microsoft things from Java, same applies to print PDF documents from .NET or other Microsoft things. Sigh!
Apache POI is one of the more useful libraries for doing MS Word on java.
And Java already has a printing library