HTML to TXT library that mimics the output of "lynx -dump"? - java

The problem is really that specific.
I need a library in java that can take HTML content and generate text in the same format that is generated by the Linux lynx program.
I need to expose data provided by 3rd party servers to end users on Android. Data format is ancient, in badly formatted HTML, so much that I've tried reading it using java and it fails occasionally (unacceptable). It is also growing every month (preinstall ruled out) and I can't convince them to change to "modern" stuff (life would be great in XML etc.).
Shortest route: I wrote a class to use the W3 html2txt service online (google search it). It worked fine on the app until I got complains and noticed that the W3 service fails occasionally. It's not that big of a deal, but the black box logic expects the output to be in this "lynx like" text format.
So I would like a library to do the conversion (HTML->TXT) in "lynx style" inside the app and avoid the outages in the W3 service. And besides, the lynx output the probably the best I've seen, the most organized and neat.
Are you guys aware of any?

not sure what you mean by lynx style so I might be completely off by submitting this (if so please excuse me).
I used some piece of code a while back to check HTML/XML files (at the time I was just priting it out in the logs
InputStream in = context.getResources().openRawResource(id);
StringBuffer inLine = new StringBuffer();
InputStreamReader isr = new InputStreamReader(in);
BufferedReader inRd = new BufferedReader(isr);
String text;
while ((text = inRd.readLine()) != null) {
inLine.append(text);
inLine.append("\n");
}
in.close();
return inLine.toString();
I hope it helps but I got the feeling you need something more complex :P

After a year, I give up. Answer is: no way to handle that, no library in Java. At least for now.
I'm closing this. Thank you for your attention.

Related

How to receive a dynamic variable from a website (android)

I'm not exactly sure how to state this but here's the basic idea of what I'm trying to do:
I'm making a radio player application in Android Java. The function I'm looking at including is a dynamic TextView that get's the title of the song that's currently playing from either the website or ShoutCast.
My thoughts as they stand now is that my XML can stay as it is (a "content_wrap"-ed #string value). I just have no idea if there's a way to change that XML from Java and how to get the HTML (I'm not even sure if I need to use HTML) from the website.
Thank you in advance. You are all great people for even reading this :)
To answer my own question (regarding what I was looking for). I believe what I was starting with was javascript (something like a webplayer that constantly updated, so the title wasn't actually in the code nor the redirection).
What I did was from a permanent list (a played list in my case) and connect to that URL with a URLconnection (java.net.URL & java.net.URLConnection) then use a BufferedReader (java.io.BufferedReader). This part can probably be optomized more using the BufferedReader's mark and restart methods, but for this one I just kept circling though until I found a permanent mark then trimmed it till I found what I needed.
e.g.
while (!br.contains("PERMANANT MARK BEFORE MY DESIRED INFO"))
br.readLine(); //Which advances the line
String s = br.readLine(); //Repeat this till you get your info into s
//Trim functions (I used the </td> to my advantage)
s = s.split("THING NEXT TO INFO")[0[;
display.setText(s);
Thank you to everyone who read this. Good luck on your endeavors!

How can I use http://translate.google.com/ to translate the string in Java program?

I want to use http://translate.google.com/ to translate the string.
And now I want to sent a string from a java program in http://translate.google.com/ to translate the string from english to bangla .
And I want to get the translated string as a program output .
Anyone can tell me how can I do this......??
The wrong way to do that : use an HTTPClient to emulate, in Java, a browser request.
It's a bad way of using a website, as you'll make dirty things in HTTP, and your program will have to be modified each time Google modify the HTML pages on translate.google.com (even if that should be pretty rare).
The right way of doing that : use the Google Translate API provided by Google for that purpose. It's just a REST service so it works quite easily in JAVA.
Beware that the number of translations that you can do each day is, as far as i remember, limited to a certain amount each day (check the online conditions on the API website). At first glance after just checking that, it seems that the v2 API is not free anymore, i don't know if you can stick to the v1 one.
If google is not a must you can consider Bing translator. Here is a link on how to use the free APIs as well (the example uses C# but you can easily write the same in JAVA). We are using this in our project and it works quite well.
I used this code on my button for translate:
String translate = "translate this string";
String locale = Locale.getDefault().getLanguage();
Uri uri = Uri.parse("https://translate.google.com/#auto/"+ locale + "/" + translate); Intent intent = new Intent(Intent.ACTION_VIEW, uri);
getApplicationContext.startActivity(intent);
I used #auto to detect automatic from string translate, and locale to detect locale language from phone.
Hope this helps :)
Easy task.
Use this - http://translate.google.com/#{fromLanguage}|{toLanguage}|{your_string_here}
Just replace languages with yours (you can check it in translator - short names)
And add your string that you want to translate.
You can make request to this site

download with java code is really slow

i wrote a bit of code that reads download links from a text file and downloads the videos using the copyURLToFile methode from apaches commons-io library and the download is really slow when im in my wlan.
when i put in an internet stick is is about 6 times faster although the stick got 4mbit and my wlan is 8 mbit.
i also tried to do it without the commons-io library but the problem is the same.
normally im downloading 600-700 kb/s in my wlan but with java it only downloads with about 50 kb/s. With the internet stick its about 300 kb/s.
Do you know what the Problem could be?
thanks in advance
//Edit: Here is the code but i dont think it has anything to do with this and what do you mean with network it policies?
FileInputStream fstream = new FileInputStream(linksFile);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String link;
String name;
while ((link = br.readLine()) != null) {
name = br.readLine();
FileUtils.copyURLToFile(new URL(link), new File("videos/"+name+".flv"));;
System.out.println(link);
}
This isn't likely to be a Java problem.
The code you've posted actually doesn't do any IO over the network - it just determines a URL and passes it to (presumably Apache Commons') FileUtils.copyURLToFile. As usual with popular third-party libraries, if this method had a bug in it that caused slow throughput in all but the most unusual situations, it would already have been identified (and hopefully fixed).
Thus the issue is going to lie elsewhere. Do you get the expected speeds when accessing resource through normal HTTP methods (e.g. in a browser)? If not, then there's a universal problem at the OS level. Otherwise, I'd have a look at the policies on your network.
Two possible causes spring to mind:
The obvious one is some sort of traffic shaping - your network deprioritises the packets that come from your Java app (for an potentially arbitrary reason). You'd need to see hwo this is configured and look at its logs to see if this is the case.
The problem resides with DNS. If Java's using a primary server that's either blocked or incredibly slow, then it could take up to a few seconds to convert that URL to an IP address and begin the actual transfer. I had a similar problem once when a firewall was silently dropping packets to one server and it took three seconds (per lookup!) for the Java process to switch to the secondary server.
In any case, it's almost certainly not the Java code that's at fault.
The FileUtils.copyURLToFile internals uses a buffer to read.
Increasing the value of the buffer could speed up the download, but that seems not possible.

Building an irc client in Java

I'm trying to write an ircBot in Java for some practice. I am using this sample code as the base. I'm trying to figure out how to get it to read in text from my console so I can actually talk to people with the bot.
There's the one while loop that takes in the input from the ircserver and spits it out to console and responds to PINGs. I'm assuming I have to have another thread that takes the input from the user and then uses the same BufferedWriter to spit it out to the ircserver again but I can't get that figured out.
Any help would be awesome!
In the code you have linked to, the 'reader' and 'writer' instances, are indeed connected to respectively the ingoing and outgoing ends of the two-way socket you have established with the IRC server.
So in order to get input from the User, you do indeed new another thread which takes commands from the user in some fashion and acts upon these. The most basic model, would naturally be to use System.in for this, preferably wrapping it so that you can retrieve whole line inputs from the User, and parse these as a command.
To read whole lines from System.in you could do something like this:
BufferedReader bin = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = bin.readLine()) != null) {
// Do stuff
}
You could also consider using one of the CLI libraries that is out there for Java, like JLine
If you really want to do yourself a favour, I recommend (after having used it extensively) switching to pircbot. Pircbot really is a wonderful library and will let you get an IRC bot up and running in just a few minutes. Check out some of the examples on the site, it's super easy to use.

Is it essential that I use libraries to manipulate XML?

I am using Java back end for creating an XML string which is passed to the browser. Currently I am using simple string manipulation to produce this XML. Is it essential that I use some XML library in Java to produce the XML string?
I find the libraries very difficult to use compared to what I need.
It's not essential, but advisable. However, if string manipulation works for you, then go for it! There are plenty of cases where small or simple XML text can be safely built by hand.
Just be aware that creating XML text is harder than it looks. Here's some criteria I would consider:
First: how much control do you have on the information that goes into the xml?
The less control you have on the source data, the more likely you will have trouble, and the more advantageous the library becomes. For example: (a) Can you guarantee that the element names will never have a character that is illegal in a name? (b) How about quotes in an attribute's content? Can they happen, and are you handling them? (c) Does the data ever contain anything that might need to be encoded as an entity (like the less-than which often needs to be output as <); are you doing it correctly?
Second, maintainability: is the code that builds the XML easy to understand by someone else?
You probably don't want to be stuck with the code for life. I've worked with second-hand C++ code that hand-builds XML and it can be surprisingly obscure. Of course, if this is a personal project of yours, then you don't need to worry about "others": substitute "in a year" for "others" above.
I wouldn't worry about performance. If your XML is simple enough that you can hand-write it, any overhead from the library is probably meaningless. Of course, your case might be different, but you should measure to prove it first.
Finally, Yes; you can hand build XML text by hand if it's simple enough; but not knowing the libraries available is probably not the right reason.
A modern XML library is a quite powerful tool, but it can also be daunting. However, learning the essentials of your XML library is not that hard, and it can be quite handy; among other things, it's almost a requisite in today's job marketplace. Just don't get bogged down by namespaces, schemas and other fancier features until you get the essentials.
Good luck.
Xml is hard. Parsing yourself is a bad idea, it's even a worse idea to generate content yourself. Have a look at the Xml 1.1 spec.
You have to deal with such things as proper encoding, attribute encoding (e.g., produces invalid xml), proper CDATA escaping, UTF encoding, custom DTD entities, and that's without throwing in the mix xml namespaces with the default / empty namespace, namespace attributes, etc.
Learn a toolkit, there's plenty available.
I think that custom string manipulation is fine, but you have to keep two things in mind:
Your code isn't as mature as the library. Allocate time in your plan to handle the bugs that pop-up.
Your approach will probably not scale as well as a 3rd party library when the xml starts to grow (both in terms of performance and ease of use).
I know a code base that uses custom string manipulation for xml output (and a 3rd party library for input). It was fine to begin with but became a real hassle after a while.
Yes, use the library.
Somebody took the time and effort to create something that is usually better than what you could come up with. String manipulation is for sending back a single node, but once you start needing to manipulate the DOM, or use an XPath query, the library will save you.
By not using a library, you risk generating or parsing data that isn't well-formed, which sooner or later will happen. For the same reason document.write isn't allowed in XHTML, you shouldn't write your XML markup as a string.
Yes.
It makes no sense to skip essential tool: even writing xml is non-trivial with having to escape those ampersands and lts, not to mention namespace bindings (if needed).
And in the end libs can generally read and write xml not only more reliably but more efficiently (esp. so for Java).
But you may have been looking at wrong tools, if they seem overcomplicated. Data binding using JAXB or XStream is simple; but for simple straight-forward XML output, I go with StaxMate. It can actually simplify the task in many ways (automatically closes start tags, writes namespace declarations if needde etc).
No - If you can parse it yourself (as you are doing), and it will scale for your needs, you do not need any library.
Just ensure that your future needs are going to be met - complex xml creation is better done using libraries - some of which come in very simple flavors too.
The only time I've done something like this in production code was when a collegue and I built a pre-processor so that we could embed XML fragments from other files into a larger XML. On load we would first parse these embed (file references in XML comment strings) and replace them with the actual fragment they referenced. Then we would pass on the combined result to the XML Parser.
You don't have to use library to parse XML, but check out this question What considerations should be made before reinventing the wheel?
before you start writing your own code for parsing/generating xml.
No - especially for generating (parsing I would be less inclined to as input text can always surprise you). I think its fine - but be prepared to shift to a library should you find yourself spending more then a few minutes maintaining your own code.
I don't think that using the DOM XML API wich comes with the JDK is difficult, it's easy to create Element nodes, attributes, etc... and later is easy convert strings to a DOM document sor DOM documents into a String
In the first page google finds from Spain (spanish XML example):
public String DOM2String(Document doc)
{
TransformerFactory transformerFactory =TransformerFactory.newInstance();
Transformer transformer = null;
try{
transformer = transformerFactory.newTransformer();
}catch (javax.xml.transform.TransformerConfigurationException error){
coderror=123;
msgerror=error.getMessage();
return null;
}
Source source = new DOMSource(doc);
StringWriter writer = new StringWriter();
Result result = new StreamResult(writer);
try{
transformer.transform(source,result);
}catch (javax.xml.transform.TransformerException error){
coderror=123;
msgerror=error.getMessage();
return null;
}
String s = writer.toString();
return s;
}
public Document string2DOM(String s)
{
Document tmpX=null;
DocumentBuilder builder = null;
try{
builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
}catch(javax.xml.parsers.ParserConfigurationException error){
coderror=10;
msgerror="Error crando factory String2DOM "+error.getMessage();
return null;
}
try{
tmpX=builder.parse(new ByteArrayInputStream(s.getBytes()));
}catch(org.xml.sax.SAXException error){
coderror=10;
msgerror="Error parseo SAX String2DOM "+error.getMessage();
return null;
}catch(IOException error){
coderror=10;
msgerror="Error generando Bytes String2DOM "+error.getMessage();
return null;
}
return tmpX;
}

Categories

Resources