HTML to Textile Java library - java

I need to parse a String from HTML to Textile.
I've been looking at Textile4J, Textile-J, JTextile, PLextile.
But so far, none of them provide the functionality I'm looking for.
They do provide the reverse functionality (Textile to HTML).
Worst case scenario, I can use another programming language, but I have not really looked into that.

For now, I don't believe the functionality I want is available in any java Textile library.
I'll try and update this post if and when that changes.
Based on the libraries mentioned above, I have created my own (limited) functionality.
There are also several solutions available in python / ruby.

Related

How to use XMLHttpRequest in GWT?

XMLHttpRequest is an alternative for HTTP calls from GWT client side and allows the control over all aspects of requests/responses. But how to use it?
javadoc address: http://www.gwtproject.org/javadoc/latest/com/google/gwt/xhr/client/class-use/XMLHttpRequest.html
You haven't mentioned what GWT version you use, so I assume the latest one. It means 2.8.2 or newer.
Elemental2 is the way to go
As it is mentioned in comments above, Elemental2 is the right way. I will explain it a bit.
If you think about future-proof implementation (being aware of GWT3/J2CL new approach), please do not use legacy GWT stuff. It means please use elemental2.dom.XMLHttpRequest instead of com.google.gwt.xhr.client.XMLHttpRequest (the one mentioned by you). Please do not use gwt-user dependency if possible, as it will be deprecated (if it is not already).
The Elemental2 is an opensource project available here: https://github.com/google/elemental2. It is kind of a base library for the "new GWT". For easier migration of existing GWT2.x projects to GWT3.x, a part of the "old" gwt-user is currently being ported to the new approach using JsInterop technique and mentioned Elemental2. So definitely Elemental2 is the way to go.
Elemental2 and JsInterop in general
The specification is not yet that rich if about the new JsInterop approach, but at the moment you will find some introduction at least: http://www.gwtproject.org/doc/latest/DevGuideCodingBasicsJsInterop.html
Examples
Please find an example for XMLHttpRequest in this article:
http://www.g-widgets.com/2016/09/09/gwt-http-requests-alternatives/
If you look for examples, also a good way is to search this on the Github site this way: https://github.com/search?q=elemental2.dom.XMLHttpRequest&type=Code.
(To use Github search you need to be logged in, in other case you will see "Whoa there! You have triggered an abuse blah blah..." )
One of the results will lead you to the very interesting project (you have the preview of the future GWT now):
https://github.com/gwtproject/gwt-http. It is
a future-proof port of the legacy com.google.gwt.http.HTTP GWT module. It will help to migrate GWT2.x projects to GWT3.x.
When you look to the test package, you will find some examples: https://github.com/gwtproject/gwt-http/tree/master/src/test/java/org/gwtproject/http/client .
So this is finally the answer to your question: "how to use it?" :-)
An additional examples source for XMLHttpRequest (using Elemental2) from Gist: https://gist.github.com/search?utf8=%E2%9C%93&q=elemental2.dom.XMLHttpRequest. This is probably even better for start, as they are short and clear.
What Elemental2 is?
The Elemental2 gives you a type checked access to native browser's API. So if you are familiar with browser's API, you should be able to implement your stuff, even based on some native JavaScript example. Please think about the new GWT like about type safe JavaScript (in addition very performant and well optimized). With JsInterop you create bindings, so it is something similar to bindings for TypeScript. So in fact you have a possibility to deal directly with browser's API, without anything GWT specific.
Libraries? More examples...?
Dealing with XMLHttpRequest is a bit low level.
You have also a possibility to use the library. One of Github search results will lead you to this repository: https://github.com/ibaca/autorest-streaming-example which is an example for interesting REST library: https://github.com/intendia-oss/autorest. A modern and reactive one, works with Observables, RxJava and so on.
This library uses JsInterop and is also migrated to Elemental2 what makes it GWT3/J2CL ready, please see the change: https://github.com/intendia-oss/autorest/commit/58516802cd42134544e6e3787207b5431fae94b5 .
With the Github search query I provided you, now you are able to find even more code examples for XMLHttpRequest. So please just have a look and find the best one for your needs.
An alternative approach would be to use a framework, for instance Errai from RedHat: http://erraiframework.org/. It helps you to deal with many problems at a different abstraction level.
I think now you have some references to study.
On the other hand it's 2018, so why not the Fetch API?
When think about the modern web application, I would rather think about the Fetch API instead of XMLHttpRequest. All modern browsers now implement the fetch() function natively. Isn't it the best way for solving your issue then?
The fetch() is a Promise-based mechanism that allows you to make network requests similar to XMLHttpRequest. Promises and Fetch are handled by Elemental2. Then you can use it from your Java code more or less in similar way like in Mozilla's examples.
Read more about the Fetch API here:
https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch
https://developers.google.com/web/updates/2015/03/introduction-to-fetch
https://codepen.io/aderaaij/post/fetching-data-with-fetch
https://fetch.spec.whatwg.org/
What more, this is nothing new as you see.
If about the older browsers a polyfill will emulate the missing function: https://github.com/github/fetch.
If about examples, I don't see that much on Github:
https://github.com/search?utf8=%E2%9C%93&q=elemental2.dom.DomGlobal+fetch&type=Code, but at least something.
The Fetch API seems to be the most current solution to the problem.
Please find a very simple fetch() example using Elemental2.
The imports section:
import static elemental2.dom.DomGlobal.fetch;
import static elemental2.dom.DomGlobal.console;
import elemental2.dom.Response;
Then use in your code:
fetch("https://randomuser.me/api/?gender=female&results=1")
.then(Response::json)
.then(data -> {
console.log(Global.JSON.stringify(data));
return null;
}).
catch_(error -> {
console.log(error);
return null;
});
As a result you should be able to see something like this:
{"results":[{"gender":"female","name":{"title":"mrs","first":"caroline","last":"coleman"},"location":{"street":"3703 new road","city":"swansea","state":"leicestershire","postcode":"ZH67 0YS","coordinates":{"latitude":"14.7870","longitude":"-107.8990"},"timezone":{"offset":"-6:00","description":"Central Time (US & Canada), Mexico City"}},"email":"caroline.coleman#example.com","login":{"uuid":"25357d90-cce4-4fe6-a3db-8ab77c0272ba","username":"smallpeacock582","password":"citizen","salt":"VX3s05Ah","md5":"84649cce1db8c6f2cbe33098221aa570","sha1":"005abf7d2ca0ff5b1a0bfd6dcee6d4860ef6e75d","sha256":"caadff0a16e27b0d9893aea483aedc7cf7c4707096c33a58acf44336bb2b54be"},"dob":{"date":"1978-03-14T15:47:16Z","age":40},"registered":{"date":"2013-08-10T19:09:41Z","age":5},"phone":"015396 74385","cell":"0726-723-103","id":{"name":"NINO","value":"JA 32 24 22 P"},"picture":{"large":"https://randomuser.me/api/portraits/women/45.jpg","medium":"https://randomuser.me/api/portraits/med/women/45.jpg","thumbnail":"https://randomuser.me/api/portraits/thumb/women/45.jpg"},"nat":"GB"}],"info":{"seed":"98f4f4a344470fbd","results":1,"page":1,"version":"1.2"}}
You can further convert the result to Java object using a technique called JsInterop DTOs. If you are interested, please find some information here: https://stackoverflow.com/a/50565283/5394086 .
Not recommended approach
If you sadly prefer to use the old GWT, so <= 2.7, then I think you can search for some examples on Github using similar search query, but for this legacy com.google.gwt.xhr.client.XMLHttpRequest. I this case I would also suggest you to not do stuff so low level, but use a library like https://github.com/reinert/requestor (which is unfortunately discontinued and development has stopped on GWT 2.7, but for this GWT version it is probably the best choice). But again please do not go this way and use GWT >= 2.8.2 with Elemental2/JsInterop approach instead.

Java Internationalization (i18n) Libraries/Frameworks

My organisation is about to embark on the long process of internationalizing (i16g?) its corporate website. The website is a mix of Java EE (JSP/Servlets, no EJB) and static content pushed from the (Documentum) WCM.
While I have experience using the "built-in" mechanism of using ResourceBundle's along with the associated properties files for each language/locale (containing the "KEY=Translated value" approach), where we simply reference the KEY value where we want the translated text to appear.
My director has mentioned that he has used a different approach at a previous organisation whereby they used a 3rd-party library (he does not recall the actual name) which included the actual [english] text in the webpage (to aid developers) which was replaced at run time with the translated content from the config xml file. (anyone know which library this would be?)
I am interested in what other approaches/libraries/frameworks there might be out there to facilitate this.
Thanks
Your boss probably meant gettext, just like #Pawel Dyda mentioned, but cosmopolitan may also be of interest to you.
My company also maintains a GNU gettext-related library for Java (and very soon with extensions aimed at Scala).
Not only does it support all of the goodness of GNU gettext, it also simplifies output AND input of date/timestamps and currency, include facilities for using "wiki" formatting in translations (so you can output HTML bold on a word, for example), java message formatting, generalized "escape" support (so the output can be auto-escaped for inclusion, say, in HTML), and currency rounding.
It is open-source, and currently on github at https://github.com/awkay/easy-i18n/
When I hear you are using ResourceBundles, I see something like this:
ResourceBundle rb = ResourceBundle.getBundle("messages", locale);
String someString = rb.getString("some.key");
If this is your approach for Java Server Pages (using such snippets in scriplets), this is wrong. Instead, you should use JSTL or Spring message tags.
As for your inquiry, I believe they used Gettext (sorry no link, as I am running out of time).
This is not necessary the best approach. JSTL approach is the most common for JSP and you should stick to it, unless you have very good reasons not to.
It worth looking at http://alexsexton.com/blog/2012/03/the-ux-of-language/ it has a good explanation of the idea behind gettext and the limitations of the gettext design a better approach to gettext is the ICU message format this is what the JDK MessageFormat class is based, on http://site.icu-project.org/home there is also a javascript library based on the ICU message format https://github.com/SlexAxton/messageformat.js
I hope, it's not too late to suggest one more solution: https://github.com/resource4j/resource4j
This library has integration with Thymeleaf web page renderer, which solves the problem you've mentioned: you include in page template the English text and then substitute it with localized version in runtime.

Is there a Standard Java SE HTML Parser? If so, why use non-standard ones?

I need to parse a simple HTML page with a simple form in it. The answers to similar questions on StackOverflow suggest using one of a large variety of non-standard Java libraries such as TagSoup, JSoup, HTMLParser and many others.
However, a web search revealed that there exists some standard functionality in Java SE via this class: http://docs.oracle.com/javase/7/docs/api/javax/swing/text/html/parser/ParserDelegator.html
My sub-questions are:
Is it really true that the standard ParserDelegator class can parse a use case like mine?
What are the limitations of the standard library that create the need for so many non-standard libraries?
Does the fact that ParserDelegator is within swing preclude using it in a regular EC2 cloud server for a web application? Would I have to jump through a lot of hoops to get around the headless aspect or would it be just a small tweak to the configuration?
If the standard one is not recommended, which non-standard one should I use, given: (a) my desire to not stray far from the standard; (b) my simple use case; (c) desire for a mature reliable implementation; and (d) no size or weight limitations since this is a server application as opposed to an embedded client. API is a far lower priority so while I do appreciate JSoup's CSS selector like API, the other concerns (a) through (d) override it.
Thank you.
JDK has built-in HTML parser that supports HTML 1.0 or so. It should support parsing of base text formatting tags and forms.
The reason to use other, third party parsers is requirement to support "real" HTML pages DHTML, JavaScript etc.
JSoup is one of popular parsers that can do the job. For more information about other implementations please take a look on the following discussion:
Pure Java HTML viewer/renderer for use in a Scrollable pane

How do I learn to use Java commons-collections?

Weird title, I know, let me explain.
I am a developer most familiar with C# and Javascript. I am completely sunk into those semi-functional worlds to the point that most of my code is about mapping/reducing/filtering collections. In C# that means I use LINQ just about everywhere, in Javascript it's Underscore.js and jQuery.
I have currently been assigned to an ongoing Java project and am feeling rather stifled. I simply do not think in terms of "create an array, shuffle stuff from one to another". I can (and did) create my own versions of the main map/reduce functions using anonymous types implementing interfaces but why re-invent the wheel? The project I am currently on already has commons-collections-3.1.jar and looking through the classes contained it seems like it likely can do everything that I want and more.
For the life of me, I can't find how to actually use it. Looking through the dozens of classes therein is not very helpful and the only thing I can google up is the api doc which is equally as helpful.
How do you use it to Map/Select, Filter/Where, Reduce/Aggregate? Is there anywhere that gives an actual tutorial on this library?
(Comment as answer for formatting purposes.)
Not so much, other than the limited user guide.
That said, I'm not sure where specifically you're having problems--filtering and selecting is mostly wrapped up in the functors package, and utilized by the CollectionUtils class.
While you're not looking for a replacement, you might find things like Guava or Lambda4J a bit more similar to what you're used to (within Java's constraints), and they're a bit less verbose.
Try these links :
http://commons.apache.org/collections/userguide.html (basic tutorial)
http://larvalabs.com/collections/tutorial.html (advanced tutorial with generic)
#george-mauer, you might have to rely on articles like this or a book like Jakarta Commons Cookbook. I have also found it rather useful to learn by creating samples of my own.

Generic Article Extraction from web pages

Am going to begin my work in article extraction.
The task that I will be doing is to extract the hotel reviews that is posted in different web pages(eg. 1. http://www.tripadvisor.ca/Hotel_Review-g32643-d1097955-Reviews-San_Mateo_County_Memorial_Park_Campground-Loma_Mar_California.html, 2. http://www.travelpod.com/hotel/Comfort_Suites_Sfo_Airport-San_Mateo.html )
I need to do the task in Java and I am just working with Java for the past couple of months alone..
And here comes my questions regarding these.
Is there possibility to extract reviews alone from different web pages in a generic way.
Kindly let me know if there are any API that supports the task in Java.
Also, let me know of your thoughts/sources which will be more helpful for me to attain the task mentioned above.
UPDATE
If any sort of related examples available in net, please post the same since that could be of great use.
You probably need a screen scraping utility for Java like TagSoup or NekoHTML. JSoup is also popular.
However, you also have a bigger legal consideration here when extracting data from a 3rd party website like tripadvisor. Does their policy allow it?

Categories

Resources