How to parse XHTML in Velocity manager? - java

I want to parse some text similar to the below example:
<p class="MsoPlainText"><em><strong>
This is Test Data ,What? To devlop this functionality.
Read more information&nbsp;&nbsp;<
a href="http://www.wikipedia.com">
more information</a>&nbsp;
Something I am doing wrong here.</strong></em></p>
This content contains some tags. I am holding all this content in string variable xmlToPlainText and I am accessing it into Vm file as $xmlToPlainText.
Can anyone help to parse content like this in java?

Related

Get all legal Text from HTML File without Library

we have to get out all the Text from an HTML File without the usage of Jsoup or similar. Whats the best/only way to do that? Our Example looks like this:
<ul><li>Coffee</li><li>Tea</li><li>Milk</li></ul>
<h2>An Ordered HTML List</h2>
<ol><li>Coffee</li><li>Tea</li><li>Milk</li></ol>´´´
need to get all the text out of these html tags without using any libs and if the Tag is not done correctly, print out an error message. Need help guys

Extract some data using Regex

I'm struggling some time to extract JSON data from one html tag. To be more specific it's a script tag and using JSOUP library I can get data between script tags. But inside there is some JSON data which I can't extract. Here is the tag:
<script type="text/javascript">jwplayer.key="WbtWzGvcRNi6Tk+gtKldIbx+nn6lXZFvKiaO2g==";jwplayer("tvplayer").setup({playlist:[{image: "http://img.canlitvlive.io/yayin/trt1_480.jpg?1509735585",title:"TRT 1 Canlı Yayın - CanliTVLive.io",file : "http://yayin.canlitvlive.io/trt1/live.m3u8?tkn=8JD95lXv9dOUXwtgOTBYfw&tms=1509749985"}],...</script>
I need url from file tag which is inside jwplayer. I tried using regular expression for example I tried somethig like this:
"playlist[\":\\s\\{]+file[\":\\s\\{]+\"([^\"]+)\""
But I don't have much experience with regex and can't figure out right pattern. Can someone help with this? Thanks
I'm guessing you just need some whitespace
file\s*:\s*"(.*?)"
https://regex101.com/r/4HldaP/3

Using JAVA, how can I parse .cshtml file and add parameters for the existing C# code in that file

I have some .CSHTML files that were incorrectly generated by a tool. I would like to modify the C# code in them to append additional parameters and remove incorrect parameters from method calls.
I've used JSoup to parse the HTML and JSP files. I am able to add or remove attribute in the HTML and JSP files via JSoup DOM iteration.
But in the .CSHTML files contains C# code (I'm new to C#) and couldn't get control over the code. Hence I am not able to append parameters for that C# code using JSoup library. For example,
<td>
#Html.Label(Resource.Get("Label_Name"), new Dictionary<string,object>{{ "Class","label"},{ "name","Name"},{ "id","Name"}})
</td>
<td>
#Html.TextBoxFor(m=> m.TextBox1,new Dictionary<string,object>{{ "Class","txtfield controlWidth"},{ "name","TextBox1"},{ "id","TextBox1"}})
</td>
As above "#Html.xxxx" codes are treated as value for the 'tr' tag in Jsoup DOM iteration. I could only think on adding if..else logic to add or remove parameters as snippet given below. I don't know what is the standard way of parsing such .cshtml file.
if(str.contains("#Html.")) {
ctrlType=str.substring(str.indexOf('.')+1,str.indexOf('('));
if(ctrlType.equalsIgnorecase("Label")) {
// logic to add parameters.
}
}
Using Java, is there way to parse the .cshtml file and add or remove parameters for C# code ? Can you please suggest to solve the problem with open standard API?

Output string as html in freemarker

So we are storing html in out data model. I need to output this into a freemarker template:
example:
[#assign value = model.value!]
${value}
value = '<p>This is <a href='somelink'>Some link</a></p>'
I have tried [#noescape] but it throws an error saying there is no escape block. see FREEMARKER: avoid escaping HTML chars. This solution did not work for me.
[#noescape] or <#noescape> is only valid when used inside an [#escape] tag. Your data is probably stored with the HTML encoded. You need to get the backend to un-encode the html.
Otherwise you'll need to do something like...
${value?replace(">", ">")?replace("<", "<")}
But that isn't a good approach because it won't catch all the encoded values and shouldn't be done in the view layer.

Need to handle special characters in URL

My input html is
<p>
<span>first
</span>
<span>Google Cloud Connect for Microsoft Office</span>
</p>
I am using xslt1.0 to convert the html to xml..my output xml is
<Relationship Id="rId12700703801" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="http://tools.google.com/dlpage/cloudconnect#utm_campaign=launch&utm_source=en-na-us-gdb-GCC-Appsperience_02242011&utm_medium=blog" TargetMode="External"/></Relationships>
with error "XML Parsing Error: not well-formed" in the location =(after launch&utm_source) in target attribute..
I want to escape the special characters present in url through xslt and make the xml.
Please help me. Thanks in advance..
are you generating the input html? if so you can use URLEncoder.encode to properly encode the string so the transformer doesn't complain about the syntax.
If this is just a random html page, and you have no control over it, then you probably need to use some html parser, such as tagsoup, et. al, to pre-correct it as most html files are not properly formatted.
XSLT expects XML as input, not HTML. You need to turn your HTML into XML if you want to transform it with XSLT.
I think it might be possible to do it with HTML Tidy.

Categories

Resources