Get all legal Text from HTML File without Library - java

we have to get out all the Text from an HTML File without the usage of Jsoup or similar. Whats the best/only way to do that? Our Example looks like this:
<ul><li>Coffee</li><li>Tea</li><li>Milk</li></ul>
<h2>An Ordered HTML List</h2>
<ol><li>Coffee</li><li>Tea</li><li>Milk</li></ol>´´´
need to get all the text out of these html tags without using any libs and if the Tag is not done correctly, print out an error message. Need help guys

Related

Extract some data using Regex

I'm struggling some time to extract JSON data from one html tag. To be more specific it's a script tag and using JSOUP library I can get data between script tags. But inside there is some JSON data which I can't extract. Here is the tag:
<script type="text/javascript">jwplayer.key="WbtWzGvcRNi6Tk+gtKldIbx+nn6lXZFvKiaO2g==";jwplayer("tvplayer").setup({playlist:[{image: "http://img.canlitvlive.io/yayin/trt1_480.jpg?1509735585",title:"TRT 1 Canlı Yayın - CanliTVLive.io",file : "http://yayin.canlitvlive.io/trt1/live.m3u8?tkn=8JD95lXv9dOUXwtgOTBYfw&tms=1509749985"}],...</script>
I need url from file tag which is inside jwplayer. I tried using regular expression for example I tried somethig like this:
"playlist[\":\\s\\{]+file[\":\\s\\{]+\"([^\"]+)\""
But I don't have much experience with regex and can't figure out right pattern. Can someone help with this? Thanks
I'm guessing you just need some whitespace
file\s*:\s*"(.*?)"
https://regex101.com/r/4HldaP/3

How to parse XHTML in Velocity manager?

I want to parse some text similar to the below example:
<p class="MsoPlainText"><em><strong>
This is Test Data ,What? To devlop this functionality.
Read more information&nbsp;&nbsp;<
a href="http://www.wikipedia.com">
more information</a>&nbsp;
Something I am doing wrong here.</strong></em></p>
This content contains some tags. I am holding all this content in string variable xmlToPlainText and I am accessing it into Vm file as $xmlToPlainText.
Can anyone help to parse content like this in java?

Using JAVA, how can I parse .cshtml file and add parameters for the existing C# code in that file

I have some .CSHTML files that were incorrectly generated by a tool. I would like to modify the C# code in them to append additional parameters and remove incorrect parameters from method calls.
I've used JSoup to parse the HTML and JSP files. I am able to add or remove attribute in the HTML and JSP files via JSoup DOM iteration.
But in the .CSHTML files contains C# code (I'm new to C#) and couldn't get control over the code. Hence I am not able to append parameters for that C# code using JSoup library. For example,
<td>
#Html.Label(Resource.Get("Label_Name"), new Dictionary<string,object>{{ "Class","label"},{ "name","Name"},{ "id","Name"}})
</td>
<td>
#Html.TextBoxFor(m=> m.TextBox1,new Dictionary<string,object>{{ "Class","txtfield controlWidth"},{ "name","TextBox1"},{ "id","TextBox1"}})
</td>
As above "#Html.xxxx" codes are treated as value for the 'tr' tag in Jsoup DOM iteration. I could only think on adding if..else logic to add or remove parameters as snippet given below. I don't know what is the standard way of parsing such .cshtml file.
if(str.contains("#Html.")) {
ctrlType=str.substring(str.indexOf('.')+1,str.indexOf('('));
if(ctrlType.equalsIgnorecase("Label")) {
// logic to add parameters.
}
}
Using Java, is there way to parse the .cshtml file and add or remove parameters for C# code ? Can you please suggest to solve the problem with open standard API?

Is there a way to put tag code into servlet code?

I have this custom tag that was created by my boss.
<lib:Menu user_id=\"<%= user.getId()%>\" />\n
I'm currently in the process of trying to move the HTML code that existed in a jsp file into a servlet file. Problem is that I don't know how to make a call to the custom library tag to get the HTML that he had created.
Is there a way to call the tag to get the HTML out of it? Is there a way to get HTML out of a tag?

Need to handle special characters in URL

My input html is
<p>
<span>first
</span>
<span>Google Cloud Connect for Microsoft Office</span>
</p>
I am using xslt1.0 to convert the html to xml..my output xml is
<Relationship Id="rId12700703801" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="http://tools.google.com/dlpage/cloudconnect#utm_campaign=launch&utm_source=en-na-us-gdb-GCC-Appsperience_02242011&utm_medium=blog" TargetMode="External"/></Relationships>
with error "XML Parsing Error: not well-formed" in the location =(after launch&utm_source) in target attribute..
I want to escape the special characters present in url through xslt and make the xml.
Please help me. Thanks in advance..
are you generating the input html? if so you can use URLEncoder.encode to properly encode the string so the transformer doesn't complain about the syntax.
If this is just a random html page, and you have no control over it, then you probably need to use some html parser, such as tagsoup, et. al, to pre-correct it as most html files are not properly formatted.
XSLT expects XML as input, not HTML. You need to turn your HTML into XML if you want to transform it with XSLT.
I think it might be possible to do it with HTML Tidy.

Categories

Resources