I am currently struggling to comment an existing line with JDom, best I would like to comment an entire node.
SAXBuilder jdomBuild = new SAXBuilder();
jdomDoc = jdomBuild.build(fileLocation);
Element root = jdomDoc.getRootElement();
Element header = root.getChild("header_info")
// how can I comment the lines now?
And the document :
<xml_report>
<header_info>
<bla1></bla1>
<bla2></bla2>
<bla3></bla3>
<bla4></bla4>
<bla5></bla5>
<bla6></bla6>
</header_info>
<random_things>
<random></random>
</random_things>
</xml_report>
I would like to comment the whole header but I can't find the solution anywhere...
Could I have some advice and explainations please?
Replace the header element with a Comment containing the content of the element. Example:
XMLOutputter outputter = new XMLOutputter();
outputter.getFormat().setExpandEmptyElements(true);
SAXBuilder jdomBuild = new SAXBuilder();
Document jdomDoc = jdomBuild.build(new ByteArrayInputStream(("<xml_report>\n"
+ "\n"
+ " <header_info>\n"
+ " <bla1></bla1>\n"
+ " <bla2></bla2>\n"
+ " <bla3></bla3>\n"
+ " <bla4></bla4>\n"
+ " <bla5></bla5>\n"
+ " <bla6></bla6>\n"
+ " </header_info>\n"
+ "\n"
+ " <random_things>\n"
+ " <random/>\n"
+ " </random_things>\n"
+ "\n"
+ "</xml_report>").getBytes(StandardCharsets.UTF_8)));
Element root = jdomDoc.getRootElement();
Element header = root.getChild("header_info");
Comment comment = new Comment(outputter.outputString(header));
root.setContent(root.indexOf(header), comment);
outputter.output(jdomDoc, System.out);
Output:
<?xml version="1.0" encoding="UTF-8"?>
<xml_report>
<!--<header_info>
<bla1></bla1>
<bla2></bla2>
<bla3></bla3>
<bla4></bla4>
<bla5></bla5>
<bla6></bla6>
</header_info>-->
<random_things>
<random></random>
</random_things>
</xml_report>
Note that you cannot preserve the fromatting with JDOM for every possible input, since the formatting information is removed during the parsing.
Also note that you cannot put a comment around a block containing a comment, since this would end the comment sooner. In fact JDOM does not allow -- to occur as substring in a comment. You could simply break up those substings inside comments using
private static final Pattern PATTERN = Pattern.compile("-{2,}");
private static String fix(String string) {
StringBuilder sb = new StringBuilder();
Matcher m = PATTERN.matcher(string);
int lastEnd = 0;
while (m.find()) {
System.out.println(m.group());
sb.append(string.subSequence(lastEnd, m.start())).append('-');
lastEnd = m.end();
for (int i = lastEnd - m.start(); i > 1; i--) {
sb.append(" -");
}
}
if (lastEnd < string.length()) {
sb.append(string.subSequence(lastEnd, string.length()));
}
return sb.toString();
}
Comment comment = new Comment(fix(outputter.outputString(header)));
Anything else would get compicated, since you'd need to take <![CDATA[]]> into account too.
Related
I have a big text files and I want to remove everything that is between
double curly brackets.
So given the text below:
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
String cleanedText = Pattern.compile("(?<=\\{\\{).*?\\}\\}", Pattern.DOTALL).matcher(text).replaceAll("");
System.out.println(cleanedText);
I want the output to be:
This is what I want.
I have googled around and tried many different things but I couldn't find anything close to my case and as soon as I change it a little bit everything gets worse.
Thanks in advance
You can use this :
public static void main(String[] args) {
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
String cleanedText = text.replaceAll("\\n", "");
while (cleanedText.contains("{{") && cleanedText.contains("}}")) {
cleanedText = cleanedText.replaceAll("\\{\\{[a-zA-Z\\s]*\\}\\}", "");
}
System.out.println(cleanedText);
}
A regular expression cannot express arbitrarily nested structures; i.e. any syntax that requires a recursive grammar to describe.
If you want to solve this using Java Pattern, you need to do it by repeated pattern matching. Here is one solution:
String res = input;
while (true) {
String tmp = res.replaceAll("\\{\\{[^}]*\\}\\}", "");
if (tmp.equals(res)) {
break;
}
res = tmp;
}
This is not very efficient ...
That can be transformed into an equivalent, but more concise form:
String res = input;
String tmp;
while (!(tmp = res.replaceAll("\\{\\{[^}]*\\}\\}", "")).equals(res)) {
res = tmp;
}
... but I prefer the first version because it is (IMO) a lot more readable.
I am not an expert in regular expression, so I just write a loop which does this for you. If you don't have/want to use a regEx, then it could be helpful for you;)
public static void main(String args[]) {
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
int openBrackets = 0;
String output = "";
char[] input = text.toCharArray();
for(int i=0;i<input.length;i++){
if(input[i] == '{'){
openBrackets++;
continue;
}
if(input[i] == '}'){
openBrackets--;
continue;
}
if(openBrackets==0){
output += input[i];
}
}
System.out.println(output);
}
My suggestion is to remove anything between curly brackets, starting at the innermost pair:
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
Pattern p = Pattern.compile("\\{\\{[^{}]+?}}", Pattern.MULTILINE);
while (p.matcher(text).find()) {
text = p.matcher(text).replaceAll("");
}
resulting in the output
This is
what I
want.
This might fail when having single curly brackets or unpaired pair of brackets, but could be good enough for your case.
I'm trying to make a chat client to Swing. I need that message in the history of correspondence were formatted in HTML.
The problem I tried to solve with JTextPane, as it supports HTML formatting. When I did, just text display, in principle everything was normal.
But when I added emoticons using the HTML tag <img>, each time a new message, all text in the window of correspondence started twitching.
How I did it:
jTextPane.setText ("message");
When it was, the new message, I did so
jTextPane.setText ("message" + "new message"); etc.
The result had been the principle of "snake" of Tetris. As a result, I did not like how it works.
So please tell me whether it is possible to deduce that new messages using JLabel adding them to JScrollpane? How to make every new post was a separate element?
String[] split = text.split("\t\t");
String time = split[0].split("\t")[2].split(", ")[1];
String sender = split[0].split("\t")[3];
String message = split[1];
if (!jTextPane.getText().equals("Please log in!")) {
oldMsg = jTextPane.getText().substring(jTextPane.getText().indexOf("<body>") + 6, jTextPane.getText().lastIndexOf("</body>"));
if(sender.equalsIgnoreCase(login.getText())) {
msg = "<div style=\"text-align:right\">" + checkMsgOnSmile(message) + " " + "<b>" + " :" + checkSenderOnColor(sender) + "</b>" + "<span style=\"font-size:10pt\">[" + time + "]</span></div>";
} else {
msg = "<div style=\"\"><span style=\"font-size:10pt\">[" + time + "]</span> " + "<b>" + checkSenderOnColor(sender) + ":" + "</b>" + " " + checkMsgOnSmile(message);
}
String[] check = (oldMsg + msg).split("<br>");
if (check.length > 99) {
ArrayList<String> arrayList = new ArrayList<String>(Arrays.asList(check));
arrayList.remove(0);
String str = "";
for (int i = 0; i < arrayList.size(); i++) {
str = str + arrayList.get(i).toString() + "<br>";
}
jTextPane.setText(str);
} else {
oldMsg = oldMsg.replaceAll("<span><font size=\"10pt\">", "<span style=\"font-size:10pt\">");
oldMsg = oldMsg.replaceAll("</font></span>", "</span>");
jTextPane.setText(oldMsg + msg + "<br>");
}
}
Can it be replaced by a JLabel and JScrollPane?
If you just need to display text with different colors, fonts etc.., then I find working with text and attributes is easier than using HTML. A simple example would be code like:
JTextPane textPane = new JTextPane();
textPane.setText( "Hello:" );
textPane.setEditable(false);
StyledDocument doc = textPane.getStyledDocument();
// Define a keyword attribute
Simple AttributeSet keyWord = new SimpleAttributeSet();
StyleConstants.setForeground(keyWord, Color.RED);
StyleConstants.setBackground(keyWord, Color.YELLOW);
StyleConstants.setBold(keyWord, true);
// Add some text
try
{
doc.insertString(doc.getLength(), "\nAnother line of text", keyWord );
}
catch(Exception e) {}
I have very long html string which has multiple
<dl id="divmap"> .... </dl>.
I want to remove all content between this .
i wrote this code in java:
String triphtml= htmlString;
System.out.println("triphtml is "+triphtml);
System.out.println("test1 ");
final Pattern pattern = Pattern.compile("(<dl id=\""+selectedArray[i]+"\">)(.+?)(</dl>)",
Pattern.DOTALL);
final Matcher matcher = pattern.matcher(triphtml);
// matcher.find();
System.out.println("pattern of test1 is : "
+ pattern); // Prints
System.out.println("MATCHER of test1 is : "
+ matcher); // Prints
System.out.println("MATCH COUNT of test1 a: "
+ matcher.groupCount()); // Prints
System.out.println("MATCH COUNT of test1 a: "
+ matcher.find()); // Prints
while (matcher.find()) {
// System.out.println("MATCH GP 3: "+matcher.group(3).substring(1,10));
for (int z = 0; z <= matcher.groupCount(); z++) {
String extstr = matcher.group(z);
System.out.println("matcher group of "+z+" test1 is " + extstr);
System.out.println("ext a of test1 is " + extstr);
triphtml = triphtml.replaceAll(extstr, "");
System.out.println("Group found of test1 is :\n" + extstr);
}
}
But this code removes some dl and some remains in triphtml.
I dont why this thing is happening.
Here triphtml is a html string which has multiple dl's. Please help me how I remove content between all
<dl id="divmap">.
Thanks in advance.
I suggest to NOT use regex for html. Just use any library used for traversing xml/html.
For example JSoup
Try using JSoup
It uses selectors and syntax like JQuery, it it very easy to use.
You can try this
String triphtml = htmlString;
Document doc = Jsoup.parse(htmlString);
Elements divmaps = doc.select("#divmap");
then you can remove (or alter) the elements in the DOM.
divmaps.remove();
triphtml = doc.html();
By using regex you can do as follows:
String orgString = "<dl id=\"divmap\"> .... </dl>";
orgString = orgString.replaceAll("<[^>]*>", "");
//for removing html tag
orgString = orgString.replaceAll(orgString.replaceAll("<[^>]*>", ""),"");
//for removing content inside html tag
But it is better to use html parsing
Edit:
String htmlString = "<dl id=\"divmap\"> Content </dl>";
Pattern p = Pattern.compile("<[^>]*>");
Matcher m = p.matcher(htmlString);
while(m.find()){
htmlString = htmlString.replaceAll(m.group(), "");
}
System.out.println("Ans"+htmlString);
I have a number of xml's that come in haphazardly that contain a Ocount, and Lnumber, as well as other data. I have created a class to get that data.
My problem is that how can I group xml's that have the same Lnumber(string), until it reaches the Ocount(int). (the xmls that have the same lnumber has the same Ocount). And eventually send out a email telling with xmls has been processed.
String readLine = FileHandler.checkListFile(sh.getShipmentHeader().getBillToCustomer());
if (!readLine.isEmpty())
{
int orderCount = 0;
int index = readLine.indexOf(";") + 1;
String customerName = readLine.substring(index, readLine.indexOf(";", index)).trim();
index = readLine.indexOf(";", index) + 1;
String to = readLine.substring(index, readLine.length()).trim();
if (!billMap.containsKey(sh.getShipmentHeader().getBillToCustomer()))
{
billMap.put(sh.getShipmentHeader().getBillToCustomer(), 1);
orderCount = 1;
}
else
{
billMap.put(sh.getShipmentHeader().getBillToCustomer(), ((int) billMap.get(sh.getShipmentHeader().getBillToCustomer())) + 1);
orderCount = (int) billMap.get(sh.getShipmentHeader().getBillToCustomer());
}
outboundMessage += sh.getShipmentHeader().getOrderNumber() + li ;
logger.info("On-Demand Outbound Export Info: " + orderCount + " processed out of " + sh.getShipmentHeader().getOrderCount() +
" for " + customerName);
if (orderCount == sh.getShipmentHeader().getOrderCount())
{
Email email = new Email();
billMap.remove(sh.getShipmentHeader().getBillToCustomer());
outboundMessage += li + "Total of #"+ sh.getShipmentHeader().getOrderCount() + " orders processed for "+ customerName + li ;
logger.info("On-Demand Email sent for " + customerName);
System.out.println(outboundMessage);
email.outboundEmail("TEST: Orders for " + customerName + " complete", outboundMessage, to);
outboundMessage = "";
email = null;
}}
I been working on this for days, where am I going wrong.
It seems like you are having difficulty obtaining information from xmls. I suggest using XStream [1]. It is capable of serialising objects to xml and back. By using XStream, you can get an Object from the xml and compare variables (Lnumber and Ocount) easily.
If you insist using this code, I suggest adding comments to notify us what you are doing, but if want an easier alternative to work with xml files using java, I highly suggest using XStream as a solution.
[1] http://x-stream.github.io/
i am trying to find a certain tag in a html-page with java. all i know is what kind of tag (div, span ...) and the id ... i dunno how it looks, how many whitespaces are where or what else is in the tag ... so i thought about using pattern matching and i have the following code:
// <tag[any character may be there or not]id="myid"[any character may be there or not]>
String str1 = "<" + Tag + "[.*]" + "id=\"" + search + "\"[.*]>";
// <tag[any character may be there or not]id="myid"[any character may be there or not]/>
String str2 = "<" + Tag + "[.*]" + "id=\"" + search + "\"[.*]/>";
Pattern p1 = Pattern.compile( str1 );
Pattern p2 = Pattern.compile( str2 );
Matcher m1 = p1.matcher( content );
Matcher m2 = p2.matcher( content );
int start = -1;
int stop = -1;
String Anfangsmarkierung = null;
int whichMatch = -1;
while( m1.find() == true || m2.find() == true ){
if( m1.find() ){
//System.out.println( " ... " + m1.group() );
start = m1.start();
//ende = m1.end();
stop = content.indexOf( "<", start );
whichMatch = 1;
}
else{
//System.out.println( " ... " + m2.group() );
start = m2.start();
stop = m2.end();
whichMatch = 2;
}
}
but i get an exception with m1(m2).start(), when i enter the actual tag without the [.*] and i dun get anything when i enter the regular expression :( ... i really havent found an explanation for this ... i havent worked with pattern or match at all yet, so i am a little lost and havent found anything so far. would be awesome if anyone could explain me what i am doing wrong or how i can do it better ...
thnx in advance :)
... dg
I know that I am broadening your question, but I think that using a dedicated library for parsing HTML documents (such as: http://htmlparser.sourceforge.net/) will be much more easier and accurate than regexps.
Here is an example for what you're trying to do adapted from one of my notes:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String tag = "thetag";
String id = "foo";
String content = "<tag1>\n"+
"<thetag name=\"Tag Name\" id=\"foo\">Some text</thetag>\n" +
"<thetag name=\"AnotherTag\" id=\"foo\">Some more text</thetag>\n" +
"</tag1>";
String patternString = "<" + tag + ".*?name=\"(.*?)\".*?id=\"" + id + "\".*?>";
System.out.println("Content:\n" + content);
System.out.println("Pattern: " + patternString);
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(content);
boolean found = false;
while (matcher.find()) {
System.out.format("I found the text \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
System.out.println("Name: " + matcher.group(1));
found = true;
}
if (!found) {
System.out.println("No match found.");
}
}
}
You'll notice that the pattern string becomes something like <thetag.*?name="(.*?)".*?id="foo".*?> which will search for tags named thetag where the id attribute is set to "foo".
Note the following:
It uses .*? to weakly match zero or more of anything (if you don't understand, try removing the ? to see what I mean).
It uses a submatch expression between parenthesis (the name="(.*?)" part) to extract the contents of the name attribute (as an example).
I think each call to find is advancing through your match. Calling m1.find() inside your condition is moving your matcher to a place where there is no longer a valid match, which causes m1.start() to throw (I'm guessing) an IllegalStateException Ensuring you call find once per iteration and referencing that result from some flag avoids this problem.
boolean m1Matched = m1.find()
boolean m2Matched = m2.find()
while( m1Matched || m2Matched ) {
if( m1Matched ){
...
}
m1Matched = m1.find();
m2Matched = m2.find();
}