Replace bibtex or latex {\dg} string command in java - java

I want to write a parser for a bibtex file. In my bibtex-file there is a string booktitle = {Wohnen - Pflege - Teilhabe {\dq}Besser leben durch Technik{\dq}} . Anyway, I am using JUnit-Tests and a method which should detect and replace {\dg} to ' or ". Unfortunately I am not able to write the corresponding java code, for example my following code did not detect the substring?
inproceedingsCitation1 += " booktitle = {Wohnen - Pflege - Teilhabe {\\dq}Besser leben durch Technik{\\dq}},\n";
Corresponding part of my replace-Method:
String afterDg = "";
CharSequence targetDg = "{\\dg}";
CharSequence replacementDg = "\"";
afterDg = afterAe.replace(targetDg, replacementDg);

This regular expression should solve your problem:
String afterDg = afterAe.replaceAll("\\{\\\\dq\\}", "\"");
For more details according to regular expressions, have a look at Vogellas Java Regex Tutorial.

Related

Python Regex to Java

I am trying to convert a python regex to java. It finds a match in python but fails on the same string in java.
Python regex : "(CommandLineEventConsumer)(\x00\x00)(.*?)(\x00)(.*?)({})(\x00\x00)?([^\x00]*)?".format(event_consumer_name)
Java regex : "(CommandLineEventConsumer)(\\u0000\\u0000)(.*?)(\\u0000)(.*?)(" + event_consumer_name + ")(\\u0000\\u0000)?([^\\u0000]*)?"
I also tried this : "(CommandLineEventConsumer)(\\x00\\x00)(.*?)(\\x00)(.*?)(" + event_consumer_name + ")(\\x00\\x00)?([^\\x00]*)?"
What I'm I missing please?
I have attached a piece of the code
String sampleStr = "\u0000\u0000�\u0003\b\u0000\u0000\u0000�\u0005\u0000\u0000\u0003\u0000\u0000�\u0000\u000B\u0000\u0000\u0000���\u0005\u0000\u0000\u0000\u0003\u0000\u0000\u0000 \u0000\u0000\u0000\u0000string\u0000\u0000WMIDataID\u0000\u0000SystemVersion\u0000\b\u0000\u0000\u0000\f\u0000.\u0000\u0000\u0000\u0000\u0000\u0000\u0000)\u0000\u0000\u0000 \u0000\u0000�\u0003\b\u0000\u0000\u0000'\u0006\u0000\u0000\u0003\u0000\u0000�\u0000\u000B\u0000\u0000\u0000��/\u0006\u0000\u0000\u0000\u0003\u0000\u0000\u0000\u000B\u0000\u0000\u0000\u0000string\u0000\u0000WMIDataID\u0000\f\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000�\u0016\u0000\u0000\u0000R\u0000O\u0000O\u0000T\u0000\\\u0000M\u0000i\u0000c\u0000r\u0000o\u0000s\u0000o\u0000f\u0000t\u0000\\\u0000H\u0000o\u0000m\u0000e\u0000N\u0000e\u0000t\u0000\u0019\u0000\u0000\u0000H\u0000N\u0000e\u0000t\u0000_\u0000C\u0000o\u0000n\u0000n\u0000e\u0000c\u0000t\u0000i\u0000o\u0000n\u0000P\u0000r\u0000o\u0000p\u0000e\u0000r\u0000t\u0000i\u0000e\u0000s\u0000 \u0000\u0000\u0000C\u0000o\u0000n\u0000n\u0000e\u0000c\u0000t\u0000i\u0000o\u0000n\u0000�\u0000\u0000\u0000N\u0000S\u0000_\u00005\u00001\u00001\u00006\u00002\u00006\u0000F\u0000A\u0000E\u00004\u0000F\u00005\u00007\u0000D\u0000B\u0000D\u00002\u00000\u0000D\u0000F\u00005\u0000C\u0000D\u00004\u00004\u0000A\u00004\u00001\u0000D\u0000A\u0000E\u0000C\u0000E\u0000D\u00002\u00008\u0000C\u0000F\u00007\u0000B\u00003\u0000F\u0000D\u00008\u0000B\u00001\u00002\u00000\u00001\u00002\u0000C\u00007\u0000F\u00004\u0000B\u00005\u00008\u0000F\u00004\u00004\u0000E\u00006\u00006\u00005\u0000\\\u0000K\u0000I\u0000_\u0000A\u00000\u00001\u00000\u00008\u0000C\u0000E\u00002\u00006\u00001\u0000D\u00006\u0000C\u0000D\u00007\u00000\u0000D\u00003\u00005\u00000\u0000F\u00005\u0000B\u00007\u00002\u0000F\u00002\u0000E\u00009\u00008\u00007\u00004\u0000A\u0000E\u00006\u0000E\u00000\u00000\u00004\u0000D\u00003\u00000\u00002\u00009\u00000\u00001\u00005\u0000B\u00000\u00009\u00001\u00009\u0000B\u00001\u0000B\u0000D\u00003\u00002\u00006\u0000B\u0000B\u00006\u00004\u00009\u0000\\\u0000I\u0000_\u0000E\u0000D\u0000C\u0000E\u0000A\u00001\u00004\u0000E\u0000C\u00006\u00003\u0000A\u00005\u00007\u00004\u00001\u0000F\u0000A\u0000A\u00006\u00003\u00000\u00001\u0000C\u00007\u00007\u0000C\u0000A\u00002\u00006\u00000\u0000A\u0000B\u0000E\u0000C\u00000\u0000E\u00007\u00007\u00000\u00009\u00005\u00001\u00004\u0000F\u00006\u0000A\u00003\u00002\u0000C\u00000\u00003\u00004\u00007\u0000E\u00000\u00002\u00006\u00008\u00001\u00007\u0000C\u00008\u00008\u0000\u0000\u0000WQL:Re4\u00007\u0000C\u00007\u00009\u0000E\u00006\u00002\u0000C\u00002\u00002\u00002\u00007\u0000E\u0000D\u0000D\u00000\u0000F\u0000F\u00002\u00009\u0000B\u0000F\u00004\u00004\u0000D\u00008\u00007\u0000F\u00002\u0000F\u0000A\u0000F\u00009\u0000F\u0000E\u0000D\u0000F\u00006\u00000\u0000A\u00001\u00008\u0000D\u00009\u0000F\u00008\u00002\u00005\u00009\u00007\u00006\u00000\u00002\u0000B\u0000D\u00009\u00005\u0000E\u00002\u00000\u0000B\u0000D\u00003\u0000�3u�&��\u0001����+\u0004�\u0001�\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\f;\u0000\u0000\u0000\u000F\u0000\u0000\u0000�\u0000\u0000\u0000F\u0000\u0000\u0000/\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0004\u0000\u0000\u0000\u0001�\u0000\u0000�\u0000__EventFilter\u0000\u001C\u0000\u0000\u0000\u0001\u0005\u0000\u0000\u0000\u0000\u0000\u0005\u0015\u0000\u0000\u0000�tw�}\n" +
"z�p�)��\u0001\u0000\u0000\u0000root\\cimv2\u0000\u0000BVTFilter\u0000\u0000SELECT * FROM __InstanceModificationEvent WITHIN 60 WHERE TargetInstance ISA \"Win32_Processor\" AND TargetInstance.LoadPercentage > 99\u0000\u0000WQL\u0000B\u0000B\u0000F\u0000C\u0000C\u0000B\u00004\u00004\u00004\u0000C\u0000F\u00006\u00006\u0000A\u0000A\u00000\u00009\u0000A\u0000E\u00006\u0000F\u00001\u00005\u00009\u00006\u00007\u0000A\u00006\u00008\u00006\u00005\u00001\u00007\u00005\u0000B\u0000B\u00000\u0000E\u0000D\u00002\u00001\u00006\u0000D\u00001\u00009\u00009\u00007\u00000\u0000A\u00007\u00009\u00008\u00008\u0000B\u00007\u00002\u0000C\u0000D\u0000F\u00000\u0000A\u00003\u0000A\u00004\u0000�3u�&��\u0001Ԏ��+\u0004�\u0001�\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u000F�����\"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000/\u0000\u0000\u0000O\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u001A\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\\\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0004\u0000\u0000\u0000\u0001q\u0000\u0000�\u0000CommandLineEventConsumer\u0000\u0000cscript KernCap.vbs\u0000\u001C\u0000\u0000\u0000\u0001\u0005\u0000\u0000\u0000\u0000\u0000\u0005\u0015\u0000\u0000\u0000�tw�}\n" +
"z�p�)��\u0001\u0000\u0000\u0000BVTConsumer\u0000\u0000C:\\\\tools\\\\kernrate\u00000\u0000A\u00007\u0000A\u0000B\u0000E\u00006\u00003\u0000F\u00003\u00006\u0000E\u00002\u0000B\u00002\u00009\u00002\u00000\u0000F\u0000E\u0000D\u0000A\u0000F\u0000A\u0000E\u00008\u00004\u00009\u00008\u00002\u00003\u0000A\u0000F\u00009\u00004\u00002\u00009\u0000C\u0000C\u00000\u0000E\u0000A\u00003\u00007\u00003\u0000F\u0000F\u0000E\u0000E\u00001\u00005\u00000\u00007\u0000E\u0000D\u0000B\u00002\u00001\u0000F\u0000D\u00009\u00001\u00007\u00000\u0000�3u�&��\u0001����+\u0004�\u0001�\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000�";
String event_consumer_name = "BVTConsumer";
String cPattern = "(CommandLineEventConsumer)(\\u0000\\u0000)(.*?)(\\u0000)(.*?)(" + event_consumer_name + ")(\\u0000\\u0000)?([^\\u0000]*)?";
Pattern consumer_mo = Pattern.compile(cPattern, Pattern.CASE_INSENSITIVE);
Matcher consumer_match = consumer_mo.matcher(sampleStr);
if(consumer_match.find()){
System.out.println(consumer_match.group(6));
}
UPDATE
In python the groups return
python result screenshot
From what I posted as comments:
The (CommandLineEventConsumer)(\u0000\u0000)(.*?)(\u0000)(.*?) part matches fine.
group(3) gets cscript KernCap.vbs
group(4) gets a null character
but group(5) gets nothing.
I did try in Python and I have the exact same lack of match when I include the (BVTConsumer). So you probably had a difference in the code doing the matching in Python, not the regex itself.
So the reason is that you have a \n in your string so the matching stops there. If you do
Pattern consumer_mo = Pattern.compile(cPattern, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
it does match in your example.

Java String replace All Chars?

i am not familiar with regular expressions maybe one of you can help me. I have a String "46,50 EUR" or "-4.785,20 €" or something similar. I like to remove all chars that are not "0123456789.,-" I tried:
string betrag = "-4.785,20 €";
betrag = betrag.replaceAll("/[^0-9.,-]/", "");
but it wont work. The EUR-Sign will not be removed. Maybe it has something to do with the coding? utf-8 vs. latin1? Or my regular expression is wrong?
Java's regex literals do not require delimiters, so remove the /:
String betrag = "-4.785,20 €";
betrag = betrag.replaceAll("[^0-9.,-]+", "");
System.out.println(betrag); // -4.785,20
you can also use "/d" - it is shortcut for digital characters
but of course the result is the same as mentioned by Tim
String betrag = "-4.785,20 €";
betrag = betrag.replaceAll("[^\d.,-]+", "");
by the way you can easily test your regular expression via some tools/websites - e.g.:
https://www.freeformatter.com/java-regex-tester.html

Java won't replace all strings, because there is text next to the tags (post improved)

I'm working on a program, which formats HTML Code, extracted from a PDF file.
I have a String list, which contains paragraphs and is divided by that.
As the PDF has hyperlinks, I decided to replace them with a foot note number "[1]".
This will be used for citation of sources. I will eventually plan, to put it at the end of a paragraph, or sentence, so you can look up the sources, like you would in a book.
My Problem
For some reason not all the hyperlinks are replaced.
The reason is most likely, that there is text directly next to the tag.
Hell<a href="http://www.example.com">o old chap!
Specifically the "o" part and the "hell" part is blocking the java .replaceAll function, from doing it's job.
Expected Result
Hello [1] old chap!
EDIT:
If I would just add space, before and after the URL, it might split some words like "help", into "hel p", which is also not an option.
My code would have to replace the URL tag (without the ) and create no new extra spaces.
This is some of my code, where the problem occures:
for (int i = 0; i < EN.length; i++) {
Pattern pattern_URL = Pattern.compile("<a(.+?)\">", Pattern.DOTALL);
Matcher matcher_URL = pattern_URL.matcher(EN[i]); //Checks in the curren Array part.
if (matcher_URL.find() == true) {
source_number++;
String extractedURL = matcher_URL.group(0);
//System.out.println(extractedURL);
String extractedURL_fully = extractedURL.replaceAll("href=\"", ""); //Anführungszeichen
//System.out.println(extractedURL_fully);
String nobracketURL = extractedURL.replaceAll("\\)", ""); //Remove round brackets from URL
EN[i] = EN[i].replaceAll("\\)\"", "\""); /*Replace round brackets from URL in Array. (For some reasons there have been href URLs, with an bracket at the end. This was already in the PDF. They were causing massive problems, because it didn't comment them out, so the entire replaceAll command didn't function.)*/
EN[i] = EN[i].replaceAll(nobracketURL, "[" + source_number + "]"); //Replace URL tags with number and Edgy brackets
}
else{
//System.out.println("FALSE: " + "[" + i + "]");
}
}
The whole idea of this is, that it loops through the array and replaces all the URLs, including it's starting tag <a until the end of the starting tag "> (which can also be seen in the pattern regex.)
Correct me if I'm wrong, but what you need is to eliminate all the <a> tags from a given string, right? If that's the case all you needed to do was use a code like the following:
final String string = "<a href=\"http://www.example.com\">Sen";
final Pattern pattern = Pattern.compile("<a(.+?)>", Pattern.DOTALL);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll("");
System.out.println(result); // prints "Sen"
Notice I didn't use the replaceAll from the String object, but from the Matcher object. This replaces all matches for the empty string "".

Safely prepare a String with Emoji Icons in Java for XML and XSLT Transformation

You get a string, containing any kind of characters (UTF-8) including special characters like emoticons/emoji 👍 🏁.
You have to generate an XML Element containing that received string and pass it to an XSLT Transformator Engine.
As I get Transformation Errors, I wonder how the Java code could process the string before inserting it into the final XML so that the XSLT Transformation will not fail.
What I currently have in Java is this:
String inputValue = ...; // you get this string by an external client
Element target = ...; // element of an XML where you have to add the string
String xml10pattern = "[^"
+ "\u0009\r\n"
+ "\u0020-\uD7FF"
+ "\uE000-\uFFFD"
+ "\ud800\udc00-\udbff\udfff"
+ "]"; // this removes the illegal characters in XML
inputValue = inputValue.replaceAll(xml10pattern, "");
target.setAttribute("text", inputValue);
But still, is something missing in order to make it more safe?
Apache commons library has StringEscapeUTils.escapeXML(string). This allows to have & in your attribute.
A cheap possibility would be to strip off all non ASCII characters so that you just pass a clean text string to it (but with linebreaks etc.):
String inputValue = ...; // you get this string by an external client
Element target = ...; // element of an XML where you have to add the string
String xml10pattern = "[^"
+ "\u0009\r\n"
+ "\u0020-\uD7FF"
+ "\uE000-\uFFFD"
+ "\ud800\udc00-\udbff\udfff"
+ "]"; // this removes the illegal characters in XML
inputValue = inputValue.replaceAll(xml10pattern, "");
inputValue = inputValue.replaceAll("[^\\x00-\\xFF]", "");
target.setAttribute("text", inputValue);
Any thoughts on this?

Convert Java Regex into PHP regex

I got the following Java code from Apache commons to validate email addresses. I code in PHP so I'm trying to see if these regex can be used directly in PHP without any modification.
LEGAL_ASCII_REGEX = "^\\p{ASCII}+$";
EMAIL_REGEX = "^\\s*?(.+)#(.+?)\\s*$";
IP_DOMAIN_REGEX = "^\\[(.*)\\]$";
USER_REGEX = "^\\s*" + WORD + "(\\." + WORD + ")*$";
If an email address fails any of these 4 conditions above, then it would be considered invalid.
I don't have any experience with Java so any advice on modifications on these regex needed for PHP is hugely appreciated!
Best,
Update:
the code I'm using is:
$email_to_test='www.jinfu66#foxmail.com';
if(filter_var($email_to_test, FILTER_VALIDATE_EMAIL)&&preg_match('/^[[:ascii:]]+$/', $email_to_test)&&preg_match('/^\s*?(.+)#(.+?)\s*$/', $email_to_test))
{
echo 'It passed';
}
else
{
echo 'It did not t passs';
}
I'm not sure how to add the condition that $email_to_test must match the requirement from $USER_REGEX in order for it to echo 'It passed'. Thank you!
2nd update:
Here's what WORD stands for in the original JAVA regex:
private static final String SPECIAL_CHARS = "\\p{Cntrl}\\(\\)<>#,;:'\\\\\\\"\\.\\[\\]";
private static final String VALID_CHARS = "[^\\s" + SPECIAL_CHARS + "]";
private static final String QUOTED_USER = "(\"[^\"]*\")";
private static final String WORD = "((" + VALID_CHARS + "|')+|" + QUOTED_USER + ")";
PHP regex dont need double \\ like Java regex
PCRE regex have [[:ascii:]] instead of \\p{ASCII}
PCRE regex need delimiter unlike Java regex
Following PHP regex should work for you:
$LEGAL_ASCII_REGEX = '/^[[:ascii:]]+$/';
$EMAIL_REGEX = '/^\s*?(.+)#(.+?)\s*$/';
$IP_DOMAIN_REGEX = '/^\[(.*)\]$/';
$USER_REGEX = '/^\s*' + preg_quote(WORD, '/') + '(\.' + preg_quote(WORD, '/') + ')*$/';

Categories

Resources