java replace HTML_Escapecodes

java replace HTML_Escapecodes - java

i need to develope a new methode, that should replace all Umlaute (ä, ö, ü) of a string entered with high performance with the correspondent HTML_Escapecodes. According to statistics only 5% of all strings entered contain Umlauts. As it is supposed that the method will be used extensively, any instantiation that is not necessary should be avoided.
Could someone show me a way to do it?

These are the HTML escape codes. Additionally, HTML features arbitrary escaping with codes of the format : and equivalently :
A simple string-replace is not going to be efficient with so many strings to replace. I suggest you split the string by entity matches, such as this:
String[] parts = str.split("&([A-Za-z]+|[0-9]+|x[A-Fa-f0-9]+);");
if(parts.length <= 1) return str; //No matched entities.
Then you can re-build the string with the replaced parts inserted.
StringBuilder result = new StringBuilder(str.length());
result.append(parts[0]); //First part always exists.
int pos = parts[0].length + 1; //Skip past the first entity and the ampersand.
for(int i = 1;i < parts.length;i++) {
String entityName = str.substring(pos,str.indexOf(';',pos));
if(entityName.matches("x[A-Fa-f0-9]+") && entityName.length() <= 5) {
result.append((char)Integer.decode("0" + entityName));
} else if(entityName.matches("[0-9]+")) {
result.append((char)Integer.decode(entityName));
} else {
switch(entityName) {
case "euml": result.append('ë'); break;
case "auml": result.append('ä'); break;
...
default: result.append("&" + entityName + ";"); //Unknown entity. Give the original string.
}
}
result.append(parts[i]); //Append the text after the entity.
pos += entityName.length() + parts[i].length() + 2; //Skip past the entity name, the semicolon and the following part.
}
return result.toString();
Rather than copy-pasting this code, type it in your own project by hand. This gives you the opportunity to look at how the code actually works. I didn't run this code myself, so I can't guarantee it being correct. It can also be made slightly more efficient by pre-compiling the regular expressions.

Related

String Fragment Combinations Puzzle

Let's say I am given a list of String fragments. Two fragments can be concatenated on their overlapping substrings.
e.g.
"sad" and "den" = "saden"
"fat" and "cat" = cannot be combined.
Sample input:
aw was poq qo
Sample output:
awas poqo
So, what's the best way to write a method which find the longest string that can be made by combining the strings in a list. If the string is infinite the output should be "infinite".
public class StringUtil {
public static String combine(List<String> fragments) {
StringBuilder combined = new StringBuilder();
for (int i = 0; i < fragments.size(); i++) {
char last = (char) (fragments.get(i).length() - 1);
if (Character.toString(last).equals(fragments.get(i).substring(0))) {
combined.append(fragments.get(i)).append(fragments.get(i+1));
}
}
return combined.toString();
}
}
Here's my JUnit test:
public class StringUtilTest {
#Test
public void combine() {
List<String> fragments = new ArrayList<String>();
fragments.add("aw");
fragments.add("was");
fragments.add("poq");
fragments.add("qo");
String result = StringUtil.combine(fragments);
assertEquals("awas poqo", result);
}
}
This code doesn't seem to be working on my end... It returning an empty string:
org.junit.ComparisonFailure: expected:<[awas poqo]> but was:<[]>
How can I get this to work? And also how can I get it to check for infinite strings?

I don't understand how fragments.get(i).length() - 1 is supposed to be a char. You clearly casted it on purpose, but I can't for the life of me tell what that purpose is. A string of length < 63 will be converted to an ASCII (Unicode?) character that isn't a letter.
I'm thinking you meant to compare the last character in one fragment to the first character in another, but I don't think that's what that code is doing.
My helpful answer is to undo some of the method chaining (function().otherFunction()), store the results in temporary variables, and step through it with a debugger. Break the problem down into small steps that you understand and verify the code is doing what you think it SHOULD be doing at each step. Once it works, then go back to chaining.
Edit: ok I'm bored and I like teaching. This smells like homework so I won't give you any code.
1) method chaining is just convenience. You could (and should) do:
String tempString = fragments.get(i);
int lengthOfString = tempString.length() - 1;
char lastChar = (char) lengthOfString;//WRONG
Etc.
This lets you SEE the intermediate steps, and THINK about what you are doing. You are literally taking the length of a string, say 3, and converting that Integer to a Char. You really want the last character in the string. When you don't use method chaining, you are forced to declare a Type of intermediate variable, which of course forces you to think about what the method ACTUALLY RETURNS. And this is why I told you to forgo method chaining until you are familiar with the functions.
2) I'm guessing at the point you wrote the function, the compiler complained that it couldn't implicitly cast to char from int. You then explicitly cast to a char to get it to shut up and compile. And now you are trying to figure out why it's failing at run time. The lesson is to listen to the compiler while you are learning. If it's complaining, you're messing something up.
3) I knew there was something else. Debugging. If you want to code, you'll need to learn how to do this. Most IDE's will give you an option to set a break point. Learn how to use this feature and "step through" your code line by line. THINK about exactly what step you are doing. Write down the algorithm for a short two letter pair, and execute it by hand on paper, one step at a time. Then look at what the code DOES, step by step, until you see somewhere it does something that you don't think is right. Finally, fix the section that isn't giving you the desired result.

Looking at your unit test, the answer seems to be quite simple.
public static String combine(List<String> fragments) {
StringBuilder combined = new StringBuilder();
for (String fragment : fragments) {
if (combined.length() == 0) {
combined.append(fragment);
} else if (combined.charAt(combined.length() - 1) == fragment.charAt(0)) {
combined.append(fragment.substring(1));
} else {
combined.append(" " + fragment);
}
}
return combined.toString();
}
But seeing at your inqusition example, you might be looking for something like this,
public static String combine(List<String> fragments) {
StringBuilder combined = new StringBuilder();
for (String fragment : fragments) {
if (combined.length() == 0) {
combined.append(fragment);
} else if (combined.charAt(combined.length() - 1) == fragment.charAt(0)) {
int i = 1;
while (i < fragment.length() && i < combined.length() && combined.charAt(combined.length() - i - 1) == fragment.charAt(i))
i++;
combined.append(fragment.substring(i));
} else {
combined.append(" " + fragment);
}
}
return combined.toString();
}
But note that for your test, it will generate aws poq which seems to be logical.

Using Files.lines with .map(line -> line.split("multiple delimiters"))

I have an input file with the following format:
Ontario:Brampton:43° 41' N:79° 45' W
Ontario:Toronto:43° 39' N:79° 23' W
Quebec:Montreal:45° 30' N:73° 31' W
...
I have a class named where the values will go.
example:
Province: Ontario
City: Brampton
LatDegrees: 43
LatMinutes: 41
LatDirection: N
LongDegrees: 79 .... etc
I have already completed a method that parses this out correctly, but i'm trying to learn if this can be done better with Java 8 using Streams, Lambdas.
If I start with the following:
Files.lines(Paths.get(inputFile))
.map(line -> line.split("\\b+")) //this delimits everything
//.filter(x -> x.startsWith(":"))
.flatMap(Arrays::stream)
.forEach(System.out::println);
Can someone please help me reproduce the following please ?
private void parseLine(String data) {
int counter1 = 1; //1-2 province or city
int counter2 = 1; //1-2 LatitudeDirection,LongitudeDirection
int counter3 = 1; //1-4 LatitudeDegrees,LatitudeMinutes,LongitudeDegrees,LongitudeMinutes
City city = new City(); //create City object
//String read = Arrays.toString(data); //convert array element to String
String[] splited = data.split(":"); //set delimiter
for (String part : splited) {
//System.out.println(part);
char firstChar = part.charAt(0);
if(Character.isDigit(firstChar)){ //if the first char is a digit, then this part needs to be split again
String[] splited2 = part.split(" "); //split second time with space delimiter
for (String part2: splited2){
firstChar = part2.charAt(0);
if (Character.isDigit(firstChar)){ //if the first char is a digit, then needs trimming
String parseDigits = part2.substring(0, part2.length()-1); //trim trailing degrees or radians character
switch(counter2++){
case 1:
city.setLatitudeDegrees(Integer.parseInt(parseDigits));
//System.out.println("LatitudeDegrees: " + city.getLatitudeDegrees());
break;
case 2:
city.setLatitudeMinutes(Integer.parseInt(parseDigits));
//System.out.println("LatitudeMinutes: " + city.getLatitudeMinutes());
break;
case 3:
city.setLongitudeDegrees(Integer.parseInt(parseDigits));
//System.out.println("LongitudeDegrees: " + city.getLongitudeDegrees());
break;
case 4:
city.setLongitudeMinutes(Integer.parseInt(parseDigits));
//System.out.println("LongitudeMinutes: " + city.getLongitudeMinutes());
counter2 = 1; //reset counter2
break;
}
}else{
if(counter3 == 1){
city.setLatitudeDirection(part2.charAt(0));
//System.out.println("LatitudeDirection: " + city.getLatitudeDirection());
counter3++; //increment counter3 to use longitude next
}else{
city.setLongitudeDirection(part2.charAt(0));
//System.out.println("LongitudeDirection: " + city.getLongitudeDirection());
counter3 = 1; //reset counter 3
//System.out.println("Number of cities: " + cities.size());
cities.add(city);
}
}
}
}else{
if(counter1 == 1){
city.setProvince(part);
//System.out.println("\nProvince: " + city.getProvince());
counter1++;
}else if(counter1 == 2){
city.setCity(part);
//System.out.println("City: " + city.getCity());
counter1 = 1; //reset counter1
}
}
}
}
There's probably a better solution to my parseLine() method no doubt, but I would really like to condense that as outlined above.
Thanks !!

Let’s start with some general notes.
Your sequence .map(line -> line.split("\\b+")).flatMap(Arrays::stream) isn’t recommended. These two steps will first create an array before creating another stream wrapping that array. You can skip the array step by using splitAsStream though this requires you to deal with Pattern explicitly instead of hiding it within String.split:
.flatMap(Pattern.compile("\\b+")::splitAsStream)
but note that in this case, splitting into words doesn’t really pay off.
If you want to keep your original parseLine method, you can simply do
Files.lines(Paths.get(inputFile))
.forEach(this::parseLine);
and you’re done.
But seriously, that is not a real solution. To do pattern matching, you should use a library designated to pattern matching, e.g. the regex package. You are using it already, when you do splitting via split("\\b+") but that’s far behind from what it can do for you.
Lets define the pattern:
(…) forms a group that allows capturing the matching part so we can extract it for our result
[^:]* specifies a token consisting of arbitrary characters except the colon ([^:]) of arbitrary length (*)
\d+ defines a number (d = numeric digit, + = one or more)
[NS] and [WE] match a single character being either N or S, or either W or E, respectively
so the entire pattern you are looking for is
([^:]*):([^:]*):(\d+)° (\d+)' ([NS]):(\d+)° (\d+)' ([WE])
and the entire parse routine will be:
static Pattern CITY_PATTERN=Pattern.compile(
"([^:]*):([^:]*):(\\d+)° (\\d+)' ([NS]):(\\d+)° (\\d+)' ([WE])");
static City parseCity(String line) {
Matcher matcher = CITY_PATTERN.matcher(line);
if(!matcher.matches())
throw new IllegalArgumentException(line+" doesn't match "+CITY_PATTERN);
City city=new City();
city.setProvince(matcher.group(1));
city.setCity(matcher.group(2));
city.setLatitudeDegrees(Integer.parseInt(matcher.group(3)));
city.setLatitudeMinutes(Integer.parseInt(matcher.group(4)));
city.setLatitudeDirection(line.charAt(matcher.start(5)));
city.setLongitudeDegrees(Integer.parseInt(matcher.group(6)));
city.setLongitudeMinutes(Integer.parseInt(matcher.group(7)));
city.setLongitudeDirection(line.charAt(matcher.start(8)));
return city;
}
and I really hope you are calling your hard-to-read method never “condense” anymore…
Using the routine above, a clean Stream-based processing solution would look like
List<City> cities = Files.lines(Paths.get(inputFile))
.map(ContainingClass::parseCity).collect(Collectors.toList());
to collect a file into a new list of cities.

Java - Best Way to Convert Characters in a String (Arrays, if-then, hash-map)?

Title: (Java Beginner) - In Java, what would be the recommended way to replace a series of characters within strings?
Issue/Example: I would like certain characters within a group of strings to be replaced by other characters. (e.g. all 'a's will be replaced by 'aa' and all 'あ' characters will be replaced by 'a')
Data example:
Tammy,Tあmmy,John Jones KO'd Machida,The Drall,あい
Changed to:
Taammy,Tammy,John Johes KO'd Mあchida,The Draall,aい
I'm doing this using an if-then statement, but this isn't scalable as there are potentially hundreds of if-thens to perform. I'm currently just worried about the logic and haven't thought of how to handle the data source files yet, which will be either csv file or a flatfile format.
Question: Should I be looking at arrays? hashmaps? collections?
The current code is similar to the below, but I understand that it is inefficient. I would like to know how I might be able to do this more efficiently.
public static String formatString(String s)
{
//Declare Variables
String strInput = s;
String strChanged = "";
//Iterate through length of string
for (int i = 0; i < strInput.length(); i++)
{
if (strInput.charAt(i)=='あ')
{
strChanged = strChanged + "a";
}
else if (strInput.charAt(i)=='a')
{
strChanged = strChanged + "aa";
}
else if (strInput.charAt(i)=='c')
{
strChanged = strChanged + "k";
}
else
{
strChanged = strChanged + strInput.charAt(i);
}
}
System.out.println(strChanged);
}
Caveats:
-up to 200 different characters which need to be changed
-looping through potentially thousands of rows of data

Here's a solution using a HashMap to reduce the number of if statements
String input = "Tammy,Tあmmy,John Jones KO'd Machida,The Drall,あい";
StringBuilder builder = new StringBuilder();
Map<Character, CharSequence> map = new HashMap<>();
map.put('あ', "a");
map.put('a', "aa");
map.put('c', "k");
for (char c : input.toCharArray()) {
if (map.containsKey(c)) {
builder.append(map.get(c));
} else {
builder.append(c);
}
}
System.out.println(builder.toString());

Have you checked the Java API? String.replace() and friends ought to do what you want in one or two passes.

String.replace is very inefficient. To improve performance, see Peter Lawrey's answer to Is String.replace implementation really efficient?

There are several solutions:
A different way of using your approach might be to use a switch statement.
for (int i = 0; i < strInput.length(); i++)
{
cur = strInput.charAt(i)
switch (cur){
case 'あ': ...
break;
case 'a': ...
break;
case 'c': ...
break;
...
}
System.out.println(strChanged);
Switch statements are often a good alternative to long if/else trains. Read more here: http://docs.oracle.com/javase/tutorial/java/nutsandbolts/switch.html
A more readable solution would be to use the String.replaceAll() method for each case. The drawback to this is the runtime would be slower. A thousand lines is small enough to not make a significant difference, but it is important to keep these limitations in mind. In addition, if you replace 'あ' with 'a' then 'a' with 'aa', you might accidentally get the wrong results. Write tests!

Efficent way to replace underscore with char or string

I have researched this topic for a while, but without much success. I did find the StringBuilder and it works wonders, but that's as far as I got. Here is how I got my hangman program to work like it should:
if(strGuess.equalsIgnoreCase("t")){
mainword.replace(0,1,"T");
gletters.append('T');
}
else if(strGuess.equalsIgnoreCase("e")){
mainword.replace(1,2,"E");
gletters.append('E');
}
else if(strGuess.equalsIgnoreCase("c")){
mainword.replace(2,3,"C");
gletters.append('C');
}
else if(strGuess.equalsIgnoreCase("h")){
mainword.replace(3,4,"H");
gletters.append('H');
}
else if(strGuess.equalsIgnoreCase("n")){
mainword.replace(4,5,"N");
gletters.append('N');
}
else if(strGuess.equalsIgnoreCase("o")){
mainword.replace(5,6,"O");
mainword.replace(7,8,"O");
gletters.append('O');
}
else if(strGuess.equalsIgnoreCase("l")){
mainword.replace(6,7,"L");
gletters.append('L');
}
else if(strGuess.equalsIgnoreCase("g")){
mainword.replace(8,9,"G");
gletters.append('G');
}
else if(strGuess.equalsIgnoreCase("y")){
mainword.replace(9,10,"Y");
gletters.append('Y');
}
else{
JOptionPane.showMessageDialog(null, "Sorry, that wasn't in the word!");
errors++;
gletters.append(strGuess.toUpperCase());
}
SetMain = mainword.toString();
GuessedLetters = gletters.toString();
WordLabel.setText(SetMain);
GuessedLabel.setText(GuessedLetters);
GuessText.setText(null);
GuessText.requestFocusInWindow();
However, I can't do this for EVERY letter for EVERY word, so is there a simple and efficient way to do this? What I want is to have a loop of some sort so that I would only have to use it once for whatever word. So the word could be technology (like it is above) or apple or pickles or christmas or hello or whatever.
I have tried using a for loop, and I feel the answer lies in that. And if someone could explain the charAt() method and how/where to use it, that'd be good. The closest I got to being more efficient is:
for(i = 0; i < GuessWord.length(); i++) {
if (GuessWord.charAt(i) == guess2) {
mainword.replace(i,i,strGuess.toUpperCase());
}
So if you could use that as a basis and go off of it, like fix it? Or tell me something I haven't thought of.

It's a good question. There's clearly repeated code, so how do you replace all that with something reusable. Actually, you can dispense with all of your code.
That whole code block can be replaced by just one line (that works for every word)!
String word = "TECHNOLOGY"; // This is the word the user must guess
mainword = word.replaceAll("[^" + gletters + "]", "_");
This uses replaceAll() with a regex that means "any letter not already guessed" and replaces it with a underscore character "_". Note that Strings are immutable, and the replaceAll() method returns the modified String - it doesn't modify the String called on.
Here's some test code to show it in action:
public static void main(String[] args) {
String word = "TECHNOLOGY"; // what the user must guess
StringBuilder gletters = new StringBuilder("GOTCHA"); // letters guessed
String mainword = word.replaceAll("[^" + gletters + "]", "_");
System.out.println(mainword);
}
Output:
T_CH_O_OG_

Retrieve method source code from class source code file

I have here a String that contains the source code of a class. Now i have another String that contains the full name of a method in this class. The method name is e.g.
public void (java.lang.String test)
Now I want to retieve the source code of this method from the string with the class' source code. How can I do that? With String#indexOf(methodName) i can find the start of the method source code, but how do i find the end?
====EDIT====
I used the count curly-braces approach:
internal void retrieveSourceCode()
{
int startPosition = parentClass.getSourceCode().IndexOf(this.getName());
if (startPosition != -1)
{
String subCode = parentClass.getSourceCode().Substring(startPosition, parentClass.getSourceCode().Length - startPosition);
for (int i = 0; i < subCode.Length; i++)
{
String c = subCode.Substring(0, i);
int open = c.Split('{').Count() - 1;
int close = c.Split('}').Count() - 1;
if (open == close && open != 0)
{
sourceCode = c;
break;
}
}
}
Console.WriteLine("SourceCode for " + this.getName() + "\n" + sourceCode);
}
This works more or less fine, However, if a method is defined without body, it fails. Any hints how to solve that?

Counting braces and stopping when the count decreases to 0 is indeed the way to go. Of course, you need to take into account braces that appear as literals and should thus not be counted, e.g. braces in comments and strings.
Overall this is kind of a thankless endeavour, comparable in complexity to say, building a command line parser if you want to get it working really reliably. If you know you can get away with it you could cut some corners and just count all the braces, although I do not recommend it.
Update:
Here's some sample code to do the brace counting. As I said, this is a thankless job and there are tons of details you have to get right (in essence, you 're writing a mini-lexer). It's in C#, as this is the closest to Java I can write code in with confidence.
The code below is not complete and probably not 100% correct (for example: verbatim strings in C# do not allow spaces between the # and the opening quote, but did I know that for a fact or just forgot about it?)
// sourceCode is a string containing all the source file's text
var sourceCode = "...";
// startIndex is the index of the char AFTER the opening brace
// for the method we are interested in
var methodStartIndex = 42;
var openBraces = 1;
var insideLiteralString = false;
var insideVerbatimString = false;
var insideBlockComment = false;
var lastChar = ' '; // White space is ignored by the C# parser,
// so a space is a good "neutral" character
for (var i = methodStartIndex; openBraces > 0; ++i) {
var ch = sourceCode[i];
switch (ch) {
case '{':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
++openBraces;
}
break;
case '}':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
--openBraces;
}
break;
case '"':
if (insideBlockComment) {
continue;
}
if (insideLiteralString) {
// "Step out" of the string if this is the closing quote
insideLiteralString = lastChar != '\';
}
else if (insideVerbatimString) {
// If this quote is part of a two-quote pair, do NOT step out
// (it means the string contains a literal quote)
// This can throw, but only for source files with syntax errors
// I 'm ignoring this possibility here...
var nextCh = sourceCode[i + 1];
if (nextCh == '"') {
++i; // skip that next quote
}
else {
insideVerbatimString = false;
}
}
else {
if (lastChar == '#') {
insideVerbatimString = true;
}
else {
insideLiteralString = true;
}
}
break;
case '/':
if (insideLiteralString || insideVerbatimString) {
continue;
}
// TODO: parse this
// It can start a line comment, if followed by /
// It can start a block comment, if followed by *
// It can end a block comment, if preceded by *
// Line comments are intended to be handled by just incrementing i
// until you see a CR and/or LF, hence no insideLineComment flag.
break;
}
lastChar = ch;
}
// From the values of methodStartIndex and i we can now do sourceCode.Substring and get the method source

Have a look at:- Parser for C#
It recommends using NRefactory to parse and tokenise source code, you should be able to use that to navigate your class source and pick out methods.

You will have to, probably, know the sequence of the methods listed in the code file. So that, you can look for the method closing scope } which may be right above start of next method.
So you code might look like:
nStartOfMethod = String.indexOf(methodName)
nStartOfNextMethod = String.indexOf(NextMethodName)
Look for .LastIndexOf(yourMethodTerminator /*probably a}*/,...) between a string of nStartOfMethod and nStartOfNextMethod
In this case, if you dont know the sequence of methods, you might end up skipping a method in between, to find an ending brace.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.