Now if I want to convert HTML escape characters to readable String I have this method:
public static String unescapeHTML(String text) {
return text
.replace("™", "™")
.replace("€", "€")
.replace(" ", " ")
.replace(" ", " ")
.replace("!", "!")
.replace(""", "\"")
.replace(""", "\"")
.replace("#", "#")
.replace("$", "$")
.replace("%", "%")
.replace("&", "&")
//and the rest of HTML escape characters
.replace("&", "&");
}
My goal is not to use any external library like Apache (class StringUtils), etc.
Because the list is quite long - more than 300 chars - it would be nice to know what would be the fastest way to replace them?
Using Patterns and Matcher. if you want avoid the calculation/adjustment on buffer length, you can also keep the difference between two strings in some datastructure and use it instead of calculating buffer length at run time. like { -4,-4,0,-4} . Since buffer length is just returning the instance variable, i did used buffer length here.
private final static Pattern MY_PATTERN = Pattern.compile("\\&(.*?)\\;");
private final static HashMap<String, String> patterns = new HashMap<>();
static{
patterns.put("&", "&");
patterns.put("!", "!");
patterns.put(" ", "thick");
patterns.put("$", "$");
}
public static StringBuffer escapeString(String text){
StringBuffer buffer = new StringBuffer(text);
Matcher m = MY_PATTERN.matcher(text);
int modifiedLength = 0;
while (m.find()) {
int tmpLength = buffer.length();
// To consider the modified buffer length due to replace. hold difference between old and previous
buffer.replace(m.start()-modifiedLength, m.end()-modifiedLength, patterns.get(m.group()));
modifiedLength = modifiedLength + tmpLength-buffer.length();
}
return buffer;
}
I have decided to do it this way:
private static final Map<Integer, Character> iMap = new HashMap<>();
static {//Code, like or
iMap.put(32, ' ');
iMap.put(33, '!');
iMap.put(34, '\"');
iMap.put(35, '#');
iMap.put(36, '$');
iMap.put(37, '%');
iMap.put(38, '&');
//...
}
private static final Map<String, Character> sMap = new HashMap<>();
static {//Entity Name
sMap.put("←", '←');
sMap.put("↑", '↑');
sMap.put("→", '→');
sMap.put("↓", '↓');
sMap.put("↔", '↔');
sMap.put("♠", '♠');
sMap.put("♣", '♣');
sMap.put("♥", '♥');
//...
}
public static String unescapeHTML(String str) {
StringBuilder sb = new StringBuilder(),
tmp = new StringBuilder();
StringReader sr = new StringReader(str);
boolean esc = false;
try {
int i;
while ((i = sr.read()) != -1) {
char c = (char) i;
if (c == '&') {
tmp.append(c);
esc = true;
} else if (esc) {
tmp.append(c);
if (c == ';') {
esc = false;
if (tmp.charAt(1) == '#') {
try {
sb.append(iMap.get(Integer.parseInt(tmp.substring(2, tmp.capacity() - 1))));
} catch (NumberFormatException ex) {
sb.append(tmp.toString());//Ignore and leave unchanged
}
} else {
sb.append(sMap.get(tmp.toString()));
}
tmp.setLength(0);
}
} else {
sb.append(c);
}
}
sr.close();
} catch (IOException ex) {
Logger.getLogger(UnescapeHTML.class.getName()).log(Level.SEVERE, null, ex);
}
return sb.toString();
}
Works perfectly and the code is simple. Still testing. It would be nice to hear your comments.
Related
I want to write java code to convert left side strings to right ones.
1234_hello -- 1234_Hello
hello Data -- Hello Data
hELLO data -- Hello data
1234hEllo -- 1234Hello
heLLO1234hEllo -- Hello1234hEllo
$hello -- $Hello
Could you please help with the solution?
Thank you!
Here is a solution:
public static void main(String[] args) {
try {
System.out.println(convertString("1234_hello"));
System.out.println(convertString("hello Data"));
System.out.println(convertString("hELLO data"));
System.out.println(convertString("1234hEllo"));
System.out.println(convertString("heLLO1234hEllo"));
System.out.println(convertString("$hello"));
System.out.println(convertString("$1234hEllo_TTHjjZ"));
}
catch (Exception e) {
e.printStackTrace();
}
}
private static String convertString(String string) {
String result = string;
final String regex1 = "^([^a-zA-Z]+)([a-zA-Z])([a-zA-Z]*)([^a-zA-Z].*)$";
final String regex2 = "^([a-zA-Z])([a-zA-Z]*)([^a-zA-Z].*)$";
final String regex3 = "^([^a-zA-Z]+)([a-zA-Z])([a-zA-Z]*)$";
final Pattern pattern1 = Pattern.compile(regex1, Pattern.MULTILINE);
final Pattern pattern2 = Pattern.compile(regex2, Pattern.MULTILINE);
final Pattern pattern3 = Pattern.compile(regex3, Pattern.MULTILINE);
Matcher matcher1 = pattern1.matcher(string);
Matcher matcher2 = pattern2.matcher(string);
Matcher matcher3 = pattern3.matcher(string);
if (matcher1.find()) {
result = matcher1.group(1) + matcher1.group(2).toUpperCase() + matcher1.group(3).toLowerCase() + matcher1.group(4);
}
else if (matcher2.find()) {
result = matcher2.group(1).toUpperCase() + matcher2.group(2).toLowerCase() + matcher2.group(3);
}
else if (matcher3.find()) {
result = matcher3.group(1) + matcher3.group(2).toUpperCase() + matcher3.group(3).toLowerCase();
}
return result;
}
The result is as expected:
1234_Hello
Hello Data
Hello data
1234Hello
Hello1234hEllo
$Hello
$1234Hello_TTHjjZ
I have a solution for you but it is not efficient:
public static String toCamelCase(String input) {
StringBuilder output = new StringBuilder();
for(int i = 0; i < input.length(); i++) {
if(i == 0) {
output.append(Character.toUpperCase(input.charAt(i)));
continue;
}
if(Character.isLetter(input.charAt(i))) {
if(Character.isLetter(input.charAt(i-1))) {
output.append(Character.toLowerCase(input.charAt(i)));
} else {
output.append(Character.toUpperCase(input.charAt(i)));
}
} else {
output.append(input.charAt(i));
}
}
return output.toString();
}
I have written the following function which gets rid of characters in a string that can't be represented in iso88591:
public static String convert(String str) {
if (str.length()==0) return str;
str = str.replace("–","-");
str = str.replace("“","\"");
str = str.replace("”","\"");
return new String(str.getBytes(),iso88591charset);
}
My problem is this doesn't have the behavior I require.
When it comes across a character that has no representation it is converted to multiple bytes. I want that character to be simply omitted from the result.
I would also like to somehow not have to have all those replace commands.
I have been researching charsetEnocder. It has methods like:
CharsetEncoder encoder = iso88591charset.newEncoder();
encoder.onMalformedInput(CodingErrorAction.IGNORE);
encoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
which seem to be what I want, but I have failed to even write a function that mimics what I already have using charset encoder yet alone get to set those options.
Also I am restricted to Java 6 :(
Update:
I came up with a nasty solution for this, but there must be a better way to do it:
public static String convert(String str) {
if (str.length()==0) return str;
str = str.replace("–","-");
str = str.replace("“","\"");
str = str.replace("”","\"");
String str2 = "";
for (int c=0;c<str.length();c++) {
String cur = (new Character(str.charAt(c))).toString();
if (cur.equals(new String(cur.getBytes(),iso88591charset))) str2 += cur;
}
return new String(str2.getBytes(),iso88591charset);
}
One possibile way could be
// U+2126 - omega sign
// U+2013 - en dash
// U+201c - left double quotation mark
// U+201d - right double quotation mark
String str = "\u2126\u2013\u201c\u201d";
System.out.println("original = " + str);
str = str.replace("–", "-");
str = str.replace("“", "\"");
str = str.replace("”", "\"");
System.out.println("replaced = " + str);
StringBuilder sb = new StringBuilder();
for (char c : str.toCharArray()) {
if (c <= '\u00ff') {
sb.append(c);
}
}
System.out.println("stripped = " + sb);
output
original = Ω–“”
replaced = Ω-""
stripped = -""
I am working on a very simple application for a website, just a basic desktop application.
So I've figured out how to grab all of the JSON Data I need, and if possible, I am trying to avoid the use of external libraries to parse the JSON.
Here is what I am doing right now:
package me.thegreengamerhd.TTVPortable;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import me.thegreengamerhd.TTVPortable.Utils.Messenger;
public class Channel
{
URL url;
String data;
String[] dataArray;
String name;
boolean online;
int viewers;
int followers;
public Channel(String name)
{
this.name = name;
}
public void update() throws IOException
{
// grab all of the JSON data from selected channel, if channel exists
try
{
url = new URL("https://api.twitch.tv/kraken/channels/" + name);
URLConnection connection = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
data = new String(in.readLine());
in.close();
// clean up data a little, into an array
dataArray = data.split(",");
}
// channel does not exist, throw exception and close client
catch (Exception e)
{
Messenger.sendErrorMessage("The channel you have specified is invalid or corrupted.", true);
e.printStackTrace();
return;
}
StringBuilder sb = new StringBuilder();
for (int i = 0; i < dataArray.length; i++)
{
sb.append(dataArray[i] + "\n");
}
System.out.println(sb.toString());
}
}
So here is what is printed when I enter an example channel (which grabs data correctly)
{"updated_at":"2013-05-24T11:00:26Z"
"created_at":"2011-06-28T07:50:25Z"
"status":"HD [XBOX] Call of Duty Black Ops 2 OPEN LOBBY"
"url":"http://www.twitch.tv/zetaspartan21"
"_id":23170407
"game":"Call of Duty: Black Ops II"
"logo":"http://static-cdn.jtvnw.net/jtv_user_pictures/zetaspartan21-profile_image-121d2cb317e8a91c-300x300.jpeg"
"banner":"http://static-cdn.jtvnw.net/jtv_user_pictures/zetaspartan21-channel_header_image-7c894f59f77ae0c1-640x125.png"
"_links":{"subscriptions":"https://api.twitch.tv/kraken/channels/zetaspartan21/subscriptions"
"editors":"https://api.twitch.tv/kraken/channels/zetaspartan21/editors"
"commercial":"https://api.twitch.tv/kraken/channels/zetaspartan21/commercial"
"teams":"https://api.twitch.tv/kraken/channels/zetaspartan21/teams"
"features":"https://api.twitch.tv/kraken/channels/zetaspartan21/features"
"videos":"https://api.twitch.tv/kraken/channels/zetaspartan21/videos"
"self":"https://api.twitch.tv/kraken/channels/zetaspartan21"
"follows":"https://api.twitch.tv/kraken/channels/zetaspartan21/follows"
"chat":"https://api.twitch.tv/kraken/chat/zetaspartan21"
"stream_key":"https://api.twitch.tv/kraken/channels/zetaspartan21/stream_key"}
"name":"zetaspartan21"
"delay":0
"display_name":"ZetaSpartan21"
"video_banner":"http://static-cdn.jtvnw.net/jtv_user_pictures/zetaspartan21-channel_offline_image-b20322d22543539a-640x360.jpeg"
"background":"http://static-cdn.jtvnw.net/jtv_user_pictures/zetaspartan21-channel_background_image-587bde3d4f90b293.jpeg"
"mature":true}
Initializing User Interface - JOIN
All of this is correct. Now what I want to do, is to be able to grab, for example the 'mature' tag, and it's value. So when I grab it, it would be like as simple as:
// pseudo code
if(mature /*this is a boolean */ == true){ // do stuff}
So if you don't understand, I need to split away the quotes and semicolon between the values to retrieve a Key, Value.
It's doable with the following code :
public static Map<String, Object> parseJSON (String data) throws ParseException {
if (data==null)
return null;
final Map<String, Object> ret = new HashMap<String, Object>();
data = data.trim();
if (!data.startsWith("{") || !data.endsWith("}"))
throw new ParseException("Missing '{' or '}'.", 0);
data = data.substring(1, data.length()-1);
final String [] lines = data.split("[\r\n]");
for (int i=0; i<lines.length; i++) {
String line = lines[i];
if (line.isEmpty())
continue;
line = line.trim();
if (line.indexOf(":")<0)
throw new ParseException("Missing ':'.", 0);
String key = line.substring(0, line.indexOf(":"));
String value = line.substring(line.indexOf(":")+1);
if (key.startsWith("\"") && key.endsWith("\"") && key.length()>2)
key = key.substring(1, key.length()-1);
if (value.startsWith("{"))
while (i+1<line.length() && !value.endsWith("}"))
value = value + "\n" + lines[++i].trim();
if (value.startsWith("\"") && value.endsWith("\"") && value.length()>2)
value = value.substring(1, value.length()-1);
Object mapValue = value;
if (value.startsWith("{") && value.endsWith("}"))
mapValue = parseJSON(value);
else if (value.equalsIgnoreCase("true") || value.equalsIgnoreCase("false"))
mapValue = new Boolean (value);
else {
try {
mapValue = Integer.parseInt(value);
} catch (NumberFormatException nfe) {
try {
mapValue = Long.parseLong(value);
} catch (NumberFormatException nfe2) {}
}
}
ret.put(key, mapValue);
}
return ret;
}
You can call it like that :
try {
Map<String, Object> ret = parseJSON(sb.toString());
if(((Boolean)ret.get("mature")) == true){
System.out.println("mature is true !");
}
} catch (ParseException e) {
}
But, really, you shouldn't do this, and use an already existing JSON parser, because this code will break on any complex or invalid JSON data (like a ":" in the key), and if you want to build a true JSON parser by hand, it will take you a lot more code and debugging !
This is a parser of an easy json string:
public static HashMap<String, String> parseEasyJson(String json) {
final String regex = "([^{}: ]*?):(\\{.*?\\}|\".*?\"|[^:{}\" ]*)";
json = json.replaceAll("\n", "");
Matcher m = Pattern.compile(regex).matcher(json);
HashMap<String, String> map = new HashMap<>();
while (m.find())
map.put(m.group(1), m.group(2));
return map;
}
Live Demo
Ok I have a method that is replacing text when I use string.replace() it works but when I switch to relpaceFirst() as shown below it no longer works, what am I doing wrong or missing here?
private void acceptAccButtonActionPerformed(java.awt.event.ActionEvent evt) {
int selectedAcTableItem = validAcTable.getSelectedRow();
int selectedSugTableItem = suggestedAcTable.getSelectedRow();
if (selectedAcTableItem > 0) {
String acNameDefthmlText = htmlText;
String parensName = "";
String acName = validAcTable.getValueAt(selectedAcTableItem, 0).toString();
String acDef = validAcTable.getValueAt(selectedAcTableItem, 1).toString();
String acSent = validAcTable.getValueAt(selectedAcTableItem, 2).toString();
StringBuilder acBuilder = new StringBuilder(acDef);
acBuilder.append(" (").append(acName).append(")");
if (!acDef.equals("")) {
parensName = " (" + acName + ")";
if (htmlText.contains(acName) && !htmlText.contains(acBuilder)){
String acReplace = acBuilder.toString();
String acOrigDefName = acDefRow + parensName;
if (htmlText.contains(acOrigDefName) && parensName.contains(acOrigName)){
acNameDefthmlText = htmlText.replaceFirst(acOrigDefName, acReplace);
} else if (htmlText.contains(acName)) {
acNameDefthmlText = htmlText.replaceFirst(acName, acReplace);
}
htmlText = acNameDefthmlText;
}
validAcTable.setValueAt(true, selectedAcTableItem, 2);
Acronym acronym = createNewAcronym(acName, acSent, acDef, true);
try {
AcronymDefinitionController.sharedInstance().writeAcronymToExcelSheet(acName, acDef);
} catch (IOException ex) {
Exceptions.printStackTrace(ex);
} catch (InvalidFormatException ex) {
Exceptions.printStackTrace(ex);
}
if (validAcTable.getRowCount() - 1 >= validAcTable.getSelectedRow() + 1) {
validAcTable.changeSelection(selectedAcTableItem + 1, 0, true, true);
}
validAcTable.repaint();
}
}
If you notice the signature of two methods in question:
replace(char oldChar,char newChar);
replace(CharSequence target, CharSequence replacement);
replaceFirst(String regex, String replacement);
As you can see, in replaceFirst you matching argument is treated as regex(regular expression), which will cause the difference if any special chars are involved in the argument.
For example: consider below:
System.out.println("abcdab".replace("ab", "ef")); //<- replaces all
System.out.println("abcdab".replaceFirst("ab", "ef"));//<-replaces first
System.out.println("\\abcdab".replace("\\ab", "ef")); //<-replaces first
System.out.println("\\abcdab".replaceFirst("\\ab", "ef"));
//^ doesn't replace as `\` is an special char
I'm trying to create a regex pattern to match the lines in the following format:
field[bii] = float4:.4f_degree // Galactic Latitude
field[class] = int2 (index) // Browse Object Classification
field[dec] = float8:.4f_degree (key) // Declination
field[name] = char20 (index) // Object Designation
field[dircos1] = float8 // 1st Directional Cosine
I came up with this pattern, which seemed to work, then suddenly seemed NOT to work:
field\[(.*)\] = (float|int|char)([0-9]|[1-9][0-9]).*(:(\.([0-9])))
Here is the code I'm trying to use (edit: provided full method instead of excerpt):
private static Map<String, String> createColumnMap(String filename) {
// create a linked hashmap mapping field names to their column types. Use LHM because I'm picky and
// would prefer to preserve the order
Map<String, String> columnMap = new LinkedHashMap<String, String>();
// define the regex patterns
Pattern columnNamePattern = Pattern.compile(columnNameRegexPattern);
try {
Scanner scanner = new Scanner(new FileInputStream(filename));
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
if (line.indexOf("field[") != -1) {
// get the field name
Matcher fieldNameMatcher = columnNamePattern.matcher(line);
String fieldName = null;
if (fieldNameMatcher.find()) {
fieldName = fieldNameMatcher.group(1);
}
String columnName = null;
String columnType = null;
String columnPrecision = null;
String columnScale = null;
//Pattern columnTypePattern = Pattern.compile(".*(float|int|char)([0-9]|[1-9][0-9])");
Pattern columnTypePattern = Pattern.compile("field\\[(.*)\\] = (float|int|char).*([0-9]|[1-9][0-9]).*(:(\\.([0-9])))");
Matcher columnTypeMatcher = columnTypePattern.matcher(line);
System.out.println(columnTypeMatcher.lookingAt());
if (columnTypeMatcher.lookingAt()) {
System.out.println(fieldName + ": " + columnTypeMatcher.groupCount());
int count = columnTypeMatcher.groupCount();
if (count > 1) {
columnName = columnTypeMatcher.group(1);
columnType = columnTypeMatcher.group(2);
}
if (count > 2) {
columnScale = columnTypeMatcher.group(3);
}
if (count >= 6) {
columnPrecision = columnTypeMatcher.group(6);
}
}
int precision = Integer.parseInt(columnPrecision);
int scale = Integer.parseInt(columnScale);
if (columnType.equals("int")) {
if (precision <= 4) {
columnMap.put(fieldName, "INTEGER");
} else {
columnMap.put(fieldName, "BIGINT");
}
} else if (columnType.equals("float")) {
if (columnPrecision==null) {
columnMap.put(fieldName,"DECIMAL(8,4)");
} else {
columnMap.put(fieldName,"DECIMAL(" + columnPrecision + "," + columnScale + ")");
}
} else {
columnMap.put(fieldName,"VARCHAR("+columnPrecision+")");
}
}
if (line.indexOf("<DATA>") != -1) {
scanner.close();
break;
}
}
scanner.close();
} catch (FileNotFoundException e) {
}
return columnMap;
}
When I get the groupCount from the Matcher object, it says there are 6 groups. However, they aren't matching the text, so I could definitely use some help... can anyone assist?
It's not entirely clear to me what you're after but I came up with the following pattern and it accepts all of your input examples:
field\\[(.*)\\] = (float|int|char)([1-9][0-9]?)?(:\\.([0-9]))?
using this code:
String columnName = null;
String columnType = null;
String columnPrecision = null;
String columnScale = null;
// Pattern columnTypePattern =
// Pattern.compile(".*(float|int|char)([0-9]|[1-9][0-9])");
// field\[(.*)\] = (float|int|char)([0-9]|[1-9][0-9]).*(:(\.([0-9])))
Pattern columnTypePattern = Pattern
.compile("field\\[(.*)\\] = (float|int|char)([1-9][0-9]?)?(:\\.([0-9]))?");
Matcher columnTypeMatcher = columnTypePattern.matcher(line);
boolean match = columnTypeMatcher.lookingAt();
System.out.println("Match: " + match);
if (match) {
int count = columnTypeMatcher.groupCount();
if (count > 1) {
columnName = columnTypeMatcher.group(1);
columnType = columnTypeMatcher.group(2);
}
if (count > 2) {
columnScale = columnTypeMatcher.group(3);
}
if (count > 4) {
columnPrecision = columnTypeMatcher.group(5);
}
System.out.println("Name=" + columnName + "; Type=" + columnType + "; Scale=" + columnScale + "; Precision=" + columnPrecision);
}
I think the problem with your regex was it needed to make the scale and precision optional.
field\[(.*)\] = (float|int|char)([0-9]|[1-9][0-9]).*(:(\.([0-9])))
The .* is overly broad, and there is a lot of redundancy in ([0-9]|[1-9][0-9]), and I think the parenthetical group that starts with : and preceding .* should be optional.
After removing all the ambiguity, I get
field\[([^\]]*)\] = (float|int|char)(0|[1-9][0-9]+)(?:[^:]*(:(\.([0-9]+))))?