If I am being passed a string that contains comma delimited key-value pairs like this
seller=1000,country="canada",address="123 1st st", etc.
There seems like there must be a better way than parsing then iterating through.
What is the best way to retreive a value from this string based on the key name in Java?
Since release 10 Google Guava provides a class MapSplitter which does exactly that kind of things:
Map<String, String> params = Splitter
.on(",")
.withKeyValueSeparator("=")
.split("k1=v1,k2=v2");
You can create your own CSV parser, it's not very complicated but there are a few corner cases to be carfull with assuming of course you are using standard CSV format.
But why reinventing the wheel...
You can try looking up a CSV parser like
OpenCSV
SuperCSV
Apache Commons
There are others, look around I'm sure you will find one that suits your needs.
Usually you will want to parse the string into a map because you will be pulling various values perhaps multiple times, so it often makes sense to pay the parsing cost up-front.
If not, then here is how I would solve the problem (assuming you want to differentiate between int values and String values).:
public Object pullValue(String pairs, String key) {
boolean returnString = false;
int keyStart = pairs.indexOf(key + "=");
if (keyStart < 0) {
logger.error("Key " + key + " not found in key-value pairs string");
return null;
}
int valueStart = keyStart + key.length() + 1;
if (pairs.charAt(valueStart) == '"') {
returnString = true;
valueStart++; // Skip past the quote mark
}
int valueEnd;
if (returnString) {
valueEnd = pairs.indexOf('"', valueStart);
if (valueEnd < 0) {
logger.error("Unmatched double quote mark extracting value for key " + key)
}
return pairs.substring(valueStart, valueEnd);
} else {
valueEnd = pairs.indexOf(',', valueStart);
if (valueEnd < 0) { // If this is the last key value pair in string
valueEnd = pairs.length();
}
return Integer.decode(pairs.substring(valueStart, valueEnd));
}
}
Note that this solution assumes no spaces between the key, the equals sign, and the value. If these are possible you will have to create some code to travel the string between them.
Another solution is to use a regular expression parser. You could do something like (this is untested):
Pattern lookingForString = Pattern.compile(key + "[ \t]*=[ \t]*[\"]([^\"]+)[\"]");
Pattern lookingForInt = Pattern.compile(key + "[ \t]*=[ \t]*([^,]+)");
Matcher stringFinder = lookingForString.matcher(pairs);
Matcher intFinder = lookingForInt.matcher(pairs);
if (stringFinder.find()) {
return stringFinder.group(1);
} else if (intFinder.find()) {
return Integer.decode(intFinder.group(1));
} else {
logger.error("Could not extract value for key " + key);
return null;
}
HTH
To separate the string by commas, the other posters are correct. It is best to use a CSV parser (your own or OTS). Considering things like commas inside quotes etc can lead to a lot of un-considered problems.
Once you have each separate token in the form:
key = "value"
I think it is easy enough to look for the first index of '='. Then the part before that will be the key, and the part after that will be the value. Then you can store them in a Map<String, String>.
This is assuming that your keys will be simple enough, and not contain = in them etc. Sometimes it's enough to take the simple route when you can restrict the problem scope.
If you just want one value out of such a string, you can use String's indexOf() and substring() methods:
String getValue(String str, String key)
{
int keyIndex = str.indexOf(key + "=");
if(keyIndex == -1) return null;
int startIndex = str.indexOf("\"", keyIndex);
int endIndex = str.indexOf("\"", startIndex);
String value = str.substring(startIndex + 1, endIndex);
return value;
}
First thing you should use a CSV parsing library to parse the comma separated values. Correctly parsing CSV data isn't as trivial as it first seems. There are lots of good arguments to not reinvent that wheel.
This will also future proof your code and be code you don't have to test or maintain.
I know the temptation to do something like data.split(','); is strong, but it is fragile and brittle solution. For just one example, what if any of the values contain the ','.
Second thing you should do is then parse the pairs. Again the temptation to use String.split("="); will be strong, but it can be brittle and fragile if the right hand side of the = has an = in it.
I am not a blind proponent of regular expressions, but used with restraint they can be just the right tool for the job. Here is the regular expression to parse the name value pairs.
The regular expression ^(.*)\s?=\s?("?([^"]*)"?|"(.*)")$, click the regular expression to test it online interactively. This works even for multiple double quotes in the right hand side of the name value pair.
This will match only what is on the left side of the first = and everything else on the right hand side, and strip the optional " off the string values, while still matching the non-quoted number values.
Given a List<String> list of the encoded name value pairs.
final Pattern p = Pattern.compile("^(.*)\s?=\s?("?([^"]*)"?|"(.*)")$");
final Map<String, String> map = new HashMap<String, String>(list.size());
for (final String nvp : list)
{
final Matcher m = p.matcher(nvp);
m.matches();
final String name = m.group(1);
final String value = m.group(2);
System.out.format("name = %s | value = %s\n", name, value);
}
Use String.split(yourdata, ',') and you will get a String[]. Then, perform String.split(String[i],"="), on each entry to separate property and values.
Ideally, you should move this data in a Properties object instance. You can then save/load it from XML easily. It has useful methods.
REM: I am assuming that you are savvy enough to understand that this solution won't work if values contain the separator (i.e., the comma) in them...
Related
I have an ArrayMap, of which the keys are something like tag - randomWord. I want to check if the tag part of the key matches a certain variable.
I have tried messing around with Patterns, but to no success. The only way I can get this working at this moment, is iterating through all the keys in a for loop, then splitting the key on ' - ', and getting the first value from that, to compare to my variable.
for (String s : testArray) {
if ((s.split("(\\s)(-)(\\s)(.*)")[0]).equals(variableA)) {
// Do stuff
}
}
This seems very devious to me, especially since I only need to know if the keySet contains the variable, that's all I'm interested in. I was thinking about using the contains() method, and put in (variableA + "(\\s)(-)(\\s)(.*)"), but that doesn't seem to work.
Is there a way to use the .contains() method for this case, or do I have to loop the keys manually?
You should split these tasks into two steps - first extract the tag, then compare it. Your code should look something like this:
for (String s : testArray) {
if (arrayMap. keySet().contains(extractTag(s)) {
// Do stuff
}
}
Notice that we've separated our concerns into two steps, making it easier to verify each step behaves correctly individually. So now the question is "How do we implement extractTag()?"
The ( ) symbols in a regular expression create a group match, which you can retrieve via Matcher.group() - if you only care about tag you could use a Pattern like so:
"(\\S+)\\s-\\s.*"
In which case your extractTag() method would look like:
private static final Pattern TAG_PATTERN = Pattern.compile("(\\S+)\\s-\\s.*");
private static String extractTag(String s) {
Matcher m = TAG_PATTERN.matcher(s);
if (m.matches()) {
return m.group(1);
}
throw new IllegalArgumentException(
"'" + s + "' didn't match " TAG_PATTERN.pattern());
}
If you'd rather use String.split() you just need to define a regular expression that matches the delimiter, in this case -; you could use the following regular expression in a split() call:
"\\s-\\s"
It's often a good idea to use + after \\s to support one or more spaces, but it depends on what inputs you need to process. If you know it's always exactly one-space-followed-by-one-dash-followed-by-one-space, you could just split on:
" - "
In which case your extractTag() method would look like:
private static String extractTag(String s) {
String[] parts = s.split(" - ");
if (parts.length > 1) {
return s[0];
}
throw new IllegalArgumentException("Could not extract tag from '" + s + "'");
}
I've got a List containing String like those :
device0001;sale;2013-01-01 00:00:00;30.45
device0001;sale;2013-01-02 00:00:00;41.02
device0001;sale;2013-01-03 00:00:00;30.45
...
device0001;saleCode;2013-01-01 00:00:00;10
device0001;saleCode;2013-01-02 00:00:00;55
device0001;saleCode;2013-01-03 00:00:00;55
Multiple Device, multiple CodeName and Date by Device. I'd like to map the Value of the saleCode to the sale CodeName.
Example of what I'd like in the end :
device0001;10;2013-01-01 00:00:00;30.45
device0001;55;2013-01-02 00:00:00;41.02
device0001;55;2013-01-03 00:00:00;30.45
The saleCode String may or may not be kept, it doesn't matter.
I've made it work with 2 for loop and ifs, but it was way too long to process.
I thought about building something like this :
Map<String(device), Map<DateTime, Map<String(element), String(value)>>>
forEach device
forEach datetime
element (Codename substring) and replace by element (Value substring)
I'm pretty sure there must be a better and/or elegant way to do this.
EDIT - Since it doesn't seem so clear why I'm trying to do, here is the code with for and if (which is way too slow) :
for (String line : lines) {
if (line.split(SEPARATOR)[4].equals("sale")) {
for (String codeLine : lines) {
if (codeLine.split(SEPARATOR)[5].equals(line.split(SEPARATOR)[5]) &&
codeLine.split(SEPARATOR)[1].equals(line.split(SEPARATOR)[1])&&
codeLine.split(SEPARATOR)[4].equals("saleCode")) {
line = line.replaceAll("sale", codeLine.split(SEPARATOR)[7]);
}
}
}
}
The index doesn't fit with my string's examples only because there are other non important fields, but index [1] is the device number, [5] the date. [4] is the type (sale, saleCode) and [7] the value.
EDIT #2
I've improved the speed like so :
MultiKeyMap<String, String> multiKeyMap = new MultiKeyMap<>();
for (String line : lines) {
if (line.split(SEPARATOR)[4].equals("saleCode")) {
String device = line.split(SEPARATOR)[1];
String date = line.split(SEPARATOR)[5];
String value = line.split(SEPARATOR)[7];
multiKeyMap.put(device, date, value);
}
}
for (int i = 0; i < lines.size(); i++) {
String code = lines.get(i).split(SEPARATOR)[4];
if (code.equals("sale")) {
String device = lines.get(i).split(SEPARATOR)[1];
String date = lines.get(i).split(SEPARATOR)[5];
String newline = lines.get(i).replaceAll("sale", multiKeyMap.get(device, date));
lines.set(i, newline);
}
}
I'll go for that for the moment, but always open for advices.
If I understand your question correctly you don't need to build any maps etc.
You have a list of strings with that format.
Just go over each string and use a regular expression to replace/update each string.
Update:
Your code is slow because you are processing the list over and over for each string.
Create a hashmap based on device id.
Go over the strings in lines one by one.
Check if the string exists on hashmap.
If it does not exist then then check which type of string it is and apply a proper regex for replacement. Add the string to the hashmap
If it does exist then update the string via a regex using the newly encountered string.
When you are done the hashmap will have the strings replaced.
Note: I am mentioning regexes because it seems you have a specific format and it might be written easily and efficiently that way. If you can't use regexes e.g you are not familiar with them follow the approach of parsing it character by character as you are doing. Still it will be better as you process the list once
I want to replace some strings in a String input :
string=string.replace("<h1>","<big><big><big><b>");
string=string.replace("</h1>","</b></big></big></big>");
string=string.replace("<h2>","<big><big>");
string=string.replace("</h2>","</big></big>");
string=string.replace("<h3>","<big>");
string=string.replace("</h3>","</big>");
string=string.replace("<h4>","<b>");
string=string.replace("</h4>","</b>");
string=string.replace("<h5>","<small><b>");
string=string.replace("</h5>","</b><small>");
string=string.replace("<h6>","<small>");
string=string.replace("</h6>","</small>");
As you can see this approach is not the best, because each time I have to search for the portion to replace etc, and Strings are immutable... Also the input is large, which means that some performance issues are to be considered.
Is there any better approach to reduce the complexity of this code ?
Although StringBuilder.replace() is a huge improvement compared to String.replace(), it is still very far from being optimal.
The problem with StringBuilder.replace() is that if the replacement has different length than the replaceable part (applies to our case), a bigger internal char array might have to be allocated, and the content has to be copied, and then the replace will occur (which also involves copying).
Imagine this: You have a text with 10.000 characters. If you want to replace the "XY" substring found at position 1 (2nd character) to "ABC", the implementation has to reallocate a char buffer which is at least larger by 1, has to copy the old content to the new array, and it has to copy 9.997 characters (starting at position 3) to the right by 1 to fit "ABC" into the place of "XY", and finally characters of "ABC" are copied to the starter position 1. This has to be done for every replace! This is slow.
Faster Solution: Building Output On-The-Fly
We can build the output on-the-fly: parts that don't contain replaceable texts can simply be appended to the output, and if we find a replaceable fragment, we append the replacement instead of it. Theoretically it's enough to loop over the input only once to generate the output. Sounds simple, and it's not that hard to implement it.
Implementation:
We will use a Map preloaded with mappings of the replaceable-replacement strings:
Map<String, String> map = new HashMap<>();
map.put("<h1>", "<big><big><big><b>");
map.put("</h1>", "</b></big></big></big>");
map.put("<h2>", "<big><big>");
map.put("</h2>", "</big></big>");
map.put("<h3>", "<big>");
map.put("</h3>", "</big>");
map.put("<h4>", "<b>");
map.put("</h4>", "</b>");
map.put("<h5>", "<small><b>");
map.put("</h5>", "</b></small>");
map.put("<h6>", "<small>");
map.put("</h6>", "</small>");
And using this, here is the replacer code: (more explanation after the code)
public static String replaceTags(String src, Map<String, String> map) {
StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);
for (int pos = 0;;) {
int ltIdx = src.indexOf('<', pos);
if (ltIdx < 0) {
// No more '<', we're done:
sb.append(src, pos, src.length());
return sb.toString();
}
sb.append(src, pos, ltIdx); // Copy chars before '<'
// Check if our hit is replaceable:
boolean mismatch = true;
for (Entry<String, String> e : map.entrySet()) {
String key = e.getKey();
if (src.regionMatches(ltIdx, key, 0, key.length())) {
// Match, append the replacement:
sb.append(e.getValue());
pos = ltIdx + key.length();
mismatch = false;
break;
}
}
if (mismatch) {
sb.append('<');
pos = ltIdx + 1;
}
}
}
Testing it:
String in = "Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End";
System.out.println(in);
System.out.println(replaceTags(in, map));
Output: (wrapped to avoid scroll bar)
Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End
Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End
This solution is faster than using regular expressions as that involves much overhead, like compiling a Pattern, creating a Matcher etc. and regexp is also much more general. It also creates many temporary objects under the hood which are thrown away after the replace. Here I only use a StringBuilder (plus char array under its hood) and the code iterates over the input String only once. Also this solution is much faster that using StringBuilder.replace() as detailed at the top of this answer.
Notes and Explanation
I initialized the StringBuilder in the replaceTags() method like this:
StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);
So basically I created it with an initial capacity of 150% of the length of the original String. This is because our replacements are longer than the replaceable texts, so if replacing occurs, the output will obviously be longer than the input. Giving a larger initial capacity to StringBuilder will result in no internal char[] reallocation at all (of course the required initial capacity depends on the replaceable-replacement pairs and their frequency/occurrence in the input, but this +50% is a good upper estimation).
I also utilized the fact that all replaceable strings start with a '<' character, so finding the next potential replaceable position becomes blazing-fast:
int ltIdx = src.indexOf('<', pos);
It's just a simple loop and char comparisons inside String, and since it always starts searching from pos (and not from the start of the input), overall the code iterates over the input String only once.
And finally to tell if a replaceable String does occur at the potential position, we use the String.regionMatches() method to check the replaceable stings which is also blazing-fast as all it does is just compares char values in a loop and returns at the very first mismatching character.
And a PLUS:
The question doesn't mention it, but our input is an HTML document. HTML tags are case-insensitive which means the input might contain <H1> instead of <h1>.
To this algorithm this is not a problem. The regionMatches() in the String class has an overload which supports case-insensitive comparison:
boolean regionMatches(boolean ignoreCase, int toffset, String other,
int ooffset, int len);
So if we want to modify our algorithm to also find and replace input tags which are the same but are written using different letter case, all we have to modify is this one line:
if (src.regionMatches(true, ltIdx, key, 0, key.length())) {
Using this modified code, replaceable tags become case-insensitive:
Yo<H1>TITLE</H1><h3>Hi!</h3>Nice day.<H6>Hi back!</H6>End
Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End
For performance - use StringBuilder.
For convenience you can use Map to store values and replacements.
Map<String, String> map = new HashMap<>();
map.put("<h1>","<big><big><big><b>");
map.put("</h1>","</b></big></big></big>");
map.put("<h2>","<big><big>");
...
StringBuilder builder = new StringBuilder(yourString);
for (String key : map.keySet()) {
replaceAll(builder, key, map.get(key));
}
... To replace all occurences in StringBuilder you can check here:
Replace all occurrences of a String using StringBuilder?
public static void replaceAll(StringBuilder builder, String from, String to)
{
int index = builder.indexOf(from);
while (index != -1)
{
builder.replace(index, index + from.length(), to);
index += to.length(); // Move to the end of the replacement
index = builder.indexOf(from, index);
}
}
Unfortunately StringBuilder doesn't provide a replace(string,string) method, so you might want to consider using Pattern and Matcher in conjunction with StringBuffer:
String input = ...;
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("</?(h1|h2|...)>");
Matcher m = p.matcher( input );
while( m.find() )
{
String match = m.group();
String replacement = ...; //get replacement for match, e.g. by lookup in a map
m.appendReplacement( sb, replacement );
}
m.appendTail( sb );
You could do something similar with StringBuilder but in that case you'd have to implement appendReplacement etc. yourself.
As for the expression you could also just try and match any html tag (although that might cause problems since regex and arbitrary html don't fit very well) and when the lookup doesn't have any result you just replace the match with itself.
The particular example you provide seems to be HTML or XHTML. Trying to edit HTML or XML using regular expressions is frought with problems. For the kind of editing you seem to be interested in doing you should look at using XSLT. Another possibility is to use SAX, the streaming XML parser, and have your back-end write the edited output on the fly. If the text is actually HTML, you might be better using a tolerant HTML parser, such as JSoup, to build a parsed representation of the document (like the DOM), and manipulate that before outputting it.
StringBuilder is backed by a char array. So, unlike String instances, it is mutable. Thus, you can call indexOf() and replace() on the StringBuilder.
I would do something like this
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if (tagEquals(str, i, "h1")) {
sb.append("<big><big><big><b>");
i += 2;
} else (tagEquals(s, i, "/h1")) {
...
} else {
sb.append(str.charAt(i));
}
}
tagEquals is a func which checks a tag name
Use Apache Commons StringUtils.replaceEach.
String[] searches = new String[]{"<h1>", "</h1>", "<h2>", ...};
String[] replacements = new String[]("<big><big><big><b>", "</b></big></big></big>", "<big><big>" ...};
string = StringUtils.replaceEach(string, searches, replacements);
I have a requirement to get the substring of a string based on a condition.
String str = "ABC::abcdefgh||XYZ::xyz";
If input is "ABC", check if it contains in str and if it presents then it should print abcdefgh.
In the same way, if input is "XYZ", then it should print xyz.
How can i achieve this with string manipulation in java?
If I've guessed the format of your String correctly, then you could split it into tokens with something like this:
String[] tokens = str.split("||");
for(String token : tokens)
{
// Cycle through each token.
String key = token.split("::")[0];
String value = token.split("::")[1];
if(key.equals(input))
{
// input being the user's typed in value.
return value;
}
}
But let's have a think for a minute. Why keep this in a String, when a HashMap is a much cleaner solution to your problem? Stick the String into a config file, and on load,
some code can perform a similar task:
Map<String, String> inputMap = new HashMap<String, String>();
String[] tokens = str.split("||");
for(String token : tokens)
{
// Cycle through each token.
String key = token.split("::")[0];
String value = token.split("::")[1];
inputMap.put(key, value);
}
Then when the user types something in, it's as easy as:
return inputMap.get(input);
The idea is that, you should split your string with the delimiters of "::" and "||" , i.e. whichever of them is encountered it will be treated as a delimiter. So, the best way for achieving that is using regular expressions, I think.
String str = "ABC::abcdefgh||XYZ::xyz";
String[] parts = str.split("[::]|[/||]");
Map<String, String> map = new HashMap<String, String>();
for (int i = 0; i < parts.length - 2; i += 4) {
if (!parts[i].equals("")) {
map.put(parts[i], parts[i + 2]);
}
}
Short and concise, your code is ready. The for loop seems weird, if anyone comes up with a better regex for splitting (to get rid of the empty strings), it will become cleaner. I'm not a regex expert, so any suggestions are welcome.
Use the contains method to see if it has the sub string: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#contains%28java.lang.CharSequence%29
You could do it as follows:
String[] parts = st.split("||");
if (parts[0].startsWith("ABC")) {
String[] values = parts[0].split("::");
System.out.println(values[1]);
} else {
if (parts[1].startsWith("XYZ") {
String[] values = parts[0].split("::");
System.out.println(values[1]);
}
}
The above code will check first if ABC is there. If yes, it will print the result and then stop. If not, it will check the second section of the code to see if it starts with XYZ and then print the result. You can change it to suit your needs.
I have string which will be in the format of
<!-- accountId="123" activity="add" request="add user" -->
Number of parameters and the order is random.
I need to get the value of request, I need to parse the add user text from the string. What is the best way to do this in Java?
Sounds like a school project so rather than solving the problem I'll just point you in the right direction: Check out the String Tokenizer Class
You could parse it using regular expressions, something like this:
public static Map<String, String> parse(String s) {
Map<String, String> map = new HashMap<String, String>();
Pattern p = Pattern.compile("(\\w+)\\s*=\\s*\"(.*?)\"");
Matcher m = p.matcher(s);
while (m.find()) {
map.put(m.group(1), m.group(2));
}
return map;
}
With example usage:
String s = "<!-- accountId=\"123\" activity=\"add\" request=\"add user\" -->";
Map<String, String> m = parse(s);
// m => {accountId=123, request=add user, activity=add}
m.get("request"); // => "add user"
If you need to retain the ordering of the attributes you could use a LinkedHashMap or TreeMap, for example.
My solution would be to use brute force and split the string as needed, and update a HashMap based on that. This probably is the simplest solution.
The other way is to use String Tokenizer, as Kyle suggested.
Third alternative is to replace beginning and ending markup so that it forms a valid XML and then parse that as XML. Yes, I am aware this particular is like shooting a fly with a cannon. But sometimes it may be needed and it is an option ;)
You need to do the following steps:
Find a good character sequence to split on
Iterate over the returned String array
When the current index of said String array matches the key you are looking for retrieve the value
A hint to keep it simple: not try to parse all in one go. For instance, first try to retrieve the raw key-value pairs like 'activity="add"'. Then continue from there.
If you just need the value of "request", the fastest way to do that would be:
void getRequest(String str) {
int start = str.indexOf("request=\"");
if (start != -1) {
start += 9; // request="
end = str.indexOf('"', start);
if (end != -1) {
return str.substring(start, end);
}
}
// not found
return null;
}
I'm mostly experienced with Python regular expressions, but the Java syntax appears to be the same. Perhaps strip off the '' from either end, then iterate through the key-value pairs with a regex like
'\s?([\w ]+)="([\w ]+)"\s?(.*?)'
(Assuming the keys and values consist only of alphanumeric characters, spaces, and underscores; otherwise, you might substitute the \w's with other sets of characters) The three groups from this match would be the next key, the next value, and the rest of the string, which can be parsed in the same way until you find what you need.