parsing a string - java

I have string which will be in the format of
<!-- accountId="123" activity="add" request="add user" -->
Number of parameters and the order is random.
I need to get the value of request, I need to parse the add user text from the string. What is the best way to do this in Java?

Sounds like a school project so rather than solving the problem I'll just point you in the right direction: Check out the String Tokenizer Class

You could parse it using regular expressions, something like this:
public static Map<String, String> parse(String s) {
Map<String, String> map = new HashMap<String, String>();
Pattern p = Pattern.compile("(\\w+)\\s*=\\s*\"(.*?)\"");
Matcher m = p.matcher(s);
while (m.find()) {
map.put(m.group(1), m.group(2));
}
return map;
}
With example usage:
String s = "<!-- accountId=\"123\" activity=\"add\" request=\"add user\" -->";
Map<String, String> m = parse(s);
// m => {accountId=123, request=add user, activity=add}
m.get("request"); // => "add user"
If you need to retain the ordering of the attributes you could use a LinkedHashMap or TreeMap, for example.

My solution would be to use brute force and split the string as needed, and update a HashMap based on that. This probably is the simplest solution.
The other way is to use String Tokenizer, as Kyle suggested.
Third alternative is to replace beginning and ending markup so that it forms a valid XML and then parse that as XML. Yes, I am aware this particular is like shooting a fly with a cannon. But sometimes it may be needed and it is an option ;)

You need to do the following steps:
Find a good character sequence to split on
Iterate over the returned String array
When the current index of said String array matches the key you are looking for retrieve the value

A hint to keep it simple: not try to parse all in one go. For instance, first try to retrieve the raw key-value pairs like 'activity="add"'. Then continue from there.

If you just need the value of "request", the fastest way to do that would be:
void getRequest(String str) {
int start = str.indexOf("request=\"");
if (start != -1) {
start += 9; // request="
end = str.indexOf('"', start);
if (end != -1) {
return str.substring(start, end);
}
}
// not found
return null;
}

I'm mostly experienced with Python regular expressions, but the Java syntax appears to be the same. Perhaps strip off the '' from either end, then iterate through the key-value pairs with a regex like
'\s?([\w ]+)="([\w ]+)"\s?(.*?)'
(Assuming the keys and values consist only of alphanumeric characters, spaces, and underscores; otherwise, you might substitute the \w's with other sets of characters) The three groups from this match would be the next key, the next value, and the rest of the string, which can be parsed in the same way until you find what you need.

Related

How to check any of the list of words present in the string?

I have list of words which i need to check if any of the words in list is present in string or not but word in the string can be in any format let say i have list of words {:carloan:,creditcard} but in string it can be like car-loan or carloan or :carloan in any of this formats.
I am using lambda function in java to find the any near match but its not working like below:
List<String> list = new ArrayList<>();
list.add(":carloan:")
list.add(":creditcard:")
String inputString = "i want carloan"
boolean match = list.stream().anyMatch(s -> inputString.contains(s));
But above method is giving boolean true only if the substring is matching exactly same with the word in the list.
Is there way i can give true even if it match partially let say the user entered car-loan but in list it's like :carloan: i don't want to use iterate over a list and do matching. Please suggest me way i can do using lambda function in java.
You could use a regex approach here:
List<String> list = new ArrayList<>();
list.add("carloan");
list.add("creditcard");
String regex = ".*(?:" + String.join("|", list) + ").*";
String input = "I am looking for a carloan or creditcard";
if (input.matches(regex)) {
System.out.println("MATCH");
}
Some possible changes you might want to make to the above would be to add word boundaries around the alternation. That is, you might want to use this regex pattern:
.*\b(?:carloan|creditcard)\b.*
This would avoid matching e.g. carloans when you really want to exactly match only the singular carloan.
Edit:
Here is a version using regex closer to your original starting point:
boolean result = list.stream().anyMatch(s -> input.matches(".*\\b" + s + "\\b.*"));
if (result) {
System.out.println("MATCH");
}
We can stream your list of terms, and then assert whether the input string matches any term using regex. But note that this approach means calling String#matches N times, for a list of N terms, while the above approach just makes a single call to that API. I would bet on the alternation approach being more efficient here.

Extracting digits in the middle of a string using delimiters

String ccToken = "";
String result = "ssl_transaction_type=CCGETTOKENssl_result=0ssl_token=4366738602809990ssl_card_number=41**********9990ssl_token_response=SUCCESS";
String[] elavonResponse = result.split("=|ssl");
for (String t : elavonResponse) {
System.out.println(t);
}
ccToken = (elavonResponse[6]);
System.out.println(ccToken);
I want to be able to grab a specific part of a string and store it in a variable. The way I'm currently doing it, is by splitting the string and then storing the value of the cell into my variable. Is there a way to specify that I want to store the digits after "ssl_token="?
I want my code to be able to obtain the value of ssl_token without having to worry about changes in the string that are not related to the token since I wont have control over the string. I have searched online but I can't find answers for my specific problem or I maybe using the wrong words for searching.
You can use replaceAll with this regex .*ssl_token=(\\d+).* :
String number = result.replaceAll(".*ssl_token=(\\d+).*", "$1");
Outputs
4366738602809990
You can do it with regex. It would probably be better to change the specifications of the input string so that each key/value pair is separated by an ampersand (&) so you could split it (similar to HTTP POST parameters).
Pattern p = Pattern.compile(".*ssl_token=([0-9]+).*");
Matcher m = p.matcher(result);
if(m.matches()) {
long token = Long.parseLong(m.group(1));
System.out.println(String.format("token: [%d]", token));
} else {
System.out.println("token not found");
}
Search index of ssl_token. Create substring from that index. Convert substring to number. To number can extract number when it is at the beggining of the string.

How to find a String of last 2 items in colon separated string

I have a string = ab:cd:ef:gh. On this input, I want to return the string ef:gh (third colon intact).
The string apple:orange:cat:dog should return cat:dog (there's always 4 items and 3 colons).
I could have a loop that counts colons and makes a string of characters after the second colon, but I was wondering if there exists some easier way to solve it.
You can use the split() method for your string.
String example = "ab:cd:ef:gh";
String[] parts = example.split(":");
System.out.println(parts[parts.length-2] + ":" + parts[parts.length-1]);
String example = "ab:cd:ef:gh";
String[] parts = example.split(":",3); // create at most 3 Array entries
System.out.println(parts[2]);
The split function might be what you're looking for here. Use the colon, like in the documentation as your delimiter. You can then obtain the last two indexes, like in an array.
Yes, there is easier way.
First, is by using method split from String class:
String txt= "ab:cd:ef:gh";
String[] arr = example.split(":");
System.out.println(arr[arr.length-2] + " " + arr[arr.length-1]);
and the second, is to use Matcher class.
Use overloaded version of lastIndexOf(), which takes the starting index as 2nd parameter:
str.substring(a.lastIndexOf(":", a.lastIndexOf(":") - 1) + 1)
Another solution would be using a Pattern to match your input, something like [^:]+:[^:]+$. Using a pattern would probably be easier to maintain as you can easily change it to handle for example other separators, without changing the rest of the method.
Using a pattern is also likely be more efficient than String.split() as the latter is also converting its parameter to a Pattern internally, but it does more than what you actually need.
This would give something like this:
String example = "ab:cd:ef:gh";
Pattern regex = Pattern.compile("[^:]+:[^:]+$");
final Matcher matcher = regex.matcher(example);
if (matcher.find()) {
// extract the matching group, which is what we are looking for
System.out.println(matcher.group()); // prints ef:gh
} else {
// handle invalid input
System.out.println("no match");
}
Note that you would typically extract regex as a reusable constant to avoid compiling the pattern every time. Using a constant would also make the pattern easier to change without looking at the actual code.

Change String in List<String> based on another String same List Java

I've got a List containing String like those :
device0001;sale;2013-01-01 00:00:00;30.45
device0001;sale;2013-01-02 00:00:00;41.02
device0001;sale;2013-01-03 00:00:00;30.45
...
device0001;saleCode;2013-01-01 00:00:00;10
device0001;saleCode;2013-01-02 00:00:00;55
device0001;saleCode;2013-01-03 00:00:00;55
Multiple Device, multiple CodeName and Date by Device. I'd like to map the Value of the saleCode to the sale CodeName.
Example of what I'd like in the end :
device0001;10;2013-01-01 00:00:00;30.45
device0001;55;2013-01-02 00:00:00;41.02
device0001;55;2013-01-03 00:00:00;30.45
The saleCode String may or may not be kept, it doesn't matter.
I've made it work with 2 for loop and ifs, but it was way too long to process.
I thought about building something like this :
Map<String(device), Map<DateTime, Map<String(element), String(value)>>>
forEach device
forEach datetime
element (Codename substring) and replace by element (Value substring)
I'm pretty sure there must be a better and/or elegant way to do this.
EDIT - Since it doesn't seem so clear why I'm trying to do, here is the code with for and if (which is way too slow) :
for (String line : lines) {
if (line.split(SEPARATOR)[4].equals("sale")) {
for (String codeLine : lines) {
if (codeLine.split(SEPARATOR)[5].equals(line.split(SEPARATOR)[5]) &&
codeLine.split(SEPARATOR)[1].equals(line.split(SEPARATOR)[1])&&
codeLine.split(SEPARATOR)[4].equals("saleCode")) {
line = line.replaceAll("sale", codeLine.split(SEPARATOR)[7]);
}
}
}
}
The index doesn't fit with my string's examples only because there are other non important fields, but index [1] is the device number, [5] the date. [4] is the type (sale, saleCode) and [7] the value.
EDIT #2
I've improved the speed like so :
MultiKeyMap<String, String> multiKeyMap = new MultiKeyMap<>();
for (String line : lines) {
if (line.split(SEPARATOR)[4].equals("saleCode")) {
String device = line.split(SEPARATOR)[1];
String date = line.split(SEPARATOR)[5];
String value = line.split(SEPARATOR)[7];
multiKeyMap.put(device, date, value);
}
}
for (int i = 0; i < lines.size(); i++) {
String code = lines.get(i).split(SEPARATOR)[4];
if (code.equals("sale")) {
String device = lines.get(i).split(SEPARATOR)[1];
String date = lines.get(i).split(SEPARATOR)[5];
String newline = lines.get(i).replaceAll("sale", multiKeyMap.get(device, date));
lines.set(i, newline);
}
}
I'll go for that for the moment, but always open for advices.
If I understand your question correctly you don't need to build any maps etc.
You have a list of strings with that format.
Just go over each string and use a regular expression to replace/update each string.
Update:
Your code is slow because you are processing the list over and over for each string.
Create a hashmap based on device id.
Go over the strings in lines one by one.
Check if the string exists on hashmap.
If it does not exist then then check which type of string it is and apply a proper regex for replacement. Add the string to the hashmap
If it does exist then update the string via a regex using the newly encountered string.
When you are done the hashmap will have the strings replaced.
Note: I am mentioning regexes because it seems you have a specific format and it might be written easily and efficiently that way. If you can't use regexes e.g you are not familiar with them follow the approach of parsing it character by character as you are doing. Still it will be better as you process the list once

Best way to retrieve a value from a string java

If I am being passed a string that contains comma delimited key-value pairs like this
seller=1000,country="canada",address="123 1st st", etc.
There seems like there must be a better way than parsing then iterating through.
What is the best way to retreive a value from this string based on the key name in Java?
Since release 10 Google Guava provides a class MapSplitter which does exactly that kind of things:
Map<String, String> params = Splitter
.on(",")
.withKeyValueSeparator("=")
.split("k1=v1,k2=v2");
You can create your own CSV parser, it's not very complicated but there are a few corner cases to be carfull with assuming of course you are using standard CSV format.
But why reinventing the wheel...
You can try looking up a CSV parser like
OpenCSV
SuperCSV
Apache Commons
There are others, look around I'm sure you will find one that suits your needs.
Usually you will want to parse the string into a map because you will be pulling various values perhaps multiple times, so it often makes sense to pay the parsing cost up-front.
If not, then here is how I would solve the problem (assuming you want to differentiate between int values and String values).:
public Object pullValue(String pairs, String key) {
boolean returnString = false;
int keyStart = pairs.indexOf(key + "=");
if (keyStart < 0) {
logger.error("Key " + key + " not found in key-value pairs string");
return null;
}
int valueStart = keyStart + key.length() + 1;
if (pairs.charAt(valueStart) == '"') {
returnString = true;
valueStart++; // Skip past the quote mark
}
int valueEnd;
if (returnString) {
valueEnd = pairs.indexOf('"', valueStart);
if (valueEnd < 0) {
logger.error("Unmatched double quote mark extracting value for key " + key)
}
return pairs.substring(valueStart, valueEnd);
} else {
valueEnd = pairs.indexOf(',', valueStart);
if (valueEnd < 0) { // If this is the last key value pair in string
valueEnd = pairs.length();
}
return Integer.decode(pairs.substring(valueStart, valueEnd));
}
}
Note that this solution assumes no spaces between the key, the equals sign, and the value. If these are possible you will have to create some code to travel the string between them.
Another solution is to use a regular expression parser. You could do something like (this is untested):
Pattern lookingForString = Pattern.compile(key + "[ \t]*=[ \t]*[\"]([^\"]+)[\"]");
Pattern lookingForInt = Pattern.compile(key + "[ \t]*=[ \t]*([^,]+)");
Matcher stringFinder = lookingForString.matcher(pairs);
Matcher intFinder = lookingForInt.matcher(pairs);
if (stringFinder.find()) {
return stringFinder.group(1);
} else if (intFinder.find()) {
return Integer.decode(intFinder.group(1));
} else {
logger.error("Could not extract value for key " + key);
return null;
}
HTH
To separate the string by commas, the other posters are correct. It is best to use a CSV parser (your own or OTS). Considering things like commas inside quotes etc can lead to a lot of un-considered problems.
Once you have each separate token in the form:
key = "value"
I think it is easy enough to look for the first index of '='. Then the part before that will be the key, and the part after that will be the value. Then you can store them in a Map<String, String>.
This is assuming that your keys will be simple enough, and not contain = in them etc. Sometimes it's enough to take the simple route when you can restrict the problem scope.
If you just want one value out of such a string, you can use String's indexOf() and substring() methods:
String getValue(String str, String key)
{
int keyIndex = str.indexOf(key + "=");
if(keyIndex == -1) return null;
int startIndex = str.indexOf("\"", keyIndex);
int endIndex = str.indexOf("\"", startIndex);
String value = str.substring(startIndex + 1, endIndex);
return value;
}
First thing you should use a CSV parsing library to parse the comma separated values. Correctly parsing CSV data isn't as trivial as it first seems. There are lots of good arguments to not reinvent that wheel.
This will also future proof your code and be code you don't have to test or maintain.
I know the temptation to do something like data.split(','); is strong, but it is fragile and brittle solution. For just one example, what if any of the values contain the ','.
Second thing you should do is then parse the pairs. Again the temptation to use String.split("="); will be strong, but it can be brittle and fragile if the right hand side of the = has an = in it.
I am not a blind proponent of regular expressions, but used with restraint they can be just the right tool for the job. Here is the regular expression to parse the name value pairs.
The regular expression ^(.*)\s?=\s?("?([^"]*)"?|"(.*)")$, click the regular expression to test it online interactively. This works even for multiple double quotes in the right hand side of the name value pair.
This will match only what is on the left side of the first = and everything else on the right hand side, and strip the optional " off the string values, while still matching the non-quoted number values.
Given a List<String> list of the encoded name value pairs.
final Pattern p = Pattern.compile("^(.*)\s?=\s?("?([^"]*)"?|"(.*)")$");
final Map<String, String> map = new HashMap<String, String>(list.size());
for (final String nvp : list)
{
final Matcher m = p.matcher(nvp);
m.matches();
final String name = m.group(1);
final String value = m.group(2);
System.out.format("name = %s | value = %s\n", name, value);
}
Use String.split(yourdata, ',') and you will get a String[]. Then, perform String.split(String[i],"="), on each entry to separate property and values.
Ideally, you should move this data in a Properties object instance. You can then save/load it from XML easily. It has useful methods.
REM: I am assuming that you are savvy enough to understand that this solution won't work if values contain the separator (i.e., the comma) in them...

Categories

Resources