Split with Java a Python Dictionary String - java

I Have Java program that invokes a Python script and takes the result which is a Python dictionary as a string. A result would be something like:
{'27': [], '10864': [u'8344', u'7769', u'7207', u'3735']}
I want to parse the result into two String tables the keys and the values as returned. In this example I want:
String[] keys >>>> ["27",'10864'] and
String[] values >>>> ["[]","[u'8344', u'7769', u'7207', u'3735']"]
I am trying to do it with regular expressions but I don't have much success. Can anyone give an efficient solution?

You can see the regex I have used here on regex101.com.
Essentially the Matcher will run through your input and detect all matches of the regex. Then you use the .find method in a while loop to go through each match individually and extract the information.
You can use a regular array however in my opinion using an ArrayList is way more efficient if you are not sure of the size of the incoming input.
This code works for me:
public static void main(String[] args)
{
String input = "{'27': [], '10864': [u'8344', u'7769', u'7207', u'3735']}";
String pattern = "('([\w]*)': (\\[[u'0-9, ]*])?)";
List<String> keys = new ArrayList<>();
List<String> values = new ArrayList<>();
//Find multiple matches in your input.
Matcher m = Pattern.compile(pattern).matcher(input);
//Goes through each match individually.
while (m.find())
{
keys.add(m.group(2));
values.add(m.group(3));
}
//To display your keys and values.
System.out.println("Keys: " + keys.toString());
System.out.println("Values: " + values.toString());
}

Since you want to use regexes, here is one approach:
import java.util.*;
class Main {
public static void main(String[] args) {
String strExample = "{'27': [], '10864': [u'8344', u'7769', u'7207', u'3735']}";
// Initialize
List<String> keys = new ArrayList<String>();
List<String> values = new ArrayList<String>();
// Hacky regex parsing -- happy now???
for (String component : strExample.replace("{","").split("][,}]")) {
String kv[] = component.split(": ");
keys.add(kv[0]);
values.add(kv[1] + "]");
}
// Print
System.out.println("Keys: ");
for (String key : keys) {
System.out.println(key);
}
System.out.println("\nValues: ");
for (String value : values) {
System.out.println(value);
}
}
}
This first splits on the closing bracket ] and then the colon :
The output is as you desired (run in repl):
Keys:
'27'
'10864'
Values:
[]
[u'8344', u'7769', u'7207', u'3735']
Note, you really should not:
use Strings for numeric keys
use regexes to parse JSONs
use anything like this code in a real system

Related

Find exact match from Array

In java I want to iterate an array to find any matching words from my input string
if the input string is appended to numbers it should return true.
Array arr = {"card","creditcard","debitcard"}
String inputStr = "need to discard pin" --> Return False
String inputStr = "need to 444card pin" --> Return True if its followed by number
I tried the below code, but it returns true as it takes "card" from the "discard" string and compares, but I need to do an exact match
Arrays.stream(arr).anymatch(inputString::contains)
Try this:
String[] arr = {"card","creditcard","debitcard"}; // array that keeps the words
String inputStr = "need to discard pin"; // String that keeps the 'sentence'
String[] wordsToBeChecked = inputStr.split(" "); // We take the string and split it at each " " (space)
HashSet<String> matchingWords = new HashSet<>(); // This will keep the matching words
for (String s : arr)
{
for (String s1 : wordsToBeChecked)
{
if(s.equalsIgnoreCase(s1)) // If first word matches with the second
{
matchingWords.add(s1); // add it to our container
}
}
}
Or using Java 8 Streams:
List<String> wordList = Arrays.asList(arr);
List<String> sentenceWordList = Arrays.asList(inputStr.split(" "));
List<String> matchedWords = wordList.stream().filter(sentenceWordList::contains)
.collect(Collectors.toList());
The problem with most answers here is that they do not take punctuation into consideration. To solve this, you could use a regular expression like below.
String[] arr = { "card", "creditcard", "debitcard" };
String inputStr = "You need to discard Pin Card.";
Arrays.stream(arr)
.anyMatch(word -> Pattern
.compile("(?<![a-z-])" + Pattern.quote(word) + "(?![a-z-])", Pattern.CASE_INSENSITIVE)
.matcher(inputStr)
.find());
With Pattern.quote(word), we escape any character within each word with is a special character in the context of a regular expression. For instance, the literal string a^b would never match, because ^ means the start of a string if used in a regular expression.
(?<![a-z-]) and (?![a-z-]) mean that there is not a word character immediately preceding or succeeding the word. For instance, discard will not match, even if it contains the word card. I have used only lowercase in these character classes because of the next bullet:
The flag CASE_INSENSITIVE passed to the compile method causes the pattern to be matched in a case-insensitive manner.
Online demo
You could split the string using a regular expression
String[] arr = {"card","creditcard","debitcard"};
String inputStr = "need to discard pin";
List<String> wordsToBeChecked = Arrays.asList(inputStr.split("[ 0-9]"));
Arrays.stream(arr).anyMatch(wordsToBeChecked::contains);
If your word list and input string is longer, consider splitting your input string into a hashset. Looksups will be faster, then:
Set<String> wordsToBeChecked = new HashSet<>(Arrays.asList(inputStr.split(" ")));
You can create a Set of the words in inputStr and then check the words list against that Set.
Set<String> inputWords = uniqueWords(inputStr);
List<String> matchedWords = Arrays.stream(arr)
.anyMatch(word -> inputWords.contains(word))
.collect(Collectors.toList());
Building the Set may be non-trivial if you have to account for hyphenation, numbers, punctuation, and so forth. I'll wave my hands and ignore that - here's a naive implementation of uniqueWords(String) that assumes they are separated by spaces.
public Set<String> uniqueWords(String string) {
return Arrays.stream(string.split(" "))
.collect(Collectors.toSet());
}
One way would be
String[] arr = {"card","creditcard","debitcard"};
String inputStr = "need to discard pin";
var contains = Arrays.stream(inputStr.split(" ")).anyMatch(word -> Arrays.asList(arr).contains(word));
You can adjust the split regex to include all kinds of whitespace too.
Also: Consider an appropriate data structure for lookups. Array will be O(n), HashSet will be O(1).

Java extracting substring from sentences

There are combination of words like is, is not, does not contain. We have to match these words in a sentence and have to split it.
Intput : if name is tom and age is not 45 or name does not contain tom then let me know.
Expected output:
If name is
tom and age is not
45 or name does not contain
tom then let me know
I tried below code to split and extract but the occurrence of "is" is in "is not" as well which my code is not able to find out:
public static void loadOperators(){
operators.add("is");
operators.add("is not");
operators.add("does not contain");
}
public static void main(String[] args) {
loadOperators();
for(String s : operators){
System.out.println(str.split(s).length - 1);
}
}
Since there could be multiple occurence of a word split wouldn't solve your use case, as in is and is not being different operators for you. You would ideally :
Iterate :
1. Find the index of the 'operator'.
2. Search for the next space _ or word.
3. Then update your string as substring from its index to length-1.
I am not entirely sure about what you try to achieve, but let's give it a shot.
For your case, a simple "workaround" might work just fine:
Sort the operators by their length, descending. This way the "largest match" will get found first. You can define "largest" as either literally the longest string, or preferably the number of words (number of spaces contained), so is a has precedence over contains
You'll need to make sure that no matches overlap though, which can be done by comparing all matches' start and end indices and discarding overlaps by some criteria, like first match wins
This code does what you seem to be wanting to do (or what I guessed you are wanting to do):
public static void main(String[] args) {
List<String> operators = new ArrayList<>();
operators.add("is");
operators.add("is not");
operators.add("does not contain");
String input = "if name is tom and age is not 45 or name does not contain tom then let me know.";
List<String> output = new ArrayList<>();
int lastFoundOperatorsEndIndex = 0; // First start at the beginning of input
for (String operator : operators){
int indexOfOperator = input.indexOf(operator); // Find current operator's position
if (indexOfOperator > -1) { // If operator was found
int thisOperatorsEndIndex = indexOfOperator + operator.length(); // Get length of operator and add it to the index to include operator
output.add(input.substring(lastFoundOperatorsEndIndex, thisOperatorsEndIndex).trim()); // Add operator to output (and remove trailing space)
lastFoundOperatorsEndIndex = thisOperatorsEndIndex; // Update startindex for next operator
}
}
output.add(input.substring(lastFoundOperatorsEndIndex, input.length()).trim()); // Add rest of input as last entry to output
for (String part : output) { // Output to console
System.out.println(part);
}
}
But it is highly dependant on the order of the sentence and the operators. If we're talking about user-input, the task will be much more complicated.
A better method using regular expressions (regExp) would be:
public static void main(String... args) {
// Define inputs
String input1 = "if name is tom and age is not 45 or name does not contain tom then let me know.";
String input2 = "the name is tom and he is 22 years old but the name does not contain jack, but merry is 24 year old.";
// Output split strings
for (String part : split(input1)) {
System.out.println(part.trim());
}
System.out.println();
for (String part : split(input2)) {
System.out.println(part.trim());
}
}
private static String[] split(String input) {
// Define list of operators - 'is not' has to precede 'is'!!
String[] operators = { "\\sis not\\s", "\\sis\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
// Concatenate operators to regExp-String for search
StringBuilder searchString = new StringBuilder();
for (String operator : operators) {
if (searchString.length() > 0) {
searchString.append("|");
}
searchString.append(operator);
}
// Replace all operators by operator+\n and split resulting string at \n-character
return input.replaceAll("(" + searchString.toString() + ")", "$1\n").split("\n");
}
Notice the order of the operators! 'is' has to come after 'is not' or 'is not' will always be split.
You can prevent this by using a negative lookahead for the operator 'is'.
So "\\sis\\s" would become "\\sis(?! not)\\s" (reading like: "is", not followed by a " not").
A minimalist Version (with JDK 1.6+) could look like this:
private static String[] split(String input) {
String[] operators = { "\\sis(?! not)\\s", "\\sis not\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
return input.replaceAll("(" + String.join("|", operators) + ")", "$1\n").split("\n");
}

Regex pattern for String with multiple leading and trailing ones and zeroes

I have a search String which contains the format below:
Search String
111651311
111651303
4111650024
4360280062
20167400
It needs to be matched with sequence of numbers below
001111651311000
001111651303000
054111650024000
054360280062000
201674000000000
Please note the search strings have been added with additional numbers either on each sides.
I have tried the regex below in java to match the search strings but it only works for some.
Pattern pattern = Pattern.compile("([0-9])\1*"+c4MIDVal+"([0-9])\1*");
Any advice ?
Update
Added the code I used below might provide some clarity on what am trying to do
Code Snippet
public void compare(String fileNameAdded, String fileNameToBeAdded){
List<String> midListAdded = readMID.readMIDAdded(fileNameAdded);
HashMap<String, String> midPairsToBeAdded = readMID.readMIDToBeAdded(fileNameToBeAdded);
List <String []> midCaptured = new ArrayList<String[]>();
for (Map.Entry<String, String> entry: midPairsToBeAdded.entrySet()){
String c4StoreKey = entry.getKey();
String c4MIDVal = entry.getValue();
Pattern pattern = Pattern.compile("([0-9]?)\\1*"+c4MIDVal+"([0-9]?)\\2*");
for (String mid : midListAdded){
Matcher match = pattern.matcher(mid);
// logger.info("Match Configured MID :: "+ mid+ " with Pattern "+"\\*"+match.toString()+"\\*");
if (match.find()){
midCaptured.add(new String []{ c4StoreKey +"-"+c4MIDVal, mid});
}
}
}
logger.info(midCaptured.size()+ " List of Configured MIDs ");
for (String [] entry: midCaptured){
logger.info(entry[0]+ "- "+entry[1] );
}
}
You need to refer the second capturing group in the second part and also you need to make both the patterns inside the capturing group as optional.
Pattern pattern = Pattern.compile("([0-9]?)\\1*"+c4MIDVal+"([0-9]?)\\2*");
DEMO
What is the problem by using the String.contains() method?
"001111651311000".contains("111651311"); // true
"201674000000000".contains("111651311"); // false

Java regex to match all words in a string

I am looking for a regex to match following pattern
(abc|def|ghi|abc+def+ghi|def+ghi)
essentially everything that's separated by | is an OR search
and everything joined with + all words must be present.
I have to construct the regex dynamically based on an input string in the above format.
I tried following for AND searches:
(?=.*?\babc\b)(?=.*?\bdef\b)(?=(.*?\bghi\b)
following for OR searches
.*(abc|def).*
Is there a single regex possible? any examples would help
(abc|def|ghi)
That will match every string that contains the words you're looking for.
AND searches
You list the following:
(?=.*?\babc\b)(?=.*?\bdef\b)(?=(.*?\bghi\b)
My version:
(?=.*?\babc\b)(?=.*?\bdef\b)(?=.*?\bghi\b).
Note that your version appears an extra ( before the ghi test.
Also note that I include a . at the end (capture any single character), this is so the regular expression actually can match something otherwise you are just doing a lookahead with no actual search.
OR searches
For a search for "abc" OR "def" I would use the following regular expression:
\babc\b|\bdef\b
OR
\b(?:abc|def)\b
Combined
So for your example of (abc|def|ghi|abc+def+ghi|def+ghi) the actual regular expression might look like this:
\babc\b|\bdef\b|\bghi\b|(?=.*?\babc\b)(?=.*?\bdef\b)(?=.*?\bghi\b).|(?=.*?\bdef\b)(?=.*?\bghi\b).
It's kind of a bad example because it would match abc on it's own because of the first OR case instead of the requirement specified by the AND case in the middle.
Remember to specify your case sensitivty for the regular expression to.
Wrote this sample method match(String input, String searchFilter)
public static void main(String[] args) {
String input = " dsfsdf Invalid Locatio sdfsdff Invalid c Test1 xx Test2";
String searchFilter = "Invalid Pref Code|Invalid Location+Invalid company|Test|Test1+Test2";
System.out.println(match(input, searchFilter));
}
/**
* #param input
* #param searchFilter
*/
private static boolean match(String input, String searchFilter) {
List<String> searchParts = Arrays.asList(searchFilter.split("\\|"));
ArrayList<String> ms = new ArrayList<String>();
ArrayList<String> ps = new ArrayList<String>();
for (String pls : searchParts) {
if (pls.indexOf("+") > 0) {
ms.add(pls);
} else {
ps.add(pls);
}
}
ArrayList<String> patterns = new ArrayList<>();
for (String msb : ms) {
StringBuffer sb = new StringBuffer();
for (String msbp : msb.trim().split("\\+")) {
sb.append("(?=.*?\\b").append(msbp.trim()).append("\\b).");
}
patterns.add(sb.toString());
}
Pattern p = Pattern
.compile("\\b(?:" + StringUtils.join(ps, "|") + ")\\b|"+ StringUtils.join(patterns, "|"),
Pattern.CASE_INSENSITIVE);
return p.matcher(input).find();
}
assertTrue(Pattern.matches("\\((\\w+(\\||\\+))+\\w+\\)", "(abc|def|ghi|abc+def+ghi|def+ghi)"));

How to get the specific part of a string based on condition?

I have a requirement to get the substring of a string based on a condition.
String str = "ABC::abcdefgh||XYZ::xyz";
If input is "ABC", check if it contains in str and if it presents then it should print abcdefgh.
In the same way, if input is "XYZ", then it should print xyz.
How can i achieve this with string manipulation in java?
If I've guessed the format of your String correctly, then you could split it into tokens with something like this:
String[] tokens = str.split("||");
for(String token : tokens)
{
// Cycle through each token.
String key = token.split("::")[0];
String value = token.split("::")[1];
if(key.equals(input))
{
// input being the user's typed in value.
return value;
}
}
But let's have a think for a minute. Why keep this in a String, when a HashMap is a much cleaner solution to your problem? Stick the String into a config file, and on load,
some code can perform a similar task:
Map<String, String> inputMap = new HashMap<String, String>();
String[] tokens = str.split("||");
for(String token : tokens)
{
// Cycle through each token.
String key = token.split("::")[0];
String value = token.split("::")[1];
inputMap.put(key, value);
}
Then when the user types something in, it's as easy as:
return inputMap.get(input);
The idea is that, you should split your string with the delimiters of "::" and "||" , i.e. whichever of them is encountered it will be treated as a delimiter. So, the best way for achieving that is using regular expressions, I think.
String str = "ABC::abcdefgh||XYZ::xyz";
String[] parts = str.split("[::]|[/||]");
Map<String, String> map = new HashMap<String, String>();
for (int i = 0; i < parts.length - 2; i += 4) {
if (!parts[i].equals("")) {
map.put(parts[i], parts[i + 2]);
}
}
Short and concise, your code is ready. The for loop seems weird, if anyone comes up with a better regex for splitting (to get rid of the empty strings), it will become cleaner. I'm not a regex expert, so any suggestions are welcome.
Use the contains method to see if it has the sub string: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#contains%28java.lang.CharSequence%29
You could do it as follows:
String[] parts = st.split("||");
if (parts[0].startsWith("ABC")) {
String[] values = parts[0].split("::");
System.out.println(values[1]);
} else {
if (parts[1].startsWith("XYZ") {
String[] values = parts[0].split("::");
System.out.println(values[1]);
}
}
The above code will check first if ABC is there. If yes, it will print the result and then stop. If not, it will check the second section of the code to see if it starts with XYZ and then print the result. You can change it to suit your needs.

Categories

Resources