Java sorting string based on two delimiters - java

I have a string of the following format
A34B56A12B56
And I am trying to sort the numbers into two arrays based on the prefixes.
For example:
Array A: 34,12
Array B: 56,56
What is the simplest way to go about this?
I have tried to use the String Tokenizer class and I am able to extract the numbers, however there is no way of telling what the prefix was. Essentially, I can only extract them into a single array.
Any help would be appreciated.
Thanks!

Andreas seems to have provided a good answer already, but I wanted to practice some regular expressions in Java, so I wrote the following solution that works for any typical alphabetical prefix: (Comments are in-line.)
String str = "A34B56A12B56";
// pattern that captures the prefix and the suffix groups
String regexStr = "([A-z]+)([0-9]+)";
// compile the regex pattern
Pattern regexPattern = Pattern.compile(regexStr);
// create the matcher
Matcher regexMatcher = regexPattern.matcher(str);
HashMap<String, ArrayList<Long>> prefixToNumsMap = new HashMap<>();
// retrieve all matches, add to prefix bucket
while (regexMatcher.find()) {
// get letter prefix (assuming can be more than one letter for generality)
String prefix = regexMatcher.group(1);
// get number
long suffix = Long.parseLong(regexMatcher.group(2));
// search for list in map
ArrayList<Long> nums = prefixToNumsMap.get(prefix);
// if prefix new, create new list with the number added, update the map
if (nums == null) {
nums = new ArrayList<Long>();
nums.add(suffix);
prefixToNumsMap.put(prefix, nums);
} else { // otherwise add the number to the existing list
nums.add(suffix);
}
System.out.println(prefixToNumsMap);
}
Output : {A=[34, 12], B=[56, 56]}

Related

Find exact match from Array

In java I want to iterate an array to find any matching words from my input string
if the input string is appended to numbers it should return true.
Array arr = {"card","creditcard","debitcard"}
String inputStr = "need to discard pin" --> Return False
String inputStr = "need to 444card pin" --> Return True if its followed by number
I tried the below code, but it returns true as it takes "card" from the "discard" string and compares, but I need to do an exact match
Arrays.stream(arr).anymatch(inputString::contains)
Try this:
String[] arr = {"card","creditcard","debitcard"}; // array that keeps the words
String inputStr = "need to discard pin"; // String that keeps the 'sentence'
String[] wordsToBeChecked = inputStr.split(" "); // We take the string and split it at each " " (space)
HashSet<String> matchingWords = new HashSet<>(); // This will keep the matching words
for (String s : arr)
{
for (String s1 : wordsToBeChecked)
{
if(s.equalsIgnoreCase(s1)) // If first word matches with the second
{
matchingWords.add(s1); // add it to our container
}
}
}
Or using Java 8 Streams:
List<String> wordList = Arrays.asList(arr);
List<String> sentenceWordList = Arrays.asList(inputStr.split(" "));
List<String> matchedWords = wordList.stream().filter(sentenceWordList::contains)
.collect(Collectors.toList());
The problem with most answers here is that they do not take punctuation into consideration. To solve this, you could use a regular expression like below.
String[] arr = { "card", "creditcard", "debitcard" };
String inputStr = "You need to discard Pin Card.";
Arrays.stream(arr)
.anyMatch(word -> Pattern
.compile("(?<![a-z-])" + Pattern.quote(word) + "(?![a-z-])", Pattern.CASE_INSENSITIVE)
.matcher(inputStr)
.find());
With Pattern.quote(word), we escape any character within each word with is a special character in the context of a regular expression. For instance, the literal string a^b would never match, because ^ means the start of a string if used in a regular expression.
(?<![a-z-]) and (?![a-z-]) mean that there is not a word character immediately preceding or succeeding the word. For instance, discard will not match, even if it contains the word card. I have used only lowercase in these character classes because of the next bullet:
The flag CASE_INSENSITIVE passed to the compile method causes the pattern to be matched in a case-insensitive manner.
Online demo
You could split the string using a regular expression
String[] arr = {"card","creditcard","debitcard"};
String inputStr = "need to discard pin";
List<String> wordsToBeChecked = Arrays.asList(inputStr.split("[ 0-9]"));
Arrays.stream(arr).anyMatch(wordsToBeChecked::contains);
If your word list and input string is longer, consider splitting your input string into a hashset. Looksups will be faster, then:
Set<String> wordsToBeChecked = new HashSet<>(Arrays.asList(inputStr.split(" ")));
You can create a Set of the words in inputStr and then check the words list against that Set.
Set<String> inputWords = uniqueWords(inputStr);
List<String> matchedWords = Arrays.stream(arr)
.anyMatch(word -> inputWords.contains(word))
.collect(Collectors.toList());
Building the Set may be non-trivial if you have to account for hyphenation, numbers, punctuation, and so forth. I'll wave my hands and ignore that - here's a naive implementation of uniqueWords(String) that assumes they are separated by spaces.
public Set<String> uniqueWords(String string) {
return Arrays.stream(string.split(" "))
.collect(Collectors.toSet());
}
One way would be
String[] arr = {"card","creditcard","debitcard"};
String inputStr = "need to discard pin";
var contains = Arrays.stream(inputStr.split(" ")).anyMatch(word -> Arrays.asList(arr).contains(word));
You can adjust the split regex to include all kinds of whitespace too.
Also: Consider an appropriate data structure for lookups. Array will be O(n), HashSet will be O(1).

How I can use regex to implement contains functionality?

P.S : If you don't understand anything from the below I describe, please ask me
I have a Dictionary with the list of words.
And I have String of one word with multiple characters.
Eg: Dictionary =>
String[] = {"Manager","age","range", "east".....} // list of words in dictionary
Now I have one string tageranm.
I have to find all the words in the dictionary which can be made using this string. I have been able to find the solution using create all string using Permuation and verify the string is present in the dictionary.
But I have another solution, but dint know how I can do it in Java using Regex
Algorithm:
// 1. Sort `tageranm`.
char c[] = "tageranm".toCharArray();
Arrays.sort(c);
letters = String.valueOf(c); // letters = "aaegmnrt"
2.Sort all words in dictionary:
Example: "range" => "aegnr" // After sorting
Now If I will use "aaegmnrt".contains("aegnr") will return false. As 'm' is coming in between.
Is there a way to use Regex and ignore the character m and get all the words in dictionary using the above approach?
Thanks in advance.
Here is a possible solution, using the regex-type stated by #MattTimmermans in the comments. It's not very fast though, so there are probably loads of ways to improve this.. I'm also pretty sure there should be libraries for this kind of searches, which will (hopefully) have used performance-reducing algorithms.
java.util.List<String> test(String[] words, String input){
java.util.List<String> result = new java.util.ArrayList<>();
// Sort the characters in the input-String:
byte[] inputArray = input.getBytes();
java.util.Arrays.sort(inputArray);
String sortedInput = new String(inputArray);
for(String word : words){
// Sort the characters of the word:
byte[] wordArray = word.getBytes();
java.util.Arrays.sort(wordArray);
String sortedWord = new String(wordArray);
// Create a regex to match from this word:
String wordRegex = ".*" + sortedWord.replaceAll(".", "$0.*");
// If the input matches this regex:
if(sortedInput.matches(wordRegex))
// Add the word to the result-List:
result.add(word);
}
return result;
}
Try it online (with added DEBUG-lines to see what's happening).
For your inputs {"Manager","age","range", "east"} and "tageranm" it will return ["age", "range"].
EDIT: Doesn't match Manager because the M is in uppercase. If you want case-insensitive matching, the easiest it to convert both the input and words to the same case before checking:
input.getBytes() becomes input.toLowerCase().getBytes()
word.getBytes() becomes word.toLowerCase().getBytes()
Try it online (now resulting in ["Manager", "age", "range"]).

Java Regex expression to match and store any integers

Right now, using Java, I just want it to be able to tokenize any string of integers to an array
input = 1dsa23f hj23nma9123
array = 1,23,23,9123;
I have been trying a few different ways to do it, string.matches("") and then tokenising after it's in the right format and what not but it is too limiting to the user.
It looks like you are looking for something like
String[] nums = text.split("\\D+");
\D regex is negation of \d (it is like [^\d]) which means \D+ will match one or more non-digits.
Only problem with this solution is that if your text start with non-digits result array will start with one empty string.
If you still want to use split then you can simply remove that non-digits part from start of your text.
String[] nums = text.replaceFirst("^\\D+","").split("\\D+");
Other approach than split which is focusing on finding delimiters would be focusing on finding parts which are interesting to us. So instead of searching for non-digits lets find digits.
We can do it in few ways like Patter/Matcher#find, or with Scanner. Problem here is that these approaches don't return array but single elements which you would need to store in some resizeable structure like List.
So solution using Pattern and Matcher could look like:
List<String> numbers = new ArrayList<>();
Matcher m = Pattern.compile("\\d+").matcher(yourText);
while(m.find()){
numbers.add(m.group());
}
Solution using Scanner is similar, we just need to set proper delimiter (to non-digit) and read everything which is not delimiter (delimiters at start of text will be ignored which will should prevent returning empty strings).
List<String> nums = new ArrayList<>();
Scanner sc = new Scanner(yourText);
sc.useDelimiter("\\D+");
while(sc.hasNext()){
nums.add(sc.next());
}
final String input = "1dsa23f hj23nma9123";
final String[] parts = input.split("[^0-9]+");
for (final String s: parts) {
final int i = Integer.parseInt(s);
}

Regex pattern for String with multiple leading and trailing ones and zeroes

I have a search String which contains the format below:
Search String
111651311
111651303
4111650024
4360280062
20167400
It needs to be matched with sequence of numbers below
001111651311000
001111651303000
054111650024000
054360280062000
201674000000000
Please note the search strings have been added with additional numbers either on each sides.
I have tried the regex below in java to match the search strings but it only works for some.
Pattern pattern = Pattern.compile("([0-9])\1*"+c4MIDVal+"([0-9])\1*");
Any advice ?
Update
Added the code I used below might provide some clarity on what am trying to do
Code Snippet
public void compare(String fileNameAdded, String fileNameToBeAdded){
List<String> midListAdded = readMID.readMIDAdded(fileNameAdded);
HashMap<String, String> midPairsToBeAdded = readMID.readMIDToBeAdded(fileNameToBeAdded);
List <String []> midCaptured = new ArrayList<String[]>();
for (Map.Entry<String, String> entry: midPairsToBeAdded.entrySet()){
String c4StoreKey = entry.getKey();
String c4MIDVal = entry.getValue();
Pattern pattern = Pattern.compile("([0-9]?)\\1*"+c4MIDVal+"([0-9]?)\\2*");
for (String mid : midListAdded){
Matcher match = pattern.matcher(mid);
// logger.info("Match Configured MID :: "+ mid+ " with Pattern "+"\\*"+match.toString()+"\\*");
if (match.find()){
midCaptured.add(new String []{ c4StoreKey +"-"+c4MIDVal, mid});
}
}
}
logger.info(midCaptured.size()+ " List of Configured MIDs ");
for (String [] entry: midCaptured){
logger.info(entry[0]+ "- "+entry[1] );
}
}
You need to refer the second capturing group in the second part and also you need to make both the patterns inside the capturing group as optional.
Pattern pattern = Pattern.compile("([0-9]?)\\1*"+c4MIDVal+"([0-9]?)\\2*");
DEMO
What is the problem by using the String.contains() method?
"001111651311000".contains("111651311"); // true
"201674000000000".contains("111651311"); // false

how to replace parts of string using regular expressions

I am not a beginner to regular expressions, but their use in perl seems a bit different than in Java.
Anyways, I basically have a dictionary of shorthand words and their definitions. I want to iterate over words in the dictionary and replace them with their meanings. what is the best way to do this in JAVA?
I have seen String.replaceAll(), String.replace(), as well as the Pattern/Matcher classes. I wish to do a case insensitive replacement along the lines of:
word =~ s/\s?\Q$short_word\E\s?/ \Q$short_def\E /sig
While I am at it, do you think that it is best to extract all the words from the string and then apply my dictionary or just apply the dictionary to the string? I know that I need to be careful, because the shorthand words could match parts of other shorthand meanings.
Hopefully this all makes sense.
Thanks.
Clarification:
Dictionary is something like:
lol:laugh out loud, rofl:rolling on the floor laughing, ll:like lemons
string is:
lol, i am rofl
replaced text:
laugh out loud, i am rolling on the floor laughing
notice how the ll wasnt added anywhere
The danger is false positives inside of normal words. "fell" != "felikes lemons"
One way is to split the words on whitespace (do multiple spaces need to be conserved?) then loop over the List performing the 'if contains() { replace } else { output original } idea above.
My output class would be a StringBuffer
StringBuffer outputBuffer = new StringBuffer();
for(String s: split(inputText)) {
outputBuffer.append( dictionary.contains(s) ? dictionary.get(s) : s);
}
Make your split method smart enough to return word delimiters also:
split("now is the time") -> now,<space>,is,<space>,the,<space><space>,time
Then you don't have to worry about conserving white space - the loop above will just append anything that isn't a dictionary word to the StringBuffer.
Here's a recent SO thread on retaining delimiters when regexing.
If you insist on using regex, this would work (taking Zoltan Balazs' dictionary map approach):
Map<String, String> substitutions = loadDictionaryFromSomewhere();
int lengthOfShortestKeyInMap = 3; //Calculate
int lengthOfLongestKeyInMap = 3; //Calculate
StringBuffer output = new StringBuffer(input.length());
Pattern pattern = Pattern.compile("\\b(\\w{" + lengthOfShortestKeyInMap + "," + lengthOfLongestKeyInMap + "})\\b");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String candidate = matcher.group(1);
String substitute = substitutions.get(candidate);
if (substitute == null)
substitute = candidate; // no match, use original
matcher.appendReplacement(output, Matcher.quoteReplacement(substitute));
}
matcher.appendTail(output);
// output now contains the text with substituted words
If you plan to process many inputs, pre-compiling the pattern is more efficient than using String.split(), which compiles a new Pattern each call.
(edit) Compiling all of the keys into a single pattern yields a more efficient approach, like so:
Pattern pattern = Pattern.compile("\\b(lol|rtfm|rofl|wtf)\\b");
// rest of the method unchanged, don't need the shortest/longest key stuff
This allows the regex engine to skip over any words that happen to be short enough but aren't in the list, saving you a lot of map accesses.
The first thing, that comes into my mind is this:
...
// eg: lol -> laugh out loud
Map<String, String> dictionatry;
ArrayList<String> originalText;
ArrayList<String> replacedText;
for(String string : originalText) {
if(dictionary.contains(string)) {
replacedText.add(dictionary.get(string));
} else {
replacedText.add(string);
}
...
Or you could use a StringBuffer instead of the replacedText.

Categories

Resources