Extract string and number combination from a larger string - java

I have a number of large strings looking like this:
"text(24), text_2(5), text_4(822)..."
I'm trying to check if a specific text exists and get the corresponding value.
Is there a quick way to do this?
Edit:
I have an array with all possible text values. At the moment I use foreach to check for text values.
I have the string text_2 and what I need is the corresponding 5 as an integer.

You can use regex to extract all the text element from the String and store them into a map, e.g:
String s = "text(24), text_2(5), text_4(822)";
Pattern pattern = Pattern.compile("([a-zA-Z]*(_)?[0-9]*\\([0-9]+\\))");
Matcher matcher = pattern.matcher(s);
Map<String, Integer> valuesMap = new HashMap<>();
while(matcher.find()){
String[] tokens = matcher.group().split("(?=\\([0-9]+\\),?)");
String key = tokens[0];
Integer value = Integer.parseInt(tokens[1].substring(1, tokens[1].length() - 1));
valuesMap.put(key, value);
}
System.out.println(valuesMap);
Once done, you can call valuesMap.get("test_2"); to get the corresponding value. This is how the above example works:
It splits text into tokens containing <text>(<Value)
It then splits each token again, into text and value and places these into a Map.

Since you need to do this a number of times. I suggest you split the string and build a map from the text to its value, this is O(n). After that, your lookups are only O(1) if you use HashMap.
String text = "text(24), text_2(5), text_4(822)";
Map<String, Integer> map = new HashMap<>();
String[] split = text.split(", ");
for(String s:split){
//search for the position of "(" and ")"
int start = 0;
int end = s.length()-1;
while(s.charAt(start) != '(')
start++;
while(s.charAt(end) != ')')
end--;
//put string and matching value in the map
map.put(s.substring(0, start), Integer.parseInt(s.substring(start+1, end)));
}
System.out.println(map);
I also ran some benchmarks for a string containing 10000 entries. And this approach was about 4 times faster than a regex approach. (38 ms vs 163 ms)

Related

Replace certain string's values dynamically

I have a HashMap<Integer, Double> which looks something similar like this:
{260=223.118,50, 261=1889,00, 262=305,70, 270=308,00}
From database I take a string that could look something like this:
String result = "(260+261)-(262+270)";
I want to change the string's values of 260, 261, 262... (which are always the same with the HashMap's keys) with the values so I could get a string like:
String finRes = "(223.118,50+1889,00)-(305,70+308,00)";
Also the string result can contain multiplication and division characters.
A simple regex solution here would be to match your input string against the pattern (\d+). This should yield all the integers in the arithmetic string. Then, we can lookup each match, converted to an integer, in the map to obtain the corresponding double value. Since the desired output is again a string, we have to convert the double back to a string.
Map<Integer, Double> map = new HashMap<>();
map.put(260, 223.118);
map.put(261, 1889.00);
map.put(262, 305.70);
map.put(270, 308.00);
String input = "(260+261)-(262+270)";
String result = input;
String pattern = "(\\d+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, String.valueOf(map.get(Integer.parseInt(m.group(1)))));
}
m.appendTail(sb);
System.out.println(sb.toString());
Output:
(223.118+1889.0)-(305.7+308.0)
Demo here:
Rextester
here is an explained solution:
// your hashmap that contains data
HashMap<Integer,Double> myHashMap = new HashMap<Integer,Double>();
// fill your hashmap with data ..
..
// the string coming from the Database
String result = "(260+261)-(262+270)";
// u will iterate all the keys of your map and replace each key by its value
for(Integer n : myHashMap.keySet()) {
result = result.replace(n,Double.toString(myHashMap.get(n)));
}
// the String variable 'result' will contains the new String
Hope it helps :)

Is there a way to put lines on a vector?

I have a problem, I'm making a simple program in Java that reads a regular expression like this in a txt file:
Name:{[A-z][0-9]}
Phone:{[0-9]}
but I'm looking a way to split the name of the regex and put it on a vector:
(0) [Name:]
(1) [Phone:]
and another vector that it contents the regular expression like this:
(0) [[A-z][0-9]]
(1) [[0-9]]
The principal idea is splitting the name of the regex and the regex on separated vectors by line of the txt archive, but the problem is that is a n number of expressions on the txt file.
Is there an example or something?
Thank you.
Rather than storing two parallel vectors, I'd recommend using a Map data structure for this purpose. If you need to preserve the order of elements from the file for some reason, you can use a LinkedHashMap. For example:
Map<String, String> regexesByName = new LinkedHashMap<>();
for (String line : linesFromFile) {
int index = line.indexOf(':');
if (index < 0) {
throw new IllegalArgumentException("Cannot parse: " + line);
}
String name = line.substring(0, index);
String regex = line.substring(index + 1);
regexesByName.put(name, regex);
}
Then to enumerate all the name/regex pairs, you can do:
for (Map.Entry<String, String> entry : regexesByName.entrySet()) {
// entry.getKey() is the name, entry.getValue() is the regex
}
And to look up the regex for a given name, just do:
String regex = regexesByName.get(name);
If what you want is to later convert the regexes into compiled Pattern objects, that's easily doable as well:
Map<String, Pattern> compiledRegexesByName = new LinkedHashMap<>();
for (Map.Entry<String, String> entry : regexesByName.entrySet()) {
compiledRegexesByName.put(
entry.getKey(), Pattern.compile(entry.getValue()));
}

split a string with multiple alphabets and letters - android

Very new to android java and I have a simple question. I have string for example like this :
P:38,AS:31,DT:231,AR:21
I want to split this into 4 different lists in the form :
P(list) = 38
AS(list) = 31
DT(list) = 231
AR(list) = 21
I tried split but it didnt get the job done ...
As long as the keys are always letters and the values are always integers, you can use regular expressions to parse these strings:
Hashtable<String, int[]> result = new Hashtable<String, int[]>();
Pattern pattern = Pattern.compile("([A-Z]+):(\\d+(?:,\\d+)*)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String key = matcher.group(1);
String[] fields = matcher.group(2).split(",");
int[] values = new int[fields.length];
for (int i=0; i<values.length; i++)
values[i] = Integer.parseInt(fields[i]);
result.put(key, values);
}
Edit
"([A-Z]+):(\\d+(?:,\\d+)*)" is a regular expression that matches at least one uppercase letter ([A-Z]+) followed by a colon (:) followed by one more numbers separated by commas (\\d+(?:,\\d+)*). A single number is composed of one more digits (\\d+). The additional parentheses allow us to later access the individual parts of the input string using the group(int) method calls.
The java.util.regex.Matcher class allows us to iterate through the individual parts of the input string that match our regular expression. The find() method returns true as long as there is another substring in our input string that matches the regular expression. So with the input string "P:38,45,AS:31,DT:231,345,678,AR:21" the while loop would execute four times and the matcher variable would point to the following four substrings of the input string:
P:38,45
AS:31
DT:231,345,678
AR:21
We can then use the matcher's group(int) method to access the individual parts of each substring. matcher.group(1) accesses the text that was captured by the first parentheses of our regular expression (([A-Z]+)) which corresponds to "P", "AS", "DT", and "AR" in the individual loop iterations. Analogously, matcher.group(2) corresponds to the second parentheses of the regular expression ((\\d+(?:,\\d+)*)) which would return "38,45", "31", "231,345,678", and "21". So in the first iteration of the while loop key would hold "P" and fields would hold an array of strings ["38", "45"]. We can then parse the fields into actual integer values using Integer.parseInt(String) and store the key and the values in a Hashtable so that we can later retrieve the values for the individual keys. For example, result.get("DT") would return an array of integers with the values [231, 345, 678].
As #pobybolek said, you can use his method which he wrote to take the string and convert it into a hashtable, which uses the key and then the int values after it. The int values are stored in an array and the key is a string.
String input = master_string;
Hashtable<String, int[]> result = new Hashtable<String, int[]>();
Pattern pattern = Pattern.compile("([A-Z]+):(\\d+(?:,\\d+)*)");
Matcher matcher = pattern.matcher(input);
while (matcher.find())
{
String key = matcher.group(1);
String[] fields = matcher.group(2).split(",");
int[] values = new int[fields.length];
for (int pqr=0; pqr<values.length; pqr++)
{
values[pqr] = Integer.parseInt(fields[pqr]);
// values[pqr] = fields[pqr];
}
result.put(key, values);
}
the above code splits the given string into its keys and the values after the key into an integer array, this can also be changed into a String key, String[] array as seen in the second line of my code.

How to fetch string after the third slash in java

I am trying to fetch the string after the third slash. But i don' know how to do it. I have used split but that's not what I want.
for(String obj2: listKey.getCommonPrefixes()){
Map<String, String> map = new HashMap<String, String>();
String[] id = obj2.split("/");
if (id.length > 3) {
String name = id[3];
map.put("id", name);
map.put("date", "null");
map.put("size", String.valueOf(obj2.length()));
keys.add(map);
}
}
id[3] gives me only id[3] but i want everything after the third slash? how can i do that?
You can replace
String[] id = obj2.split("/");
by
String[] id = obj2.split("/", 4);
From the javadoc :
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Java Split string into array, by size and only split after delimiters

I have a many string wich are really randomly sized like : 5 chars to 12000 randomly.
Eg:
String 1 : A,b,C,d
String 2 :23,343,342,4535,4535,453,234,
String 3 : ,asdsfdfdasgfdsfsf,dsfdsfdsfdsfsdfdf,sdsfdsfdsfsdf, <- and this around another 1000 times.
I want to upload them to my database by their ID. So my problem is that the oracle database varchar can conatians only 4k bytes.
Edit:
So if the string bigger than 4k. I want a String[] where each element maximum 4000k characters lets count 3900. ( And ofc if i go throught the array I get back the same String, and each array element last "word" is a whole word not sliced)
So my idea is simply if the string.lenth <1000 then go.
else split it by ~4000 stocks but only split after coma.
My solution so far ( without coma care)
for (My_type type: types) {
String[] tokens =
Iterables.toArray(
Splitter
.fixedLength(4000)
.split(type.area),
String.class
);
how can I replace this function to get an "good array"?
I don't think split() is an option. I think you need to use a Matcher to consume as much input as possible, then build a list of captured sections:
Matcher matcher = Pattern.compile(".{1,3999}(,|.$)").matcher(input);
List<String> list = new ArrayList<>();
while (matcher.find())
list.add(matcher.group());
If you really want an array (not recommended)
String[] array = list.toArray(new String[list.size()]);
This regex is greedy and will consume up to 4000 chars that ends with a comma or the end of input. A length of 3999 is used to allow 1 more for the comma itself, and the dot before the end marker $ is to consume one more because $ is zero-width.
This will give you such tokens, in a List<> - hoping that's fine.
for (My_type type: types) {
String longString = type.area;
List<String> tokens = new ArrayList<>();
while (longString.length() > 4000) {
int splitIndex = longString.lastIndexOf(",", 3999);
if (splitIndex < 0) {
// no comma found
throw new IllegalStateException("Cannot split string");
}
tokens.add(longString.substring(0, splitIndex));
longString = longString.substring(splitIndex + 1); // leaving out the comma
}
if (tokens.size() == 0) {
tokens.add(longString);
}
}

Categories

Resources