Convert Python to Java 8 - java

I’m working in a project that use java 8, this project is about get some geographic information and work with this.
I already have done part of this work in python, and now I’m translating this part did in Python to java 8, well in Python I use this lines bellow to convert coordinates in Google format to Postgis format:
s1 = tuple(value.split(" "))
s2 = zip(s1[1::2], s1[::2])
For example:
I have a entrance like: value = "11.12345679 12.987655 11.3434454 12.1223323" and so on
The Python code above changes de entrance to:
s2 = "12.987655 11.12345679 12.1223323" and so on.
Changing the position of each coordinate pair, each entrance have thousands of coordinates.
To get the same effect with java (before java 8):
Using my knowledge of java (acquired before the java 8) I will need do that:
try {
String result = "", right = "", left = "";
String[] txt = str.split(" ");
for (int i = 0; i < txt.length; i += 2) {
right = txt[i];
left = txt[i + 1];
result += "," + left + " " + right;
}
return result.substring(1);
} catch (ArrayIndexOutOfBoundsException e) {
return null;
}
I will execute the java code above thousands of times, my question is: Java 8 has some new way to do this code above more like Python ?
My motivation to ask that question is because I came across with this news about Java 8:
List<String> someList = new ArrayList<>();
// Add some elements
someList.add("Generic (1.5)");
someList.add("Functional (8)");
// Open a stream
someList.stream()
// Turn all texts in Upper Case
.map(String::toUpperCase)
// Loop all elemnst in Upper Case
.forEach(System.out::println);
Updating:
The solution of Jean-François Savard was perfect using Java 8 like I asked, thank you so much Jean-Francois Savard
String str = "11.12345679 12.987655 11.3434454 12.1223323 11.12345679 12.987655 11.3434454 12.1223323";
String[] strs = str.split(" ");
str = IntStream.range(0, strs.length) .filter(i -> i % 2 == 0) .mapToObj(i -> strs[i + 1] + " " + strs[i]) .collect(Collectors.joining(","));
System.out.println(str);
>> 12.987655 11.12345679,12.1223323 11.3434454,12.987655 11.12345679,12.1223323 11.3434454
The solution shown by Vampire and Tukayi fit perfectly in my problem, thanks a lot guys
String str = "11.12345679 12.987655 11.3434454 12.1223323 11.12345679
12.987655 11.3434454 12.1223323";
str = str.replaceAll("([^\\s]+) ([^\\s]+)(?: |$)", ",$2 $1").substring(1);
System.out.println(str);

Define the following in your class to precompile a Regex pattern
private static final Pattern pattern = Pattern.compile("([^ ]++) ([^ ]++)(?: |$)");
Then in your method use
if ((new StringTokenizer(str, " ").countTokens() % 2) == 1) {
return null;
}
return pattern.matcher(str).replaceAll(",$2 $1").substring(1);
to get the same result as in your original code.
If you depend on using Streams why-o-ever, here a Streams solution
String[] strs = str.split(" ");
return IntStream.range(0, strs.length)
.filter(i -> i % 2 == 0)
.mapToObj(i -> strs[i + 1] + " " + strs[i])
.collect(Collectors.joining(","));

Java 8 added (among other things) Lambdas, Streams and Functional interfaces.
You can use streams to simplify looping over objects. But the syntax like you see in Python isn't the same as in java like that.

Related

Can not count how many number of unique date are available in every part of string

I divided my string in three part using newline ('\n'). The output that i want to achieve: count how many number of unique date are available in every part of string.
According to below code, first part contains two unique date, second part contains two and third part contains three unique date. So the output should be like this: 2,2,3,
But after run this below code i get this Output: 5,5,5,5,1,3,1,
How do i get Output: 2,2,3,
Thanks in advance.
String strH;
String strT = null;
StringBuilder sbE = new StringBuilder();
String strA = "2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-11,2021-03-11,2021-03-11,2021-03-11,2021-03-11," + '\n' +
"2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-15,2021-03-15,2021-03-15,2021-03-15,2021-03-15," + '\n' +
"2021-03-02,2021-03-09,2021-03-07,2021-03-09,2021-03-09,";
String[] strG = strA.split("\n");
for(int h=0; h<strG.length; h++){
strH = strG[h];
String[] words=strH.split(",");
int wrc=1;
for(int i=0;i<words.length;i++) {
for(int j=i+1;j<words.length;j++) {
if(words[i].equals(words[j])) {
wrc=wrc+1;
words[j]="0";
}
}
if(words[i]!="0"){
sbE.append(wrc).append(",");
strT = String.valueOf(sbE);
}
wrc=1;
}
}
Log.d("TAG", "Output: "+strT);
I would use a set here to count the duplicates:
String strA = "2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-11,2021-03-11,2021-03-11,2021-03-11,2021-03-11" + "\n" +
"2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-15,2021-03-15,2021-03-15,2021-03-15,2021-03-15" + "\n" +
"2021-03-02,2021-03-09,2021-03-07,2021-03-09,2021-03-09";
String[] lines = strA.split("\n");
List<Integer> counts = new ArrayList<>();
for (String line : lines) {
counts.add(new HashSet<String>(Arrays.asList(line.split(","))).size());
}
System.out.println(counts); // [2, 2, 3]
Note that I have done a minor cleanup of the strA input by removing the trailing comma from each line.
With Java 8 Streams, this can be done in a single statement:
String strA = "2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-11,2021-03-11,2021-03-11,2021-03-11,2021-03-11," + '\n' +
"2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-15,2021-03-15,2021-03-15,2021-03-15,2021-03-15," + '\n' +
"2021-03-02,2021-03-09,2021-03-07,2021-03-09,2021-03-09,";
String strT = Pattern.compile("\n").splitAsStream(strA)
.map(strG -> String.valueOf(Pattern.compile(",").splitAsStream(strG).distinct().count()))
.collect(Collectors.joining(","));
System.out.println(strT); // 2,2,3
Note that Pattern.compile("\n").splitAsStream(strA) can also be written as Arrays.stream(strA.split("\n")), which is shorter to write, but creates an unnecessary intermediate array. Matter of personal preference which is better.
String strT = Arrays.stream(strA.split("\n"))
.map(strG -> String.valueOf(Arrays.stream(strG.split(",")).distinct().count()))
.collect(Collectors.joining(","));
The first version can be further micro-optimized by only compiling the regex once:
Pattern patternComma = Pattern.compile(",");
String strT = Pattern.compile("\n").splitAsStream(strA)
.map(strG -> String.valueOf(patternComma.splitAsStream(strG).distinct().count()))
.collect(Collectors.joining(","));

How to write a lambda expression when you are using a string array?

I want to use a lambda expression instead of a classic for.
String str = "Hello, Maria has 30 USD.";
String[] FORMAT = {"USD", "CAD"};
final String simbol = "$";
// This was the initial implementation.
// for (String s: FORMAT) {
// str = str.replaceAll(s + "\\s", "\\" + FORMAT);
// }
Arrays.stream(FORMAT).forEach(country -> {
str = str.replaceAll(country + "\\s", "\\" + simbol);
});
// and I tried to do like that, but I receiced an error
// "Variable used in lambda expression should be final or effectively final"
// but I don't want the str String to be final
For any string, I want to change the USD or CAD in $ simbol.
How can I changed this code to work ? Thanks in advance!
I see no problem with using a loop for this. That's how I'd likely do it.
You can do it with a stream using reduce:
str = Arrays.stream(FORMAT)
.reduce(
str,
(s, country) -> s.replaceAll(country + "\\s", Matcher.quoteReplacement(simbol)));
Or, easier:
str = str.replaceAll(
Arrays.stream(FORMAT).collect(joining("|", "(", ")")) + "\\s",
Matcher.quoteReplacement(simbol));
Consider using a traditional for loop, since you're changing a global variable:
for(String country: FORMAT) {
str = str.replaceAll(country + "\\s", "\\" + simbol);
}
Using Streams in this example will make things less readable.

Fast way to extract data from string

I have a response from my OkHttpClient like:
{"CUSTOMER_ID":"928941293291"}
{"CUSTOMER_ID":"291389218398"}
{"CUSTOMER_ID":"1C4DC4FC-02Q9-4130-S12B-762D97FS43C"}
{"CUSTOMER_ID":"219382198"}
{"CUSTOMER_ID":"282828"}
{"CUSTOMER_ID":"21268239813"}
{"CUSTOMER_ID":"1114445184"}
{"CUSTOMER_ID":"2222222222"}
{"CUSTOMER_ID":"99218492183921"}
I want to extract all customerId that are of type Long (then skip 1C4DC4FC-02Q9-4130-S12B-762D97FS43C) between a minId and maxId.
This is my implementation:
final List<String> customerIds = Arrays.asList(response.body().string()
.replace("CUSTOMER_ID", "")
.replace("\"", "")
.replace("{", "").replace(":", "")
.replace("}", ",").split("\\s*,\\s*"));
for (final String id : customerIds) {
try {
final Long idParsed = Long.valueOf(id);
if (idParsed > minId && idParsed < maxId) {
ids.add(idParsed);
}
} catch (final NumberFormatException e) {
logger.debug("NumberFormatException", e);
}
}
I have a long list of customerId (around 1M) then performance are really important. This is best implementation of my behaviour?
I would use a BufferedReader to read the string line by line
https://www.mkyong.com/java/how-to-read-file-from-java-bufferedreader-example/
The for each line I would reduce amount of replaces
String id= line.replace({"CUSTOMER_ID":", "");
id = id.substring(0, id.length-2); //to avoid one more replace
and then apply the attempt to parse long logic, adding successfull attempts to a list.
Since you have a big file, then reading the content line by line can be a way to go, and dont replace the CUSTOMER_ID, but instead define a better regex pattern.
Following your approach: replace USER_ID and use regex:
String x = "{\"CUSTOMER_ID\":\"928941293291\"}{\"CUSTOMER_ID\":\"291389218398\"}{\"CUSTOMER_ID\":\"1C4DC4FC-02Q9-4130-S12B-762D97FS43C\"}"
+ "{\"CUSTOMER_ID\":\"99218492183921\"}";
x = x.replaceAll("\"CUSTOMER_ID\"", "");
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(x);
while (m.find()) {
System.out.println(m.group(1));
}
or implement a regex that matches all between :" and "}
String x = "{\"CUSTOMER_ID\":\"928941293291\"}{\"CUSTOMER_ID\":\"291389218398\"}{\"CUSTOMER_ID\":\"1C4DC4FC-02Q9-4130-S12B-762D97FS43C\"}"
+ "{\"CUSTOMER_ID\":\"99218492183921\"}";
Pattern p = Pattern.compile(":\"([^\"]*)\"}");
Matcher m = p.matcher(x);
while (m.find()) {
System.out.println(m.group(1));
}
so no need to replace CUSTOMER_ID
Try to avoid exceptions! When 10%-20% of your number parsing fails then it needs 10x more time to execute and it (you can write a litte test for it).
If your input is exactly like you showed it you should use cheap operations:
Read the file with a BufferedReader line by line (like mentioned before) or (if you have the whole data as string) us a StringTokenizer to handle each line separat.
Every line starts with {"CUSTOMER_ID":" and ends with "}. Don't use replace or regex (which is even worse) to remove this! Just use one simple substring:
String input = line.substring(16, line.length() - 2)
To avoid exceptions you need to find metrics to distinguish between id and a UUID(?) so your parsing works without exception. For example your ids will be positiv but your UUID contains minus signs, or a long can only contain 20 digits but your UUID contains 35 characters. So it's a simple if-else instead of try-catch.
For those who think its bad to not catch NumberFormatException when parsing numbers: in case there is an id which can not be parsed the whole file is corrupt which means you shouldn't try to continue but fail hard.
This is a little test to see the performance difference between catching exceptions and testing the input:
long REPEATS = 1_000_000, startTime;
final String[] inputs = new String[]{"0", "1", "42", "84", "168", "336", "672", "a-b", "1-2"};
for (int r = 0; r < 1000; r++) {
startTime = System.currentTimeMillis();
for (int i = 0; i < REPEATS; i++) {
try {
Integer.parseInt(inputs[i % inputs.length]);
} catch (NumberFormatException e) { /* ignore */ }
}
System.out.println("Try: " + (System.currentTimeMillis() - startTime) + " ms");
startTime = System.currentTimeMillis();
for (int i = 0; i < REPEATS; i++) {
final String input = inputs[i % inputs.length];
if (input.indexOf('-') == -1)
Integer.parseInt(inputs[i % inputs.length]);
}
System.out.println("If: " + (System.currentTimeMillis() - startTime) + " ms");
}
My results are:
~20ms (testing) and ~200ms (catching) with 20% invalid input.
~22ms (testing) and ~130ms (catching) with 10% invalid input.
Those kinds of performance tests are easy to do right because of JIT or other optimizations. But I think you can see a direction.
You can use Files.lines() to stream the data from your file. Here I demonstrate using a stream from a List.
List<String> sample = Arrays.asList(
"{\"CUSTOMER_ID\":\"928941293291\"}",
"{\"CUSTOMER_ID\":\"291389218398\"}",
"{\"CUSTOMER_ID\":\"1C4DC4FC-02Q9-4130-S12B-762D97FS43C\"}",
"{\"CUSTOMER_ID\":\"219382198\"}",
"{\"CUSTOMER_ID\":\"282828\"}",
"{\"CUSTOMER_ID\":\"21268239813\"}",
"{\"CUSTOMER_ID\":\"1114445184\"}",
"{\"CUSTOMER_ID\":\"2222222222\"}",
"{\"CUSTOMER_ID\":\"99218492183921\"}"
);
static final long MIN_ID = 1000000L;
static final long MAX_ID = 1000000000000000000L;
public void test() {
sample.stream()
// Extract CustomerID
.map(s -> s.substring("{\"CUSTOMER_ID\":\"".length(), s.length() - 2))
// Remove any bad ones - such as UUID.
.filter(s -> s.matches("[0-9]+"))
// Convert to long - assumes no number too big, add a further filter for that.
.map(s -> Long.valueOf(s))
// Apply limits.
.filter(l -> MIN_ID <= l && l <= MAX_ID)
// For now - just print them.
.forEach(s -> System.out.println(s));
}
First you should be trying to read the file line by line. Then from each line you should extract the id if it's matching to the pattern and collect it into an array. Here's similar solution implemented in python.
import re
# Open the file
with open('cids.json') as f:
# Read line by line
for line in f:
try:
# Try to extract matching id with regex pattern
_id = re.search('^{[\w\W]+:"([A-Z\d]+-[A-Z\d]+-[A-Z\d]+-[A-Z\d]+-[A-Z\d]+)"}', line).group(1)
customer_ids.append(_id)
except:
print('No match')
You can ignore all non numeric fields
long[] ids =
Stream.of(response.body().string().split("\""))
.mapToLong(s -> parseLong(s))
.filter(l -> l > minId && i < maxId)
.toArray();
static long parseLong(String s) {
try {
if (!s.isEmpty() && Character.isDigit(s.charAt(0)))
return Long.parseLong(s);
} catch (NumberFormatException expected) {
}
return Long.MIN_VALUE
}
Or if you are using Java 7
List<Long> ids = new ArrayList<>();
for (String s : response.body().string().split("\"")) {
long id = parseLong(s);
if (id > minId && id < maxId)
ids.add(id);
}

Splitting a string based on " " and spaces [duplicate]

This question already has answers here:
Regular Expression to Split String based on space and matching quotes in java
(3 answers)
Closed 8 years ago.
I have a String str, which is comprised of several words separated by single spaces.
If I want to create a set or list of strings I can simply call str.split(" ") and I would get I want.
Now, assume that str is a little more complicated, for example it is something like:
str = "hello bonjour \"good morning\" buongiorno";
In this case what is in between " " I want to keep so that my list of strings is:
hello
bonjour
good morning
buongiorno
Clearly, if I used split(" ") in this case it won't work because I'd get
hello
bonjour
"good
morning"
buongiorno
So, how do I get what I want?
You can create a regex that finds every word or words between "".. like:
\w+|(\"\w+(\s\w+)*\")
and search for them with the Pattern and Matcher classes.
ex.
String searchedStr = "";
Pattern pattern = Pattern.compile("\\w+|(\\\"\\w+(\\s\\w+)*\\\")");
Matcher matcher = pattern.matcher(searchedStr);
while(matcher.find()){
String word = matcher.group();
}
Edit: works for every number of words within "" now. XD forgot that
You can do something like below. First split the Sting using "\"" and then split the remaining ones using space" " . The even tokens will be the ones between quotes "".
public static void main(String args[]) {
String str = "hello bonjour \"good morning\" buongiorno";
System.out.println(str);
String[] parts = str.split("\"");
List<String> myList = new ArrayList<String>();
int i = 1;
for(String partStr : parts) {
if(i%2 == 0){
myList.add(partStr);
}
else {
myList.addAll(Arrays.asList(partStr.trim().split(" ")));
}
i++;
}
System.out.println("MyList : " + myList);
}
and the output is
hello bonjour "good morning" buongiorno
MyList : [hello, bonjour, good morning, buongiorno]
You may be able to find a solution using regular expressions, but what I'd do is simply manually write a string breaker.
List<String> splitButKeepQuotes(String s, char splitter) {
ArrayList<String> list = new ArrayList<String>();
boolean inQuotes = false;
int startOfWord = 0;
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == splitter && !inQuotes && i != startOfWord) {
list.add(s.substring(startOfWord, i));
startOfWord = i + 1;
}
if (s.charAt(i) == "\"") {
inQuotes = !inQuotes;
}
}
return list;
}

Trim() in Java not working the way I expect? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Query about the trim() method in Java
I am parsing a site's usernames and other information, and each one has a bunch of spaces after it (but spaces in between the words).
For example: "Bob the Builder " or "Sam the welder ". The numbers of spaces vary from name to name. I figured I'd just use .trim(), since I've used this before.
However, it's giving me trouble. My code looks like this:
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).trim());
}
The result is just the same; no spaces are removed at the end.
Thank you in advance for your excellent answers!
UPDATE:
The full code is a bit more complicated, since there are HTML tags that are parsed out first. It goes exactly like this:
for (String s : splitSource2) {
if (s.length() > "<td class=\"dddefault\">".length() && s.substring(0, "<td class=\"dddefault\">".length()).equals("<td class=\"dddefault\">")) {
splitSource3.add(s.substring("<td class=\"dddefault\">".length()));
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).substring(0, splitSource3.get(i).length() - 5));
splitSource3.set(i, splitSource3.get(i).trim());
System.out.println(i + ": " + splitSource3.get(i));
}
}
UPDATE:
Calm down. I never said the fault lay with Java, and I never said it was a bug or broken or anything. I simply said I was having trouble with it and posted my code for you to collaborate on and help solve my issue. Note the phrase "my issue" and not "java's issue". I have actually had the code printing out
System.out.println(i + ": " + splitSource3.get(i) + "*");
in a for each loop afterward.
This is how I knew I had a problem.
By the way, the problem has still not been fixed.
UPDATE:
Sample output (minus single quotes):
'0: Olin D. Kirkland                                          '
'1: Sophomore                                          '
'2: Someplace, Virginia  12345<br />VA SomeCity<br />'
'3: Undergraduate                                          '
EDIT the OP rephrased his question at Query about the trim() method in Java, where the issue was found to be Unicode whitespace characters which are not matched by String.trim().
It just occurred to me that I used to have this sort of issue when I worked on a screen-scraping project. The key is that sometimes the downloaded HTML sources contain non-printable characters which are non-whitespace characters too. These are very difficult to copy-paste to a browser. I assume that this could happened to you.
If my assumption is correct then you've got two choices:
Use a binary reader and figure out what those characters are - and delete them with String.replace(); E.g.:
private static void cutCharacters(String fromHtml) {
String result = fromHtml;
char[] problematicCharacters = {'\000', '\001', '\003'}; //this could be a private static final constant too
for (char ch : problematicCharacters) {
result = result.replace(ch, ""); //I know, it's dirty to modify an input parameter. But it will do as an example
}
return result;
}
If you find some sort of reoccurring pattern in the HTML to be parsed then you can use regexes and substrings to cut the unwanted parts. E.g.:
private String getImportantParts(String fromHtml) {
Pattern p = Pattern.compile("(\\w*\\s*)"); //this could be a private static final constant as well.
Matcher m = p.matcher(fromHtml);
StringBuilder buff = new StringBuilder();
while (m.find()) {
buff.append(m.group(1));
}
return buff.toString().trim();
}
Works without a problem for me.
Here your code a bit refactored and (maybe) better readable:
final String openingTag = "<td class=\"dddefault\">";
final String closingTag = "</td>";
List<String> splitSource2 = new ArrayList<String>();
splitSource2.add(openingTag + "Bob the Builder " + closingTag);
splitSource2.add(openingTag + "Sam the welder " + closingTag);
for (String string : splitSource2) {
System.out.println("|" + string + "|");
}
List<String> splitSource3 = new ArrayList<String>();
for (String s : splitSource2) {
if (s.length() > openingTag.length() && s.startsWith(openingTag)) {
String nameWithoutOpeningTag = s.substring(openingTag.length());
splitSource3.add(nameWithoutOpeningTag);
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
String name = splitSource3.get(i);
int closingTagBegin = splitSource3.get(i).length() - closingTag.length();
String nameWithoutClosingTag = name.substring(0, closingTagBegin);
String nameTrimmed = nameWithoutClosingTag.trim();
splitSource3.set(i, nameTrimmed);
System.out.println("|" + splitSource3.get(i) + "|");
}
I know that's not a real answer, but i cannot post comments and this code as a comment wouldn't fit, so I made it an answer, so that Olin Kirkland can check his code.

Categories

Resources