In my java application I have a List<String> sbuff_Test = new ArrayList<>(); structure that I fill during the execution. When the sbuff_Test is ready, I put every string of it in a jTextArea. My output is something like that:
Please choose Node:
2 : Low s23_t0
1 : High s23_t0 (Id = 0)
* TESTPAD MAIN MENU (v10r0p0) *
----------------------------------
LowPT PAD s23_t0 on node 2
1 = Initialize PAD
2 = PRODE MENU
3 = TTC MENU
4 = CM MENU
5 = FPGA MENU
6 = LINK MENU
7 = CAN MENU
8 = ELMB MENU
9 = SPLITTER MENU
10 = Change CURRENT PAD
11 = Reset full PAD
12 = Warm Initialize PAD
13 = Change Pad configuration
14 = Change CM latencies
15 = Phase measurement
16 = Power ON/OFF
17 = Print PAD Status
18 = Measurement loop
19 = Read CM trigger frequencies
20 = fast check of locks
21 = TRIGGER MENU
22 = Read CM BC ids
23 = Test CM BC ids with prode
24 = Test CM BC ids with TTC
25 = Test init low-high
0 = Quit
TESTPAD: ELMB MENU
(1) ELMB reset
(2) power OFF/ON ELMB on Node 1
(3) ELMB firm/hard version
(4) set CAN-debug ON/OFF
(5) set the communication rate
(6) download XPG file into FLASH for localInit
(0) exit
2
Firmware Version SV22
Hardware Version pad8
TESTPAD: ELMB MENU
(1) ELMB reset
(2) power OFF/ON ELMB on Node 1
(3) ELMB firm/hard version
(4) set CAN-debug ON/OFF
(5) set the communication rate
(6) download XPG file into FLASH for localInit
(0) exit
Now, I want an hint on how extract only the text that I need; for example, for the text above:
LowPT PAD s23_t0 on node 2
Firmware Version SV22
Hardware Version pad8
The trouble is that the part of text that I must delete is variable and I can't find an approach for this problem. What do you suggest for a similar problem? Thanks for the hint.
EDIT:
To delete the unwanted phrase you just need to use matcher.replaceAll("") method, In this Example I will use the old patterns:
String text = jTextAreaName.getText();
//This is the list of the wanted groups
String[] patterns = new String[]{"(LowPT .+)[\\r\\n]", "(Firmware Version .+)[\\r\\n]", "(Hardware Version .+)[\\r\\n]"};
//Then delete the three matched groups like this
for(int i=0; i<patterns.length; i++) {
Pattern pattern = Pattern.compile(patterns[i]);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
text = matcher.replaceAll("");
}
}
Here's the Updated DEMO.
In that case you need to use a Regex Matcher and matching groups to only extract the wanted parts from it:
String text = jTextAreaName.getText();
//This is the list of the wanted groups
String[] patterns = new String[]{"(LowPT .+)[\\r\\n]", "(Firmware Version .+)[\\r\\n]", "(Hardware Version .+)[\\r\\n]"};
//Then extract the three matched groups like this
String myResult="";
for(int i=0; i<patterns.length; i++) {
//compile each matching group and find matches.
Pattern pattern = Pattern.compile(patterns[i]);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
myResult += matcher.group(1);
myResult += "\n";
}
}
This a Live DEMO where you can test it, giving the following result:
LowPT PAD s23_t0 on node 2
Firmware Version SV22
Hardware Version pad8
Explanation:
(LowPT .+)[\\r\\n] is a matching group for the line LowPT PAD s23_t0 on node 2.
(Firmware Version .+)[\\r\\n] is a matching group for the line Firmware Version SV22.
(Hardware Version .+)[\\r\\n] is a matching group for the line Hardware Version pad8.
If you only need LowPT,Firmware and Hardware Version ,read your file line by line.
If your line contains one of the above keywords print the current line and continue to the next one.
Related
I have below data smaple data but in real life this dataset is huge.
A B 1-1-2018 10
A B 2-1-2018 20
C D 1-1-2018 15
C D 2-1-2018 25
I need to group by above data using date and generate key pair values
1-1-2018->key
-----------------
A B 1-1-2018 10
C D 1-1-2018 15
2-1-2018->key
-----------------
A B 2-1-2018 20
C D 2-1-2018 25
Can anyone please tell me how can we do that in spark in best optimize way (using java if possible )
Not Java but looking at your code above it seems you wants recursively set your dataframes into sub-groups by Key. The best way I know how to do it is by a while loop and its not the easiest on the planet earth.
//You will also need to import all DataFrame and Array data types in Scala, don't know if you need to do it for Java for the below code.
//Inputting your DF, with columns as Value_1, Value_2, Key, Output_Amount
val inputDF = //DF From above
//Need to get an empty DF, I just like doing it this way
val testDF = spark.sql("select 'foo' as bar")
var arrayOfDataFrames = Array[DataFrame] = Array(testDF)
val arrayOfKeys = inputDF.selectExpr("Key").distinct.rdd.map(x=>x.mkString).collect
var keyIterator = 1
//Need to overwrite the foo bar first DF
arrayOfDataFrames = Array(inputDF.where($""===arrayOfKeys(keyIterator - 1)))
keyIterator = keyIterator + 1
//loop through find the key and place it into the DataFrames array
while(keyIterator <= arrayOfKeys.length) {
arrayOfDataFrames = arrayOfDataFrames ++ Array(inputDF.where($"Key"===arrayOfKeys(keyIterator - 1)))
keyIterator = keyIterator + 1
}
At the end of the command you will have two array of same length DataFrames and Keys that match. Meaning if you select the 3rd element of the Keys it matches the 3rd element of the DataFrames.
Since this isn't Java and doesn't directly answer your question, does this at least help push you in a direction that might help (I built it in Spark Scala).
I have been working on a program which makes use of Regular Expressions. It searches for some text in the files to give me a database based on the scores of different players.
Here is the sample of the text within which it searches.
ISLAMABAD UNITED 1st innings
Player Status Runs Blls 4s 6s S/R
David Warner lbw b. Hassan 19 16 4 0 118.8%
Joe Burns b. Morkel 73 149 16 0 49.0%
Kane Wiliiamson b. Tahir 135 166 28 2 81.3%
Asad Shafiq c. Rahane b. Morkel 22 38 5 0 57.9%
Kraigg Braithwaite c. Khan b. Boult 24 36 5 0 66.7%
Corey Anderson b. Tahir 18 47 3 0 38.3%
Sarfaraz Ahmed b. Morkel 0 6 0 0 0.0%
Tim Southee c. Hales b. Morkel 0 6 0 0 0.0%
Kyle Abbbott c. Rahane b. Morkel 26 35 4 0 74.3%
Steven Finn c. Hales b. Hassan 10 45 1 0 22.2%
Yasir Shah not out 1 12 0 0 8.3%
Total: 338/10 Overs: 92.1 Run Rate: 3.67 Extras: 10
Day 2 10:11 AM
-X-
I am using the following regex to get the different fields..
((?:\/)?(?:[A-Za-z']+)?\s?(?:[A-Za-z']+)?\s?(?:[A-Za-z']+)?\s?)\s+(?:lbw)?(?:not\sout)?(?:run\sout)?\s?(?:\(((?:[A-Za-z']+)?\s?(?:['A-Za-z]+)?)\))?(?:(?:st\s)?\s?(?:((?:['A-Za-z]+)\s(?:['A-Za-z]+)?)))?(?:c(?:\.)?\s((?:(?:['A-Za-z]+)?\s(?:[A-Za-z']+)?)?(?:&)?))?\s+(?:b\.)?\s+((?:[A-Za-z']+)\s(?:[A-Za-z']+)?)?\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)
Batsman Name - Group 1
Person Affecting Stumping (if any) - Group 2
Person Affecting RunOut (if any) - Group 3
Person Taking Catch (if any) - Group 4
Person Taking the wicket (if any) - Group 5
Runs Scored - Group 6
Balls Faced - Group 7
Fours Hit - Group 8
Sixes Hit - Group 9
Here is an example of the text I need to extract...
Group 0 contains David Warner lbw b. Hassan 19 16 4 0 118.8%
Group 1 contains 'David Warner'
Group 2 does not exist in this example
Group 3 does not exist in this example
Group 4 does not exist in this example
Group 5 contains 'Hassan'
Group 6 contains '19'
Group 7 contains '16'
Group 8 contains '4'
Group 9 contains '0'
When I try this on Regexr or Regex101, it gives the Group 1 as David Warner in the Group 1... But in my Java Program, it gives it as David. It is same for all results. I don't know why?
Here's the code of my program:
Matcher bat = Pattern.compile("((?:\\/)?(?:[A-Za-z']+)?\\s?(?:[A-Za-z']+)?\\s?(?:[A-Za-z']+)?\\s?)\\s+(?:lbw)?(?:not\\sout)?(?:run\\sout)?\\s?(?:\\(((?:[A-Za-z']+)?\\s?(?:['A-Za-z]+)?)\\))?(?:(?:st\\s)?\\s?(?:((?:['A-Za-z]+)\\s(?:['A-Za-z]+)?)))?(?:c(?:\\.)?\\s((?:(?:['A-Za-z]+)?\\s(?:[A-Za-z']+)?)?(?:&)?))?\\s+(?:b\\.)?\\s+((?:[A-Za-z']+)\\s(?:[A-Za-z']+)?)?\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)").matcher(batting.group(1));
while (bat.find()) {
batPos++;
Batsman a = new Batsman(bat.group(1).replace("\n", "").replace("\r", "").replace("S/R", "").replace("/R", "").trim(), batting.group(2));
if (bat.group(0).contains("not out")) {
a.bat(Integer.parseInt(bat.group(6)), Integer.parseInt(bat.group(7)), Integer.parseInt(bat.group(8)), Integer.parseInt(bat.group(9)), batting.group(2), false);
} else {
a.bat(Integer.parseInt(bat.group(6)), Integer.parseInt(bat.group(7)), Integer.parseInt(bat.group(8)), Integer.parseInt(bat.group(9)), batting.group(2), true);
}
if (!teams.contains(batting.group(2))) {
teams.add(batting.group(2));
}
boolean f = true;
Batsman clone = null;
for (Batsman b1 : batted) {
if (b1.eq(a)) {
clone = b1;
f = false;
break;
}
}
if (!f) {
if (bat.group(0).contains("not out")) {
clone.batUpdate(a.getRunScored(), a.getBallFaced(), a.getFour(), a.getSix(), false, true);
} else {
clone.batUpdate(a.getRunScored(), a.getBallFaced(), a.getFour(), a.getSix(), true, true);
}
} else {
batted.add(a);
}
}
Your regex is way too complicated for such a simple task. To make it simple(or eliminate it for that matter), operate on a single line rather than the bunch of text.
For this, do
String array[] = str.split("\\n");
Then once you get each individual line, just split by a mutliple spaces, like
String parts[] = array[1].split("\\s\\s+");
Then you can access each part seperately, like Status can be accessed like
System.out.println("Status - " + parts[1]);
All commentators are right, of course, this might not be a typical problem to solve with a regex. But to answer your question - why is there a difference between java and regex101? - let's try to pull out some of the problems caused by your regex that makes it too complex. Next step would be to track down if and why there is a difference in using it in java.
I tried to understand your regex (and cricket at the same time!) and came up with a proposal that might help you to make us understand what your regex should look like.
First attempt reads until the number columns are reached. My guess is, that you should be looking at alternation instead of introducing a lot of groups. Take a look at this: example 1
Explanation:
( # group 1 start
\/? # not sure why there should be /?
[A-Z][a-z]+ # first name
(?:\s(?:[A-Z]['a-z]+)+) # last name
)
(?:\ # spaces
( # group 2 start
lbw # lbw or
|not\sout # not out or
|(c\.|st|run\sout) # group 3: c., st or run out
\s # space
\(? # optional (
(\w+) # group 4: name
\)? # optional )
))? # group 2 end
(?:\s+ # spaces
( # group 5 start
(?:b\.\s)(\w+) # b. name
))? # group 5 end
\s+ # spaces
EDIT 1: Actually, there is a 'stumped' option missing in your regex as well. Added that in mine.
EDIT 2: Stumped doesn't have a dot.
EDIT 3: The complete example can be found at example 2
Some java code to test it:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Foo {
public static void main(String[] args) {
String[] examples = {
"David Warner lbw b. Hassan 19 16 4 0 118.8%",
"Joe Burns b. Morkel 73 149 16 0 49.0%",
"Asad Shafiq c. Rahane b. Morkel 22 38 5 0 57.9%",
"Yasir Shah not out 1 12 0 0 8.3%",
"Yasir Shah st Rahane 1 12 0 0 8.3%",
"Morne Morkel run out (Shah) 11 17 1 1 64.7%"
};
Pattern pattern = Pattern.compile("(\\/?[A-Z][a-z]+(?:\\s(?:[A-Z]['a-z]+)+))(?:\\s+(lbw|not\\sout|(c\\.|st|run\\sout)\\s\\(?(\\w+)\\)?))?(?:\\s+((?:b\\.\\s)(\\w+)))?\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+\\.\\d%)");
for (String text : examples) {
System.out.println("TEXT: " + text);
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
System.out.println("batsman: " + matcher.group(1));
if (matcher.group(2) != null) System.out.println(matcher.group(2));
if (matcher.group(5) != null && matcher.group(5).matches("^b.*"))
System.out.println("bowler: " + matcher.group(6));
StringBuilder sb = new StringBuilder("numbers are: ");
int[] groups = {7, 8, 9, 10, 11};
for (int i : groups) {
sb.append(" " + matcher.group(i));
}
System.out.println(sb.toString());
System.out.println();
}
}
}
}
Ok Here is my example text... everything is
THEPONDIS15AWAYLOOKATTHOSEBASS5POUNDERSWELLLITATNIGHTALLAROUNDQUIETSEMICOUNTRYAREASTILLMOREBUTCALLMENORENTALNOLEASEANDPLEASEWENEEDNOREALTORSASMYWIFEDOES3176665440ANDCANNOTKEEPALLTHEMAINTANCEOFABIGHOUSEWANNAGOSOUTHTHANKSCALLMETHANKS
As you can see the Call and the phone number are within so far of eachother within 60 chars or so. So I been trying to right an expression to find this, determine that CALL is within 60 chars or so and then pull the phone number if it is..
I know that I would need something like...
Pattern p11 = Pattern.compile("[0-9]{11}");
Pattern p10 = Pattern.compile("[0-9]{10}");
Pattern p7 = Pattern.compile("[0-9]{7}");
In order to determine if its possibly an actual phone number since it could be 13173333333 or just 3173333333 or just 3333333
What about the rest? I know I would probably have to do a type of substring or something, but Its giving me a lot more difficulty then I thought it would.
I tried doing this...
String PHONENUMBER = "";
Pattern p11 = Pattern.compile("[0-9]{11}");
Pattern p10 = Pattern.compile("[0-9]{10}");
Pattern p7 = Pattern.compile("[0-9]{7}");
Matcher m11 = p11.matcher(Number);
Matcher m10 = p10.matcher(Number);
Matcher m7 = p7.matcher(Number);
String Call = "CALL";
String Text = "TEXT";
String Message = "MESSAGE";
if (Number.contains(Call)) {
int Numindex = Number.indexOf(Call);
int low = Numindex - 30;
int high = Numindex + 35;
if (low < 0) {
low = 0;
}
if (high > Number.length()) {
high = Number.length();
}
String extract = Number.substring(low, high);
m11 = p11.matcher(extract);
m10 = p10.matcher(extract);
m7 = p7.matcher(extract);
if (m11.find() == true) {
PHONENUMBER = m11.group();
} else if (m10.find() == true) {
PHONENUMBER = m10.group();
} else if (m7.find() == true) {
PHONENUMBER = m7.group();
}
But for some reason its not working out for me
EDIT #1 Requested for Original Text....
The Pond is 15' away- look at those bass- 5 Pounders-- well lit at night all around- quiet Semi Country area...still more but Ca ll me- NO RENTAL/No Lease and Please- we need NO Realtors as my Wife does 317 6 6.6-54.4 0 and cannot keep all the maintance of a big House- wanna go South Thanks call me!Call Me Thanks!
As you can see from the original text, it only makes sense to remove the spaces and all special characters then just do a simple expression comparison to find the phone number, then just find if the word "call" is within 60 chars. Obviously this isn't the ONLY paragraph there are hundreds more.
I'll be honest this seems like you are doing it in an extremely difficult way. However here is an idea on how you could go about doing it.
First get the range you want to check for the number let's say it's 0(low)-15(high)
then write a for loop to loop through that range of characters. The below code is an example of how you could set it up to loop through the section of the string you want checking the characters along the way to see if it matches a phone number. Take in mind this doesn't take in account reaching the end of the String to soon which would result in an index out of bounds exception nor does it take in account if it is too large of a number but I will let you figure those things out.
String number = "123HEY1234567890HOWIS";
int realNum = 0; //if this hits exactly 10 then it is a real phone number
int low = 0;
int high = number.length();
for(int i = low; i < high;i++){
//check if the current char is a number
if(number.substring(i, i + 1).matches("[0-9]")){
//if yes then increment
realNum++;
System.out.println(realNum);
//checks if realNum is 10 and makes sure that the next char isn't a number also
if(realNum == 10){
low = i - 9;
high = i;
System.out.println("match");
break;
}
}else{
//if no then reset the checker back to 1
realNum = 0;
}
}
System.out.println("All Done");
Hopefully this at least gets you on the right path.
I would use https://github.com/googlei18n/libphonenumber and not regex for finding phone numbers. The library works as you would expect
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
Iterable<PhoneNumberMatch> numbers = phoneUtil.findNumbers(text, Locale.US.getCountry());
List<String> data = new ArrayList<>();
numbers.forEach(number -> {
String s = number.rawString();
// your phone numbers
});
The following javascript code will inform all your browser's enabled plugins (yeah, I know it doesn't work on IE, but for IE there's always deployJava):
if ((navigator.plugins) && (navigator.plugins.length)) {
for (var bb = 0, l = navigator.plugins.length; bb < l; bb++) {
var vv = navigator.plugins[bb].name + "<br>";
document.write(vv);
}
}
I have Java 6.22 installed so the relevant line written to the page is this:
Java(TM) Platform SE 6 U22
My question is: how can I complement the above code so that it returns the major version (6) and update (22) found in my (or anyone's) browser?
I think the best way is to work with regular expression, but I am not good with it.
I think the easiest (read: hackiest) solution would be something like this:
var plugin_name = navigator.plugins[bb].name
if (plugin_name.toLowerCase().indexOf("java") != -1) {
var parts = plugin_name.split(" ").reverse();
// if the plugin has an update
if(plugin_name.match(/U[0-9]+/)) {
// grab the end of the plugin name and remove non numeric chars
var update = parts[0].replace(/[^0-9]/, "");
// grab the major version and remove non numeric chars
var major = parts[1].replace(/[^0-9]/, "");
// print the major number and update number
console.log(major);
console.log(update);
} else {
var update = "0";
// grab the major version and remove non numeric chars
var major = parts[0].replace(/[^0-9]/, "");
// print the major number and update number
console.log(major);
console.log(update);
}
}
You can then throw this code in your loop through the plugins and replace the console.log with whatever logic is appropriate given a major and update number.
I am currently trying to make a naming convention. The idea behind this is parsing.
Lets say I obtain an xml doc. Everything can be used once, but these 2 in the code below can be submitted several times within the xml document. It could be 1, or simply 100.
This states that ItemNumber and ReceiptType will be grabbed for the first element.
ItemNumber1 = eElement.getElementsByTagName("ItemNumber").item(0).getTextContent();
ReceiptType1 = eElement.getElementsByTagName("ReceiptType").item(0).getTextContent();
This one states that it will grab the second submission if they were in their twice.
ItemNumber2 = eElement.getElementsByTagName("ItemNumber").item(1).getTextContent();
ReceiptType2 = eElement.getElementsByTagName("ReceiptType").item(1).getTextContent();
ItemNumber and ReceiptType must both be submitted together. So if there is 30 ItemNumbers, there must be 30 Receipt Types.
However now I would like to set this in an IF statement to create variables.
I was thinking something along the lines of:
int cnt = 2;
if (eElement.getElementsByTagName("ItemNumber").item(cnt).getTextContent();)
**MAKE VARIABLE**
Then make a loop which adds one to count to see if their is a third or 4th. Now here comes the tricky part..I need them set to a generated variable. Example if ItemNumber 2 existed, it would set it to
String ItemNumber2 = eElement.getElementsByTagName("ItemNumber").item(cnt).getTextContent();
I do not wish to make pre-made variable names as I don't want to code a possible 1000 variables if that 1000 were to happen.
KUDOS for anyone who can help or give tips on just small parts of this as in the naming convention etc. Thanks!
You don't know beforehand how many ItemNumbers and ReceiptTypes you'll get ? Maybe consider using two Lists (java.util.List). Here is an example.
boolean finished = ... ; // true if there is no more item to process
List<String> listItemNumbers = new ArrayList<>();
List<String> listReceiptTypes = new ArrayList<>();
int cnt = 0;
while(!finished) {
String itemNumber = eElement.getElementsByTagName("ItemNumber").item(cnt).getTextContent();
String receiptType = eElement.getElementsByTagName("ReceiptType").item(cnt).getTextContent();
listItemNumbers.add(itemNumber);
listReceiptTypes.add(receiptType);
++cnt;
// update 'finished' (to test if there are remaining itemNumbers to process)
}
// use them :
int indexYouNeed = 32; // for example
String itemNumber = listItemNumbers.get(indexYouNeed); // index start from 0
String receiptType = listReceiptTypes.get(indexYouNeed);