I have a long string, the format will be same of this string but the message n the string may vary.So i want to know,how is this possible to extract this particular message string from this complex string in java
charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84&post_form_id=71c3b72f4049d394140cedf32d39f525&fb_dtsg=AQBY3vp-&feedback_params=%7B%22actor%22%3A%22176851262376586%22%2C%22target_fbid%22%3A%22283157315079313%22%2C%22target_profile_id%22%3A%22176851262376586%22%2C%22type_id%22%3A%227%22%2C%22source%22%3A%222%22%2C%22assoc_obj_id%22%3A%22%22%2C%22source_app_id%22%3A%220%22%2C%22extra_story_params%22%3A%7B%22photo_viewer_version%22%3A%222%22%7D%2C%22content_timestamp%22%3A%221327693760%22%2C%22check_hash%22%3A%22129f5441c4cb4266%22%7D&translate_on_load=&add_comment_text_text=I%20didn't%20got%20any%20msg%20in%20my%20mailbox%20%3A(&add_comment_text=I%20didn't%20got%20any%20msg%20in%20m%20inbox%20%3A(&comment_replace=optimistic_comment_2931473608_0&comment=1&lsd&post_form_id_source=AsyncRequest&__user=18802987&phstamp=165895111811245853
i want to extract this particular string in the format below
I didn't got any msg in my mailbox
Here's a regex-solution:
String input = "charset_test=%E2%8...3A(&add_comment_text=I%20didn't%20got%20any"
+ "%20msg%20in%20m%20inbox%20%3A(&comment_replace=optim"
+ "istic_comment_2931473608_0&comment=1&lsd&post_form_id_source="
+ "AsyncRequest&__user=18802987&phstamp=165895111811245853";
Pattern p = Pattern.compile("add_comment_text_text=(.*?)[&$]");
Matcher m = p.matcher(input);
if (m.find()) {
String value = URLDecoder.decode(m.group(1), "UTF-8");
System.out.println(value);
}
Output:
I didn't got any msg in my mailbox :(
Related
I have a very long text and I'm extracting some specific values that are followed by some particular words. Here's an example of my long text:
.........
FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]
.........
TotalFrames[ValMin: 100000, ValMax:200000]
.........
MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]
.........
here's my code:
File file = filePath.toFile();
JSONObject jsonObject = new JSONObject();
String FPSMin="";
String FPSMax="";
String TotalFramesMin="";
String TotalFramesMax="";
String MemUsageMin="";
String MemUsageMax="";
String log = "my//log//file";
final Matcher matcher = Pattern.compile("FPS/\(FramesPerSecond/\)/\[ValMin:");
if(matcher.find()){
FPSMin= matcher.end().trim();
}
But I can't make it work. Where am I wrong? Basically I need to select, for each String, the corresponding values (max and min) coming from that long text and store them into the variables. Like
FPSMin = 29.0000
FPSMax = 35.0000
FramesMin = 100000
Etc
Thank you
EDIT:
I tried the following code (in a test case) to see if the solution could work, but I'm experiencing issues because I can't print anything except an object. Here's the code:
#Test
public void whenReadLargeFileJava7_thenCorrect()
throws IOException, URISyntaxException {
Scanner txtScan = new Scanner("path//to//file//test.txt");
String[] FPSMin= new String[0];
String FPSMax= "";
//Read File Line By Line
while (txtScan.hasNextLine()) {
// Print the content on the console
String str = txtScan.nextLine();
Pattern FPSMin= Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = FPSMin.matcher(str);
if(matcher.find()){
String MinMaxFPS= str.substring(matcher.end(), str.length()-1);
String[] splitted = MinMaxFPS.split(",");
FPSMin= splitted[0].split(": ");
FPSMax = splitted[1];
}
System.out.println(FPSMin);
System.out.println(FPSMax);
}
Maybe your pattern should be like this ^FPS\\(FramesPerSecond\\)\\[ValMin: . I've tried it and it works for me.
String line = "FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]";
Pattern pattern = Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(line.substring(matcher.end(), line.length()-1));
}
}
In that way, you get the offset of the line that you want to extract data and using the substring function you can get all characters starting from offset until the size of the line-1 (because you dont want to get also the ] character)
The following regular expression will match and capture the name, min and max:
Pattern.compile("(.*)\\[.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*,.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*\\]");
Usage (extracting the captured groups):
String input = (".........\n" +
"FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]\n" +
".........\n" +
"TotalFrames[ValMin: 100000, ValMax:200000]\n" +
".........\n" +
"MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]\n" +
".........");
for (String s : input.split("\n")) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + ", " + matcher.group(2) + ", " + matcher.group(3));
}
}
Output:
FPS(FramesPerSecond), 29.0000, 35.000
TotalFrames, 100000, 200000
MemoryUsage(In MB), 190000, 360000
I have searched everywhere for this but couldn't get a specific solution, and the documentation also didn't cover this. So I want to extract the start date and end date from this string "1-Mar-2019 to 31-Mar-2019". The problem is I'm not able to extract both the date strings.
I found the closest solution here but couldn't post a comment asking how to extract values individually due to low reputation: https://stackoverflow.com/a/8116229/10735227
I'm using a regex pattern to look for the occurrences and to extract both occurrences to 2 strings first.
Here's what I tried:
Pattern p = Pattern.compile("(\\d{1,2}-[a-zA-Z]{3}-\\d{4})");
Matcher m = p.matcher(str);
while(m.find())
{
startdt = m.group(1);
enddt = m.group(1); //I think this is wrong, don't know how to fix it
}
System.out.println("startdt: "+startdt+" enddt: "+enddt);
Output is:
startdt: 31-Mar-2019 enddt: 31-Mar-2019
Additionally I need to use DateFormatter to convert the string to date (adding the trailing 0 before single digit date if required).
You can catch both dates simply calling the find method twice, if you only have one, this would only capture the first one :
String str = "1-Mar-2019 to 31-Mar-2019";
String startdt = null, enddt = null;
Pattern p = Pattern.compile("(\\d{1,2}-[a-zA-Z]{3}-\\d{4})");
Matcher m = p.matcher(str);
if(m.find()) {
startdt = m.group(1);
if(m.find()) {
enddt = m.group(1);
}
}
System.out.println("startdt: "+startdt+" enddt: "+enddt);
Note that this could be used with a while(m.find()) and a List<String to be able to extract every date your could find.
If your text may be messy, and you really need to use a regex to extract the date range, you may use
String str = "Text here 1-Mar-2019 to 31-Mar-2019 and tex there";
String startdt = "";
String enddt = "";
String date_rx = "\\d{1,2}-[a-zA-Z]{3}-\\d{4}";
Pattern p = Pattern.compile("(" + date_rx + ")\\s*to\\s*(" + date_rx + ")");
Matcher m = p.matcher(str);
if(m.find())
{
startdt = m.group(1);
enddt = m.group(2);
}
System.out.println("startdt: "+startdt+" enddt: "+enddt);
// => startdt: 1-Mar-2019 enddt: 31-Mar-2019
See the Java demo
Also, consider this enhancement: match the date as whole word to avoid partial matches in longer strings:
Pattern.compile("\\b(" + date_rx + ")\\s*to\\s*(" + date_rx + ")\\b")
If the range can be expressed with - or to you may replace to with (?:to|-), or even (?:to|\\p{Pd}) where \p{Pd} matches any hyphen/dash.
You can simply use String::split
String range = "1-Mar-2019 to 31-Mar-2019";
String dts [] = range.split(" ");
System.out.println(dts[0]);
System.out.println(dts[2]);
Hi I get this String from server :
id_not="autoincrement"; id_obj="-"; id_tr="-"; id_pgo="-"; typ_not=""; tresc="Nie wystawił"; datetime="-"; lon="-"; lat="-";
I need to create a new String e.x String word and send a value which I get from String tresc="Nie wystawił"
Like #Jan suggest in comment you can use regex for example :
String str = "id_not=\"autoincrement\"; id_obj=\"-\"; id_tr=\"-\"; id_pgo=\"-\"; typ_not=\"\"; tresc=\"Nie wystawił\"; datetime=\"-\"; lon=\"-\"; lat=\"-\";";
Pattern p = Pattern.compile("tresc(.*?);");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group());
}
Output
tresc="Nie wystawił";
If you want to get only the value of tresc you can use :
Pattern p = Pattern.compile("tresc=\"(.*?)\";");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
Output
Nie wystawił
Something along the lines of
Pattern p = Pattern.compile("tresc=\"([^\"]+)\");
Matcher m = p.matcher(stringFromServer);
if(m.find()) {
String whatYouWereLookingfor = m.group(1);
}
should to the trick. JSON parsing might be much better in the long run if you need additional values
Your question is unclear but i think you get a string from server and from that string you want the string/value for tresc. You can first search for tresc in the string you get. like:
serverString.substring(serverString.indexOf("tresc") + x , serverString.length());
Here replace x with 'how much further you want to pick characters.
Read on substring and delimiters
As values are separated by semicolon so annother solution could be:
int delimiter = serverstring.indexOf(";");
//in string thus giving you the index of where it is in the string
// Now delimiter can be -1, if lets say the string had no ";" at all in it i.e. no ";" is not found.
//check and account for it.
if (delimiter != -1)
String subString= serverstring.substring(5 , iend);
Here 5 means tresc is on number five in string, so it will five you tresc part.
You can then use it anyway you want.
How to edit this string and split it into two?
String asd = {RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef};
I want to make two strings.
String reponame;
String RepoID;
reponame should be CodeCommitTest
repoID should be 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef
Can someone help me get it? Thanks
Here is Java code using a regular expression in case you can't use a JSON parsing library (which is what you probably should be using):
String pattern = "^\\{RepositoryName:\\s(.*?),RepositoryId:\\s(.*?)\\}$";
String asd = "{RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef}";
String reponame = "";
String repoID = "";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(asd);
if (m.find()) {
reponame = m.group(1);
repoID = m.group(2);
System.out.println("Found reponame: " + reponame + " with repoID: " + repoID);
} else {
System.out.println("NO MATCH");
}
This code has been tested in IntelliJ and runs without error.
Output:
Found reponame: CodeCommitTest with repoID: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef
Assuming there aren't quote marks in the input, and that the repository name and ID consist of letters, numbers, and dashes, then this should work to get the repository name:
Pattern repoNamePattern = Pattern.compile("RepositoryName: *([A-Za-z0-9\\-]+)");
Matcher matcher = repoNamePattern.matcher(asd);
if (matcher.find()) {
reponame = matcher.group(1);
}
and you can do something similar to get the ID. The above code just looks for RepositoryName:, possibly followed by spaces, followed by one or more letters, digits, or hyphen characters; then the group(1) method extracts the name, since it's the first (and only) group enclosed in () in the pattern.
public String readEmails(String fileData) {
String regex = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9]"
+ "(?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?";
String emails = "", emails2 = "";
fileData = fileData.toLowerCase();
Matcher m = Pattern.compile(regex).matcher(fileData);
while (m.find()) {
emails += m.group()+", ";
}
return emails;
}
I am reading rtf file, finding emails and then storing into DB. I found one email is extracted two time I.e. HYPERLINK "mailto: aa#ymail.com" and then aa#ymail.com
How can I match two similar emails and keep one copy by removing all similar emails?
You can change your code as
Set<String> set = new HashSet<String>();
Matcher m = Pattern.compile(regex).matcher(fileData);
while (m.find()) {
String email = m.group();
if (!set.contains(email)) {
emails += email + ", ";
set.add(email);
}
}
return emails;
}
Instead of saving emails as a string with commas:
Lower case them.
Store them in dictionary (HashSet) to deduplicate them
At the end, create output string out of the elements in dictionary.