public String readEmails(String fileData) {
String regex = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9]"
+ "(?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?";
String emails = "", emails2 = "";
fileData = fileData.toLowerCase();
Matcher m = Pattern.compile(regex).matcher(fileData);
while (m.find()) {
emails += m.group()+", ";
}
return emails;
}
I am reading rtf file, finding emails and then storing into DB. I found one email is extracted two time I.e. HYPERLINK "mailto: aa#ymail.com" and then aa#ymail.com
How can I match two similar emails and keep one copy by removing all similar emails?
You can change your code as
Set<String> set = new HashSet<String>();
Matcher m = Pattern.compile(regex).matcher(fileData);
while (m.find()) {
String email = m.group();
if (!set.contains(email)) {
emails += email + ", ";
set.add(email);
}
}
return emails;
}
Instead of saving emails as a string with commas:
Lower case them.
Store them in dictionary (HashSet) to deduplicate them
At the end, create output string out of the elements in dictionary.
Related
I have a very long text and I'm extracting some specific values that are followed by some particular words. Here's an example of my long text:
.........
FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]
.........
TotalFrames[ValMin: 100000, ValMax:200000]
.........
MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]
.........
here's my code:
File file = filePath.toFile();
JSONObject jsonObject = new JSONObject();
String FPSMin="";
String FPSMax="";
String TotalFramesMin="";
String TotalFramesMax="";
String MemUsageMin="";
String MemUsageMax="";
String log = "my//log//file";
final Matcher matcher = Pattern.compile("FPS/\(FramesPerSecond/\)/\[ValMin:");
if(matcher.find()){
FPSMin= matcher.end().trim();
}
But I can't make it work. Where am I wrong? Basically I need to select, for each String, the corresponding values (max and min) coming from that long text and store them into the variables. Like
FPSMin = 29.0000
FPSMax = 35.0000
FramesMin = 100000
Etc
Thank you
EDIT:
I tried the following code (in a test case) to see if the solution could work, but I'm experiencing issues because I can't print anything except an object. Here's the code:
#Test
public void whenReadLargeFileJava7_thenCorrect()
throws IOException, URISyntaxException {
Scanner txtScan = new Scanner("path//to//file//test.txt");
String[] FPSMin= new String[0];
String FPSMax= "";
//Read File Line By Line
while (txtScan.hasNextLine()) {
// Print the content on the console
String str = txtScan.nextLine();
Pattern FPSMin= Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = FPSMin.matcher(str);
if(matcher.find()){
String MinMaxFPS= str.substring(matcher.end(), str.length()-1);
String[] splitted = MinMaxFPS.split(",");
FPSMin= splitted[0].split(": ");
FPSMax = splitted[1];
}
System.out.println(FPSMin);
System.out.println(FPSMax);
}
Maybe your pattern should be like this ^FPS\\(FramesPerSecond\\)\\[ValMin: . I've tried it and it works for me.
String line = "FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]";
Pattern pattern = Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(line.substring(matcher.end(), line.length()-1));
}
}
In that way, you get the offset of the line that you want to extract data and using the substring function you can get all characters starting from offset until the size of the line-1 (because you dont want to get also the ] character)
The following regular expression will match and capture the name, min and max:
Pattern.compile("(.*)\\[.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*,.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*\\]");
Usage (extracting the captured groups):
String input = (".........\n" +
"FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]\n" +
".........\n" +
"TotalFrames[ValMin: 100000, ValMax:200000]\n" +
".........\n" +
"MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]\n" +
".........");
for (String s : input.split("\n")) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + ", " + matcher.group(2) + ", " + matcher.group(3));
}
}
Output:
FPS(FramesPerSecond), 29.0000, 35.000
TotalFrames, 100000, 200000
MemoryUsage(In MB), 190000, 360000
I am trying to get a regex to match, then get the value with it. For example, I want to check for 1234 as an id and if present, get the status (which is 0 in this case). Basically its id:status. Here is what I am trying:
String topicStatus = "1234:0,567:1,89:2";
String someId = "1234";
String regex = "\\b"+someId+":[0-2]\\b";
if (topicStatus.matches(regex)) {
//How to get status?
}
Not only do I not know how to get the status without splitting and looping through, I don't know why it doesn't match the regex.
Any help would be appreciated. Thanks.
Use the Pattern class
String topicStatus = "1234:0,567:1,89:2";
String someId = "1234";
String regex = "\\b"+someId+":[0-2]\\b";
Pattern MY_PATTERN = Pattern.compile(regex);
Matcher m = MY_PATTERN.matcher(topicStatus);
while (m.find()) {
String s = m.group(1);
System.out.println(s);
}
The key here is to surround the position you want [0-2] in parenthesis which means it will be saved as the first group. You then access it through group(1)
I made some assumptions that your pairs we're always comma separate and then delimited by a colon. Using that I just used split.
String[] idsToCheck = topicStatus.split(",");
for(String idPair : idsToCheck)
{
String[] idPairArray = idPair.split(":");
if(idPairArray[0].equals(someId))
{
System.out.println("id : " + idPairArray[0]);
System.out.println("status: " + idPairArray[1]);
}
}
Get a string from server "Hello, my e-mail email#email.com write here please".
Need to surround email with HTML tags and make such string
"Hello, my e-mail email#email.com write here please"
How can I make it in Java?
I need to make it because I need to paste formatted string in TextView.setText(Html.fromHtml())
You can add android:autoLink="email"
on your TextView xml tag.
You can use a Matcher to find all of your emails inside a String and then replaceAll to edit your String like this:
yourString = getHtmlString(yourString);
private String getHtmlString(String s) {
List<String> emails = getEmailsFromString(s);
if (emails.size() < 1) {
return s;
}
String result = s;
for (String email : emails) {
String toReplace = "email#email.com";
result.replaceAll(email, toReplace);
}
return result;
}
private List<String> getEmailsFromString(String s) {
List<String> emails = new ArrayList<>();
Matcher m = Pattern.compile("[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+").matcher(s);
while (m.find()) {
emails.add(m.group());
}
return emails;
}
How to edit this string and split it into two?
String asd = {RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef};
I want to make two strings.
String reponame;
String RepoID;
reponame should be CodeCommitTest
repoID should be 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef
Can someone help me get it? Thanks
Here is Java code using a regular expression in case you can't use a JSON parsing library (which is what you probably should be using):
String pattern = "^\\{RepositoryName:\\s(.*?),RepositoryId:\\s(.*?)\\}$";
String asd = "{RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef}";
String reponame = "";
String repoID = "";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(asd);
if (m.find()) {
reponame = m.group(1);
repoID = m.group(2);
System.out.println("Found reponame: " + reponame + " with repoID: " + repoID);
} else {
System.out.println("NO MATCH");
}
This code has been tested in IntelliJ and runs without error.
Output:
Found reponame: CodeCommitTest with repoID: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef
Assuming there aren't quote marks in the input, and that the repository name and ID consist of letters, numbers, and dashes, then this should work to get the repository name:
Pattern repoNamePattern = Pattern.compile("RepositoryName: *([A-Za-z0-9\\-]+)");
Matcher matcher = repoNamePattern.matcher(asd);
if (matcher.find()) {
reponame = matcher.group(1);
}
and you can do something similar to get the ID. The above code just looks for RepositoryName:, possibly followed by spaces, followed by one or more letters, digits, or hyphen characters; then the group(1) method extracts the name, since it's the first (and only) group enclosed in () in the pattern.
I have a long string, the format will be same of this string but the message n the string may vary.So i want to know,how is this possible to extract this particular message string from this complex string in java
charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84&post_form_id=71c3b72f4049d394140cedf32d39f525&fb_dtsg=AQBY3vp-&feedback_params=%7B%22actor%22%3A%22176851262376586%22%2C%22target_fbid%22%3A%22283157315079313%22%2C%22target_profile_id%22%3A%22176851262376586%22%2C%22type_id%22%3A%227%22%2C%22source%22%3A%222%22%2C%22assoc_obj_id%22%3A%22%22%2C%22source_app_id%22%3A%220%22%2C%22extra_story_params%22%3A%7B%22photo_viewer_version%22%3A%222%22%7D%2C%22content_timestamp%22%3A%221327693760%22%2C%22check_hash%22%3A%22129f5441c4cb4266%22%7D&translate_on_load=&add_comment_text_text=I%20didn't%20got%20any%20msg%20in%20my%20mailbox%20%3A(&add_comment_text=I%20didn't%20got%20any%20msg%20in%20m%20inbox%20%3A(&comment_replace=optimistic_comment_2931473608_0&comment=1&lsd&post_form_id_source=AsyncRequest&__user=18802987&phstamp=165895111811245853
i want to extract this particular string in the format below
I didn't got any msg in my mailbox
Here's a regex-solution:
String input = "charset_test=%E2%8...3A(&add_comment_text=I%20didn't%20got%20any"
+ "%20msg%20in%20m%20inbox%20%3A(&comment_replace=optim"
+ "istic_comment_2931473608_0&comment=1&lsd&post_form_id_source="
+ "AsyncRequest&__user=18802987&phstamp=165895111811245853";
Pattern p = Pattern.compile("add_comment_text_text=(.*?)[&$]");
Matcher m = p.matcher(input);
if (m.find()) {
String value = URLDecoder.decode(m.group(1), "UTF-8");
System.out.println(value);
}
Output:
I didn't got any msg in my mailbox :(