Java string indexing make me confused - java

So i need to gather data from my db, it's holiday date in my country, the data comes like this
Example 1 : THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for
Example 2 : MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival
So i need to get data from days, dates, and the holiday name, for get data from example 1 i'm using code like this
public static void main(String[] args) {
String ex1 = "THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for";
String ex2 = "MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival ";
String[] trim1 = ex1.trim().split("\\s+"); //to split by space
String[] trim2 = ex1.trim().split(" "); //to split by 3 space so i got the data from multiple space as delimiter
System.out.println("DAY " +trim1[0]);//display day
System.out.println("DATE " +trim1[1] +trim1[2]+"2020");//display date
System.out.println("HOLIDAY NAME " +trim2[3]);//dispay holiday name
}
The Output come like this
DAY MON
DATE 21May2020
HOLIDAY NAME Ascension Day of Jesus Christ
and just like what i need, but when come to example 2, i can't use same code because the space is different, how to get the data i need with example 1 and 2 with same code.
i am new in java so i'm sorry if my question looking dumb, i hope you can help me.Thanks

.split("\\s+") will split at any space, including multiple spaces. Eg. it will split at 1 space or more.
This means that you are able to split at any amount of spaces (what you want). However, this will also split your text comments. You are able to limit the length of the array produced (the amount of times it is split) using .split(regex, n), which will result in an array of n-1 size at most. See this for more details
As for splitting out your two textual comments, I cannot see a way to do this.
Substitute for Commemoration of Idul Fitri Festival "; contains no way of telling what is the first text comment and the second.
It seems quite strange to me that you receive information from your database like this, I would recommend seeing if there are other options for doing this. There is almost certainly a way to get seperate fields.
If have the ability to change all the information in the database, you could put single quotes (') or some other seperator, which you would then be able to split out the two pieces of text.

This is basically what #DanielBarbarian suggested: Since the information seems to always start at the same indexes, you can just use those to get what you need.
String ex1 = "THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for";
String ex2 = "MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival ";
String day = ex2.substring(0, 8).trim();
String date = ex2.substring(8, 14).trim() + ex2.substring(14, 22).trim() + "2020";
String name = ex2.substring(22);
System.out.println("DAY " + day);// display day
System.out.println("DATE " + date);// display date
System.out.println("HOLIDAY NAME " + name);// dispay holiday name

Related

How to detect multi set words OpenNLP

I'm doing NER using Java OpenNLP and I'm not sure how can I detect multiple words (eg. New York, Bruno Mars, Hong Kong) by using the custom model I have trained.
My training data do cover multi-word spans:
<START:place> Hong Kong <END> ... <START:person> Putin <END>
I'm pretty sure my trained model and training data are working good. It's just that I do not know how to get the multi-word set. Here is what I did
// testing the model
NameFinderME nameFinder = new NameFinderME(nameFinderModel);
String sentence = "India may US to Japan France so Putin should Hong Kong review Trump";
WhitespaceTokenizer whitespaceTokenizer = WhitespaceTokenizer.INSTANCE;
// Tokenizing the given paragraph
String tokens[] = whitespaceTokenizer.tokenize(sentence);
Span nameSpans[] = nameFinder.find(tokens);
for (Span s : nameSpans)
System.out.println(s.toString() + " " + tokens[s.getStart()]);
And here is what I get:
[0..1) place India
[0..1) place US
[0..1) place Japan
[0..1) place France
[0..1) person Putin
[0..1) place Hong
[0..1) person Trump
But I want to get [0..1) place Hong Kong instead of splitting them into two categories.
Thanks.
I defined an array list to include all the multiple word place name, eg {"Hong", "New", "North", "South" ... } then use it to check if it contains tokens[s.getStart()]. If yes, add tokens[s.getStart()] + " " + tokens[s.getStart() + 1] else, add tokens[s.getStart()]. Although it's not the best approach but it's enough for me now.

Making a Regex More Dynamic

I posted this question a couple weeks ago pertaining to extracting a capture group using regex in Java, Extracting Capture Group Using Regex, and I received a working answer. I also posted this question a couple weeks ago pertaining to character replacement in Java using regex, Replace Character in Matching Regex, and received an even better answer that was more dynamic than the one I got from my first post. I'll quickly illustrate by example. I have a string like this that I want to extract the "ID" from:
String idInfo = "Any text up here\n" +
"Here is the id\n" +
"\n" +
"?a0 12 b5\n" +
"&Edit Properties...\n" +
"And any text down here";
And in this case I want the output to just be:
a0 12 b5
But it turns out the ID could be any number of octets (just has to be 1 or more octets), and I want my regex to be able to basically account for an ID of 1 octet then any number of subsequent octets (from 0 to however many). The person I received an answer from in my Replace Character in Matching Regex post did this for a similar but different use case of mine, but I'm having trouble porting this "more dynamic" regex over to the first use case.
Currently, I have ...
Pattern p = Pattern.compile("(?s)?:Here is the id\n\n\\?([a-z0-9]{2})|(?<!^)\\G:?([a-z0-9]{2})|.*?(?=Here is the id\n\n\\?)|.+");
Matcher m = p.matcher(certSerialNum);
String idNum = m.group(1);
System.out.println(idNum);
But it's throwing an exception. In addition, I would actually like it to use all known adjacent text in the pattern including "Here is the id\n\n\?" and "\n&Edit Properties...". What corrections do I need to get this working?
Seems like you want something like this,
String idInfo = "Any text up here\n" +
"Here is the id\n" +
"\n" +
"?a0 12 b5\n" +
"&Edit Properties...\n" +
"And any text down here";
Pattern regex = Pattern.compile("Here is the id\\n+\\?([a-z0-9]{2}(?:\\s[a-z0-9]{2})*)(?=\\n&Edit Properties)");
Matcher matcher = regex.matcher(idInfo);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
a0 12 b5
DEMO

Arranging text inside saved txt file

I have made an app that takes some values and adds them to a txt file.
It does something like this,they are strings[] :
product[1] quantity[1] price[1]
product[2] quantity[2] price[2]
.....
product[n] quantity[n] price[n]
The problem is,most of the time product[1] won't have the same lenght as product[2] or the other products and the same goes for quantities and prices.This results in a messy text layout,something like this.
ww 2 4
wwww 1 2.5
w 1.2 1.1
Is there any way i can make it tidier ? Something like creating a table or columns?
Thanks !
EDIT : To make it a bit clearer,i want to find a way for the stuff in the txt file to be arranged like this,instead of how it is in the above example
ww 2 4
wwww 1 2.5
w 1.2 1.1
At the moment i'm using this
pw.println(prod[n]+" "+cant[n]+" "+pret[n]);}
But this is making the text in the txt file be unaligned(example 1)
Use the format Method of the String class like this:
Declare a String with the format
String yourFormat = "%-10s %-10s %-10s%n"; //choose optimal ranges.
//if you exceed them, it will always automatically make one space
//between the next column
write the output with that format:
output.write(String.format(yourFormat, firstString, secondString, thirdString));
first string are your w's, second and third are the columns with numbers.
for your example:
String myFormat = "%-10s %-10s %-10s%n";
for(int i=0;i<prod.length();i++){
pw.println(String.format(myFormat, prod[n], cant[n], pret[n]));
}
more info here and here

extracting a substring to use in simple date format

I am reading log files and need to extract the date and year from certain lines in order to then use simple date format and find out the average time in between 2 actions. An example of what a line that I would need the date from looks like this.
(INFO ) [07 Feb 2013 08:04:39,161] -- ua, navigation, fault
I can't figure out if I should split the the line twice or use the substring function. Also I don't think I need to include that last number when converting to simple date format (the 161).
I would suggest you to use regex to extract the required data from log files.
Consider using regex groups to extract the String you want. You can use
Pattern p = Pattern.compile("you regex pattern with () around the bit you wanna extract");
Matcher m = p.matcher(theLine);
if (m.find()) {
String date = m.group(1);
}
I'd go for using substring:
final int from = line.indexOf('[') + 1;
final int to = line.indexOf(']', from); // or , if you do not want to have the last number included
final String timestampAsString = line.substring(from, to);
BTW: Add some from/to-checks if valid indices have been found - for example to detect errors early on when the log format changed.

Fill an XML file with test data

I have an XML file looks like the attached image:
The attributes (commitId, author, date, and time) in I got it from Git open source repository using "blame" command.
However, for my evaluation, I need to fill these attributes randomly with data I have. For example, I have Junio, Peter, Jiang, Jens, Jeff, Ramkumar, and Linus as authors, and date is ranged between 2010 and 2012.
what is the best way to fill this data using Java?
You can start by placing the values you need into arrays like this:
String names[] = {Junio, Peter, Jiang, Jens, Jeff, Ramkumar, Linus};
String dates[] = {2010, 2011, 2012};
Then using the Random class you can do this:
Random ran = new Random();
String name = names[ran.nextInt(7)];
String date = dates[ran.nextInt(3)];
That will give you a random name and date. Rinse. Repeat.
Note that to get a random number between 0 and n inclusive you need nextInt(n + 1).

Categories

Resources