Regex to extract a paragraph - java

I need a regex to extract a each paragraph and store as a string for additional processing from the text buffer containing many such similar paragraphs.
Example: Say, the text buffer is like this:
=== Jun 11 14:05:39 - Person Details ===
Person Name = "Hurlman"
Person Address = "2nd Street Benjamin Blvd NJ"
Persion Age = 25
=== Jun 11 14:05:39 - Person Details ===
Person Name = "Greg"
Person Address = "3rd Street Benjamin Blvd NJ"
Persion Age = 26
=== Jun 11 14:05:42 - Person Details ===
Person Name = "Michel"
Person Address = "4th Street Benjamin Blvd NJ"
Persion Age = 27
And I need to iterate through all the paragraphs and store each one of them to further find the specific person details inside.
Each paragraph I need to extract should be of the below format
=== Jun 11 14:05:42 - Person Details ===
Person Name = "Michel"
Person Address = "4th Street Benjamin Blvd NJ"
Persion Age = 27
Any help is much appreciated!

you could use this pattern (===.*===[\s\S]*?)(?====|$)
Demo

Using regexes to solve this is possible, but it is likely to give you a poor (inefficient, hard to understand, hard to maintain, etc) solution.
What you have is an informal record structure represented using lines of text. (This is not natural language text, so describing it in terms of "paragraphs" doesn't make sense.)
The way to process it is to read it a line at a time and then use Scanner (or equivalent) to parse each line into name value pairs. You just need some simple logic to detect the record boundaries and / or check that they are appearing at the correct place in the input stream.

Related

Java string indexing make me confused

So i need to gather data from my db, it's holiday date in my country, the data comes like this
Example 1 : THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for
Example 2 : MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival
So i need to get data from days, dates, and the holiday name, for get data from example 1 i'm using code like this
public static void main(String[] args) {
String ex1 = "THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for";
String ex2 = "MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival ";
String[] trim1 = ex1.trim().split("\\s+"); //to split by space
String[] trim2 = ex1.trim().split(" "); //to split by 3 space so i got the data from multiple space as delimiter
System.out.println("DAY " +trim1[0]);//display day
System.out.println("DATE " +trim1[1] +trim1[2]+"2020");//display date
System.out.println("HOLIDAY NAME " +trim2[3]);//dispay holiday name
}
The Output come like this
DAY MON
DATE 21May2020
HOLIDAY NAME Ascension Day of Jesus Christ
and just like what i need, but when come to example 2, i can't use same code because the space is different, how to get the data i need with example 1 and 2 with same code.
i am new in java so i'm sorry if my question looking dumb, i hope you can help me.Thanks
.split("\\s+") will split at any space, including multiple spaces. Eg. it will split at 1 space or more.
This means that you are able to split at any amount of spaces (what you want). However, this will also split your text comments. You are able to limit the length of the array produced (the amount of times it is split) using .split(regex, n), which will result in an array of n-1 size at most. See this for more details
As for splitting out your two textual comments, I cannot see a way to do this.
Substitute for Commemoration of Idul Fitri Festival "; contains no way of telling what is the first text comment and the second.
It seems quite strange to me that you receive information from your database like this, I would recommend seeing if there are other options for doing this. There is almost certainly a way to get seperate fields.
If have the ability to change all the information in the database, you could put single quotes (') or some other seperator, which you would then be able to split out the two pieces of text.
This is basically what #DanielBarbarian suggested: Since the information seems to always start at the same indexes, you can just use those to get what you need.
String ex1 = "THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for";
String ex2 = "MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival ";
String day = ex2.substring(0, 8).trim();
String date = ex2.substring(8, 14).trim() + ex2.substring(14, 22).trim() + "2020";
String name = ex2.substring(22);
System.out.println("DAY " + day);// display day
System.out.println("DATE " + date);// display date
System.out.println("HOLIDAY NAME " + name);// dispay holiday name

Merging rows with same ID together with dynamic headers with CSV in java

Please Help, I just can't figure it out.
I am trying to merge CSV rows together that have the same ID number, but have different values in the fields.
Please i dont have code.
For example,
Read in this CSV file:
ID NAME PHONE EMAIL
22 John 555-1111 john#aol.com
22 John 555-2222 john#aol.com
44 Bill 555-9999 Bill#aol.com
Should return:
ID NAME PHONE EMAIL PHONE0
22 John 555-1111 john#aol.com 555-2222
44 Bill 555-9999 Bill#aol.com
Thanks

How would I match names to memberships in organizations using HashMaps in Java?

So here is my predicament. I have no idea where to start with this. Lets say i'm given a list of people that are members of multiple organizations like the following:
John NAACP PETA NRA
Bill NRA WHO
Nancy NAACP NRA WHO
Jim PETA WHO
But I want to take another file in that has a list of all the possible organizations and then output something like this (with the organizations in alphabetical order and the members also in alphabetical order, and no names next to an organization if nobody is in it):
NAACP John Nancy
NRA Bill John Nancy
PETA Jim John
WHO Bill Jim Nancy
YEO
I'm new to HashMaps and I have no idea how to go about doing this, so i'd appreciate all the help I can get.
Try something like a HashMap<String, ArrayList<String>>. Insert each organization name as a String key with an empty ArrayList<String>. Then loop over the list of people => organizations, look up the organizations one by one, and insert the person's name in the ArrayList for that organization.
Not the most elegant solution but it will work
To add people to the list you can use the following code:
Map<String, List<String>> storage = new
LinkedHashMap<String,List<String>>();
if(!storage.containsKey("NRA")){
storage.put("NRA", new ArrayList<String>());
}
storage.get("NRA").add("Bill");
storage.get("NRA").add("John Nancy");
To extract and print people you can use the following code:
for(Entry<String, List<String>> entry : storage.entrySet()){
String line = entry.getKey(); //getting company name
for(String name : entry.getValue()){ //extracting name from an array
line += " ";
line += name;
}
System.out.println(line); //printing the result
}
I didn't check this code in IDE but except possible typos it will work.
Create a HashMap for each organization, e.g., HashMap<String,Boolean> memberOfNAACP = new HashMap<String,Boolean>();. As you loop through the members and you find one who is a member of NAACP, run memberOfNAACP.put("John",true). When you're done, dump out the contents of the hash with memberOfNAACP.keySet(). If you don't know all the organizations in advance, use an ArrayList of HashMaps, type ArrayList<HashMap<String,Boolean>>.

Finding document by numeric fields in Lucene

For example, I have some documents described by fields: id, date and price.
First document: id=1, date='from 10.01.2014 to 20.01.2014', price='120'
Second document: id=2, date='19.01.2014' and price='from 100 to 140'
My program receives key/value parameters and should find the most suitable documents. So, for example, with this parameters date=19.01.2014 and price='120' program should find both documents. With date=20.01.2014, price=120' only the first document. With date='19.01.2014, price=140' only the second one.
How can I do it with Lucene in Java? I saw examples where I'm typing query like 'give me docs where date is from .. to ..', and Lucene gives me docs in this range. Instead of this I want to specify range for my document and not for query.
You could index both opening and closing ranges for dates and prices, e.g.
Your document #1 would be indexed as:
id = 1
dateFrom = 10.01.2014
dateTo = 20.01.2014
priceFrom = 120
priceTo = 9999999999
And document #2 as
id=2
dateFrom = 19.01.2014
dateTo = 01.01.2099
priceFrom = 100
priceTo = 140
The query would look like this:
+dateFrom:[19.01.2014 TO *] +priceFrom:[120 TO *] +priceTo:[* TO 140]
This is not very effective but it should work.

Querying based on available partial data of nested document (in morphia, mongoDB)

Document structure (just for illustration)
Employee
{
name : "..",
age : ..,
addresses [
{
"street":"...",
"country":{
name:"..",
continent:"..",
Galaxy:".."
}
}
],
company:".."
}
Query -
I have just Addresses -> street (type String) and Addresses -> country -> name (type String). And i want to get all employees that match this criteria.
Address a1 = new Address();
a1.setStreet("bla bla");
Country c = new Country();
c.setName("sth");
a1.setCountry(c);
Query<Employee> q = ds.createQuery(Employee.class).field("addresses").hasThisElement(a1)
DOESN'T fetch results (when actually there is a real match). Looks like its because of partial "Country" document match. If i populate all fields of Country its getting results as expected.
Question #1 : Any workaround for above?
Question #2 : Address is an array and i can get multiple (address#street, country#name) pairs and again i want the list of employees that match given pairs.
Something like:
Query<Employee> q = ds.createQuery(Employee.class).field("addresses").hasThisElement(a1).field("addresses").hasThisElement(a2).field(..) // and so on
Note: i can breakdown address match something like this
Address a = new Address();
a.setStreet("bla bla");
q.createQuery(Employee.class).field("addresses").hasThisElement(a).field("addresses.country.name").equal("hoo");
BUT this will match Employee where street="bla bla" and country.name!="hoo" in address#1 and street!="bla bla" and country.name="hoo" in address #2. You get the point. I don't want such Employees to be returned.
Please let me know if this is possible. Thanks much.
It's possible. MongoDB has a special operator for situations like this called elemMatch. Morphia has support for it.
You are correct that the second approach is the right way (The first approach is trying to match the entire country, not the subkey). The only thing is you want to constrain it to a single element with both street and country.name matching. Not a document with a matching street and a matching country.name.
This doc page and this thread have some more information.
http://code.google.com/p/morphia/wiki/Query
http://groups.google.com/group/morphia/tree/browse_frm/month/2011-02/5bd3f654526fa30b?rnum=41&lnk=ol
Unfortunately I don't know morphia well, but hopefully that will give you enough information to solve it.

Categories

Resources