String regular expression java - java

I'm working on a utility where I've this requirement:
there is a string which contains parameters like - #p1 or #p2 or #pn, where n can be any number.
for example string is :
Input:
It provides #p1 latest news, videos #p2 from India and #p3 the world. Get today's news headlines from #p5 Business, #p5
Replace all the parameters with #pn#. So if the parameter is #p1 it will become #p1#.
The above string will become :
Output:
It provides #p1# latest news, videos #p2# from India and #p3# the world. Get today's news headlines from #p4# Business, #p5#
Any quick help appreciated.
Thanks.

Use string.replaceAll function like below.
string.replaceAll("(#p\\d+)", "$1#");
\d+ matches one or more digits. () called capturing group which capture the characters that the matched by the pattern inside () and it store the captured characters into their corresponding groups. Later we could refer those characters by specifying its index like $1 or $2 .
Example:
String s = "It provides #p1 latest news, videos #p2 from India and #p3 the world. Get today's news headlines from #p5 Business, #p5";
System.out.println(s.replaceAll("(#p\\d+)", "$1#"));
Output:
It provides #p1# latest news, videos #p2# from India and #p3# the world. Get today's news headlines from #p5# Business, #p5#

You can try regex like this :
public static void main(String[] args) {
String s = "it provides #p1 latest news, videos #p2 from India and #p3 the world. Get today's news headlines from #p5 Business, #p5";
System.out.println(s.replaceAll("(#p\\d+)(?=\\s+|$)", "$1\\#"));
}
O/P :
it provides #p1# latest news, videos #p2# from India and #p3# the world. Get today's news headlines from #p5# Business, #p5#
Explanation :
(#p\\d+)(?=\\s+|$) --> `#p` followed by any number of digits (which are all captured) followed by a space or end of String (which are matched but not captured..)

Related

Java string indexing make me confused

So i need to gather data from my db, it's holiday date in my country, the data comes like this
Example 1 : THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for
Example 2 : MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival
So i need to get data from days, dates, and the holiday name, for get data from example 1 i'm using code like this
public static void main(String[] args) {
String ex1 = "THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for";
String ex2 = "MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival ";
String[] trim1 = ex1.trim().split("\\s+"); //to split by space
String[] trim2 = ex1.trim().split(" "); //to split by 3 space so i got the data from multiple space as delimiter
System.out.println("DAY " +trim1[0]);//display day
System.out.println("DATE " +trim1[1] +trim1[2]+"2020");//display date
System.out.println("HOLIDAY NAME " +trim2[3]);//dispay holiday name
}
The Output come like this
DAY MON
DATE 21May2020
HOLIDAY NAME Ascension Day of Jesus Christ
and just like what i need, but when come to example 2, i can't use same code because the space is different, how to get the data i need with example 1 and 2 with same code.
i am new in java so i'm sorry if my question looking dumb, i hope you can help me.Thanks
.split("\\s+") will split at any space, including multiple spaces. Eg. it will split at 1 space or more.
This means that you are able to split at any amount of spaces (what you want). However, this will also split your text comments. You are able to limit the length of the array produced (the amount of times it is split) using .split(regex, n), which will result in an array of n-1 size at most. See this for more details
As for splitting out your two textual comments, I cannot see a way to do this.
Substitute for Commemoration of Idul Fitri Festival "; contains no way of telling what is the first text comment and the second.
It seems quite strange to me that you receive information from your database like this, I would recommend seeing if there are other options for doing this. There is almost certainly a way to get seperate fields.
If have the ability to change all the information in the database, you could put single quotes (') or some other seperator, which you would then be able to split out the two pieces of text.
This is basically what #DanielBarbarian suggested: Since the information seems to always start at the same indexes, you can just use those to get what you need.
String ex1 = "THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for";
String ex2 = "MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival ";
String day = ex2.substring(0, 8).trim();
String date = ex2.substring(8, 14).trim() + ex2.substring(14, 22).trim() + "2020";
String name = ex2.substring(22);
System.out.println("DAY " + day);// display day
System.out.println("DATE " + date);// display date
System.out.println("HOLIDAY NAME " + name);// dispay holiday name

How to compare below tokenized text useing open NLP with Database column values?

In my java web application (Jsp + Servlet + hibernate) users can request books. The request goes to the database as a text. After that I tokenize the text using Apache Open NLP. Then I need to compare these tokenized text with books table (the books table has book ID , Book Name , Author , Description) and give most related suggestions to the user. Mostly I need to compare this with book name column and book description column. Is this possible?
import opennlp.tools.tokenize.SimpleTokenizer;
public class SimpleTokenizerExample {
public static void main(String args[]){
String sentence = "Hello Guys , I like to read horror stories. If you have any horror story books please share with us. Also my favorite author is Stephen King";
//Instantiating SimpleTokenizer class
SimpleTokenizer simpleTokenizer = SimpleTokenizer.INSTANCE;
//Tokenizing the given sentence
String tokens[] = simpleTokenizer.tokenize(sentence);
//Printing the tokens
for(String token : tokens) {
System.out.println(token);
}
}
}
Apache OpenNLP can do Natural Language Processing, but the task you describe is Information Retrieval. Take a look at http://lucene.apache.org/solr/.
If you really need to use DB only, you can try to make a query for each token using the LIKE sql keyword:
SELECT DISTINCT FROM mytable WHERE token IN description;
and rank the lines with higher match.
How OpenNLP can help you?
You can use the OpenNLP Stemmer. In that case you can get the stem of the book description and title before adding it to the columns to the database. You also need to stem the query. This will help you with inflections: "car" will match "cars", "car".
You can accomplish the same with the OpenNLP Lemmatizer, but you need a trained model, which is not available today for that module.
just to add to what #wcolen says, some out of the box stemmers exist for various languages in Lucene as well.
Another thing OpenNLP could help with is recognizing book authors names (e.g. Stephen King) via the NameFinderTool so that you could adjust the query so that your code creates a phrase query for such entities instead of a plain keyword based query (with the result that you won't get results containing Stephen or King but only results containing Stephen King).

Android - Get id from Geocoder location

UPDATED: If I try to use ZIP CODE I'll get neighborhood code. And I want something to identify NEW YORK not the neighborhood.
I'm using geolocation (geocoder) in my app, and I compare the city name that geocode gives me with a name I already have in my database.
For example, if the user has phone language in English the result would be something like this:
DATABASE: New York
APP RESULT: New York (MATCH)
But if user has phone language in Spanish the result won`t match:
DATABASE: New York
APP RESULT: Nueva York (NO MATCH)
Is there any way to obtain a UNIQUE ID for every CITY instead of names, something like "CITYNAME = 0001q5"?

Does google libphonenumber API provide CITY name based on the area code?

I am working on google libphonenumber to get the phone number's information. I was able to pull the State & country name. But I am unable to get the CITY name. Does the google API provide CITY name as well ?
Please advice thanks in advance.
Does Google's libphonenumber API provide city names based on the area code?
Yes - but only if its internal geocoding database contains that information. Some US phone numbers are entirely non-geographic (like 1-800 numbers) and for many others the database just doesn't have that information.
Also, libphonenumber's geo lookup only returns single string values: it does not return structured data, like a class with city, state, county, and zip code fields. And the format of the returned string value varies considerably - limiting its uses in any automated system as it necessarily requires a human user to interpret the data.
For example, using the current libphonenumber (as of December 2021, that's version 8.12.38), I get the following results for these fictitious, but valid, numbers:
1-800-333-4567
This is a non-geographic "1-800" (toll-free) number.
libphonenumber's lookup returns the string "United States".
1-201-200-0000
The is is a US phone number in area code 201 for New Jersey.
libphonenumber's lookup returns the string "Jersey City, NJ".
It resolved it to a city and state, giving the state's abbreviation.
1-425-666-9999
This is (almost) my own personal phone number which I got from a T-Mobile store in Redmond, WA.
But libphonenumber's lookup returns only the imprecise string "Washington State" with no city name or similar. It's curious that it returned the full US state name instead of the "WA" abbreviation.
Remember, as with any in-proc "offline" geolocation database, data will inevitably go stale and inaccurate. Also, in the US at least, it's impossible to differentiate between mobile-phone numbers and landline numbers, and the area-code of someone's mobile-phone number is completely detached from the phone subscriber's actual physical location (it's usually the area-code of the AT&T / T-Mobile / Sprint store where they initially signed-up for service, even if they subsequently move across the country).
Anyway, here's how you use the PhoneNumberOfflineGeocoder class:
Java:
static string lookupNumber( string phoneNumber ) {
PhoneNumber parsed = PhoneNumberUtil.getInstance().parse( phoneNumber, /*defaultRegion:*/ "US" );
return PhoneNumberOfflineGeocoder.getInstance().getDescriptionForNumber( parsed, Locale.ENGLISH );
}
lookupNumber( "1-800-333-4567" ) // "United States"
lookupNumber( "+12012000000" ) // "Jersey City, NJ"
lookupNumber( "+14256669999" ) // "Washington State"
C#
static String LookupNumber( string phoneNumber ) {
PhoneNumber parsed = PhoneNumberUtil.GetInstance().Parse( phoneNumber, defaultRegion: "US" );
return PhoneNumberOfflineGeocoder.GetInstance().GetDescriptionForNumber( parsed, Locale.English );
}
LookupNumber( "1-800-333-4567" ) // "United States"
LookupNumber( "+12012000000" ) // "Jersey City, NJ"
LookupNumber( "+14256669999" ) // "Washington State"

Google Directions API and Using maps.google.com give different Response

Here - > https://maps.google.com/
I queried for parameters
A : antim chauraha, Indore, Madhya Pradesh, India
B : India Collection, Indore, Madhya Pradesh, India
And using the webservice to get as XML, same response from Direction API
http://maps.googleapis.com/maps/api/directions/xml?origin=ANTIM%20CHAURAHA,Indore,Madhya%20Pradesh,India&destination=INDIA%20COLLECTION,PALSIKAR,Indore,Madhya%20Pradesh,India&sensor=false
Those responses do not match at all. You see the distance, direction or anything.
Am I doing something wrong?
Google Maps uses additional information to locate your "query string". The directions API uses postal addresses (for the most part).
Neither of these looks like a particularly good "address"
A : antim chauraha, Indore, Madhya Pradesh, India
B : India Collection, Indore, Madhya Pradesh, India
Run them through the Google Maps API v3 geocoder, they both resolve to the same place (Indore, Madhya Pradesh, India)
If you need to resolve "places" like that, use the Places API to get coordinates, then use the directions service to get directions between the coordinates.
Basically there is a difference in name and address. Google database can have other symbols for the same location. You can read my answer here: http://www.stackoverflow.com/questions/12788664/google-maps-api-geocode-returns-different-co-ordinates-then-google-maps/12790012#12790012. In the google response the name is what you are looking for and in the placemark object the list of address and location in the google database. You need to loop through the placemark object to find the most similar location.

Categories

Resources